Determining the appropriate number of clusters and the number of units per cluster
There are two main issues to consider when applying sample size calculations to fieldwork plans for cluster surveys: how many clusters are needed, and how many units (individuals or households) are needed per cluster. The two values are interrelated, meaning that decisions about one affect the value of the other.
As noted in Module 5: Sample size, the number of units per cluster affects the DEFF and therefore the required overall sample size. The more clusters it is possible to include, the fewer units are needed per cluster and the more diverse the sample will be. As a general rule, up to around 40 clusters per stratum is a good estimate, with the aim of at least 10 observations per cluster. A greater number of clusters and fewer units per cluster will decrease the DEFF and either improve precision for the same sample size, or maintain precision with a smaller sample size. In the example given in Module 5: Sample size, for a sample of 1200 households, higher precision can be achieved by selecting 60 clusters of 20 households each as opposed to 40 clusters of 30 households each. On the other hand, visiting 60 clusters rather than 40 increases the cost of the survey. This underscores the need to weigh the cost of collecting data and specimens against programmatic needs for a specific level of precision.
General guidelines for deciding on the number of clusters and number of units per cluster are provided in the section on DEFF in Module 5: Sample size. Other factors that influence this decision are geography and time per cluster:
- Geography: In larger countries, the cost and time required for teams to move from one cluster to another can be substantial. In this case it might be better to select a smaller number of clusters, but never less than 25. If the country is very small, or if the survey is being conducted in a region or province only, having a larger number of clusters is a reasonable way to improve precision.
- Time per cluster: The number of household visits and data collection that can be completed in a single day can vary. In some surveys, the questionnaire and specimen collection might be brief, whereas in others it may be much more time-consuming. In some surveys, the specimen collection and interviews are conducted in the household, while in others, survey participants may be asked to go to a central laboratory set up in the cluster. Depending on the survey design, the size of the team, the traveling distance between households, and the complexity of the survey, a single team can typically complete visits to five to ten households in one day.
The logistics required for cold chain management are also crucial to consider. In some harder-to-reach clusters with limited access to electricity, it may be necessary to minimize the number of days spent in the cluster if there are specimens that require processing and freezing in the field. Portable freezers and centrifuges that can be plugged into a car or portable generator are available, but the less time spent in areas without a direct power source, the more likely it is that the cold chain can be maintained. There are several ways to minimize the time spent in one cluster: increasing the number of enumerators per team, increasing the number of teams per cluster and increasing the total number of clusters so that there are fewer households per cluster.
In some circumstances, the collection of specimens for a specific biomarker may be complicated and time-consuming. If this is the case, it may be possible to collect data only from a random subsample of the population group of interest within each stratum, and focus on generating a reliable national estimate. The modified relative dose response (MRDR) test for assessing vitamin A deficiency is an example of a biomarker that requires a random subsample. Specimen collection for the MRDR test requires the survey participant to avoid vitamin A rich foods for at least two hours before the initial blood draw, consume a dose of vitamin A2 mixed with oil, continue to avoid vitamin A rich foods, and then have a second blood specimen drawn four to six hours later. In addition, sample analysis for the MRDR test is costly. For reasons of feasibility and cost, it may be sufficient to collect specimens for the MRDR test from one single household per cluster.