Module 15 / Data processing

Data processing

Before data analysis can begin, data may need to be prepared or processed to accommodate the statistical software or methods being used.

Categorization of Variables

For data on biological specimens and food samples, there may be a need to create new variables to interpret the results of analysis. For example, haemoglobin needs to be divided into a categorical variable with two levels, one for anaemic and one for non-anaemic. Anaemia can be further categorized into none, mild, moderate, and severe, according to WHO guidance. Additional information on cutoffs and the need to adjust for such factors as altitude, smoking and inflammation can be found in Module 3: Biomarker selection and specimen handling.

Calculating anthropometry Z-scores

The recommended standard approach for anthropometry data analyses uses the WHO Child Growth Standards as the reference. Analyses can be done using standard software, such as the Anthro software, or macros (SAS, SPSS, STATA and R) that can be downloaded from the WHO website and applied directly to the data. WHO recently developed an online tool for anthropometric data analyses. This tool updates the Anthro methodology to provide more accurate estimates of standard errors and confidence intervals for prevalence and mean Z-scores. The WHO Anthro Survey Analyser, based on the “R and R Shiny package” provides interactive graphics for data quality assessment. It also provides a summary report template offering key outputs (such as Z-score distribution graphics) for various grouping factors and nutrition status tables with accompanying prevalence and Z-score statistics. The software is currently available either online or offline.