NWU Institutional Repository

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns

dc.contributor.authorMalan, Linda
dc.contributor.authorSmuts, Cornelius M.
dc.contributor.authorBaumgartner, Jeannine
dc.contributor.authorRicci, Cristian
dc.contributor.researchID10091130 - Malan, Linda
dc.contributor.researchID24054909 - Baumgartner, Jeannine
dc.contributor.researchID29790514 - Ricci, Cristian
dc.contributor.researchID20924445 - Smuts, Cornelius Mattheus
dc.date.accessioned2020-03-18T12:36:39Z
dc.date.available2020-03-18T12:36:39Z
dc.date.issued2020
dc.description.abstractPrincipal component analysis (PCA) is a popular statistical tool. However, despite numerous advantages, the good practice of imputing missing data before PCA is not common. In the present work, we evaluated the hypothesis that the expectation-maximization (EM) algorithm for missing data imputation is a reliable and advantageous procedure when using PCA to derive biomarker profiles and dietary patterns. To this aim, we used numerical simulations aimed to mimic real data commonly observed in nutritional research. Finally, we showed the advantages and pitfalls of the EM algorithm for missing data imputation applied to plasma fatty acid concentrations and nutrient intakes from real data sets deriving from the US National Health and Nutrition Examination Survey. PCA applied to simulated data having missing values resulted in biased eigenvalues with respect to the original data set without missing values. The bias between the eigenvalues from the original set of data and from the data set with missing values increased with number of missing values and appeared as independent with respect to the correlation structure among variables. On the other hand, when data were imputed, the mean of the eigenvalues over the 10 missing imputation runs overlapped with the ones derived from the PCA applied to the original data set. These results were confirmed when real data sets from the National Health and Nutrition Examination Survey were analyzed. We accept the hypothesis that the EM algorithm for missing data imputation applied before PCA aimed to derive biochemical profiles and dietary patterns is an effective technique especially for relatively small sample sizesen_US
dc.identifier.citationMalan, L. et al. 2020. Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns. Nutrition research, 75:67-76. [https://doi.org/10.1016/j.nutres.2020.01.001]en_US
dc.identifier.issn0271-5317
dc.identifier.urihttp://hdl.handle.net/10394/34410
dc.identifier.urihttps://www.sciencedirect.com/science/article/pii/S0271531719309583
dc.identifier.urihttps://doi.org/10.1016/j.nutres.2020.01.001
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.subjectMissing data imputationen_US
dc.subjectEM algorithmen_US
dc.subjectPrincipal component analysisen_US
dc.subjectDietary patternsen_US
dc.subjectBiochemical profilesen_US
dc.titleMissing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patternsen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: