Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?

Jin Hyuk Lee; J. Charles Huber Jr.

doi:10.18502/ijph.v50i7.6626

Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?

Jin Hyuk Lee

J. Charles Huber Jr.

Abstract

Background: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable.

Methods: Using data from “Predictive Study of Coronary Heart Disease” study, this study examined the effectiveness of multiple imputation in data with 20% missing to 80% missing observations using absolute bias (|bias|) and Root Mean Square Error (RMSE) of MI measured under Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR) assumptions.

Results: The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms.

Conclusion: As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.

1. Zhou XH, Eckert GJ, Tierney WM (2001). Multiple imputation in public health re-search. Stat Med, 20(9‐10): 1541-9.
2. Deng Y, Chang C, Ido MS, Long Q (2016). Multiple imputation for general missing data patterns in the presence of high-dimensional data. Sci Rep, 6: 21689.
3. Lee KJ, Carlin JB (2012). Recovery of in-formation from multiple imputation: a simulation study. Emerg Themes Epidemiol, 9(1):3.
4. Little RJA (1992). Regression with missing X’s: a review. J Am Stat Assoc, 87:1227-1237.
5. Janssen KJ, Donders AR, Harrell FE Jr, et al (2010). Missing covariate data in medical research: to impute is better than to ig-nore. J Clin Epidemiol, 63(7): 721-7.
6. Allison PD (2002). Missing data. Sage publi-cations. Thousand Oaks. doi: 10.4135/9781412985079.
7. Little RJ, Rubin DB (2019). Statistical analysis with missing data, John Wiley & Sons. New York. doi: 10.1002/9781119482260.
8. Rubin DB (1976). Inference and missing da-ta. Biometrika, 63(3):581-92.
9. Bennett DA (2001). How can I deal with missing data in my study?. Aust N Z J Public Health, 25(5):464-9.
10. Schafer JL (1999). Multiple imputation: a primer. Stat Methods Med Res, 8(1):3-15.
11. Mishra S, Khare D (2014). On comparative performance of multiple imputation methods for moderate to large propor-tions of missing data in clinical trials: a simulation study. J Med Stat Inform, 2(1):9.
12. Hardt J, Herke M, Brian T, Laubach W (2013). Multiple imputation of missing data: a simulation study on a binary re-sponse. Open Journal of Statistics, 3(05):370.
13. Emdin CA, Rothwell PM, Salimi-Khorshidi G, et al (2016). Blood pressure and risk of vascular dementia: evidence from a primary care registry and a cohort study of transient ischemic attack and stroke. Stroke, 47(6): 1429-35.
14. Kenward MG, Carpenter J (2007). Multiple imputation: current perspectives. Stat Methods Med Res, 16(3): 199-218.
15. Mirmohammadkhani M, Foroushani AR, Davatchi F, et al (2012). Multiple Imputa-tion to Deal with Missing Clinical Data in Rheumatologic Surveys: an Application in the WHO-ILAR COPCORD Study in Iran. Iran J Public Health, 41(1): 87-95.
16. Miri HH, Hassanzadeh J, Rajaeefard A, et al (2015). Multiple Imputation to Correct for Nonresponse Bias: Application in Non-Communicable Disease Risk Fac-tors Survey. Glob J Health Sci, 8(1): 133-42.
17. Rosenman RH, Friedman M, Straus R, et al (1964). A predictive study of coronary heart disease: The Western Collaborative Group Study. JAMA, 189(1):15-22.
18. Scheffer J (2002). Dealing with missing data. Res Lett Inf Math Sci, 3(1):153–60. Availa-ble from: https://mro.massey.ac.nz/handle/10179/4355
19. Giorgi R, Belot A, Gaudart J, Launoy G (2008). the French Network of Cancer Registries FRANCIM. The performance of multiple imputation for missing co-variate data within the context of regres-sion relative survival analysis. Stat Med, 27:6310-31.
20. Horton NJ, White IR, Carpenter J (2010). The performance of multiple imputation for missing covariates relative to com-plete case analysis. Stat Med, 29(12): 1357.
21. Marshall A, Altman DG, Royston P, et al (2010). Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simula-tion study. BMC Med Res Methodol, 10: 7.
22. Galimard JE, Chevret S, Protopopescu C, et al (2016). A multiple imputation approach for MNAR mechanisms compatible with Heckman's model. Stat Med, 35(17):2907-20.

Files	XML PDF (1MB)
Issue	Vol 50 No 7 (2021)
Section	Original Article(s)
DOI	https://doi.org/10.18502/ijph.v50i7.6626
Keywords
Public health research Multiple imputation Large proportions of missing data Coronary heart disease

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite

Lee JH, Huber Jr. J. Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?. Iran J Public Health. 2021;50(7):1372-1380.

Vancouver

Download Citation