Sunday, December 13, 2009

Multiple Imputation with Deletion

The sociologist Paul T. von Hippel has written a great article outlining how to deal with missing values when the Y's are also missing. Typically the gold standard for dealing with missing data has been multiple imputation, but he advocates multiple imputation with deletion (MID): that is, you use all cases for multiple imputation, but after imputing you delete those cases with imputed Y values. Somewhat surprisingly (because of the reduced sample size after excluding those cases with imputed Y values), MID usually leads to smaller standard errors; moreover, since the Y's are excluded from the analysis, MID is robust to problems with the imputation model. Check out von Hippel's informative paper here.