Performance of parametric vs. data mining methods for estimating propensity scores with multilevel data: a Monte Carlo study
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Randomized controlled trials (RCTs) or randomized experiments, have long been considered as the most rigorous method to determine whether causal effects exist between a treatment and an outcome, such as the effect of an educational intervention. However, RCTs are often infeasible due to practical or ethical reasons in educational settings. Under such circumstances, non-randomized observational studies are often used to estimate treatment effects. The propensity score is defined as the conditional probability of receiving treatment given a set of observed pretreatment variables. Under Rubin’s causal model, the aim of conditioning on the propensity score is to improve the quality of estimates by attempting to mimic the balance between groups that occurs through the randomization process. Propensity score methods have been developed primarily for single-level data structures. In educational studies, data typically have a clustered or hierarchical structure, where probability of receiving treatment is a function of both individual and cluster-level factors. ☐ Using the Monte Carlo simulations, this dissertation aims to compare two tree-based data mining approaches (i.e., generalized boosting modeling [GBM], generalized linear mixed-effects model trees [GLMERTREE]) to two parametric models (i.e., multiple logistic regression [MLR], multilevel logistic regression [RC]) for propensity score estimation under different simulated settings. There are several primary findings in this study. First, hidden bias from unobserved covariates has a very large impact on the estimate of causal effects—missing covariates renders all PSA approaches invalid. Second, under conditions of non-additivity and non-linearity, the data mining approaches can provide better performance on predicting the propensity score. However, all of the four estimation methods with an appropriately specified outcome model can provide unbiased treatment effect estimates. Third, although the MLR and RC outcome models performed similarly on the relative bias of treatment effects, RC offers better precision by producing lower standard errors of treatment effects. Fourth, among the eight estimation and outcome model combinations, GBM-RC combination provided a more accurate and precise treatment effect estimates across the greatest number of simulated conditions. ☐ There are several limitations in this study. First, this study did not consider varied correlation between covariates. Future research can be done to incorporate varied correlations among covariates. Second, balanced cluster size scenarios were created in this study. It is worth exploring the effect of the imbalance on the estimation of treatment effect. Third, this study included only propensity score weighting as the conditioning method. Future research can assess the performance of data mining approaches to estimate the propensity score using matching and stratification conditioning methods. Fourth, when using GBM to generate the propensity score in this study, only one algorithm specification was specified. Further research should include different algorithm specifications for GBM with multilevel data.
Description
Keywords
Data mining, Monte Carlo simulations, Multilevel data, Propensity score analysis, Mathematics education, Student performance