Comparison of Statistical Learning and Predictive Models on Breast Cancer Data and King County Housing Data

Cai, YunjiaoFu, ZhuolunZhao, YuzheHu, YilinDing, Shanshan2017-09-192017-09-192017-09http://udspace.udel.edu/handle/19716/21667In this study, we evaluate the predictive performance of popular statistical learning methods, such as discriminant analysis, random forests, support vector machines, and neural networks via real data analysis. Two datasets, Breast Cancer Diagnosis in Wisconsin and House Sales in King County, are analyzed respectively to obtain the best models for prediction. Linear and Quadratic Discriminant Analysis are used in WDBC data set. Linear Regression and Elastic Net are used in KC house data set. Random Forest, Gradient Boosting Method, Support Vector Machines, and Neural Network are used in both datasets. Individual models and stacking of models are trained based on accuracy or R-squared from repeated cross-validation of training sets. The final models are evaluated by using test sets.en-USMachine learningPredictionClassificationRegressionStackingComparison of Statistical Learning and Predictive Models on Breast Cancer Data and King County Housing DataWorking Paper