DATA SCIENCE IN DEVELOPMENT ECONOMICS: USING CLUSTER ANALYSIS TO GENERATE A MULTIVARIATE DEVELOPMENT TAXONOMY

Gross, Andrew2018-09-272018-09-272018-05http://udspace.udel.edu/handle/19716/23851This paper attempts to apply clustering techniques from data science to the economic problem of generating a country-level development taxonomy. Development taxonomies currently in use su er from two key issues. First, the taxonomies are based on very few variables and therefore cannot properly represent something as complex and multifaceted as development. Second, the values used to discriminate groups are chosen arbitrarily. In this work, a univariate analysis is performed using the method of kernel density estimation to empirically generate a single-valued taxonomy which can be directly compared with the income group taxonomy published by the World Bank. Next, a de nition of development is derived and a multivariate analysis is performed to create a comprehensive development taxonomy using two forms of k-means clustering. The univariate analysis demonstrates the superiority of a data-driven approach to single-valued taxonomy creation. Conversely, it remains inconclusive as to whether cluster analysis can create a well-de ned multivariate development taxonomy.Mathematics and Economics, cluster analysis, multivariate development taxonomyDATA SCIENCE IN DEVELOPMENT ECONOMICS: USING CLUSTER ANALYSIS TO GENERATE A MULTIVARIATE DEVELOPMENT TAXONOMYThesis