Vaishali Jain, Ted Enamorado, and Cynthia Rudin. 2022. “The Importance of Being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-Based Ethnicity Classification.” Harvard Data Science Review, 4, 3.
Name-based ethnicity classification is the task of predicting ethnicity from a name. Ethnicity classification can be a key tool for assessing the fairness of algorithms, demographic studies, and political analysis. While previous state-of-the-art approaches for this task rely on complex neural networks that are difficult to understand, troubleshoot, and tune, we provide an interpretable and intuitive solution that outperforms these (overly) complicated models at a fraction of the computational cost. Using our technique, we can analyze patterns in name-ethnicity databases to show connections between ethnic groups in terms of their overlap of names. We provide techniques to generalize under domain shift, leveraging ‘indistinguishables,’ which are names common to multiple ethnic groups. We provide an application of our method to the estimation of how many political donations for each political party were provided by individuals from various ethnic groups in 2020 leading up to the presidential election.