Statistical Challenges in Analyzing Migrant Backgrounds Among University Students: a Case Study from Italy
Abstract
The methodological issues and statistical complexities of analyzing university students with migrant backgrounds is explored, focusing on Italian data from the University of Milano-Bicocca. With the increasing size of migrant populations and the growth of the second and middle generations, the need has risen for deeper knowledge of the various strata of this population, including university students with migrant backgrounds. This presents challenges due to inconsistent recording in university datasets. By leveraging both administrative records and an original targeted survey we propose a methodology to fully identify the study population of students with migrant histories, and to distinguish relevant subpopulations within it such as second-generation born in Italy. Traditional logistic regression and machine learning random forest models are used and compared to predict migrant status. The primary contribution lies in creating an expanded administrative dataset enriched with indicators of students' migrant backgrounds and status. The expanded dataset provides a critical foundation for analyzing the characteristics of students with migration histories across all variables routinely registered in the administrative data set. Additionally, findings highlight the presence of selection bias in the targeted survey data, underscoring the need of further research.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- DOI:
- arXiv:
- arXiv:2501.06166
- Bibcode:
- 2025arXiv250106166G
- Keywords:
-
- Statistics - Applications;
- 62P25;
- 62D99
- E-Print:
- 20 pages, 6 figures