Inference of Demographic Attributes based on Mobile Phone Usage Patterns and Social Network Topology
Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper, we focus on the population of Mexican mobile phone users. We first present an observational study of mobile phone usage according to gender and age groups. We are able to detect significant differences in phone usage among different subgroups of the population. We then study the performance of different machine learning (ML) methods to predict demographic features (namely, age and gender) of unlabeled users by leveraging individual calling patterns, as well as the structure of the communication graph. We show how a specific implementation of a diffusion model, harnessing the graph structure, has significantly better performance over other node-based standard ML methods. We provide details of the methodology together with an analysis of the robustness of our results to changes in the model parameters. Furthermore, by carefully examining the topological relations of the training nodes (seed nodes) to the rest of the nodes in the network, we find topological metrics which have a direct influence on the performance of the algorithm.