affinity: A System for Latent User Similarity Comparison on Texting Data
Abstract
In the field of social networking services, finding similar users based on profile data is common practice. Smartphones harbor sensor and personal context data that can be used for user profiling. Yet, one vast source of personal data, that is text messaging data, has hardly been studied for user profiling. We see three reasons for this: First, private text messaging data is not shared due to their intimate character. Second, the definition of an appropriate privacy-preserving similarity measure is non-trivial. Third, assessing the quality of a similarity measure on text messaging data representing a potentially infinite set of topics is non-trivial. In order to overcome these obstacles we propose affinity, a system that assesses the similarity between text messaging histories of users reliably and efficiently in a privacy-preserving manner. Private texting data stays on user devices and data for comparison is compared in a latent format that neither allows to reconstruct the comparison words nor any original private plain text. We evaluate our approach by calculating similarities between Twitter histories of 60 US senators. The resulting similarity network reaches an average 85.0% accuracy on a political party classification task.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- 10.48550/arXiv.1904.01897
- arXiv:
- arXiv:1904.01897
- Bibcode:
- 2019arXiv190401897E
- Keywords:
-
- Computer Science - Social and Information Networks