Follow Spam Detection based on Cascaded Social Information
Abstract
In the last decade we have witnessed the explosive growth of online social networking services (SNSs) such as Facebook, Twitter, RenRen and LinkedIn. While SNSs provide diverse benefits for example, forstering interpersonal relationships, community formations and news propagation, they also attracted uninvited nuiance. Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services. Therefore, detecting spammers has become an urgent and critical issue in SNSs. This paper deals with Follow spam in Twitter. Instead of spreading annoying messages to the public, a spammer follows (subscribes to) legitimate users, and followed a legitimate user. Based on the assumption that the online relationships of spammers are different from those of legitimate users, we proposed classification schemes that detect follow spammers. Particularly, we focused on cascaded social relations and devised two schemes, TSP-Filtering and SS-Filtering, each of which utilizes Triad Significance Profile (TSP) and Social status (SS) in a two-hop subnetwork centered at each other. We also propose an emsemble technique, Cascaded-Filtering, that combine both TSP and SS properties. Our experiments on real Twitter datasets demonstrated that the proposed three approaches are very practical. The proposed schemes are scalable because instead of analyzing the whole network, they inspect user-centered two hop social networks. Our performance study showed that proposed methods yield significantly better performance than prior scheme in terms of true positives and false positives.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2016
- DOI:
- 10.48550/arXiv.1605.00448
- arXiv:
- arXiv:1605.00448
- Bibcode:
- 2016arXiv160500448J
- Keywords:
-
- Computer Science - Social and Information Networks;
- Computer Science - Information Retrieval;
- 68P01;
- J.4;
- K.6.5
- E-Print:
- 34 pages,10 figures, Preprint submitted to Elsevier Information Sciences