Stacking classifiers for anti-spam filtering of e-mail

doi:10.48550/arXiv.cs/0106040

Stacking classifiers for anti-spam filtering of e-mail

We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, or "spam", floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the efficiency of automatically induced anti-spam filters, and that such filters can be used in real-life applications.

Publication:

arXiv e-prints

Pub Date:

June 2001

DOI:

10.48550/arXiv.cs/0106040

arXiv:

arXiv:cs/0106040

Bibcode:

2001cs........6040S

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
H.4.3;
I.2.6;
I.2.7;
I.5.4;
K.4.1

E-Print:

Proceedings of "Empirical Methods in Natural Language Processing" (EMNLP 2001), L. Lee and D. Harman (Eds.), pp. 44-50, Carnegie Mellon University, Pittsburgh, PA, 2001

NASA/ADS

Stacking classifiers for anti-spam filtering of e-mail

Abstract