Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

doi:10.48550/arXiv.2306.05871

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources. https://gitlab.inria.fr/wantoun/robust-chatgpt-detection

Publication:

arXiv e-prints

Pub Date:

June 2023

DOI:

10.48550/arXiv.2306.05871

arXiv:

arXiv:2306.05871

Bibcode:

2023arXiv230605871A

Keywords:

Computer Science - Computation and Language

E-Print:

Accepted to TALN 2023

ADS

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

Abstract