This paper describes the approach of the UniBuc - NLP team in tackling the SemEval 2024 Task 8: Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection. We explored transformer-based and hybrid deep learning architectures. For subtask B, our transformer-based model achieved a strong \textbf{second-place} out of $77$ teams with an accuracy of \textbf{86.95%}, demonstrating the architecture's suitability for this task. However, our models showed overfitting in subtask A which could potentially be fixed with less fine-tunning and increasing maximum sequence length. For subtask C (token-level classification), our hybrid model overfit during training, hindering its ability to detect transitions between human and machine-generated text.
Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection
The UniBuc - NLP team used transformer-based and hybrid deep learning models to detect black-box machine-generated text, achieving top performance in subtask B but facing overfitting issues in subtasks A and C.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 3
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2405.17964ARXIV-DEFAULT
- TL;DR
- Semantic Scholar