Overview of DETOXIS at IberLEF 2021Detection of TOXicity in comments in spanish

  1. Amigó Cabrera, Enrique
  2. Rosso, Paolo
  3. Taulé Delor, Mariona
  4. Ariza, Alejandro
  5. Nofre, Montserrat
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2021

Número: 67

Páginas: 209-221

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural


En este artículo se presenta la tarea DETOXIS, DEtección de TOxicidad en comentarios en español, que tuvo lugar en el Iberian Languages Evaluation Forum workshop (IberLEF 2021) en el congreso de la SEPLN 2021. Se describe el corpus NewsCom-TOX utilizado para entrenar y evaluar los sistemas, las métricas para evaluarlos y los resultados obtenidos por las distintas aproximaciones utilizadas. Se proporciona también un análisis de los resultados obtenidos por estos sistemas.

Información de financiación

The work has been carried out in the framework of the following projects: MISMIS project (PGC2018-096212-B), funded by Ministerio de Ciencia, Inno- vaci ́on y Universidades (Spain), CLiC SGR (2027SGR341), funded by AGAUR (Gener- alitat de Catalunya) and STERHEOTYPES project (Challenges for Europe), funded by Fondazione Compangia di San Paolo


Referencias bibliográficas

  • Amigó, E., J. Gonzalo, S. Mizzaro, and J. Carrillo-de Albornoz. 2020. Effectiveness metric for ordinal classification: Formal properties and experimental results. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
  • Basile, V., C. Bosco, E. Fersini, N. Debora, V. Patti, F. M. R. Pardo, P. Rosso, M. Sanguinetti, et al. 2019. Semeval2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In 13th International Workshop on Semantic Evaluation, pages 54–63. Association for Computational Linguistics.
  • Davidson, T., D. Warmsley, M. Macy, and I. Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, volume 11.
  • Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla, and M. Taboada. 2020. The sfu opinion and comments corpus: A corpus for the analysis of online news comments. Corpus Pragmatics, 4(2):155–190.
  • Kumar, R., A. K. Ojha, S. Malmasi, and M. Zampieri. 2018. Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pages 1–11.
  • Kumar, R., A. K. Ojha, S. Malmasi, and M. Zampieri. 2020. Evaluating aggression identification in social media. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 1–5.
  • Moffat, A. and J. Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS), 27(1):1–27.
  • Nobata, C., J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web, pages 145–153.
  • Nockleby, J. T. 2000. Hate speech. Encyclopedia of the American constitution, 3(2):1277–1279.
  • Poletto, F., V. Basile, M. Sanguinetti, C. Bosco, and V. Patti. 2020. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, pages 1–47.
  • Schmidt, A. and M. Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the fifth international workshop on natural language processing for social media, pages 1–10.
  • Struß, J. M., M. Siegel, J. Ruppenhofer, M. Wiegand, M. Klenner, et al. 2019. Overview of germeval task 2, 2019 shared task on the identification of offensive language. In Preliminary proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019).
  • Waseem, Z. and D. Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop, pages 88–93.
  • Zampieri, M., P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis, and C¸ . C¸ öltekin. 2020. SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1425–1447. International Committee for Computational Linguistics.