Overview of ADoBo 2021:Automatic Detection of Unassimilated Borrowings in the Spanish Press

  1. Lignos, Constantine
  2. Porta Zamorano, Jordi
  3. Álvarez Mellado, Elena
  4. Espinosa-Anke, Luis
  5. Gonzalo Arroyo, Julio
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2021

Número: 67

Páginas: 277-285

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

En este artículo presentamos los resultados de ADoBo 2021, la tarea compartida de IberLEF 2021 sobre detección de préstamos léxicos en la prensa española. En esta tarea abordamos la detección de préstamos como un problema de etiquetado de secuencias. A los participantes de la tarea se les proporcionó un corpus de prensa española anotado con préstamos léxicos no asimilados (mayoritariamente anglicismos) siguiendo el esquema BIO. Recibimos nueve sistemas distintos provenientes de cuatro equipos diferentes. Los resultados obtenidos oscilan entre los 37 y los 85 puntos de valor F1, lo que indica que la detección de préstamos léxicos es un problema no resuelto (sobre todo cuando se abordan préstamos no vistos anteriormente) y que el trabajo lexicográfico tradicional podría beneficiarse de incorporar las técnicas actuales del PLN.

Referencias bibliográficas

  • Aguilar, G., F. AlGhamdi, V. Soto, M. Diab, J. Hirschberg, and T. Solorio. 2018. Named entity recognition on code switched data: Overview of the CALCS 2018 shared task. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 138–147, Melbourne, Australia, July. Association for Computational Linguistics.
  • Alex, B. 2008a. Automatic detection of English inclusions in mixed-lingual data with an application to parsing. Ph.D. thesis, University of Edinburgh.
  • Alex, B. 2008b. Comparing corpus-based to web-based lookup techniques for automatic English inclusion detection. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May. European Language Resources Association (ELRA).
  • Alvarez-Mellado, E. 2020. Lázaro: An extractor of emergent anglicisms in Spanish newswire. Master’s thesis, Brandeis University.
  • Andersen, G. 2012. Semi-automatic approaches to anglicism detection in Norwegian corpus data. In C. Furiassi, V. Pulcini, and F. Rodríguez González, editors, The anglicization of European lexis. pages 111–130.
  • Chesley, P. 2010. Lexical borrowings in French: Anglicisms as a separate phenomenon. Journal of French Language Studies, 20(3):231–251.
  • Chesley, P. and R. H. Baayen. 2010. Predicting new words from newer words: Lexical borrowings in French. Linguistics, 48(6):1343.
  • Clyne, M., M. G. Clyne, and C. Michael. 2003. Dynamics of language contact: English and immigrant languages. Cambridge University Press.
  • Furiassi, C. and K. Hofland. 2007. The retrieval of false anglicisms in newspaper texts. In Corpus Linguistics 25 Years On. Brill Rodopi, pages 347–363.
  • Furiassi, C., V. Pulcini, and F. R. González. 2012. The anglicization of European lexis. John Benjamins Publishing.
  • Garley, M. and J. Hockenmaier. 2012. Beefmoves: Dissemination, diversity, and dynamics of English borrowings in a German hip hop forum. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 135–139, Jeju Island, Korea, July. Association for Computational Linguistics.
  • Gerding, C., M. Fuentes, L. Gómez, and G. Kotz. 2014. Anglicism: An active word-formation mechanism in Spanish. Colombian Applied Linguistics Journal, 16(1):40–54.
  • Gómez Capuz, J. 1997. Towards a typological classification of linguistic borrowing (illustrated with anglicisms in romance languages). Revista alicantina de estudios ingleses, 10:81–94.
  • Haspelmath, M. and U. Tadmor. 2009. Loanwords in the world’s languages: a comparative handbook. Walter de Gruyter.
  • Haugen, E. 1950. The analysis of linguistic borrowing. Language, 26(2):210–231.
  • Leidig, S., T. Schlippe, and T. Schultz. 2014. Automatic detection of anglicisms for the pronunciation dictionary generation: a case study on our German IT corpus. In Spoken Language Technologies for UnderResourced Languages.
  • Lorenzo, E. 1996. Anglicismos hispánicos. Biblioteca románica hispánica: Estudios y ensayos. Gredos.
  • Losnegaard, G. S. and G. I. Lyse. 2012. A data-driven approach to anglicism identification in Norwegian. In G. Andersen, editor, Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian. John Benjamins Publishing, pages 131–154.
  • Mansikkaniemi, A. and M. Kurimo. 2012. Unsupervised vocabulary adaptation for morph-based language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the Ngram Model? On the Future of Language Modeling for HLT, pages 37–40. Association for Computational Linguistics.
  • Matras, Y. and J. Sakel. 2007. Grammatical borrowing in cross-linguistic perspective, volume 38. Walter de Gruyter. Molina, G., F. AlGhamdi, M. Ghoneim, A. Hawwari, N. Rey-Villamizar, M. Diab, and T. Solorio. 2016. Overview for the second shared task on language identification in code-switched data. In Proceedings of the Second Workshop on Computational Approaches to Code Switching, pages 40– 49, Austin, Texas, November. Association for Computational Linguistics.
  • Núñez Nogueroles, E. E. 2018. A comprehensive definition and typology of anglicisms in present-day Spanish. Epos: Revista de filología, (34):211–237.
  • Onysko, A. 2007. Anglicisms in German: Borrowing, lexical productivity, and written codeswitching, volume 23. Walter de Gruyter.
  • Palen-Michel, C., N. Holley, and C. Lignos. 2021. SeqScore. https://github.com/bltlab/seqscore.
  • Phang, J., T. Févry, and S. R. Bowman. 2019. Sentence encoders on stilts: Supplementary training on intermediate labeleddata tasks. arXiv preprint 1811.01088.
  • Poplack, S., D. Sankoff, and C. Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics, 26(1):47–104.
  • Pratt, C. 1980. El anglicismo en el español peninsular contemporáneo, volume 308. Gredos.
  • Rodríguez González, F. 1999. Anglicisms in contemporary Spanish. An overview. Atlantis, 21(1/2):103–139.
  • Serigos, J. R. L. 2017. Applying corpus and computational methods to loanword research: new approaches to Anglicisms in Spanish. Ph.D. thesis, The University of Texas at Austin.
  • Solorio, T., E. Blair, S. Maharjan, S. Bethard, M. Diab, M. Ghoneim, A. Hawwari, F. AlGhamdi, J. Hirschberg, A. Chang, and P. Fung. 2014. Overview for the first shared task on language identification in code-switched data. In Proceedings of the First Workshop on Computational Approaches to Code Switching, pages 62–72, Doha, Qatar, October. Association for Computational Linguistics.
  • Thomason, S. G. and T. Kaufman. 1992. Language contact, creolization, and genetic linguistics. Univ of California Press.
  • Tsvetkov, Y. and C. Dyer. 2016. Crosslingual bridges with models of lexical borrowing. Journal of Artificial Intelligence Research, 55:63–93.
  • Weinreich, U. 1963. Languages in contact (1953). The Hague: Mouton.
  • Rodríguez González, F. 2002. Spanish. In M. Görlach, editor, English in Europe. Oxford University Press, chapter 7, pages 128–150.