LyricSIM: Un nuevo dataset y benchmark para la detección de similitud en letras de canciones en español

Benito-Santos, Alejandro; Ghajari, Adrián; Hernández, Pedro; Fresno Fernández, Víctor; Ros, Salvador; González-Blanco García, Elena

LyricSIMUn nuevo dataset y benchmark para la detección de similitud en letras de canciones en español

Benito-Santos, Alejandro
Ghajari, Adrián
Hernández, Pedro
Fresno Fernández, Víctor
Ros, Salvador
González-Blanco García, Elena

Journal:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2023

Issue: 71

Pages: 149-163

Type: Article

DIALNET GOOGLE SCHOLAR RUA editor

More publications in: Procesamiento del lenguaje natural

Abstract

In this paper, we present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators. After collecting and refining the data to ensure a high degree of consensus and data integrity, we obtained 676 high-quality annotated pairs that were used to evaluate the performance of various state-of-the-art monolingual and multilingual language models. Consequently, we established baseline results that we hope will be useful to the community in all future academic and industrial applications conducted in this context.

Bibliographic References

Abe, K., S. Yokoi, T. Kajiwara, and K. Inui. 2022. Why is sentence similarity benchmark not predictive of applicationoriented task performance? In Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, pages 70–87, Online, November. Association for Computational Linguistics.
Agerri, R. and E. Agirre. 2023. Lessons learned from the evaluation of Spanish Language Models. Procesamiento del Lenguaje Natural, 70:157–170, March.
Agirre, E., C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre, W. Guo, I. Lopez-Gazpio, M. Maritxalar, R. Mihalcea, G. Rigau, L. Uria, and J. Wiebe. 2015. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2015), pages 252–263, Denver, Colorado, June. Association for Computational Linguistics.
Agirre, E., C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre, W. Guo, R. Mihalcea, G. Rigau, and J. Wiebe. 2014. SemEval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 81–91, Dublin, Ireland, August. Association for Computational Linguistics.
Agirre, E., D. Cer, M. Diab, and A. Gonzalez-Agirre. 2012. SemEval- 2012 task 6: A pilot on semantic textual similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 385–393, Montr´eal, Canada, 7-8 June. Association for Computational Linguistics.
Agirre, E., D. Cer, M. Diab, A. Gonzalez- Agirre, and W. Guo. 2013. *SEM 2013 shared task: Semantic Textual Similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 32–43, Atlanta, Georgia, USA, June. Association for Computational Linguistics.
Cer, D., M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, Vancouver, Canada, August. Association for Computational Linguistics.
Chandrasekaran, D. and V. Mago. 2022. Evolution of Semantic Similarity – A Survey. ACM Computing Surveys, 54(2):1– 37, March.
Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzm´an, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised Crosslingual Representation Learning at Scale, April. arXiv:1911.02116.
da Silva, A. C. M., D. F. Silva, and R. M. Marcacini. 2020. 4mula: A multitask, multimodal, and multilingual dataset of music lyrics and audio features. In Proceedings of the Brazilian Symposium on Multimedia and the Web, WebMedia ’20, page 145–148, New York, NY, USA. Association for Computing Machinery.
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, volume 1, pages 4171–4186.
Fell, M. 2020. Natural language processing for music information retrieval : deep analysis of lyrics structure and content. phdthesis, Universit´e Cote d’Azur, May.
Gutierrez-Fandino, A., J. Armengol- Estap, M. Pamies, J. Llop-Palao, J. Silveira-Ocampo, C. P. Carrino, C. Armentano-Oller, C. Rodriguez- Penagos, A. Gonzalez-Agirre, and M. Villegas. 2022. MarIA: Spanish Language Models. Procesamiento del Lenguaje Natural, 68:39–60, March.
Haider, T., S. Eger, E. Kim, R. Klinger, and W. Menninghaus. 2020. PO-EMO: Conceptualization, annotation, and modeling of aesthetic emotions in German and English poetry. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1652–1663, Marseille, France, May. European Language Resources Association.
He, P., X. Liu, J. Gao, and W. Chen. 2021. DeBERTa: Decoding-enhanced BERT with Disentangled Attention, October. arXiv:2006.03654.
Krippendorff, K. 2004. Reliability in Content Analysis. Human Communication Research, 30(3):411–433.
Li, W., F. Qi, M. Sun, X. Yi, and J. Zhang. 2021. Ccpm: A chinese classical poetry matching dataset. arXiv preprint arXiv:2106.01979.
Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach, July. arXiv:1907.11692.
Reimers, N. and I. Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, August. arXiv:1908.10084.
Rosa, J. D. l., E. G. Ponferrada, M. Romero, P. Villegas, P. G. d. P. Salas, and M. Grandury. 2022. BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling. Procesamiento del Lenguaje Natural, 68:13–23, March.
Schuff, H., L. Vanderlyn, H. Adel, and N. T. Vu. 2023. How to do human evaluation: A brief introduction to user studies in NLP. Natural Language Engineering, pages 1–24, February.
Wolf, T., L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush. 2020. Transformers: Stateof- the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October. Association for Computational Linguistics.
Zhang, X., M. Sun, J. Liu, and X. Li. 2021. Optimal Embedding Calibration for Symbolic Music Similarity, September.

Data source: Dialnet