Paraphrasing, textual entailment, and semantic similarity above word level
- Kovatchev, Venelin Orlinov
- María Antonia Martí Antonín Director/a
- Maria Salamó Llorente Codirector/a
Universitat de defensa: Universitat de Barcelona
Fecha de defensa: 09 de de juliol de 2020
- Julio Gonzalo Arroyo President
- Maite López Sánchez Secretari/ària
- Eduardo Blanco Villar Vocal
Tipus: Tesi
Resum
This dissertation explores the linguistic and computational aspects of the meaning relations that can hold between two or more complex linguistic expressions (phrases, clauses, sentences, paragraphs). In particular, it focuses on Paraphrasing, Textual Entailment, Contradiction, and Semantic Similarity. This thesis is composed of seven different articles and is divided into three thematic Parts. In Part I: “Similarity at the Level of Words and Phrases”, I study the Distributional Hypothesis (DH). DH is central for most contemporary approaches for automatic processing of meaning and meaning relations within Computational Linguistics (CL) and Natural Language Processing (NLP). Part I of this thesis explores different methodologies for quantifying semantic similarity at the levels of words and short phrases. I measure the importance of the corpus size and the role of linguistic preprocessing. I also show that (lexical) semantic similarity can interact with syntactic-based compositional rules and result in productive patterns at the phrase level. The research in Part I resulted in the publication of two articles. In Part II: “Paraphrase Typology and Paraphrase Identification”, I focus on the meaning relation of paraphrasing and the empirical task of automated Paraphrase Identification (PI). Paraphrasing is one of the most widely studied meaning relation both in theoretical and practical research. PI is among the most popular tasks in CL and NLP. In Part II of this thesis I present: 1) EPT: a new typology of the linguistic and reason-based phenomena involved in paraphrasing; 2) WARP-Text: a new web-based annotation interface capable of annotating paraphrase types; 3) ETPC: the largest corpus to date to be annotated with paraphrase types; and 4) a qualitative evaluation framework for automated PI systems. The findings presented in Part II provide in-depth knowledge on the nature of the paraphrasing relation and improve the evaluation, interpretation, and error analysis in the task of PI. The research in Part II resulted in the publication of three articles. In Part III:“Paraphrasing, Textual Entailment, and Semantic Similarity”, I present a novel direction in the research on textual meaning relations, resulting from joint research carried out on on paraphrasing, textual entailment, contradiction, and semantic similarity. Traditionally, these meaning relations are studied in isolation and the transfer of knowledge and resources between them is limited. In Part III of this thesis I present: 1) a methodology for the creation and annotation of corpora containing multiple textual meaning relations; 2) the first corpus annotated independently with Paraphrasing, Textual Entailment, Contradiction, Textual Specificity, and Semantic Similarity; 3) a statistical corpus-based analysis of the interactions, correlations, and overlap between the different meaning relations; 4) SHARel - a shared typology of textual meaning relations; 5) a corpus of paraphrasing, textual entailment, and contradiction annotated with SHARel. Part III of the thesis gives a new perspective on the research of textual meaning relations. I show that a joint study of multiple meaning relations is both possible and beneficial for processing and analyzing each individual relation. I provide the first empirical data on the interactions between paraphrasing, textual entailment, contradiction, and semantic similarity. The research in Part III resulted in the publication of two articles. This thesis has advanced our understanding of important issues associated with the empirical analysis, corpus annotation, and computational treatment of textual meaning relations. I have addressed existing gaps in the research field, posed new research questions, and explored novel research directions. The findings and resources presented in this dissertation have been released to the community to facilitate further research and knowledge transfer.