Automatic generation of descriptive related work reports

  1. ABURA'ED, AHMED GHASSAN TAWFIQ
Dirigée par:
  1. Horacio Saggion Directeur/trice

Université de défendre: Universitat Pompeu Fabra

Fecha de defensa: 16 octobre 2020

Jury:
  1. Elena Lloret Pastor President
  2. Iria da Cunha Fanego Secrétaire
  3. Leïla Kosseim Rapporteur

Type: Thèses

Teseo: 638113 DIALNET

Résumé

A related work report is a text which integrates key information from a list of related scientific papers providing context to the work being presented. Related work reports can either be {\bf descriptive} or {\bf integrative}. Integrative related work reports provide a high-level overview and critique of the scientific papers by comparing them with each other, providing fewer details of individual studies. Descriptive related work reports, instead, provide more in-depth information about each mentioned study providing information such as methods and results of the cited works. In order to write a related work report, scientist have to identify, condense/summarize, and combine relevant information from different scientific papers. However, such task is complicated due to the available volume of scientific papers. In this context, the automatic generation of related work reports appears to be an important problem to tackle. The automatic generation of related work reports can be considered as an instance of the multi-document summarization problem where, given a list of scientific papers, the main objective is to automatically summarize those scientific papers and generate related work reports. In order to study the problem of related work generation, we have developed a manually annotated, machine readable data-set of related work sections, cited papers (e.g. references) and sentences, together with an additional layer of papers citing the references. We have also investigated the relation between a citation context in a citing paper and the scientific paper it is citing so as to properly model cross-document relations. Moreover, we have also investigated the identification of explicit and implicit citations to a given scientific paper which is an important task in several scientific text mining activities such as citation purpose identification, scientific opinion mining, and scientific summarization. We present both extractive and abstractive methods to summarize a list of scientific papers by utilizing their citation network. The extractive approach follows three stages: scoring the sentences of the scientific papers based on their citation network, selecting sentences from each scientific paper to be mentioned in the related work report, and generating an organized related work report by grouping together the sentences of the scientific papers that belong to the same topic. On the other hand, the abstractive approach attempts to generate citation sentences to be included in a related work report, taking advantage of current sequence-to-sequence neural architectures and resources that we have created specifically for this task. The thesis also presents and discusses automatic and manual evaluation of the generated related work reports showing the viability of the proposed approaches.