DiSegUn segmentador discursivo automático para el español

  1. Cunha Fanego, Iria da
  2. SanJuan, Éric
  3. Torres Moreno, Juan Manuel
  4. Lloberas, Marina
  5. Castellón Masalles, Irene
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2010

Issue: 45

Pages: 145-152

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish that uses the framework of the Rhetorical Structure Theory (Mann and Thompson, 1988) and is based on lexical and syntactic rules. We describe the system and we evaluate its performance with a gold standard corpus, obtaining promising results.

Bibliographic References

  • Afantenos, S., P. Denis, P. Muller y, L. Danlos (2010). “Learning Recursive Segments for Discourse Parsing”. En Proceedings of the Seventh conference on International Language Resources and Evaluation.
  • Alonso, L. 2005. “Representing discourse for automatic text summarization via shallow NLP techniques”. Tesis doctoral. Barcelona: Universitat de Barcelona. Atserias, J., B. Casas, E. Comelles, M. González, Ll. Padró, y M. Padró. 2006. “FreeLing 1.3: Syntactic and semantic services in an open-source NLP library”. En Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), ELRA.
  • Carlson, L. y D. Marcu. 2001. Discourse Tagging Reference Manual. ISI Technical Report ISITR-545. Los Ángeles: University of Southern California. da Cunha, I. y M. Iruskieta (en prensa). “Comparing rhetorical structures of different languages: The influence of translation strategies”. Discourse Studies 12(5).
  • Dale, R., E. Hovy, D. Rösner, y O. Stock (Eds.). 1992. Aspects of Automated Natural Language Generation. Berlín: Springer. Ghorbel, H., A. Ballim, y G. Coray. 2001. “ROSETTA: Rhetorical and Semantic Environment for Text Alignment”. En P. Rayson, A. Wilson, A. M. McEnery, A. Hardie, y S. Khoja (Eds.). Proceedings of Corpus Linguistics 2001. 224-233. Lancaster, UK.
  • Hovy, E. 1993. “Automated discourse generation using discourse structure relations”. Artificial Intelligence, 63. 341-385. Mann, W.C. y S.A. Thompson. 1988. “Rhetorical structure theory: Toward a functional theory of text organization”. Text, 8(3): 243-281.
  • Marcu, D. 2000a. The Theory and Practice of Discourse Parsing Summarization. Massachusetts: Institute of Technology.
  • Marcu, D. 2000b. “The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach”. Computational Linguistics, 26(3): 395-448.
  • Marcu, D., L. Carlson, y M. Watanabe. 2000. “The automatic translation of discourse structures”. En Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00). Vol. 1. 9-17. Seattle, Washington.
  • Mazeiro, E.G. y T.A.S. Pardo. 2009. “Metodologia de avaliação automática de estruturas retóricas”. En Proceedings of the 7th Brazilian Symposium in Information and Human Language Technology (STIL). São Carlos, Brasil: Universidade de São Paulo. Erick G. Mazeiro, Thiago A.S. Pardo and Maria das Graças V. Nunes. 2007. “Identificação automática de segmentos discursivos: o uso do parser PALAVRAS”. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional (NILC). São Carlos, São Paulo.
  • O’Donnell, M. 2000. “RSTTOOL 2.4 – A Markup Tool for Rhetorical Structure Theory”. En Proceedings of the International Natural Language Generation Conference. 253-256.
  • O'Donnell, M., C. Mellish, J. Oberlander, y A. Knott. 2001. “ILEX: An architecture for a dynamic Hypertext generation system”. Natural Language Engineering, 7. 225-250.
  • Pardo, T.A.S. y L.H.M. Rino. 2002. “DMSumm: Review and assessment”. En Proceedings of Advances in Natural Language Processing, Third International Conference (PorTAL 2002). 263-274. Faro, Portugal: Springer.
  • Pardo, T.A.S., M.G.V. Nunes, y L.H.M. Rino. 2004. “DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese”. Lecture Notes in Artificial Intelligence, 3171: 224-234.
  • T.A.S. Pardo y M.G.V. Nunes. 2008. “On the Development and Evaluation of a Brazilian Portuguese Discourse Parser”. Journal of Theoretical and Applied Computing, 15(2): 43-64.
  • Radev, D. 2000. “A common theory of information fusion from multiple text sources. Step one: Cross document structure”. En L. Dybkjær, K. Hasida and D. Traum (Eds.). Proceedings of 1st SIGdial Workshop on Discourse and Dialogue. 74-83. Hong-Kong. Soricut, R. y D. Marcu. 2003. “Sentence Level Discourse Parsing Using Syntactic and Lexical Information”. En Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 149-156. Edmonton, Canada.
  • Sumita, K., K. Ono, T. Chino, T. Ukita, y S. Amano. 1992. “A discourse structure analyzer for Japonese text”. En Proceedings of the International Conference on Fifth Generation Computer Systems. 1133-1140.
  • Taboada, M. Y W.C. Mann. 2005. “Applications of rhetorical structure theory”. Discourse Studies, 8(4): 567-588.
  • Tofiloski, M., J. Brooke y M. Taboada. 2009. “A Syntactic and Lexical-Based Discourse Segmenter”. En Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. Singapur.