DiSeg: Un segmentador discursivo automático para el español

Cunha Fanego, Iria da; SanJuan, Éric; Torres Moreno, Juan Manuel; Lloberas, Marina; Castellón Masalles, Irene

DiSegUn segmentador discursivo automático para el español

Cunha Fanego, Iria da
SanJuan, Éric
Torres Moreno, Juan Manuel
Lloberas, Marina
Castellón Masalles, Irene

Revista:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2010

Número: 45

Páginas: 145-152

Tipo: Artículo

DIALNET GOOGLE SCHOLAR RUA editor

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Hoy en día el análisis discursivo automático es un tema de investigación relevante. Sin embargo, no existen analizadores del discurso para textos en español. El primer paso para desarrollar esta herramienta es la segmentación discursiva. En este artículo presentamos DiSeg, el primer segmentador discursivo para el español que utiliza el marco de la Rhetorical Structure Theory (Mann y Thompson, 1988) y se basa en reglas léxicas y sintácticas. Describimos el sistema y evaluamos sus resultados con un corpus gold standard, obteniendo resultados prometedores.

Referencias bibliográficas

Afantenos, S., P. Denis, P. Muller y, L. Danlos (2010). “Learning Recursive Segments for Discourse Parsing”. En Proceedings of the Seventh conference on International Language Resources and Evaluation.
Alonso, L. 2005. “Representing discourse for automatic text summarization via shallow NLP techniques”. Tesis doctoral. Barcelona: Universitat de Barcelona. Atserias, J., B. Casas, E. Comelles, M. González, Ll. Padró, y M. Padró. 2006. “FreeLing 1.3: Syntactic and semantic services in an open-source NLP library”. En Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), ELRA.
Carlson, L. y D. Marcu. 2001. Discourse Tagging Reference Manual. ISI Technical Report ISITR-545. Los Ángeles: University of Southern California. da Cunha, I. y M. Iruskieta (en prensa). “Comparing rhetorical structures of different languages: The influence of translation strategies”. Discourse Studies 12(5).
Dale, R., E. Hovy, D. Rösner, y O. Stock (Eds.). 1992. Aspects of Automated Natural Language Generation. Berlín: Springer. Ghorbel, H., A. Ballim, y G. Coray. 2001. “ROSETTA: Rhetorical and Semantic Environment for Text Alignment”. En P. Rayson, A. Wilson, A. M. McEnery, A. Hardie, y S. Khoja (Eds.). Proceedings of Corpus Linguistics 2001. 224-233. Lancaster, UK.
Hovy, E. 1993. “Automated discourse generation using discourse structure relations”. Artificial Intelligence, 63. 341-385. Mann, W.C. y S.A. Thompson. 1988. “Rhetorical structure theory: Toward a functional theory of text organization”. Text, 8(3): 243-281.
Marcu, D. 2000a. The Theory and Practice of Discourse Parsing Summarization. Massachusetts: Institute of Technology.
Marcu, D. 2000b. “The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach”. Computational Linguistics, 26(3): 395-448.
Marcu, D., L. Carlson, y M. Watanabe. 2000. “The automatic translation of discourse structures”. En Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00). Vol. 1. 9-17. Seattle, Washington.
Mazeiro, E.G. y T.A.S. Pardo. 2009. “Metodologia de avaliação automática de estruturas retóricas”. En Proceedings of the 7th Brazilian Symposium in Information and Human Language Technology (STIL). São Carlos, Brasil: Universidade de São Paulo. Erick G. Mazeiro, Thiago A.S. Pardo and Maria das Graças V. Nunes. 2007. “Identificação automática de segmentos discursivos: o uso do parser PALAVRAS”. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional (NILC). São Carlos, São Paulo.
O’Donnell, M. 2000. “RSTTOOL 2.4 – A Markup Tool for Rhetorical Structure Theory”. En Proceedings of the International Natural Language Generation Conference. 253-256.
O'Donnell, M., C. Mellish, J. Oberlander, y A. Knott. 2001. “ILEX: An architecture for a dynamic Hypertext generation system”. Natural Language Engineering, 7. 225-250.
Pardo, T.A.S. y L.H.M. Rino. 2002. “DMSumm: Review and assessment”. En Proceedings of Advances in Natural Language Processing, Third International Conference (PorTAL 2002). 263-274. Faro, Portugal: Springer.
Pardo, T.A.S., M.G.V. Nunes, y L.H.M. Rino. 2004. “DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese”. Lecture Notes in Artificial Intelligence, 3171: 224-234.
T.A.S. Pardo y M.G.V. Nunes. 2008. “On the Development and Evaluation of a Brazilian Portuguese Discourse Parser”. Journal of Theoretical and Applied Computing, 15(2): 43-64.
Radev, D. 2000. “A common theory of information fusion from multiple text sources. Step one: Cross document structure”. En L. Dybkjær, K. Hasida and D. Traum (Eds.). Proceedings of 1st SIGdial Workshop on Discourse and Dialogue. 74-83. Hong-Kong. Soricut, R. y D. Marcu. 2003. “Sentence Level Discourse Parsing Using Syntactic and Lexical Information”. En Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 149-156. Edmonton, Canada.
Sumita, K., K. Ono, T. Chino, T. Ukita, y S. Amano. 1992. “A discourse structure analyzer for Japonese text”. En Proceedings of the International Conference on Fifth Generation Computer Systems. 1133-1140.
Taboada, M. Y W.C. Mann. 2005. “Applications of rhetorical structure theory”. Discourse Studies, 8(4): 567-588.
Tofiloski, M., J. Brooke y M. Taboada. 2009. “A Syntactic and Lexical-Based Discourse Segmenter”. En Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. Singapur.

Fuente de los datos: Dialnet