Clasificación de páginas web en dominio específico

  1. Rangel Pardo, Francisco Manuel
  2. Peñas Padilla, Anselmo
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Any de publicació: 2008

Número: 41

Pàgines: 89-96

Tipus: Article

Altres publicacions en: Procesamiento del lenguaje natural

Resum

This paper obtains a novel representation that provides high performance in the automatic classification of web pages in specific domains. For this the study is focused on obtaining a formal representation of the author's intent to convey information about the web pages that he creates and that is reflected in the meta-information of the same page, in the structure of links, and in the URL. A dataset has been built in the specific domain of theater and the approach presented has obtained a performance raiting, measured both by statistical F and by the interval committed error, higher than existing methods in the state of the art.