Clasificación de páginas web en dominio específico

  1. Rangel Pardo, Francisco Manuel
  2. Peñas Padilla, Anselmo
Aldizkaria:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2008

Zenbakia: 41

Orrialdeak: 89-96

Mota: Artikulua

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

This paper obtains a novel representation that provides high performance in the automatic classification of web pages in specific domains. For this the study is focused on obtaining a formal representation of the author's intent to convey information about the web pages that he creates and that is reflected in the meta-information of the same page, in the structure of links, and in the URL. A dataset has been built in the specific domain of theater and the approach presented has obtained a performance raiting, measured both by statistical F and by the interval committed error, higher than existing methods in the state of the art.