Clasificación de páginas web en dominio específico

  1. Rangel Pardo, Francisco Manuel
  2. Peñas Padilla, Anselmo
Procesamiento del lenguaje natural

ISSN: 1135-5948

Ano de publicación: 2008

Número: 41

Páxinas: 89-96

Tipo: Artigo

Outras publicacións en: Procesamiento del lenguaje natural


This paper obtains a novel representation that provides high performance in the automatic classification of web pages in specific domains. For this the study is focused on obtaining a formal representation of the author's intent to convey information about the web pages that he creates and that is reflected in the meta-information of the same page, in the structure of links, and in the URL. A dataset has been built in the specific domain of theater and the approach presented has obtained a performance raiting, measured both by statistical F and by the interval committed error, higher than existing methods in the state of the art.