Evaluación del clustering de páginas web mediante funciones de peso y combinación heurística de criterios

  1. Casillas Rubio, Arantza
  2. Fresno Fernández, Víctor
  3. Martínez Unanue, Raquel
  4. Montalvo Herranz, Soto
Aldizkaria:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2005

Zenbakia: 35

Orrialdeak: 417-424

Mota: Artikulua

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

Web page clustering can help in the evaluation and search of the results of search engines, among other things. The different term weighting functions applied to the selected features to represent web pages is a main aspect in clustering task. In this paper, seven different term weighting functions are evaluated by means of the results of a partitioning clustering algorithm, with a reference web page collection. In addition, two feature reduction methods are applied. Five of them are well-known term weighting functions from text content analysis; the other two are based on a heuristic criteria combination, which consider HTML mark-up information. These two representations have been proposed in previous works by one of the authors. We have verified that the best results are obtained when the term weighting function based on a fuzzy criteria combination is used