Algoritmo de clustering on-line utilizando metaheurísticas y técnicas de muestreo

Casillas Rubio, Arantza; Martínez Unanue, Raquel; González de Lena, Mª Teresa

Algoritmo de clustering on-line utilizando metaheurísticas y técnicas de muestreo

Casillas Rubio, Arantza
Martínez Unanue, Raquel
González de Lena, Mª Teresa

Revista:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2003

Número: 31

Páginas: 57-64

Tipo: Artículo

DIALNET GOOGLE SCHOLAR RUA editor

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Document clustering involves dividing a set of documents into separate clusters (subsets), so that the documents are similar to other documents in the same cluster, and less similars or different from documents in other clusters. In certain conditions the clustering is a computational expensive task, for example: working with a huge collection of documents without prior knowlegdge of the appropriate number of clusters. In addition, if it is necessary a solution in few seconds, the conventional methods of calculation of the optimum number of clusters are unacceptable. In this paper we propose an algorithm for clustering a set of documents, without prior knowlegdge of the appropriate number of clusters. The emphasis has been done in the reduction of the calculation time, reason why we be able to say that our algorithm can achieve a clustering on-line. Our algorithm combines the use of a global stopping rule, genetic algorithms, techniques of statistical sampling and one classic algorithm of clustering.

Fuente de los datos: Dialnet