A Multi Criteria Function to Concept Extraction in HTML Environment
- A. RIBEIRO
- FRESNO FERNANDEZ, VICTOR DIEGO
Verlag: CSREA Press
ISBN: 1-892512-82-3
Datum der Publikation: 2001
Seiten: 1-6
Art: Konferenz-Beitrag
Zusammenfassung
The core of Internet and the World Wide Web revolution is the capacity to efficiently share the huge quantity of data. But the rapid and chaotic growth of the Net has extremely complicated the task of share or mining useful information. Each inference process, from Internet information, requires an adequate characterization of the Web pages. The textual part of a page is one of the most important aspects that should be considered to appropriately perform a page characterization. The textual characterization should be made through the extraction of an appropriate set of relevant concepts that represent properly the included text in the Web page. This paper presents a method, essentially based on the extraction of characteristics in the HTML language, to obtain a set of relevant concepts from a Web page. In addition, to prove the validity of the proposed approach a comparative study is shown. It exhibits a higher quality in the representations generated by the proposed method versus a commercial tool.