Augmenting Web Page Classifiers with Social Annotations

  1. Zubiaga, Arkaitz
  2. Martínez Unanue, Raquel
  3. Fresno Fernández, Víctor
Revue:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Année de publication: 2011

Número: 47

Pages: 189-196

Type: Article

D'autres publications dans: Procesamiento del lenguaje natural

Résumé

The lack of representative textual content in many web documents suggests the study of additional metadata to improve web page classification tasks. Social bookmarking sites provide an accessible way to increase available metadata in large amounts with user-provided annotations. This field remains relatively unexplored. In this work, we analyze the usefulness of social annotations for web page classification. We evaluate the results on two different categorization levels, and analyze their suitability for home and deeper pages. We conclude that social annotations could enhance web page classifiers in multiple cases, and we present a method to get the most out of them using classifier committees.

Références bibliographiques

  • Aliakbary, Sadegh, Hassan Abolhassani, Hossein Rahmani, and Behrooz Nobakht. 2009. Web page classification using social tags. IEEE Intl. Conf. on Computational Science and Engineering, 4:588–593.
  • Fisher, Michelle and Richard Everson. 2003. When are links useful? experiments in text classification. In Fabrizio Sebastiani, editor, Advances in Information Retrieval, volume 2633 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pages 547–547.
  • Golder, Scott and Bernardo A. Huberman. 2006. The structure of collaborative tagging systems. Journal of Information Science, 32(2), pages 198–208.
  • Heymann, Paul, Georgia Koutrika, and Hector Garcia-Molina. 2008. Can social bookmarking improve web search? In WSDM ’08, pages 195–206, New York, NY, USA. ACM.
  • Noll, Michael G. and Christoph Meinel. 2008a. Exploring social annotations for web document classification. In Proc. of the 2008 ACM Symposium on Applied Computing, pages 2315–2320, Fortaleza, Ceara, Brazil. ACM.
  • Noll, Michael G. and Christoph Meinel. 2008b. The metadata triumvirate: Social annotations, anchor texts and search queries. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM International Conference on, volume 1, pages 640–647.
  • Qi, Xiaoguang and Brian D. Davison. 2009. Web page classification: Features and algorithms. ACM Computing Surveys, 41:12:1–12:31, February.
  • Ramage, Daniel, Paul Heymann, Christopher D. Manning, and Hector Garcia-Molina. 2009. Clustering the tagged web. In Proc. of the Second ACM Intl. Conference on Web Search and Data Mining, pages 54–63, Barcelona, Spain. ACM.
  • Sun, Bing-Yu, De-Shuang Huang, Lin Guo, and Zhong-Qiu Zhao. 2004. Support vector machine committee for classification. In Advances in Neural Networks - ISNN 2004, pages 648–653.
  • Weston, J. and C. Watkins. 1999. Multiclass support vector machines. In Proc. of the 1999 European Symposium on Artificial Neural Networks.
  • Yeung, Ching Man Au, Nicholas Gibbins, and Nigel Shadbolt. 2008. Web search disambiguation by collaborative tagging. In Proc. of the Workshop on Exploring Semantic Innotations in Information Retrieval at ECIR’08, pages 48–61, March.
  • Zhou, Ding, Jiang Bian, Shuyi Zheng, Hongyuan Zha, and C. Lee Giles. 2008. Exploring social annotations for information retrieval. In Proc. of the 17th international conference on World Wide Web, pages 715–724, Beijing, China. ACM.
  • Zubiaga, Arkaitz, Raquel Martínez, and Víctor Fresno. 2009. Getting the most out of social annotations for web page classification. In DocEng ’09: Proc. of the 9th ACM symposium on Document Engineering, pages 74–83, New York, NY, USA. ACM.