Open Access Open Access  Restricted Access Subscription or Fee Access

An Improved Focussed Web Crawler Algorithm: A Survey

Rachna Singh Thakur, Dr. Pragya Shukla, Nilima Karankar

Abstract


As the rapid growth of information added in the World Wide Web the challenges are also increasing in the Focussed Crawler. In this paper we present a survey of Focussed Crawler approaches and their limitations which were published past few years.Also discussed the Focussed Crawler architecture and its type.By taken into account the challenges and issues of some approaches we proposed Focussed crawler algorithm for improving the relevance prediction.As a result search system which is a pre-evaluation of the search results omitted by the search engine is developed. In addition of that is a classification problem where the classification is performed for finding the most nearer URLs which contains the user interest data.


Keywords


Focussed Web Crawler, Search Engine, Uniform Resource Locator.

Full Text:

PDF

References


F. Menczer, G. Pant, P. Srinivasan, Topical web crawlers: evaluating adaptive algorithms, ACM Transactions on Internet Technology (TOIT) 4 (4) (2004)378–419.

G. Salton, A. Wong, C.S. Yang, A vector space model for automatic indexing, Communications of the ACM 18 (11) (1975) 613–620.

M. Ehrig, A. Maedche, Ontology-focused crawling of web documents, in: Proceedings of the Symposium on Applied Computing (SAC 2003), March9–12, 2003.

A. Hliaoutakis, G. Varelas, E. Voutsakis, E.G.M. Petrakis, E. Milios,Information retrieval by semantic similarity, International Journal on Semantic Web and Information Systems (IJSWIS) 3 (3) (2006) 55– (Special issue of multimedia semantics).

G.Pant, P.Srinivasan, Learning to crawl: comparing classification schemes, ACM Transactions on Information Systems (TOIS) 23 (4) (2005) 430–462.

M. Diligenti, F. Coetzee, S. Lawrence, C. Giles, M. Gori, Focused crawling using context graphs, in: Proceedings of the 26th International Conference on Very Large Databases (VLDB 2000), 2000, pp. 527–534.

H. Liu, J. Janssen, E. Milios, Using HMM to learn user browsing patterns for focused web crawler.

S. Chakrabarti, M. Berg, and B. Dom, “Focused Crawling: A New Approach for Topic Specific Resource Discovery”, In Journal of Computer and Information Science, vol. 31, no. 11-16, pp. 1623-1640,1999 [9] H. Michael, J. Michal, M. Yoelle, P. Dan, S.

Menachem, and U. Sigalit, “The Shark-Search Algorithm - An Application: Tailored Web Site Mapping”, In Computer Networks and ISDN Systems, vol. 30, no 1-7, pp. 317-326, 1998.

M. Najork and J. L. Wiener. Breadth-first crawling yields high-quality pages. In WWW’01: Proceedings of the 10th international conference on World Wide Web, pages 114–118,New York, NY, USA, 2001. ACM.

Effective Focused Crawling Based on Content and Link Structure Analysis Anshika Pal, Deepak Singh Tomar, S.C. Shrivastava (IJCSIS) International Journal of Computer Science and Information Security,Vol. 2, No. 1, June 2009

A Focused Crawler Based on Naive Bays Classifier Wenxian Wang, Xingshu Chen*, Yongbin Zou Third International Symposium on Intelligent Information Technology and Security Informatics 2010 IEEE

Mejdl S. Safran, Abdullah Althagafi and Dunren Che Improving Relevance Prediction for Focused Web Crawlers 2012 IEEE/ACIS 11th International Conference on Computer and Information Science

Priority based Semantic Web Crawler International Journal of Computer Applications (0975 – 8887) Volume 81 – No 15, November 2013

D. Taylan, M. Poyraz, S. Akyokus, and M. C. Ganiz, “Intelligent Focused Crawler: Learning Which Links to Crawl”, In Proc. Intelligent Systems and Applications Conference (INISTA) , pp. 504-508, 2011.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.