Open Access Open Access  Restricted Access Subscription or Fee Access

PageRank using MapReduce - an Open-Source Framework for Processing Large Data Sets

N. Rehna, N. Minni, F. Jasmine Natchial

Abstract


MapReduce is simple data-parallel programming model designed for scalability and fault-tolerance and for processing and generating large data sets. It was initially created by Google for simplifying the development of large scale web search applications in data centers and has been proposed to form the basis of a ‘Data center computer’ . Many real world tasks are expressible in this model. In this paper, a PageRank Algorithm is introduced for a hyperlink graph using MapReduce technique illustrated for a random web surfer. This algorithm computes the PageRank of several web pages which is distributed in the cloud. In this work, the Hyperlink Graph Page Rank(HGPR) algorithm is developed, using which the PageRanks can be computed and thereafter the most visited webpages can be traced out. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. The implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable. A typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use.

Keywords


Adjacency List, Cloud Computing, Dampling factor, HGPR Algorithm, MapReduce, PageRank(PR)

Full Text:

PDF

References


Dean, J., & Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters (2004). Google, Inc. Gottfrid, D., 2004.

Google’s MapReduce programming model Revisited,Ralf Lammel, July 2007.

Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer University of Maryland, College Park Manuscript prepared April 11, 2010

HaLoop: Efficient Iterative Data Processing on Large Clusters, Yingyi Bu, Bill Howe, Magdalena Balazinska and Michael D. Ernst, Department of Computer Science and Engineering, University of Washington, Seattle, WA, U.S.A, 2010

Above the Clouds: A Berkeley View of Cloud Computing, Michael Armbrust,Armando Fox, Rean, Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia, Electrical Engineering and Computer Sciences, University of California at Berkeley,February 10, 2009

MapReduce Online,Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein,Khaled Elmeleegy, Russell Sears, Oct 2009.

MapReduce in the Clouds for. Science. Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox, CloudCom 2010.

Cluster Computing at a Glance Mark Bakery and Rajkumar Buyyaz, July 2010

Software Scalability with MapReduce Craig Henderson April 2010

http://www.ams.org/featurecolumn/archive/pagerank.html


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.