Open Access Open Access  Restricted Access Subscription or Fee Access

Enhancing HiveQL Engine Using Map-Join-Reduce

Amruta Kulkarni, Shweta Dharmadhikari

Abstract


Today we are facing information explosion. It brings us the challenge of huge data handling system. Hive is a data warehouse infrastructure based on Hadoop platform. It provides mechanism of huge data organization, extraction methods of data using MapReduce and analysis of large data sets stored in HDFS system.

HiveQL is a query language for Hive for data extraction. It also allows to plug-in custom MapReduce function in addition with traditional MapReduce functionality. This HiveQL MapReduce is under consideration for MapJoinReduce enhancement. This will lead us for detailed study of performance improvement. MapReduce processing strategy frequently checkpoints and shuffles intermediate results data. MapReduce can be made more scalable and efficient by improving the intermediate data handling strategy.

 Proposed solution is Map-Join-Reduce. Map-Join-Reduce simplifies the data handling mechanism by removing burden of presenting complex join algorithm. We will first present the UML class diagrams for HiveQL Engine. These diagrams will en-light the HiveQL query execution process. We will present debugging issues for Hive system for reverse engineering and Hive build patch given for errors. Finally we will see propose solution for Map-Join-Reduce.


Keywords


Hadoop, Hive, HiveQL

Full Text:

PDF

References


“MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters ”Dawei Jiang, Anthony K. H. Tung, and Gang Chen. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 9, SEPTEMBER 2011

For Java methos http://www.tutorialspoint.com/java/lang/java_lang_runtime.htm

DeveloperGuide -Apache Hive -Apache Software Foundation https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide

For Hive building https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ

For Hive debugging https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode

For Hive related queries and fixes http://stackoverflow.com/

Previously published journal in Ciit intyernational Journal 2013 /http://www.ciitresearch.org

Previously publish paper in IPGCON Conference 2015 http://avcoe.org/iPGCON2015/ipgcon2015.html

Hadoop In Action,Chuck Lam,Volume 1

“A Comparison of Join Algorithms for Log Processing in MapReduce”, Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao,

Eugene J. Shekita, Yuanyuan Tian, SIGMOD’10, June 6–11, 2010, Indianapolis, Indiana, USA. Copyright 2010 ACM 978-1-4503-0032-2/10/06.

“Optimizing Joins in a Map-Reduce Environment”, Foto N. Afrati, Jeffrey D. 22-26, 20010

For class digram http://www.genmymodel.com/class-diagram-online


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.