Enhancing HiveQL Engine Using Map-Join-Reduce
Abstract
Today we are facing information explosion. It brings us the challenge of huge data handling system. Hive is a data warehouse infrastructure based on Hadoop platform. It provides mechanism of huge data organization, extraction methods of data using MapReduce and analysis of large data sets stored in HDFS system.
HiveQL is a query language for Hive for data extraction. It also allows to plug-in custom MapReduce function in addition with traditional MapReduce functionality. This HiveQL MapReduce is under consideration for MapJoinReduce enhancement. This will lead us for detailed study of performance improvement. MapReduce processing strategy frequently checkpoints and shuffles intermediate results data. MapReduce can be made more scalable and efficient by improving the intermediate data handling strategy.
Proposed solution is Map-Join-Reduce. Map-Join-Reduce simplifies the data handling mechanism by removing burden of presenting complex join algorithm. We will first present the UML class diagrams for HiveQL Engine. These diagrams will en-light the HiveQL query execution process. We will present debugging issues for Hive system for reverse engineering and Hive build patch given for errors. Finally we will see propose solution for Map-Join-Reduce.
Keywords
Full Text:
PDFReferences
“MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters ”Dawei Jiang, Anthony K. H. Tung, and Gang Chen. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 9, SEPTEMBER 2011
For Java methos http://www.tutorialspoint.com/java/lang/java_lang_runtime.htm
DeveloperGuide -Apache Hive -Apache Software Foundation https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide
For Hive building https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
For Hive debugging https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode
For Hive related queries and fixes http://stackoverflow.com/
Previously published journal in Ciit intyernational Journal 2013 /http://www.ciitresearch.org
Previously publish paper in IPGCON Conference 2015 http://avcoe.org/iPGCON2015/ipgcon2015.html
Hadoop In Action,Chuck Lam,Volume 1
“A Comparison of Join Algorithms for Log Processing in MapReduce”, Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao,
Eugene J. Shekita, Yuanyuan Tian, SIGMOD’10, June 6–11, 2010, Indianapolis, Indiana, USA. Copyright 2010 ACM 978-1-4503-0032-2/10/06.
“Optimizing Joins in a Map-Reduce Environment”, Foto N. Afrati, Jeffrey D. 22-26, 20010
For class digram http://www.genmymodel.com/class-diagram-online
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.