Open Access Open Access  Restricted Access Subscription or Fee Access

Dynamic Allocation of Cloud Resources for Parallel Data Processing

C. Sreejith, J.P. Angel Rajula

Abstract


Cloud computing is the access to computers and their
functionality via the Internet. Cloud computing paradigm makes the computing be assigned in a great number of distributed computers, rather than local computer or remote server. The character of cloud
computing is in the virtualization, distribution and dynamic
extendibility. Infrastructure as a Service (IaaS) cloud computing focuses on providing a computing infrastructure that leverages system virtualization to allow multiple Virtual Machines (VM) to be
consolidated on one Physical Machine (PM) where VMs often
represent components of Application Environments (AE).Ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing, making it easy for customers to access these
services and to deploy their programs. The processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for
big parts of the submitted job and unnecessarily increase processing time and cost. The objective of this paper is to explicitly exploit the dynamic resource allocation offered by today’s IaaS clouds for both,
task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution.


Keywords


Many-Task Computing, High-Throughput Computing, Loosely Coupled Applications, Cloud Computing.

Full Text:

PDF

References


R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver,

and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive

Data Sets. Proc. VLDB Endow., 1(2):1265–1276, 2008. 5

H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-

Merge: Simplified Relational Data Processing on Large Clusters. In

SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international

conference on Management of data, pages 1029–1040, New York, NY,

USA, 2007. ACM,6

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on

Large Clusters. In OSDI’04: Proceedings of the 6th conference on

Symposium on Opearting Systems Design & Implementation, pages 10–

, Berkeley, CA, USA, 2004. USENIX Association. 9

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad:

Distributed Data-Parallel Programs from Sequential Building Blocks. In

EuroSys ’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European

Conference on Computer Systems 2007, pages 59–72, New York, NY,

USA, 2007. ACM.14

I. Raicu, I. Foster, and Y. Zhao. Many-Task Computing for Grids and

Supercomputers. In Many-Task Computing on Grids and

Supercomputers, 2008. MTAGS 2008. Workshop on, pages 1–11, Nov.

20

H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-

Merge: Simplified Relational Data Processing on Large Clusters. In

SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international

conference on Management of data, pages 1029–1040, New York, NY,

USA, 2007. ACM.25

Amazon Web Services LLC. Amazon Elastic MapReduce.

http://aws.amazon.com/elasticmapreduce/, 2009.2

E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G.

Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D.

S. Katz. Pegasus: A Framework for Mapping Complex Scientific

Workflows onto Distributed Systems. Sci. Program., 13(3):219–237,

10

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad:

Distributed Data-Parallel Programs from Sequential Building Blocks. In

EuroSys ’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European

Conference on Computer Systems 2007, pages 59–72, New York, NY,

USA, 2007. ACM.

Warneke, D and O. Kao, 2011. Exploiting Dynamicresource allocation

for efficient parallel data processing in the cloud. IEEE Trans. Parallel

Distributed Syst., 22: 985-997. DOI:10.1109/TPDS.2011.65

White, T., 2010. Hadoop: The Definitive Guide. 2ndEdn., O’Reilly

Media, Beijing, ISBN: 1449389732,pp: 600.

D. Battr´e, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke.

Nephele/PACTs: A Programming Model and Execution Framework for

Web-Scale Analytical Processing. In SoCC ’10: Proceedings of the

ACM Symposium on Cloud Computing 2010, pages 119– 130, New

York, NY, USA, 2010. ACM.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.