Open Access Open Access  Restricted Access Subscription or Fee Access

Fault-Tolerant Scheduling Techniques for Computational Grid

S. Uma Maheswari, A. Shamila Ebenezer

Abstract


Besides the dynamic nature of grids which means that resources may enter and leave the grid at any time, in many cases outside of the applications control, grid resources are also heterogeneous in nature. Many grid applications will be running in environments where interaction faults are Fault more likely to occur between disparate grid nodes. As resources may also be used outside of organizational boundaries, it becomes increasingly difficult to guarantee that a resource being used is not malicious. Due to the diverse faults and failure conditions, developing, deploying, and executing long running applications over the grid remains a challenge. So Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. One Motivation of Grid computing is to aggregate the power of widely distributed resources, and provide non-trivial services to users. To achieve this goal, an efficient Grid scheduling system is an essential part of the Grid. This paper presents an extensive survey of different fault tolerant scheduling Technique such as Distributed Fault Tolerant Scheduling (DFTS) algorithm, Volunteer Availability based Tolerant Scheduling (VAFTS) algorithm, A Reliability Cost Driven (RCD) Scheduling, A Dynamic Reliability-Cost-Driven (DRCD) Scheduling Algorithm, An Efficient fault-tolerant scheduling algorithm (eFRD), contention-aware fault-tolerant (CAFT) scheduling algorithm, eFRCD (efficient Fault-tolerant Reliability Cost Driven Algorithm).

Keywords


Fault, Fault-tolerance, Fault Tolerant Scheduling, Single Resource Manager, Job placement, Replica management.

Full Text:

PDF

References


Qin Zheng, Bharadwaj Veeravalli,Chen-Khong Tham , On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs”, IEEE Transactions On Computers, VOL.58, NO.3, MARCH 2009.

s.Siva Satya, k.Syam Babu,”Survey of fault tolerant techniques for grid”, Computer Science Review, Volume 4, Issue 2, May 2010, Pages 101-120.

Sriram Krishnan, Dennis Gannon, “Checkpoint and restart for distributed components in XCAT3”, in: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Comp. GRID’’, 2004.

Arash Baratloo, Mehmet Karaul, Zvi Kedem, Peter Wyckoff, Charlotte: “Metacomputing on the Web”, in: Proceedings of the 9th International Conference on Parallel and Distributed Computing Systems, PDCS, 1996.

L.F. Lau, A.L. Ananda, G. Tan, W.F. Wong, Gucha: “Internet-based parallel computing using Java”. ICA3PP, December 2000, pp. 397–408.

SungJin Choi, MaengSoon Baik, ChongSun Hwang, JoonMin Gil, HeonChang Yu,” Volunteer availability based fault tolerant scheduling mechanism in desktop grid computing environment”, in: Proceedings of the Third IEEE International Symposium on Network Computing and Applications, NCA, 2004.

J.H. Abawajy, “Fault-tolerant scheduling policy for grid computing systems”, in: Proceedings of the 18th International Parallel and Distributed Processing Symposium–IEEE, 2004.

Xiao Qin, Hong Jiang, “A Dynamic and Reliability-Driven Scheduling Algorithm for Parallel Real-time Jobs on Heterogeneous Clusters,” IEEE Trans. On Parallel and Distributed Systems, Proc. of 2001 Int’l Conference on Parallel Processing (ICPP2001), pages 113-122, September 2001.

Qin, H. Jiang, and D. Swanson, “An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems,” In Proc. of the International Conference on Parallel Processing, 2002.

X. Qin and H. Jiang, “A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems,” Parallel Computing, vol. 32, no. 5-6, pp. 331-356, June 2006.

X. Qin, H. Jiang, and D.R. Swanson, “A Fault- tolerant Real-time Scheduling Algorithm for Precedence-Constrained Tasks in Distributed Heterogeneous Systems,” Technical Report TRUNL-CSE 2001-1003, Department of Computer Science and Engineering, University of Nebraska-Lincoln, September 2001.

Iamnitchi and I. Foster, “A problem-specific fault-tolerance mechanism for asynchronous, distributed systems,” in the proceedings of International Conference on Parallel Processing, 2000.

J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke, “Condor-G: A Computation Management Agent for Multi-Institutional Grids,” Cluster Computing, vol.5, no. 3, 2002.

Natrajan, M. Humphrey, and A. Grimshaw, “Grids: harnessing geographicallyseparated resources in a multi-organisational context,” in the proceedings of High Performance Computing Systems, 2001.

S. Hwang and C. Kesselman, “A flexible framework for fault tolerance in the Grid,” Journal of Grid Computing, vol. 1, pp. 251-272, 2003.

Lee and J. B. Weissman, “Dynamic replica management in the service Grid,”in Proceedings of the IEEE International Symposium on High Performance Distributed Computing, 2001.

X. Zhang, D. Zagorodnov, M. Hiltunen, R. D. Marzullo, and K. Schlichting, “Faulttolerant Grid Services Using Primary-Backup: Feasibility and Performance,” in CLUSTER’04, pp. 105-114, 2004.

R. AI-Omari, A. K. Somani, and G. Maninaran, “A new fault-tolerant technique for improving schedulability in multiprocessor real-time systems,” in the Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2001.

X. Qin, H. Jiang, and D. Swanson, “An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems,” In Proc.of the International Conference on Parallel Processing, 2002.

X. Qin and H. Jiang, “A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems,” Parallel Computing, vol.32, no. 5-6, pp. 331-356, June 2006.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.