Open Access Open Access  Restricted Access Subscription or Fee Access

Low Cost Coordinated Checkpointing Algorithm for Mobile Distributed Systems

Surender Kumar, R.K. Chauhan, Parveen Kumar

Abstract


Checkpointing is an efficient fault tolerance technique used in distributed systems. Due to the emerging challenges of the mobile distributed system as low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life, the fault tolerance technique designed for distributed system can not directly implemented on mobile distributed systems. Hence, the checkpointing algorithms having lesser number of coordinated messages and fewer checkpoints nearly to minimum are preferred for mobile environment. But both number of coordinated message and number of checkpoints are orthogonal. As time based approach uses lesser number of coordinated message but have higher number of checkpoints than minimum number of checkpoints required. On the other hand coordinated checkpointing approach takes lesser checkpoints than time based nearly to minimum but have higher coordinated message. Our proposed coordinated checkpointing approach use time to indirectly coordinate to minimizing the number of coordinated message transmitted through the wireless link and reduces the number of checkpoints nearest to the minimum. The algorithm is non-blocking and forces only minimum number dependent process to takes their checkpoints.

Keywords


Fault Tolerance, Distributed Systems, Mobile Systems, Checkpointing, Consistent Global State, and Coordinated Checkpointing.

Full Text:

PDF

References


Acharya A. and Badrinath B. R., “Checkpointing Distributed Applications on Mobile Computers,” Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, pp. 73-80, September 1994.

Koo R. and Toueg S., “Checkpointing and Roll-Back Recovery for Distributed Systems,” IEEE Trans. on Software Engineering, vol. 13, no. 1, pp. 23-31, January 1987.

Prakash R. and Singhal M., “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 7, no. 10, pp. 1035-1048, October1996.

Cao G. and Singhal M., On coordinated checkpointing in Distributed Systems, IEEE Transactions on Parallel and Distributed Systems, vol. 9, no.12, pp. 1213-1225, Dec 1998.

Cao G. and Singhal M., “Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 12, no. 2, pp. 157-172, February 2001.

Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., A Survey of Rollback-Recovery Protocols in Message-Passing Systems, ACM Computing Surveys, vol. 34, no. 3, pp. 375- 408, 2002.

Parveen Kumar, Lalit Kumar, R K Chauhan, V K Gupta “A Non-Intrusive Minimum Process Synchronous Checkpointing Protocol for Mobile Distributed Systems” Proceedings of IEEE ICPWC-2005, January 2005.

G.H. Forman and Zahorjan, “The Challenges of Mobile Computing” Computer, Vol. 27, no.4, pp.38-47, Apr. 1994.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.