You are here: Home Blog XtreemOS



New demo video online: Checkpointing and restart

We are pleased to announce that we have another new demo video on-line: the checkpointing demo by John Mehnert-Spahn from the University of Duesseldorf.

All our technical video demos are avilable here


XtreemOS at Linux Symposium

The Linux Symposium was held at Montreal, Canada, from July 13th to July 17th.


John Mehnert-Spahn (UDUS) gave a technical presentation of 45 minutes on his work "Incremental Checkpointing for Grids" on July 15th. John explained the motivation behind incremental checkpoint and its implementation in Linux SSI as well as its integration in XtreemOS. There was much interest in the implementation of incremental checkpoint. 

Linux Symposium 2009 - John

On 17th July Surbhi Chitre (INRIA) held a BOFs session for XtreemOS. She explained why XtreemOS is attractive and how it is a complete solution for grids. She also explained her work of integrating OpenVZ in XtreemOS, why it was chosen and the challenges in doing so. In her talk, she also gave a brief introduction to XtreemFS features. This was followed by a presentation by Louis Rilling of Kerlabs of Kerrighed Clusters which is supported by XtreemOS.

Linux Symposium 2009 - Surbhi


Two XtreemOS papers accepted at Euro-Par 2009

 Euro-Par 2009 logo


"The Architecture of the XtreemOS Grid Checkpointing Service"


Authors: John Mehnert-Spahn, Thomas Ropars, Michael Schoettner, Christine Morin

Abstract - The EU-funded XtreemOS pro ject implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in distributed heterogeneous environments. In this paper we present the architecture of the XtreemGCP service integrating existing system-specific checkpointer solutions. We propose to bridge the gap between grid semantics and system-specific
checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We also present how to avoid resource conflicts during restart. Finally, we discuss measurements numbers showing that the XtreemGGP architecture introduces only minimal overhead.


"Active Optimistic Message Logging for Reliable Execution of MPI Applications"


Authors: Thomas Ropars, Christine Morin

Abstract -To execute MPI applications reliably, fault tolerance mechanisms are needed. Message logging is a well-known solution to provide fault tolerance for MPI applications. It has been proved that it can tolerate a higher failure rate than coordinated checkpointing. However pessimistic and causal message logging can induce high overhead on failure free execution. In this paper, we present O2P, a new optimistic message logging protocol, based on active optimistic message logging. Contrary to existing optimistic message logging protocols that save dependency information on reliable storage periodically, O2P logs dependency information as soon as possible to reduce the amount of data piggybacked on application messages. Thus, it reduces the overhead of the protocol on failure free execution, makes it more scalable and simplifies recovery. O2P is implemented as a module of the Open MPI library. Experiments show that active message logging can effectively improves scalability and performance of optimistic message logging.


Euro-Par conference website:


XtreemOS at PDCAT08

The Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'08) was held in Dunedin, New Zealand from 1–4 of December, 2008.

John Mehnert-Spahn (UDUS) presented XtreemOS within an invited talk "XtreemOS: Beyond Grid Middleware" (slides) within the workshop "High Performance and Grid Computing" co-located with PDCAT08.



Furthermore, he also presented the paper "Checkpointing Process Groups in a Grid Environment" within the main track of the International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) in Dunedin, New Zealand, December 2008.