Entries For: May 2009
The brand new 2nd software release and selected demo
applications will be presented from June 24th to June 25th, 2009 at our
The floorplan is available here.
Conference website: http://www.supercomp.de/isc09/
Efﬁcient Management of Consistent Backups in a Distributed File System
Abstract: Setting up backup infrastructures for large-scale data management systems that can be operated cheaply and accessed with low latency has emerged as a practical problem. As a solution, we present a highly scalable and cost-efﬁcient architecture for backup management in a distributed ﬁle system. We describe techniques for the creation of consistent backups at runtime, as well as approaches to resource management in connection with an integrated backup architecture.
COMPSAC 2009: website
Seattle, Washington, July 20-24, 2009
The Doctoral Symposium at COMPSAC will provide an international forum for doctoral students to interact with other students and faculty mentors. Since 2006, COMPSAC has been designated as the IEEE Computer Society Signature Conference on Software Technology and Applications.
The Doctoral Symposium seeks to bring together PhD Students working in computer software and applications and related fields. Selected students will have the opportunity to present and discuss their research goals, methodology, and preliminary results within a constructive and international atmosphere.
The Symposium organizers will strive to provide useful guidance for completion of the dissertation research and motivation for a research career. The Symposium is intended for students who have already settled on a specific research proposal and have produced limited preliminary results, but have enough time remaining before their final defense to benefit from the fruitful Symposium discussions. Due to the mentoring aspect of the event, the Symposium will be open only to the students and mentors participating directly in the event.
In coordination with the technical theme of COMPSAC 2009, topics pertaining to software engineering of critical infrastructure systems such as civil, telecommunications, and medical systems will be of particular interest. Related topics include, but are not limited to, requirements analysis, co-analysis and co-design, modeling, design, development, testing, measurement, verification and validation for performance, safety, security, and dependability constraints of such systems. As effective construction of critical infrastructure systems is not limited solely to the field of computer science and engineering and is truly a multidisciplinary effort, submissions addressing multidisciplinary research topics are particularly encouraged.
XLAB is a medium-sized company whose major source of income is the family of ISL Online
business communication solutions. Other products include medical imaigng software and the Gaea+ GIS/virtual globe solution.
Having been awarded multiple innovation prizes, XLAB strives to integrate latest advances in the
fields of communication security, scalability, fault tolerance, and large dataset processing into the afore-mentioned products, thus fitting naturally with the XtreemOS project.
XLAB's most prominent role in the project is in the application execution management, where we
provided an initial prototype of the resource selection using various resource filters. We proceeded by supporting the job scheduler with a reservation management system. The job scheduler can thus rely on the timetables kept by each worker node, and a coordinating service on the level of the administration domain. As a future challenge, we will be tackling the reservation of resources between multiple domains.
In parallel to this work, we provided the implementation to the Linux SSI scheduler, which performs the load balancing of the system. It relies on an interchangeable set of modules for system metrics and scheduling policies. Hence, it can take advantage of either the ready-made probes into the kernel, or use custom ones developed by third party or the users themselves.
To support the whole system of the services in the XtreemOS, we developed a staging framework
that supports services written in Java and client programs in Java or C. It enables a distribution of
the services and asynchronous invocation using Java constructs, without having to deal with the
actual connectivity issues.
Related closely to the above is our work on security services. We developed a service to maintain
the Virtual Organisation policies, providing a way to isolate the users' jobs to only those resources that belong to the required VO, which happens already in the process of the resource selection.
Our project activities also include the development of an automatic scheduling manager for the
LinuxSSI flavour. We contributed to the definition of requirements in the early stages and have
continued to assess their fulfillment.
In the last year of the project we will lead the preparation of demonstrations and will also be involved in training activities, as a recent post about Kiberpipa presentation shows.
Authors: John Mehnert-Spahn, Thomas Ropars, Michael Schoettner, Christine Morin
Abstract - The EU-funded XtreemOS pro ject implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in distributed heterogeneous environments. In this paper we present the architecture of the XtreemGCP service integrating existing system-speciﬁc checkpointer solutions. We propose to bridge the gap between grid semantics and system-speciﬁc
checkpointers by introducing a common kernel checkpointer API that allows using diﬀerent checkpointers in a uniform way. Our architecture is open to support diﬀerent checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We also present how to avoid resource conﬂicts during restart. Finally, we discuss measurements numbers showing that the XtreemGGP architecture introduces only minimal overhead.
Authors: Thomas Ropars, Christine Morin
Abstract -To execute MPI applications reliably, fault tolerance mechanisms are needed. Message logging is a well-known solution to provide fault tolerance for MPI applications. It has been proved that it can tolerate a higher failure rate than coordinated checkpointing. However pessimistic and causal message logging can induce high overhead on failure free execution. In this paper, we present O2P, a new optimistic message logging protocol, based on active optimistic message logging. Contrary to existing optimistic message logging protocols that save dependency information on reliable storage periodically, O2P logs dependency information as soon as possible to reduce the amount of data piggybacked on application messages. Thus, it reduces the overhead of the protocol on failure free execution, makes it more scalable and simplifies recovery. O2P is implemented as a module of the Open MPI library. Experiments show that active message logging can effectively improves scalability and performance of optimistic message logging.
Euro-Par conference website: http://europar2009.ewi.tudelft.nl/