next up previous
Next: 5 Conclusion Up: 4 Summary of Checkpoint Previous: 4.1 Remote File Access

 

4.2 Limitations

While the designers of truly distributed operating systems such as Sprite and V Kernel have carefully defined and implemented their process models to accommodate migration, we users of UNIX are not so fortunate. There are a lot of details in a process's state which are implicit, known only to the kernel, or are otherwise difficult or impossible to re-create. In condor we have taken the viewpoint that we can save and restore enough of a process's state to accommodate the needs of a wide variety of real-world user code. There is however, no way we can save all the state necessary for every kind of process. The most glaring lack of course is our inability to migrate one or more members of a set of communicating processes. In fact no attempt is made to deal with processes which execute fork() or exec(), or communicate with other processes via signals, sockets, pipes, files, or any other means. This is not to say that some inventive users have not found ways to use Condor for communicating processes, but they have changed their code to accommodate our limitations.

Another major limitation is the fact the the condor checkpointing code must be linked in with the user's code. This is fine for folks who build and run their own software, but it doesn't work for users of third party software who don't have access to the source. We have considered schemes to provide a ``checkpointing C library'' for third party programs which are dynamically linked, but so far have not implemented anything. A major obstacle to such work is the fact that shared library implementations vary widely across platforms, and such a facility would not be very portable.



condor-admin@cs.wisc.edu