Process Checkpointing and Restoration Mechanism in xv6


What is xv6?

Xv6 is a teaching operating system developed in the summer of 2006 for MIT’s operating systems course.

Abstract

This project implements a process checkpointing and restoration mechanism within the MIT xv6 operating system. The system allows the kernel to serialize the memory state and CPU context of a running process into a persistent disk file. A restoration mechanism allows the kernel to reconstruct the process from this file, resuming execution from the exact point of interruption.

Key features include support for both self-checkpointing and external process target- ing, a custom binary file format with id validation, incremental disk writing to bypass file system journal limitations, and open file detection. The implementation required mod- ifications to the kernel’s memory management subsystem, system call interface and file system.

What is process Checkpointing?

Process fault tolerance is a critical feature in an operating systems. Checkpointing is the ability to save the state of a running process to disk storage, allowing it to be resumed later, even after a system reboot.

Goal of the project

The goal of this project was to extend the minimal xv6 kernel to support this functionality. This required an understanding of:

  • Virtual Memory: Translating user virtual addresses to physical addresses for serializa- tion.

  • Context Switching: Capturing the exact state of CPU registers (trapframe).

  • File System Internals: Writing large binary blobs from kernel space without violating transaction log limits.

(This was a project done for the Operating systems course during my B.Tech studies.)

System Design - xv6 checkpointing
Home/Projects/Process Checkpointing - xv6/System Design - xv6 checkpointing System Design The checkpoint image format The checkpoint-ed process is stored as a binary image file with a specific structure. To ensure data integrity, a header containing metadata is written at the beginning of the file, followed by the raw memory dump. The structure of the file is visualized in Figure 1. The header acts as a metadata container, ensuring that the restoration process has all context required (PID, memory size, and CPU registers) before it begins loading the raw memory data.
Implementations - xv6 checkpointing
Home/Projects/Process Checkpointing - xv6/Implementations - xv6 checkpointing Implementation Details This section details the specific code modifications required to implement the checkpointing feature. System Call Registration To expose the new functionality to user space, new system calls were registered in the standard xv6 interface files.
Usage Manual - xv6 checkpointing
Home/Projects/Process Checkpointing - xv6/Usage Manual - xv6 checkpointing Usage Manual This section provides instructions on how to use the implemented system calls in user-space programs. Creating a Checkpoint Use the checkpoint(int pid, char *filename) system call.
Challenges Faced - xv6 checkpointing
Home/Projects/Process Checkpointing - xv6/Challenges Faced - xv6 checkpointing Challenges Faced The development of this feature involved overcoming several technical hurdles specific to the xv6 kernel architecture. File System Transaction Limits The most significant issue was the “transaction too big” kernel panic error. The xv6 journaling system has a fixed log size. I initially attempting to write the entire process memory which can exceed tens of kilobytes in a single begin op() / end op() block caused the transaction to overflow the log buffer.