The checkpoint is a feature to save all of process states onto files, to restart its execution later.
早期在 LWN 上的讨论
CRIU
https://lvee.org/uploads/image_upload/file/311/fedora-test-day.odp.pdf
Due to restrictions imposed by several kernel APIs CRIU uses, the tools can only work with run with root privileges.
comparison to other CR projects
DMTCP
https://dmtcp.sourceforge.io/dmtcp-mug-17.pdf
DMTCP vs CRIU
CRIU can dump a task without preparations. DMTCP can dump only prepared tasks. """ A DMTCP coordinator process is created on a host (default: localhost). As new processes are created (via fork or ssh), the LD_PRELOAD environment variable (supported by the Linux loader) is used to preload the DMTCP library (dmtcphijack.so). That library runs before the routine main(). It creates a second thread (DMTCP checkpoint thread). The checkpoint thread then creates a socket to the DMTCP coordinator and registers itself. The checkpoint thread also creates a signal handler (SIGUSR2 by default) """ CRIU doesn’t affect behavior of applications before and after checkpoint/restore. CRIU is independent from GLIBC and other libraries. DMTCP sets wrappers on a few system calls, so it can change behavior of applications. Probably DMCTP can’t dump static linked programs and programs, which call syscall directly. """ The run-time overhead of DMTCP is essentially zero. When there is no checkpoint or restart in process, DMTCP code will run only within DMTCP wrappers around certain less frequently used system calls. Examples of such wrappers are wrappers for open(), getpid(), socketpair(), etc. """ DMTCP doesn’t support namespaces, so it can not dump Linux Containers. DMTCP virtualizes PID-s in user-space, actually a task is restored with another pid. It may be prefered in some cases.
I’m not sure that DMCTP can restore anonymous shared memory correctly.
Probably DMTCP can’t restore TCP connections, pending signals, zombies, signalfd, file locks, epoll, etc.