JVM Checkpoint Restore

The TODAY Framework integrates with checkpoint/restore as implemented by Project CRaC in order to allow implementing systems capable of reducing the startup and warmup times of Infra-based Java applications with the JVM.

Using this feature requires:

  • A checkpoint/restore enabled JVM (Linux only for now).

  • The presence of the org.crac:crac library (version 1.4.0 and above are supported) in the classpath.

  • Specifying the required java command-line parameters like -XX:CRaCCheckpointTo=PATH or -XX:CRaCRestoreFrom=PATH.

The files generated in the path specified by -XX:CRaCCheckpointTo=PATH when a checkpoint is requested contain a representation of the memory of the running JVM, which may contain secrets and other sensitive data. Using this feature should be done with the assumption that any value "seen" by the JVM, such as configuration properties coming from the environment, will be stored in those CRaC files. As a consequence, the security implications of where and how those files are generated, stored, and accessed should be carefully assessed.

Conceptually, checkpoint and restore align with the Infra Lifecycle contract for individual beans.

On-demand checkpoint/restore of a running application

A checkpoint can be created on demand, for example using a command like jcmd application.jar JDK.checkpoint. Before the creation of the checkpoint, Infra stops all the running beans, giving them a chance to close resources if needed by implementing Lifecycle.stop. After restore, the same beans are restarted, with Lifecycle.start allowing beans to reopen resources when relevant. For libraries that do not depend on Infra, custom checkpoint/restore integration can be provided by implementing org.crac.Resource and registering the related instance.

Leveraging checkpoint/restore of a running application typically requires additional lifecycle management to gracefully stop and start using resources like files or sockets and stop active threads.
If the checkpoint is created on a warmed-up JVM, the restored JVM will be equally warmed-up, allowing potentially peak performance immediately. This method typically requires access to remote services, and thus requires some level of platform integration.

Automatic checkpoint/restore at startup

When the -Dspring.context.checkpoint=onRefresh JVM system property is set, a checkpoint is created automatically at startup during the LifecycleProcessor.onRefresh phase. After this phase has completed, all non-lazy initialized singletons have been instantiated, and InitializingBean#afterPropertiesSet callbacks have been invoked; but the lifecycle has not started, and the ContextRefreshedEvent has not yet been published.

For testing purposes, it is also possible to leverage the -Dinfra.context.exit=onRefresh JVM system property which triggers similar behavior, but instead of creating a checkpoint, it exits your Infra application at the same lifecycle phase without requiring the Project CraC dependency/JVM or Linux. This can be useful to check if connections to remote services are required when the beans are not started, and potentially refine the configuration to avoid that.

As mentioned above, and especially in use cases where the CRaC files are shipped as part of a deployable artifact (a container image for example), operate with the assumption that any sensitive data "seen" by the JVM ends up in the CRaC files, and assess carefully the related security implications.
Automatic checkpoint/restore is a way to "fast-forward" the startup of the application to a phase where the application context is about to start, but it does not allow to have a fully warmed-up JVM.