Checkpoint and Restarting
Jump to navigation
Jump to search
Checkpoint Frequency
Checkpointing in Folding@Clusters has two distinct parts:
- mdrun generating checkpoints in an interval given by the nstxout parameter in grompp.mdp.
- The nanny checks the size of the checkpoint file. If the file has become larger, it is transferred to the mother. This check happens about every two seconds (as of 1 June 2005).
Signals to mdrun and their relevance to the checkpoint/restart process
The mdrun process accepts SIGTERM and SIGUSR1. These signals can be received by a mdrun process of any rank. The effects of the signals are as follows:
- SIGTERM
- Sets nsteps to current steps plus one.
- SIGUSR1
- Sets nsteps the next multiple of nstxout past the current step.