Difference between revisions of "Checkpoint and Restarting"
Jump to navigation
Jump to search
Line 12: | Line 12: | ||
<li><b>SIGTERM</b> | <li><b>SIGTERM</b> | ||
<ul> | <ul> | ||
− | <li>Sets nsteps to current steps plus one.</li> | + | <li>Sets <tt>nsteps</tt> to current steps plus one.</li> |
</ul></li> | </ul></li> | ||
<li><b>SIGUSR1</b><ul> | <li><b>SIGUSR1</b><ul> | ||
− | <li>Sets nsteps the next multiple of nstxout past the current step.</li> | + | <li>Sets <tt>nsteps</tt> the next multiple of <tt>nstxout</tt> past the current step.</li> |
</ul></li> | </ul></li> | ||
</ul> | </ul> |
Revision as of 15:15, 1 June 2005
Checkpoint Frequency
Checkpointing in Folding@Clusters has two distinct parts:
- mdrun generating checkpoints in an interval given by the nstxout parameter in grompp.mdp.
- The nanny checks the size of the checkpoint file. If the file has become larger, it is transferred to the mother. This check happens about every two seconds (as of 1 June 2005).
Signals to mdrun and their relevance to the checkpoint/restart process
The mdrun process accepts SIGTERM and SIGUSR1. These signals can be received by a mdrun process of any rank. The effects of the signals are as follows:
- SIGTERM
- Sets nsteps to current steps plus one.
- SIGUSR1
- Sets nsteps the next multiple of nstxout past the current step.