Difference between revisions of "Getting started on clusters"
Daqasimi18 (talk | contribs) |
Daqasimi18 (talk | contribs) |
||
Line 2: | Line 2: | ||
This document presumes zero prior knowledge of cluster computing. If instead you're an intermediate user (e.g. you have an account and have run a few jobs before but need a reminder) the table of contents is your friend. | This document presumes zero prior knowledge of cluster computing. If instead you're an intermediate user (e.g. you have an account and have run a few jobs before but need a reminder) the table of contents is your friend. | ||
− | < | + | <h2> ARCHITECTURE </h2> |
SLURM has a centralized manager, <code> slurmctld </code>, to monitor resources and work. There may also be a backup manager to assume those responsibilities in the event of failure. Each compute server (node) has a <code> slurmd </code>, daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work. The <code> slurmd </code>, daemons provide fault-tolerant hierarchical communications. There is an optional <code> slurmdbd </code> (Slurm DataBase Daemon) which can be used to record accounting information for multiple Slurm-managed clusters in a single database. User tools include <code> srun </code> to initiate jobs, <code> scancel </code> to terminate queued or running jobs, <code> sinfo </code> to report system status, <code> squeue </code> to report the status of jobs, and <code> sacct </code> to get information about jobs and job steps that are running or have completed. The <code> sview </code> commands graphically reports system and job status including network topology. There is an administrative tool <code> scontrol </code> available to monitor and/or modify configuration and state information on the cluster. The administrative tool used to manage the database is <code> sacctmgr </code>. It can be used to identify the clusters, valid users, valid bank accounts, etc. APIs are available for all functions. | SLURM has a centralized manager, <code> slurmctld </code>, to monitor resources and work. There may also be a backup manager to assume those responsibilities in the event of failure. Each compute server (node) has a <code> slurmd </code>, daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work. The <code> slurmd </code>, daemons provide fault-tolerant hierarchical communications. There is an optional <code> slurmdbd </code> (Slurm DataBase Daemon) which can be used to record accounting information for multiple Slurm-managed clusters in a single database. User tools include <code> srun </code> to initiate jobs, <code> scancel </code> to terminate queued or running jobs, <code> sinfo </code> to report system status, <code> squeue </code> to report the status of jobs, and <code> sacct </code> to get information about jobs and job steps that are running or have completed. The <code> sview </code> commands graphically reports system and job status including network topology. There is an administrative tool <code> scontrol </code> available to monitor and/or modify configuration and state information on the cluster. The administrative tool used to manage the database is <code> sacctmgr </code>. It can be used to identify the clusters, valid users, valid bank accounts, etc. APIs are available for all functions. | ||
Revision as of 21:41, 2 March 2021
SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. SLURM requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, SLURM has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced accounting, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms. This document presumes zero prior knowledge of cluster computing. If instead you're an intermediate user (e.g. you have an account and have run a few jobs before but need a reminder) the table of contents is your friend.
ARCHITECTURE
SLURM has a centralized manager, slurmctld
, to monitor resources and work. There may also be a backup manager to assume those responsibilities in the event of failure. Each compute server (node) has a slurmd
, daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work. The slurmd
, daemons provide fault-tolerant hierarchical communications. There is an optional slurmdbd
(Slurm DataBase Daemon) which can be used to record accounting information for multiple Slurm-managed clusters in a single database. User tools include srun
to initiate jobs, scancel
to terminate queued or running jobs, sinfo
to report system status, squeue
to report the status of jobs, and sacct
to get information about jobs and job steps that are running or have completed. The sview
commands graphically reports system and job status including network topology. There is an administrative tool scontrol
available to monitor and/or modify configuration and state information on the cluster. The administrative tool used to manage the database is sacctmgr
. It can be used to identify the clusters, valid users, valid bank accounts, etc. APIs are available for all functions.
This document gives you all the information you need to choose a system, log in to a cluster/phat node, write a script, submit it via qsub to the scheduler, and find the output. As such, these notes cover hardware and software. (If you're a sysadmin, you may be interested in this page instead.)
Contents
Prerequisites
- Get a cluster account. You can email admin at cs dot earlham dot edu or a current CS faculty member to get started. Your user account will grant access to all the servers below, and you will have a home directory at
~username
that you can access when you connect to any of them.- Note: if you have a CS account, you will use the same username and password for your cluster account.
- Connect through a terminal via ssh to
username@hopper.cluster.earlham.edu
. If you intend to work with these machines a lot, you should also configure your ssh keys.
Cluster systems to choose from
The cluster dot earlham dot edu domain consists of clusters (a collection of physical servers linked through a switch to perform high-performance computing tasks with distributed memory) and jumbo servers (nee "phat nodes"; a system comprising one physical server with a high ratio of disk+RAM to CPU, good for jobs demanding shared memory).
Our current machines are:
- whedon: newest cluster; 8 compute nodes; Torque-only pending an OS upgrade
- layout: cluster; 4 compute nodes, pre-whedon, features NVIDIA GPGPU's and multiple CUDA options
- lovelace: newest jumbo server
- pollock: jumbo server, older than lovelace but well-tested and featuring the most available disk space
To get to, e.g., whedon, from hopper, run ssh whedon
.
If you're still not sure, click here for more detailed notes.
Cluster software bundle
The cluster dot earlham dot edu servers all run a supported CentOS version.
All these servers (unless otherwise noted) also feature the following software:
- Slurm (scheduler): submit a job with
sbatch jobname.sbatch
, delete it withscancel jobID
. Running a job has its own doc section below. - Environment modules: run
module avail
to see available software modules andmodule load modulename
to load one; you may load modules in bash scripts and qsub jobs as well.
The default shell on all these servers is bash.
The default Python version on all these servers is Python 2.x, but all have at least one Python 3 module with a collection of widely-used scientific computing libraries.
Using Slurm
Slurm is our batch scheduler.
You can check that it's working by running: srun -l hostname
You can submit a job in a script with the following: sbatch my_good_script.sbatch
Here's an example of a batch file:
#!/bin/sh #SBATCH --time=1 #SBATCH --job-name hello-world #SBATCH --nodes=1 #SBATCH -c 1 # ask for one core #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --mail-user=excellent_email_user@earlham.edu echo "queue/partition is `echo $SLURM_JOB_PARTITION`" echo "running on `echo $SLURM_JOB_NODELIST`" echo "work directory is `echo $SLURM_SUBMIT_DIR`" /bin/hostname srun -l /bin/hostname sleep 10 srun -l /bin/pwd
Interactive and command line interfaces also exist. After submitting a job slurm captures anything written to stdout and stderr by the programs and when the job completes puts it in a file called slurm-nnn.out (where nnn is the job number) in the directory where you ran sbatch. Use more to view it when you are looking for error messages, output file locations, etc.
If you are used to using qpeek
, you can instead just run tail -f jobXYZ.out
or tail -f jobXYZ.err
.
There's some more CPU management information here.
Conversion from Torque to Slurm
To submit a job to PBS, you'll need to write a shell script wrapper around it and submit it through qsub on your system of choice. For example (change the specific options):
Torque | Slurm | Description |
---|---|---|
qsub
|
sbatch
|
run/submit a batch job |
qstat
|
squeue
|
show jobs currently in the queue |
qdel
|
scancel
|
cancel a job |
pbsnodes -a
|
scontrol show nodes
|
show nodes in the cluster |
Torque | Slurm | Description |
---|---|---|
$PBS_QUEUE
|
$SLURM_JOB_PARTITION
|
the queue/partition you are in |
cat $PBS_NODEFILE
|
$SLURM_JOB_NODELIST
|
there's no equivalent of the nodes file but there is an environment variable that stores that information |
$PBS_O_WORKDIR
|
$SLURM_SUBMIT_DIR
|
working directory from which the command was run |
Example script
#!/usr/bin/bash #SBATCH --job-name hello-world #SBATCH --nodes=5 #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --mail-user=excellent_email_user@earlham.edu echo "queue is `echo $SLURM_JOB_PARTITION`" echo "running on `echo $SLURM_JOB_NODELIST`" echo "work directory is `echo $SLURM_SUBMIT_DIR`" srun -l echo "hello world!"
About qsub
Before Slurm we used Torque and its associated software, including qsub. This is now deprecated and should not be used on the Earlham CS cluster systems.