Choosing a computing resource

From Earlham CS Department
Revision as of 11:54, 4 May 2020 by Craigje (talk | contribs) (Anything requiring a GPGPU)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Writeup on choosing which computing resources to use for what work

Short version

For X use Y:

  • small jobs (seconds or minutes of runtime): any machine, including your local one
  • hosting a website or web app: One of our web servers (web.cs most likely)
  • controlling a complete OS: a VM, send email to admins about this one
  • GPGPU's: Layout
  • running jobs that need lots of RAM and/or storage: a phat node (lovelace or pollock)
  • running jobs you want to split across many machines: a cluster (layout or whedon)

Check out the rest of this document for the more detailed version.

What does your computing problem look like?

Small computing jobs

A small computing job is easy to run: just run it on the computer or server you usually use.

Every system we have, from Bowie to Whedon, will run a basic Python 3 program or build and run C code. Your local machine probably will as well. If you expect your code to complete in seconds or minutes, this is probably your best choice.

You may use `nohup my_command &` for jobs you expect to take a few minutes. It puts your command in the background, returns your shell (so you can type more commands), and saves output to nohup.out in the local folder.

Websites and web apps

A few of our machines run web servers. Websites and web apps can be hosted on these machines.

A dedicated workspace where you configure the entire OS for yourself

Speak to the admins about this. We have a hypervisor running on one of our servers, and it hosts our web server among other utilities. We can quickly spin up a VM for you, either Ubuntu- or CentOS-flavored.

Anything requiring a GPGPU

The layout cluster. This is an easy answer because our other machines do not contain GPGPU’s. :)

If you're new to the term, a GPGPU (general-purpose graphics processing unit) is a GPU that performs computations in problem spaces other than rendering computer graphics.

A GPU can perform some specific computations, such as vector arithmetic, extremely quickly relative to a CPU. This is important in (for example) rendering video game animations. Many problems in the natural and computational sciences also require such calculations, hence the development of GPGPUs. Demand for their use may increase as data science and machine learning continue to grow.

Our NVIDIA GPGPUs (and associated drivers) are installed to the Layout cluster. To program using GPGPU’s, load CUDA (read more about CUDA here) on the Layout cluster. Happy coding.

High-performance computing, including scientific and research computing

If your job is expected to take hours or days, as in the case of many scientific computing and research-oriented workflows, you will want to use a system designed to handle it.

To choose this, you want to

Jumbo servers

tl;dr

  • Jumbo server: one big machine, lots of RAM and storage compared to CPU
  • Best for shared memory parallelism
  • Example problem: DNA sequence workflows
  • Hostnames: lovelace, pollock

Jumbo servers (nee phat nodes) are in many respects just bigger versions of the computers you’re accustomed to running. They have one or two CPU’s, lots of RAM, and lots of disk.

A jumbo server is the best solution for problems that require a lot of data to be loaded into memory and handled all at once, with minimal communications overhead.

Clusters

tl;dr

  • Cluster: several less powerful machines linked together to perform operations in parallel
  • Best for distributed memory parallelism
  • Example problem: molecular genomics simulations

Clusters consist of three pieces: 1. a head node that hosts the scheduler, provides Internet services, manages configuration of the system, and supports user access 2. N compute nodes, where each compute node hosts a pbs_mom and a pbs_client to do the computational work that is handed to it by the head node 3. a network switch to link all the nodes together

The use of multiple nodes requires a lot of communications overhead. As such, a cluster is well-suited to problems where data can be distributed across many places, at each of which CPU’s (and/or GPU’s) can work on it.

Examples

On GitLab, you will find a series of bits of example code, mostly “hello world” code to verify that you have successfully submitted and run a job on the correct resources.