Writeup on choosing which computing resources to use for what work

Short version

For X use Y:

small jobs (seconds or minutes of runtime): any machine, including your local one
hosting a website or web app: One of our web servers (web.cs most likely)
controlling a complete OS: a VM, send email to admins about this one
GPGPU's: Layout
running jobs that need lots of RAM and/or storage: a phat node (lovelace or pollock)
running jobs you want to split across many machines: a cluster (layout or whedon)

Check out the rest of this document for the more detailed version.

What does your computing problem look like?

Small computing jobs

A small computing job is easy to run: just run it on the computer or server you usually use.

Every system we have, from Bowie to Whedon, will run a basic Python 3 program or build and run C code. Your local machine probably will as well. If you expect your code to complete in seconds or minutes, this is probably your best choice.

You may use `nohup my_command &` for jobs you expect to take a few minutes. It puts your command in the background, returns your shell (so you can type more commands), and saves output to nohup.out in the local folder.

Websites and web apps

A few of our machines run web servers. Websites and web apps can be hosted on these machines.

A dedicated workspace where you configure the entire OS for yourself

Speak to the admins about this. We have a hypervisor running on one of our servers, and it hosts our web server among other utilities. We can quickly spin up a VM for you, either Ubuntu- or CentOS-flavored.

Anything requiring a GPGPU

The layout cluster. This is an easy answer because our other machines do not contain GPGPU’s. :)

If you're new to the term, a GPGPU (general-purpose graphics processing unit) is a GPU that performs computations in problem spaces other than rendering computer graphics.

A GPU can perform some specific computations, such as vector arithmetic, extremely quickly relative to a CPU. This is important in (for example) rendering video game animations. Many problems in the natural and computational sciences also require such calculations, hence the development of GPGPUs. Demand for their use may increase as data science and machine learning continue to grow.

Our NVIDIA GPGPUs (and associated drivers) are installed to the Layout cluster. To program using GPGPU’s, load CUDA (read more about CUDA here) on the Layout cluster. Happy coding.

High-performance computing, including scientific and research computing

If your job is expected to take hours or days, as in the case of many scientific computing and research-oriented workflows, you will want to use a system designed to handle it.

To choose this, you want to

Jumbo servers

tl;dr

Jumbo server: one big machine, lots of RAM and storage compared to CPU
Best for shared memory parallelism
Example problem: DNA sequence workflows
Hostnames: lovelace, pollock

Jumbo servers (nee phat nodes) are in many respects just bigger versions of the computers you’re accustomed to running. They have one or two CPU’s, lots of RAM, and lots of disk.

A jumbo server is the best solution for problems that require a lot of data to be loaded into memory and handled all at once, with minimal communications overhead.

Clusters

tl;dr

Cluster: several less powerful machines linked together to perform operations in parallel
Best for distributed memory parallelism
Example problem: molecular genomics simulations

Clusters consist of three pieces: 1. a head node that hosts the scheduler, provides Internet services, manages configuration of the system, and supports user access 2. N compute nodes, where each compute node hosts a pbs_mom and a pbs_client to do the computational work that is handed to it by the head node 3. a network switch to link all the nodes together

The use of multiple nodes requires a lot of communications overhead. As such, a cluster is well-suited to problems where data can be distributed across many places, at each of which CPU’s (and/or GPU’s) can work on it.

Detailed Specs

This section includes detailed information on each machine/cluster. This will likely be more information than you need, but it can come in handy for research or other applications where you need to provide information about how your code was run.

Whedon Specs (cluster)

Memory Details: Total Width: 72 bits; Data Width: 64 bits; Size: 256GB (8x32 GB); Form Factor: DIMM; Set: None; Type: DDR4; Type Detail: Synchronous; Speed: 2133 MT/s; Manufacturer: Samsung; Serial Number: 315FBBB7; Configured Memory Speed: 1866 MT/s

CPU Details: Architecture: x86_64; CPU op-mode(s): 32-bit, 64-bit; Byte Order: Little Endian; CPU(s): 32; On-line CPU(s) list: 0-31; Thread(s) per core: 2; Core(s) per socket: 8; Socket(s): 2; NUMA node(s): 2; Vendor ID: GenuineIntel; CPU family: 6; Model: 63; Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz; Stepping: 2; CPU MHz: 2599.951; CPU max MHz: 3200.0000; CPU min MHz: 1200.0000; BogoMIPS: 4799.75; Virtualization: VT-x; L1d cache: 32K; L1i cache: 32K; L2 cache: 256K; L3 cache: 20480K; NUMA node0 CPU(s): 0-7,16-23; NUMA node1 CPU(s): 8-15,24-31

Hamilton (cluster)

Layout (cluster)

Examples

On GitLab, you will find a series of bits of example code, mostly “hello world” code to verify that you have successfully submitted and run a job on the correct resources.

Choosing a computing resource

Contents

Short version

What does your computing problem look like?

Small computing jobs

Websites and web apps

A dedicated workspace where you configure the entire OS for yourself

Anything requiring a GPGPU

High-performance computing, including scientific and research computing

Jumbo servers

Clusters

Detailed Specs

Whedon Specs (cluster)

Hamilton (cluster)

Layout (cluster)

Examples

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

websites

wiki

applied groups

Tools