Difference between revisions of "Choosing a computing resource"

From Earlham CS Department
Jump to navigation Jump to search
(Created page with "Writeup on choosing which computing resources to use for what work = Short version = For X use Y: * small jobs (seconds or minutes of runtime): any machine, including your...")
 
m
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
Writeup on choosing which computing resources to use for what work
 
Writeup on choosing which computing resources to use for what work
 +
 +
= Computer Resources =
 +
 +
[https://wiki.cs.earlham.edu/index.php/Sysadmin:Computer_Resources All information related to our machines and VMs]
  
 
= Short version =
 
= Short version =
Line 34: Line 38:
  
 
The layout cluster. This is an easy answer because our other machines do not contain GPGPU’s. :)
 
The layout cluster. This is an easy answer because our other machines do not contain GPGPU’s. :)
 +
 +
If you're new to the term, a GPGPU (general-purpose graphics processing unit) is a GPU that performs computations in problem spaces other than rendering computer graphics.
 +
 +
A GPU can perform some specific computations, such as vector arithmetic, extremely quickly relative to a CPU. This is important in (for example) rendering video game animations. Many problems in the natural and computational sciences also require such calculations, hence the development of GPGPUs. Demand for their use may increase as data science and machine learning continue to grow.
 +
 +
Our NVIDIA GPGPUs (and associated drivers) are installed to the Layout cluster. To program using GPGPU’s, load CUDA ([[CUDA|read more about CUDA here]]) on the Layout cluster. Happy coding.
  
 
=== High-performance computing, including scientific and research computing ===
 
=== High-performance computing, including scientific and research computing ===
Line 43: Line 53:
 
==== Jumbo servers ====
 
==== Jumbo servers ====
  
tl;dr
 
 
* Jumbo server: one big machine, lots of RAM and storage compared to CPU
 
* Jumbo server: one big machine, lots of RAM and storage compared to CPU
 
* Best for shared memory parallelism
 
* Best for shared memory parallelism
Line 55: Line 64:
 
==== Clusters ====
 
==== Clusters ====
  
tl;dr
 
 
* Cluster: several less powerful machines linked together to perform operations in parallel
 
* Cluster: several less powerful machines linked together to perform operations in parallel
 
* Best for distributed memory parallelism
 
* Best for distributed memory parallelism
Line 66: Line 74:
  
 
The use of multiple nodes requires a lot of communications overhead. As such, a cluster is well-suited to problems where data can be distributed across many places, at each of which CPU’s (and/or GPU’s) can work on it.
 
The use of multiple nodes requires a lot of communications overhead. As such, a cluster is well-suited to problems where data can be distributed across many places, at each of which CPU’s (and/or GPU’s) can work on it.
 +
 +
  
 
= Examples =
 
= Examples =
  
On [https://gitlab.cluster.earlham.edu GitLab], you will find a series of bits of example code, mostly “hello world” code to verify that you have successfully submitted and run a job on the correct resources.
+
On [https://code.cs.earlham.edu GitLab], you will find a series of bits of example code, mostly “hello world” code to verify that you have successfully submitted and run a job on the correct resources.
 +
 
 +
Tested and working 2022

Latest revision as of 15:05, 1 January 2023

Writeup on choosing which computing resources to use for what work

Computer Resources

All information related to our machines and VMs

Short version

For X use Y:

  • small jobs (seconds or minutes of runtime): any machine, including your local one
  • hosting a website or web app: One of our web servers (web.cs most likely)
  • controlling a complete OS: a VM, send email to admins about this one
  • GPGPU's: Layout
  • running jobs that need lots of RAM and/or storage: a phat node (lovelace or pollock)
  • running jobs you want to split across many machines: a cluster (layout or whedon)

Check out the rest of this document for the more detailed version.

What does your computing problem look like?

Small computing jobs

A small computing job is easy to run: just run it on the computer or server you usually use.

Every system we have, from Bowie to Whedon, will run a basic Python 3 program or build and run C code. Your local machine probably will as well. If you expect your code to complete in seconds or minutes, this is probably your best choice.

You may use `nohup my_command &` for jobs you expect to take a few minutes. It puts your command in the background, returns your shell (so you can type more commands), and saves output to nohup.out in the local folder.

Websites and web apps

A few of our machines run web servers. Websites and web apps can be hosted on these machines.

A dedicated workspace where you configure the entire OS for yourself

Speak to the admins about this. We have a hypervisor running on one of our servers, and it hosts our web server among other utilities. We can quickly spin up a VM for you, either Ubuntu- or CentOS-flavored.

Anything requiring a GPGPU

The layout cluster. This is an easy answer because our other machines do not contain GPGPU’s. :)

If you're new to the term, a GPGPU (general-purpose graphics processing unit) is a GPU that performs computations in problem spaces other than rendering computer graphics.

A GPU can perform some specific computations, such as vector arithmetic, extremely quickly relative to a CPU. This is important in (for example) rendering video game animations. Many problems in the natural and computational sciences also require such calculations, hence the development of GPGPUs. Demand for their use may increase as data science and machine learning continue to grow.

Our NVIDIA GPGPUs (and associated drivers) are installed to the Layout cluster. To program using GPGPU’s, load CUDA (read more about CUDA here) on the Layout cluster. Happy coding.

High-performance computing, including scientific and research computing

If your job is expected to take hours or days, as in the case of many scientific computing and research-oriented workflows, you will want to use a system designed to handle it.

To choose this, you want to

Jumbo servers

  • Jumbo server: one big machine, lots of RAM and storage compared to CPU
  • Best for shared memory parallelism
  • Example problem: DNA sequence workflows
  • Hostnames: lovelace, pollock

Jumbo servers (nee phat nodes) are in many respects just bigger versions of the computers you’re accustomed to running. They have one or two CPU’s, lots of RAM, and lots of disk.

A jumbo server is the best solution for problems that require a lot of data to be loaded into memory and handled all at once, with minimal communications overhead.

Clusters

  • Cluster: several less powerful machines linked together to perform operations in parallel
  • Best for distributed memory parallelism
  • Example problem: molecular genomics simulations

Clusters consist of three pieces: 1. a head node that hosts the scheduler, provides Internet services, manages configuration of the system, and supports user access 2. N compute nodes, where each compute node hosts a pbs_mom and a pbs_client to do the computational work that is handed to it by the head node 3. a network switch to link all the nodes together

The use of multiple nodes requires a lot of communications overhead. As such, a cluster is well-suited to problems where data can be distributed across many places, at each of which CPU’s (and/or GPU’s) can work on it.


Examples

On GitLab, you will find a series of bits of example code, mostly “hello world” code to verify that you have successfully submitted and run a job on the correct resources.

Tested and working 2022