Revision as of 11:43, 14 January 2021

This is the hub for the CS sysadmins on the wiki.

Common Tasks

Welcoming a new sysadmin <- log in here with wiki credentials to begin learning to be an admin
General troubleshooting tips for admins
Useful ssh information for admins
Recurring tasks - e.g. software updates, hardware replacements
Slack and GitLab integration
Ticket tracking for current projects
User Management
- Contacting all users
- Reset password
Software installation
- Installing software under modules
Monitoring
Backup
Add a computer
- Setting up Lovelace Lab Machines
Senior projects
Shutdown and Boot up
Password managers
Server safety
Upgrading SSL Certificates
- Generating SSL Certificates
Launch a process at startup
Working with ITS
Recurring spending

Services

For old documentation, see: Old Wiki Information

Machines and Brief Descriptions of Services

If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!

Compute (servers and clusters)

We have CS and cluster machines.

CS machines:

bowie: hosts and exports user files; Jupyterhub; landing server
smiley: VM host, not accessible to regular users
web: website host
net: network administration host for CS
code: GitLab host
auth: host of the LDAP user database

Cluster machines:

hopper: landing server
bronte, pollock, lovelace: large compute servers
layout, whedon: clusters of multiple nodes linked together through a switch and managed through a headnode
sakurai: big data storage and exports
meier, miyamoto: backup servers
monitor: server monitoring

We have spare nodes on the old al-salam cluster’s rack. These should be used for services that can handle minutes to hours of downtime, as they only have one power supply.

Specialized resources

Specialized computing applications are supported on the following machines:

GPU’s for AI/ML/data science: layout cluster
virtualization: smiley
containers: bowie

Network

We have two network fabrics linking the machines together. There are three subdomains.

10 Gb

We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses.

1 Gb (cluster, cs)

We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric.

Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network.

Intra-cluster fabrics

The layout cluster has an Infiniband infrastructure. Whedon has only a 1Gb infrastructure.

Power

We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well.

HVAC

HVAC systems are static and are largely managed by Facilities.

See full topology diagrams here.

A word about what's happening between files and the drives they live on.

@@ Line 72: / Line 72: @@
 Specialized computing applications are supported on the following machines:
-* GPU’s for AI/ML/data science: layout cluster
+* [[Sysadmin:GPGPU|GPU’s for AI/ML/data science]]: layout cluster
 * virtualization: smiley
 * containers: bowie

Difference between revisions of "Sysadmin"

Revision as of 11:43, 14 January 2021

Contents

Common Tasks

Services

Machines and Brief Descriptions of Services

Compute (servers and clusters)

Specialized resources

Network

10 Gb

1 Gb (cluster, cs)

Intra-cluster fabrics

Power

HVAC

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

websites

wiki

applied groups

Tools