Difference between revisions of "Sysadmin"
m (→Services) |
m (→Specialized resources) |
||
Line 72: | Line 72: | ||
Specialized computing applications are supported on the following machines: | Specialized computing applications are supported on the following machines: | ||
− | * GPU’s for AI/ML/data science: layout cluster | + | * [[Sysadmin:GPGPU|GPU’s for AI/ML/data science]]: layout cluster |
* virtualization: smiley | * virtualization: smiley | ||
* containers: bowie | * containers: bowie |
Revision as of 10:43, 14 January 2021
This is the hub for the CS sysadmins on the wiki.
Contents
Common Tasks
- Welcoming a new sysadmin <- log in here with wiki credentials to begin learning to be an admin
- General troubleshooting tips for admins
- Useful ssh information for admins
- Recurring tasks - e.g. software updates, hardware replacements
- Slack and GitLab integration
- Ticket tracking for current projects
- User Management
- Software installation
- Monitoring
- Backup
- Add a computer
- Senior projects
- Shutdown and Boot up
- Password managers
- Server safety
- Upgrading SSL Certificates
- Launch a process at startup
- Working with ITS
- Recurring spending
Services
- Cluster Overview and additional details
- Jupyterhub and NBGrader
- Apache2
- Databases
- DNS and DHCP
- VirtualBox
- Xen Server
- X Applications
- Bash startup scripts
- AWS
For old documentation, see: Old Wiki Information
Machines and Brief Descriptions of Services
If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!
Compute (servers and clusters)
We have CS and cluster machines.
CS machines:
- bowie: hosts and exports user files; Jupyterhub; landing server
- smiley: VM host, not accessible to regular users
- web: website host
- net: network administration host for CS
- code: GitLab host
- auth: host of the LDAP user database
Cluster machines:
- hopper: landing server
- bronte, pollock, lovelace: large compute servers
- layout, whedon: clusters of multiple nodes linked together through a switch and managed through a headnode
- sakurai: big data storage and exports
- meier, miyamoto: backup servers
- monitor: server monitoring
We have spare nodes on the old al-salam cluster’s rack. These should be used for services that can handle minutes to hours of downtime, as they only have one power supply.
Specialized resources
Specialized computing applications are supported on the following machines:
- GPU’s for AI/ML/data science: layout cluster
- virtualization: smiley
- containers: bowie
Network
We have two network fabrics linking the machines together. There are three subdomains.
10 Gb
We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses.
1 Gb (cluster, cs)
We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric.
Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network.
Intra-cluster fabrics
The layout cluster has an Infiniband infrastructure. Whedon has only a 1Gb infrastructure.
Power
We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well.
HVAC
HVAC systems are static and are largely managed by Facilities.
See full topology diagrams here.
A word about what's happening between files and the drives they live on.