Difference between revisions of "Sysadmin"

From Earlham CS Department
Jump to navigation Jump to search
m (Common Tasks)
(Machines and Brief Descriptions of Services)
Line 43: Line 43:
 
= Machines and Brief Descriptions of Services =
 
= Machines and Brief Descriptions of Services =
  
[[Topology|See full topology diagrams here.]]
+
[https://docs.google.com/drawings/d/1XaULz5IxXV_BZQjrko3QJ8wV5aXsSTYcSWxxT49OyZk/edit If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!]
  
[[Sysadmin:Layers of abstraction for filesystems|A word about what's happening between files and the drives they live on.]]
+
=== Compute (servers and clusters) ===
  
== CS Machines ==
+
We have CS and cluster machines.
  
{| style="float:left; margin-right:2px;"
+
CS machines:
| style="height:40px; width:150px; text-align:center; background-color:#54C571; border-left:solid 5px #54C571; border-top:solid 5px #54C571; border-bottom:solid 1px white; border-right:solid 5px #54C571; font-size:120%;" | NET <br> (vm1)
+
* bowie: hosts and exports user files; Jupyterhub; landing server
|-
+
* smiley: VM host, not accessible to regular users
| style="height:210px; width:150px; background-color:#54C571; border-left:solid 5px #54C571; border-bottom:solid 5px #54C571; border-right:solid 5px #54C571;" | [[Sysadmin:LDAP|LDAP Server]] <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]] <br><br> Backup to Dali: etc, var
+
* web: website host
|}
+
* net: network administration host for CS
 +
* code: GitLab host
 +
* auth: host of the LDAP user database
  
{| style="float:left; margin-right:2px;"
+
Cluster machines:
| style="height:40px; width:150px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | WEB <br> (vm2)
+
* hopper: landing server
|-
+
* bronte, pollock, lovelace: large compute servers
| style="height:210px; width:150px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;" | [[Sysadmin:Email:Mailman | Mailman]] <br> [[Sysadmin:Mail Stack | Mail Stack]]<br> Apache2 <br> PostgresQL <br> MySQL <br> Wiki <br><br> Backup to Dali: etc, var
+
* layout, whedon: clusters of multiple nodes linked together through a switch and managed through a headnode
|}
+
* sakurai: big data storage and exports
 +
* meier, miyamoto: backup servers
 +
* monitor: server monitoring
  
{| style="float:left; margin-right:2px;"
+
We have spare nodes on the old al-salam cluster’s rack. These should be used for services that can handle minutes to hours of downtime, as they only have one power supply.
| style="height:40px; width:150px; text-align:center; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-top:solid 5px #C38EC7; border-bottom:solid 1px white; border-right:solid 5px #C38EC7; font-size:120%;" | TOOLS <br> (vm3)
 
|-
 
| style="height:210px; width:150px; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-bottom:solid 5px #C38EC7; border-right:solid 5px #C38EC7;" | [[Sysadmin:Jupyterhub Notebook Server | Jupyterhub Server]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> NginX  <br>SSH<br>Users<br><br> Backup to Dali: etc, var, mnts, sage
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
==== Specialized resources ====
| style="height:55px; width:150px; text-align:center; background-color:#E3A869; border-left:solid 5px #E3A869; border-top:solid 5px #E3A869; border-bottom:solid 1px white; border-right:solid 5px #E3A869; font-size:120%;" | BABBAGE
 
|-
 
| style="height:210px; width:150px; background-color: #E3A869; border-left:solid 5px #E3A869; border-bottom:solid 5px #E3A869; border-right:solid 5px #E3A869;" | [[Sysadmin:Firewall | Firewall]]
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
Specialized computing applications are supported on the following machines:
| style="height:40px; width:150px; text-align:center; background-color:#FF7E6D; border-left:solid 5px #FF7E6D; border-top:solid 5px #FF7E6D; border-bottom:solid 1px white; border-right:solid 5px      #FF7E6D; font-size:120%;" | BOWIE
 
|-
 
| style="height:210px; width:150px; background-color:#FF7E6D; border-left:solid 5px #FF7E6D; border-bottom:solid 5px #FF7E6D; border-right:solid 5px #FF7E6D;" | PostgreSQL <br> Docker <br>Weather Monitoring <br> Energy Monitoring <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
* GPU’s for AI/ML/data science: layout cluster
| style="height:40px; width:150px; text-align:center; background-color:#54C571; border-left:solid 5px #54C571; border-top:solid 5px #54C571; border-bottom:solid 1px white; border-right:solid 5px      #54C571; font-size:120%;" | SMILEY
+
* virtualization: smiley
|-
+
* containers: bowie
| style="height:210px; width:150px; background-color:#54C571; border-left:solid 5px #54C571; border-bottom:solid 5px #54C571; border-right:solid 5px #54C571;" | [[XenDocs]] <br> NET <br> WEB <br>[[NFS]]<br><br> Backup to Dali: etc, var
 
|}
 
  
 +
=== Network ===
  
 +
We have two network fabrics linking the machines together. There are three subdomains.
  
 +
==== 10 Gb ====
  
<br> <br> <br> <br> <br> <br><br> <br> <br> <br> <br> <br>
+
We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses.
  
== Cluster Machines ==
+
==== 1 Gb (cluster, cs) ====
  
{| style="float:left; margin-right:2px;"
+
We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric.
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px      #0099cc; font-size:120%;" | HOPPER
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | Users <br> SSH <br> NFS server <br> LDAP server <br> [[Sysadmin:Software Modules | Software Modules]] <br> PostgreSQL <br> Wiki <br> Apache2 <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]]  <br><br> Backup to Indiana: etc, var, cluster
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network.
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | INDIANA
 
|-
 
| style="height:300px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | [[Indiana Storage Server|New Storage Server]]
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
==== Intra-cluster fabrics ====
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | DALI
 
|-
 
| style="height:300px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | Storage Server <br>[[Sysadmin:Gitlab | Gitlab]] <br> Backups <br> NginX <br><br> Backup to Indiana (/media/r10_vol/backups/): etc, var/opt/gitlab/backups
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
The layout cluster has an Infiniband infrastructure. Whedon has only a 1Gb infrastructure.
| style="height:55px; width:150px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | AL-SALAM
 
|-
 
| style="height:300px; width:150px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;" | [[WebMO]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> Apache2 <br><br> Backup to Indiana: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
=== Power ===
| style="height:55px; width:150px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | WHEDON
 
|-
 
| style="height:300px; width:150px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;" | [[Sysadmin:Software Modules | Software Modules]] <br><br> Backups to Indiana: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well.
| style="height:55px; width:150px; text-align:center; background-color:#39ad39; border-left:solid 5px #39ad39; border-top:solid 5px #39ad39; border-bottom:solid 1px white; border-right:solid 5px #39ad39; font-size:120%;" | LAYOUT
 
|-
 
| style="height:300px; width:150px; background-color:#39ad39; border-left:solid 5px #39ad39; border-bottom:solid 5px #39ad39; border-right:solid 5px #39ad39;" | [[Sysadmin:Jupyterhub Notebook Server | Jupyterhub Server]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> NginX <br> Apache2 <br> [[WebMO]] <br><br> Backup to Indiana: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
=== HVAC ===
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | BRONTE
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br><br> Backup to Indiana: etc, var, nbserver
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
HVAC systems are static and are largely managed by Facilities.
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | POLLOCK
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br> [[WebMO]] <br> NginX <br><br> Backup to Indiana: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
[[Topology|See full topology diagrams here.]]
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | KAHLO
 
|-
 
| style="height:300px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | Storage Server <br>Backups <br> NginX <br><br> Backup to Indiana: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
[[Sysadmin:Layers of abstraction for filesystems|A word about what's happening between files and the drives they live on.]]
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | BIGFE
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br><br> Hosts BCCD related repositories and distributions.
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | T-VOC
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]]  
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | ELWOOD
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br> <br> Used by BCCD to host www.bccd.net and www.littlefe.net. Will be deprecated when BCCD project offloads their sites onto cloud-based hosting platforms.
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:150px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | krasner
 
|-
 
| style="height:300px; width:150px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;" | [[Docker]] platform on an old lovelace machine upgraded to have 16GB of RAM.
 
|}
 
 
 
 
 
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 
 
 
 
 
== Switches ==
 
 
 
 
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px      #0099cc; font-size:120%;" | SG538SF02J
 
|-
 
| style="height:200px; width:175px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc; font-size:80%;" |
 
*Model: HP Procurve 3400cl
 
*Ports: 24
 
*Backplane bandwidth:
 
**88 Gbps
 
**64 million pps
 
*Memory:
 
**2MB packet buffer
 
**16 MB dual flash
 
**128 MB SDRAM
 
*Cut-through switching: No
 
*Unused as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | CN63FP762S
 
|-
 
| style="height:200px; width:175px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;font-size:80%;" |
 
*Model: HP 2530-24G
 
*Ports: 24
 
*Switching Capacity:
 
**56 Gbps
 
**41.6 million pps
 
*Memory:
 
**1.5 MB packet buffer
 
**256 MB  flash
 
**128 MB DDR3 DIMM
 
*Cut-through switching: No
 
*Connected to Al-Salam as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | SG525SG025
 
|-
 
| style="height:200px; width:175px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;font-size:80%;" |
 
*Model: HP Procurve 3400cl
 
*Ports: 24
 
*Backplane bandwidth:
 
**88 Gbps
 
**64 million pps
 
*Memory:
 
**2MB packet buffer
 
**16 MB dual flash
 
**128 MB SDRAM
 
*Cut-through switching: No
 
*Connected to layout and whedon as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#39ad39; border-left:solid 5px #39ad39; border-top:solid 5px #39ad39; border-bottom:solid 1px white; border-right:solid 5px #39ad39; font-size:120%;" | Netgear JGS524
 
|-
 
| style="height:200px; width:175px; background-color:#39ad39; border-left:solid 5px #39ad39; border-bottom:solid 5px #39ad39; border-right:solid 5px #39ad39;font-size:80%;" |
 
*Current cluster head-node
 
*Unmanaged (no console/configuration)
 
*Ports: 24
 
*Switching bandwidth:
 
**48 Gbps
 
**1.5 million pps
 
*Memory:
 
**2MB packet buffer
 
*Cut-through switching: No
 
*Connected to Al-Salam, Hopper, Pollock, Nagios, Dali, Kahlo, Bronte as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | cs-main
 
|-
 
| style="height:200px; width:175px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;font-size:80%;" |
 
*Model: HP 5920AF-24XG
 
*Ports: 24
 
*Backplane bandwidth:
 
**480 Gbps
 
**367 million pps
 
*Memory:
 
**3.6 GB packet buffer
 
**256 MB dual flash
 
**2 GB SDRAM
 
*Cut-through switching: Yes
 
*IP Address: 159.28.31.66
 
*Connected to layout, kahlo, and dali as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px #ADDFFF; font-size:120%;" | 5500denniscs-sw1
 
|-
 
| style="height:200px; width:175px; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-bottom:solid 5px #ADDFFF; border-right:solid 5px #ADDFFF;font-size:80%;" |
 
*Model: HP 5500 JG542A
 
*Ports: 24
 
*Backplane bandwidth:
 
**224 Gbps
 
**166.6 million pps
 
*Memory:
 
**6 MB packet buffer
 
**512 MB dual flash
 
**1 GB SDRAM
 
*Cut-through switching: No
 
*IP Address: 159.28.31.67
 
*Connected to Babbage, Bowie, Nagios, and the cluster's netgear switch (via port 14) as of May 12, 2017
 
|}
 
 
 
 
 
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 

Revision as of 10:53, 11 January 2021

This is the hub for the CS sysadmins on the wiki.

Common Tasks

Services

For old documentation, see: Old Wiki Information

Machines and Brief Descriptions of Services

If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!

Compute (servers and clusters)

We have CS and cluster machines.

CS machines:

  • bowie: hosts and exports user files; Jupyterhub; landing server
  • smiley: VM host, not accessible to regular users
  • web: website host
  • net: network administration host for CS
  • code: GitLab host
  • auth: host of the LDAP user database

Cluster machines:

  • hopper: landing server
  • bronte, pollock, lovelace: large compute servers
  • layout, whedon: clusters of multiple nodes linked together through a switch and managed through a headnode
  • sakurai: big data storage and exports
  • meier, miyamoto: backup servers
  • monitor: server monitoring

We have spare nodes on the old al-salam cluster’s rack. These should be used for services that can handle minutes to hours of downtime, as they only have one power supply.

Specialized resources

Specialized computing applications are supported on the following machines:

  • GPU’s for AI/ML/data science: layout cluster
  • virtualization: smiley
  • containers: bowie

Network

We have two network fabrics linking the machines together. There are three subdomains.

10 Gb

We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses.

1 Gb (cluster, cs)

We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric.

Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network.

Intra-cluster fabrics

The layout cluster has an Infiniband infrastructure. Whedon has only a 1Gb infrastructure.

Power

We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well.

HVAC

HVAC systems are static and are largely managed by Facilities.

See full topology diagrams here.

A word about what's happening between files and the drives they live on.