Difference between revisions of "Sysadmin"

From Earlham CS Department
Jump to navigation Jump to search
m (Last updated 16 July)
 
(164 intermediate revisions by 11 users not shown)
Line 1: Line 1:
__NOTOC__
+
This is the hub for the CS sysadmins on the wiki.
  
= Machines and Brief Descriptions of Services =
+
= Overview =
== CS Machines ==
 
[[File:Server_layout_summer2017.jpg|thumb|200px|right|Server layout as of May 2017]]
 
  
{| style="float:left; margin-right:2px;"
+
[https://docs.google.com/drawings/d/1XaULz5IxXV_BZQjrko3QJ8wV5aXsSTYcSWxxT49OyZk/edit If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!]
| style="height:40px; width:150px; text-align:center; background-color:#54C571; border-left:solid 5px #54C571; border-top:solid 5px #54C571; border-bottom:solid 1px white; border-right:solid 5px #54C571; font-size:120%;" | NET <br> (vm1)
 
|-
 
| style="height:210px; width:150px; background-color:#54C571; border-left:solid 5px #54C571; border-bottom:solid 5px #54C571; border-right:solid 5px #54C571;" | [[LDAP Server]] <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]] <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
== Server room ==
| style="height:40px; width:150px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | WEB <br> (vm2)
 
|-
 
| style="height:210px; width:150px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;" | Mailman <br> [[Sysadmin:Mail Stack | Mail Stack]]<br> Apache2 <br> PostgresQL <br> MySQL <br> Wiki <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
Our servers are in Noyes, the science building that predates the CST. For general information about the server room and how to use it, check out [[Sysadmin:Server Room|this page]].
| style="height:40px; width:150px; text-align:center; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-top:solid 5px #C38EC7; border-bottom:solid 1px white; border-right:solid 5px #C38EC7; font-size:120%;" | TOOLS <br> (vm3)
 
|-
 
| style="height:210px; width:150px; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-bottom:solid 5px #C38EC7; border-right:solid 5px #C38EC7;" | [[SageNB Server | SageNB Server]] <br> [[Jupyterhub notebook server | Jupyterhub Server]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> NginX  <br>SSH<br>Users<br><br> Backup to Dali: etc, var, mnts, sage
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
Columns: machine name, IPs, type (virtual, metal), purpose, dies, cores, RAM
| style="height:55px; width:150px; text-align:center; background-color:#E3A869; border-left:solid 5px #E3A869; border-top:solid 5px #E3A869; border-bottom:solid 1px white; border-right:solid 5px #E3A869; font-size:120%;" | BABBAGE
 
|-
 
| style="height:210px; width:150px; background-color: #E3A869; border-left:solid 5px #E3A869; border-bottom:solid 5px #E3A869; border-right:solid 5px #E3A869;" | [[Sysadmin:Firewall | Firewall]]
 
|}
 
  
{|
+
== Compute Resources ==
| style="height:55px; width:150px; text-align:center; background-color:#EEDC82; border-left:solid 5px #EEDC82; border-top:solid 5px #EEDC82; border-bottom:solid 1px white; border-right:solid 5px #EEDC82; font-size:120%;" | [[Sysadmin:Servers:Proto | PROTO]]
 
|-
 
| style="height:210px; width:150px; background-color: #EEDC82; border-left:solid 5px #EEDC82; border-bottom:solid 5px #EEDC82; border-right:solid 5px #EEDC82;" | Weather Monitoring <br> GPS/NTP <br> Energy Monitoring <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
[https://wiki.cs.earlham.edu/index.php/Sysadmin:Computer_Resources Machines and VMs related information here!]
| style="height:40px; width:150px; text-align:center; background-color:#FF7E6D; border-left:solid 5px #FF7E6D; border-top:solid 5px #FF7E6D; border-bottom:solid 1px white; border-right:solid 5px      #FF7E6D; font-size:120%;" | CONTROL
 
|-
 
| style="height:210px; width:150px; background-color:#FF7E6D; border-left:solid 5px #FF7E6D; border-bottom:solid 5px #FF7E6D; border-right:solid 5px #FF7E6D;" | Users <br> SSH <br> HOME <br> TOOLS <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
== Network ==
| style="height:40px; width:150px; text-align:center; background-color:#54C571; border-left:solid 5px #54C571; border-top:solid 5px #54C571; border-bottom:solid 1px white; border-right:solid 5px      #54C571; font-size:120%;" | SMILEY
 
|-
 
| style="height:210px; width:150px; background-color:#54C571; border-left:solid 5px #54C571; border-bottom:solid 5px #54C571; border-right:solid 5px #54C571;" | [[XenDocs]] <br> NET <br> WEB <br>NFS<br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
We have two network fabrics linking the machines together. There are three subdomains.
| style="height:40px; width:150px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px      #E77471; font-size:120%;" | SHINKEN
 
|-
 
| style="height:210px; width:150px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;" | Users <br> SSH <br> Add machines
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
=== 10 Gb ===
| style="height:40px; width:150px; text-align:center; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-top:solid 5px #C38EC7; border-bottom:solid 1px white; border-right:solid 5px      #C38EC7; font-size:120%;" |MURPHY
 
|-
 
| style="height:210px; width:150px; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-bottom:solid 5px #C38EC7; border-right:solid 5px #C38EC7;" | Elderly email stack <br> Users <br> SSH
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses.
| style="height:40px; width:150px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px      #ADDFFF; font-size:120%;" | HOME <br> (vm0)
 
|-
 
| style="height:210px; width:150px; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-bottom:solid 5px #ADDFFF; border-right:solid 5px #ADDFFF;" | SSH <br> NFS <br><br> Backup to Dali: eccs, etc, var <br><br> deprecated 07-2018
 
|}
 
  
 +
=== 1 Gb (cluster, cs) ===
  
<br> <br> <br> <br> <br> <br><br> <br> <br> <br> <br> <br>
+
We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric.
  
== Cluster Machines ==
+
Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network.
  
{| style="float:left; margin-right:2px;"
+
=== Intra-cluster fabrics ===
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px      #0099cc; font-size:120%;" | HOPPER
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | Users <br> SSH <br> NFS server <br> LDAP server <br> [[Sysadmin:Software Modules | Software Modules]] <br> PostgreSQL <br> Wiki <br> Apache2 <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]]  <br><br> Backup to Dali: etc, var, cluster
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
The layout cluster has an Infiniband infrastructure. Wachowski has only a 1Gb infrastructure.
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | INDIANA
 
|-
 
| style="height:300px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | [[Indiana Storage Server|New Storage Server]]
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
== Power ==
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | DALI
 
|-
 
| style="height:300px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | Storage Server <br>[[Sysadmin:Gitlab | Gitlab]] <br> Backups <br> NginX <br><br> Backup to Dali (/media/r10_vol/backups/): etc, var/opt/gitlab/backups
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well.
| style="height:55px; width:150px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | AL-SALAM
 
|-
 
| style="height:300px; width:150px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;" | WebMO <br> [[Sysadmin:Software Modules | Software Modules]] <br> Apache2 <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
== HVAC ==
| style="height:55px; width:150px; text-align:center; background-color:#39ad39; border-left:solid 5px #39ad39; border-top:solid 5px #39ad39; border-bottom:solid 1px white; border-right:solid 5px #39ad39; font-size:120%;" | LAYOUT
 
|-
 
| style="height:300px; width:150px; background-color:#39ad39; border-left:solid 5px #39ad39; border-bottom:solid 5px #39ad39; border-right:solid 5px #39ad39;" | [[Sysadmin:Jupyterhub Notebook Server | Jupyterhub Server]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> NginX <br> Apache2 <br> WebMO <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
HVAC systems are static and are largely managed by Facilities.
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | BRONTE
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br><br> Backup to Dali: etc, var, nbserver
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
[[Topology|See full topology diagrams here.]]
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | POLLOCK
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br> WebMO <br> NginX <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
[[Sysadmin:Layers of abstraction for filesystems|A word about what's happening between files and the drives they live on.]]
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | KAHLO
 
|-
 
| style="height:300px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | Storage Server <br>Backups <br> NginX <br><br> Backup to Dali: etc, var
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
= New sysadmins =
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | BIGFE
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br><br> Hosts BCCD related repositories and distributions.
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
These pages will be helpful for you if you're just starting in the group:
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | T-VOC
 
|-
 
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]]
 
|}
 
  
{| style="float:left; margin-right:2px;"
+
* [[Sysadmin:New Sysadmins | Welcoming a new sysadmin ]]
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | ELWOOD
+
* [[Sysadmin:Troubleshooting|General troubleshooting tips for admins]]
|-
+
* [[Sandbox Notes|Sandbox Notes]]
| style="height:300px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" |  [[Sysadmin:Software Modules | Software Modules]] <br> <br> Used by BCCD to host www.bccd.net and www.littlefe.net. Will be deprecated when BCCD project offloads their sites onto cloud-based hosting platforms.
+
* [[Password managers]]
|}
+
* [[Server safety]]
 +
* [https://code.cs.earlham.edu/sysadmin/ticket-tracker Ticket tracking for current projects]
  
{| style="float:left; margin-right:2px;"
+
Note: you'll need to log in with wiki credentials to see most Sysadmin pages.
| style="height:55px; width:150px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | krasner
 
|-
 
| style="height:300px; width:150px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;" | [[Docker]] platform on an old lovelace machine upgraded to have 16GB of RAM.
 
|}
 
  
 +
= Additional information =
  
 +
These pages contain a lot of the most important information about our systems and how we operate.
  
 +
===Handy Tools===
 +
* [http://monitor.cluster.earlham.edu:8088/packages Porter's Package Explorer]
  
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
+
===Technical docs===
  
== Switches ==
+
* [https://code.cs.earlham.edu/sysadmin/ticket-tracker Ticket tracking for current projects]
 
+
* [[Server safety]]
 
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px      #0099cc; font-size:120%;" | SG538SF02J
 
|-
 
| style="height:200px; width:175px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc; font-size:80%;" |
 
*Model: HP Procurve 3400cl
 
*Ports: 24
 
*Backplane bandwidth:
 
**88 Gbps
 
**64 million pps
 
*Memory:
 
**2MB packet buffer
 
**16 MB dual flash
 
**128 MB SDRAM
 
*Cut-through switching: No
 
*Unused as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | CN63FP762S
 
|-
 
| style="height:200px; width:175px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;font-size:80%;" |
 
*Model: HP 2530-24G
 
*Ports: 24
 
*Switching Capacity:
 
**56 Gbps
 
**41.6 million pps
 
*Memory:
 
**1.5 MB packet buffer
 
**256 MB  flash
 
**128 MB DDR3 DIMM
 
*Cut-through switching: No
 
*Connected to Al-Salam as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | SG525SG025
 
|-
 
| style="height:200px; width:175px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;font-size:80%;" |
 
*Model: HP Procurve 3400cl
 
*Ports: 24
 
*Backplane bandwidth:
 
**88 Gbps
 
**64 million pps
 
*Memory:
 
**2MB packet buffer
 
**16 MB dual flash
 
**128 MB SDRAM
 
*Cut-through switching: No
 
*Connected to layout and whedon as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#39ad39; border-left:solid 5px #39ad39; border-top:solid 5px #39ad39; border-bottom:solid 1px white; border-right:solid 5px #39ad39; font-size:120%;" | Netgear JGS524
 
|-
 
| style="height:200px; width:175px; background-color:#39ad39; border-left:solid 5px #39ad39; border-bottom:solid 5px #39ad39; border-right:solid 5px #39ad39;font-size:80%;" |
 
*Current cluster head-node
 
*Unmanaged (no console/configuration)
 
*Ports: 24
 
*Switching bandwidth:
 
**48 Gbps
 
**1.5 million pps
 
*Memory:
 
**2MB packet buffer
 
*Cut-through switching: No
 
*Connected to Al-Salam, Hopper, Pollock, Nagios, Dali, Kahlo, Bronte as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | cs-main
 
|-
 
| style="height:200px; width:175px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;font-size:80%;" |
 
*Model: HP 5920AF-24XG
 
*Ports: 24
 
*Backplane bandwidth:
 
**480 Gbps
 
**367 million pps
 
*Memory:
 
**3.6 GB packet buffer
 
**256 MB dual flash
 
**2 GB SDRAM
 
*Cut-through switching: Yes
 
*IP Address: 159.28.31.66
 
*Connected to layout, kahlo, and dali as of May 12, 2017
 
|}
 
 
 
{| style="float:left; margin-right:2px;"
 
| style="height:55px; width:175px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px #ADDFFF; font-size:120%;" | 5500denniscs-sw1
 
|-
 
| style="height:200px; width:175px; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-bottom:solid 5px #ADDFFF; border-right:solid 5px #ADDFFF;font-size:80%;" |
 
*Model: HP 5500 JG542A
 
*Ports: 24
 
*Backplane bandwidth:
 
**224 Gbps
 
**166.6 million pps
 
*Memory:
 
**6 MB packet buffer
 
**512 MB dual flash
 
**1 GB SDRAM
 
*Cut-through switching: No
 
*IP Address: 159.28.31.67
 
*Connected to Babbage, Control, Nagios, and the cluster's netgear switch (via port 14) as of May 12, 2017
 
|}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 
 
 
= Systems Administration Documentation =
 
For old documentation, see: [[Sysadmin:Old | Old Wiki Information]]
 
 
 
{|
 
|- valign:"top"
 
|
 
<div style="border:10px solid #E0EAF8; padding:5px; width:230px; height:500px">
 
<div style="background-color:#CEDEF4; padding:5px;">
 
 
 
=== Admin Tasks ===
 
</div>
 
* [[Sysadmin:Nagios | Nagios Monitoring ]]
 
* [[Sysadmin:Shinken | Shinken Monitoring ]]
 
* [[Sysadmin:Upgrading SSL Certificate | Upgrading SSL Certificates ]]
 
* [[Sysadmin:User Management | User Management]]
 
* [[Newmodules | Installing software under modules ]]  
 
 
* [[Sysadmin:Backup|Backup]]
 
* [[Sysadmin:Backup|Backup]]
* [[Sysadmin:Contacting all users|Contacting all users]]
+
* [[Sysadmin:Monitoring | Monitoring ]]
* [[Sysadmin:New Sysadmins | Welcoming a new sysadmin to the fold]]
+
* [[Sysadmin:SSH|SSH info relevant to admins]]
* [[Sysadmin:AddComputer|Add a computer]]
+
* [[Sysadmin:User Management | User Management]] and [[Sysadmin:LDAP|LDAP]] generally
* [[Sysadmin:Setting up Lovelace Lab Machines | Setting up Lovelace Lab Machines]]
+
* [[Sysadmin:Jupyterhub Notebook Server|Jupyterhub]] and [[Nbgrader notes|NBGrader]]
* [[Reset password]]
+
* [[Sysadmin:MailStack|Email service]]
 
+
* [[Sysadmin:XenDocs | Xen Server]]
 
+
* [[Sysadmin:NFS|Network File System (NFS)]]
<!-- This has to stay as part of the formatting -->
+
* [[Sysadmin:Web Servers|Web Servers and Websites]]
</div>
 
| style="float:left;" |
 
|
 
<div style="border:10px solid #FFDFFF; padding:5px; width:230px; height:500px;">
 
<div style="background-color:#FFCEFF; padding:5px;">
 
 
 
=== Services ===
 
</div>
 
* [[Sysadmin:Services:ClusterOverview|Cluster Overview]]
 
* [[Sysadmin:Services:Apache2|Apache2]]
 
 
* [[Sysadmin:Services:Databases|Databases]]
 
* [[Sysadmin:Services:Databases|Databases]]
 
* [[Sysadmin:DNS & DHCP|DNS and DHCP]]
 
* [[Sysadmin:DNS & DHCP|DNS and DHCP]]
* [[Sysadmin:Services:Virtualization | Virtualization]]
+
* [[Sysadmin:AWS|AWS]]
* [[Sysadmin:Services:XenServerSetup | Xen Server]]
+
* [[Bash_start_up_script|Bash startup scripts]]
 +
* [[Sysadmin:VirtualBox | VirtualBox]]
 +
* [[X Applications]]
 +
* [[Sysadmin:Services:ClusterOverview|Cluster Overview]] and [[Sysadmin:Ccg-admin|additional details]]
 +
* [[Sysadmin:Firewall|Firewall]] running on babbage.cs.e.e
 +
* [[Sysadmin:Setting_up_Lovelace_Lab_Machines|Setting up Lab Machines]]
  
<!-- This has to stay as part of the formatting -->
+
===Common tasks===
</div>
+
* [[Sysadmin:Recurring Tasks | Recurring tasks - e.g. software updates, hardware replacements]]
| style="float:left;" |
+
* [[Sysadmin:Contacting all users|Contacting all users]]
|
+
* [[Reset password]]
<div style="border:10px solid #F0DDD5; padding:5px; width:230px; height:500px;">
+
* [[Sysadmin:Software installation | Software installation]]
<div style="background-color:#E4C0B1; padding:5px;">
+
* [[Modules | Installing software under modules ]]  
 
+
* [[Sysadmin:AddComputer|Add a computer to CS or cluster domains]]
=== Miscellaneous ===
+
* [[Senior projects|Supporting senior projects]]
</div>
+
* [[ShutdownProcedure|How to do a planned shutdown and reboot of the system]]
* [[ShutdownProcedure| Shutdown and Boot up]]
+
** [[Sysadmin:TestingServices | Testing services]] (after a reboot, upgrade, change in the phase of the moon, etc.)
* [[SysadminContactInfo| Contact Information]]
+
* [[Sysadmin:Upgrading SSL Certificate | Upgrading SSL Certificates ]]
* [[Sysadmin:ImportantInfo:PhoneNumbers| Phone Numbers]]
+
* [[Sysadmin:Launch at startup|Launch a process at startup]]
* [[Sysadmin:ImportantInfo:AuthenticationInfo| Authentication Information]]
+
* [[Sysadmin:Psql-setup | setup psql for cs430 students]]
* [[Sysadmin:ImportantInfo:UPS| UPS]]
 
* [[Sysadmin:ImportantInfo:SSLcerts| Generating SSL Certificates]]
 
* [[Sysadmin:Power draws| Power draws]]
 
* [[Sysadmin:ImportantInfo:SunHardware|Working with Sun Hardware]]
 
* [[Sysadmin:Passwords]]
 
* Patching
 
** [[LinuxKernelPatching|Linux Kernel Patching]]
 
* [[Sysadmin:SerialConsoleCableEnds|Cable Ends]]
 
* [[Sysadmin:VirtualizationComparison|NEW Virtualization Comparison]]
 
 
 
<!-- This has to stay as part of the formatting -->
 
</div>
 
| style="float:left;" |
 
|
 
<div style="border:10px solid #D6F8DE; padding:5px; width:230px; height:500px;">
 
<div style="background-color:#BDF4CB; padding:5px;">
 
 
 
=== Networking ===
 
</div>
 
* [[Sysadmin:Networking:NetworkLayout|Network Layout (as of 08/2006)]]
 
* [[Sysadmin:Networking:D224 cable plant|D224 cable plant]]
 
* [[Sysadmin:Networking:Fiber plans|Fiber plans]]
 
* [[Sysadmin:Networking:Switches|Switches]]
 
* [[Sysadmin:Networking:Rack notes|Rack notes]]
 
* [[Sysadmin:Networking:Public|Public Network]]
 
* [[Sysadmin:Networking:NetworkTopo|Old Network Topo Figures]]
 
* [[Sysadmin:Networking:NetworkDiagram|Network layout (May 2007)]]
 
* [[Sysadmin:Networking:Alternate Network Path|Alt Network path]]
 
* [[Sysadmin:UPS Setup]]
 
 
 
<!-- This has to stay as part of the formatting -->
 
</div>
 
|}
 
 
 
== Current Projects ==
 
=== Last updated 16 July ===
 
* 159.28.23.26 is a ghost machine - it responds to ping and purportedly exists, but we are unsure where it is or what it is. - ping -f
 
* Power map additions and updates
 
* nbgrader - setup continues
 
* indiana config
 
* shinken - didn't boot, and the whole two sets of processes bit
 
* [[File:Manual TeraStation5010.pdf]]
 
 
 
=== Elderly, to be sorted through ===
 
* Adam and Ahsan will be there for Noyes room tour on Friday, April 20 at 08:00
 
* Setup and install new machines in Lovelace - Eli
 
** New machine has Debian installed, needs ethernet driver installed though
 
* 10 Gbps for dali and kahlo, hopper, etc (nfs mounts in /etc/fstab) - Adam
 
** real close, reboot test 10a 8 April
 
* Chau & Laurence will be brought up to speed for Layout - Adam
 
** "Layout needs to be brought out of surgery" - Charlie
 
** Layout - head node swap, check disk space on /scratch - Adam
 
* APC is on Shinken now - Alek
 
* Postgres connection monitoring on Shinken - Vitalli
 
* Figure out the cs web production and test solution - Chau
 
* Upgrade t-voc - ?
 
 
 
* -------------------------
 
* Backup - max disk capacity, scripts on all machines backing-up at least /etc; babbage - Chau
 
* Ganglia on Hopper - Ahsan
 
* <s> UPS load monitoring </s> - Eli
 
** One UPS is being monitored, more can be added and stuff can still be cleared up
 
* Temperature and humidity monitoring in the machine room (new item)
 
* Fixing the user create script
 
* Password auditing script
 
 
 
* -------------------------
 
* FIFO for requests rather than ad-hoc
 
* Accounting for hours logged
 
* PBS Shinken monitoring
 
* Power layout - legend, color-code servers by type, how are servers with 2x power supplies plumbed?
 
 
 
* -------------------------
 
* Shinken - Vitalii and Aleks (documentation, monitoring webmo and pm8)
 
* Hadoop on Whedon - Vitalii and Adam (stuck on ?)
 
* Layout - Adam (stuck on ldap)
 
* Gaussian & WebMO on Whedon - Ahsan and Eli (stuck on firewall)
 
* Backup - Ch'''â'''u (moving along, setup backup.cs.e.e next)
 
** Needs to setup a machine with CentOs in Noyes basement, eventually run it over to Lilly - Laurence
 
* <s> Installing power monitor, etc. and rack cleanup - TO BE ASSIGNED (eli and charlie) </s>
 
** <s> switch switch (charlie to check inventory) </s>
 
* Mothur - Ahsan
 
* <s> password policy, force change and random initial </s>
 
** <s> for now notify people with default and then change after a couple of days; script will generate random string </s>
 
* Talk about at next meeting:
 
** Spring break and summer people (important)
 
** Jon's user and Postgres database
 
** investigate tools /clients/ directory with what looks like duplicate user directories
 
 
 
=== (list from 2017-10-26) ===
 
* <s>Finish migrating tools and home to smiley</s> migrate web and net back to control
 
** Record consistent & thorough documentation, especially concerning the startup and shutdown of the VMs
 
* Setup graceful shutdown when we detect to be running solely off UPS
 
** Additionally, setup clean shutdown and startup for VMs on smiley and control (?)
 
* <s> Fix reverse lookup error for mail.cs.earlham.edu
 
** Should consistently refer to 159.28.22.2 (web.cs.earlham.edu)
 
** It's possible that this isn't actually broken. </s>
 
* Layout infiniband subnet manager
 
* Layout disk swap, new lo0
 
** <s>Redo /scratch for mglerner group on /media/r10_vol</s> ?
 
* Migrate Elwood, BigFe, t-voc to repurposed Lovelace Machines (Eli)
 
* <s>HP Al-Salam switch enable jumboframes</s> ?
 
* Strike unused lovelace machine addresses from CS DNS file
 
** Perhaps there's a python file in root's home somewhere that checks for unused DNS/DHCP addresses?
 
 
 
== Ongoing Projects (Spring 2017) ==
 
=== TODO ===
 
* EMAILING ALL THE USERS https://wiki.cs.earlham.edu/index.php/Sysadmin:Old:Contacting_All_Users
 
* SHUTDOWN SCHEDULED FOR SUNDAY (APRIL 16)
 
** Check/update instructions - one version is at https://wiki.cs.earlham.edu/index.php/Sysadmin:ImportantInfo:PowerFailure, there are others too
 
** Notify users
 
* Fix certs for gitlab, etc.
 
* Secure 1-2 admins for the summer
 
* Prep layout for May-June usage
 
* Practice shutdown-startup procedure (with Michael)
 
* Nsswitch consistency across all machines
 
* Document tools: startup / shutdown - Charlie
 
* Use Sysadmin namespace for all our pages - All
 
** Testing usefulness of documentation - Dave
 
* Al Salam: configure switch, re-rack. - Vitalii
 
** HP switch should be reset and tested.
 
* LDAP cleanup of system users / old groups - James
 
* Layout - Nirdesh
 
** Lo0 RAID (mdadm)
 
** 10GB from Dali to lo0 (adding rules on compute node routing tables as a possible fix)
 
** BIOS reset
 
* 10Gb, perfsonar, ...
 
* Monitoring: (Ganglia, Shinken)
 
** Getting consistency among all the machines(check_nrpe regularly stops working).
 
* Whedon: configured and available
 
* Change passwords (on everything). Postgres, shenken, ...
 
* Webcam on office whiteboard (new office location?)
 
* Learn virtual machine architecture and modules - Dave
 
** Document in a format for future admin training?
 
** Find existing introduction material
 
* Mirror ''control'' for testing, swapping, etc.
 
 
 
=== DONE (19 Jan 2017) ===
 
* Examine extra "layout" node. - Adam
 
** Differences are: Single PSU, Single GPGPU, No VGA.
 
** It has Infiniband and 10GB cards installed.
 
* Networking - Adam, Charlie
 
** IP over Infiniband working on layout
 
*** Resolved by resetting IB switch configuration: <code>ibwarn: [3349] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)</code>
 
 
 
=== FUTURE ===
 
* Centralized password database / manager / location
 
  
== Current Projects (updated 13 Oct 16) ==  
+
===Group and institution information===
* '''Groups and LDAP and sudo - James'''
+
* [[Sysadmin:CS-ITS Interoperability|Working with ITS]]
* <s>Amber - James</s>
+
* [[Sysadmin:Recurring spending | Recurring spending ]]
* <s>Edward's setup - Vitalli</s>
+
* [[Sysadmin:SlackAndGitLab | Slack and GitLab integration]]
* <s>WebDev access - Nirdesh<s>
 
* Puppet - James and Vitalii
 
* '''Bacula - Nirdesh'''
 
* SSL certificate upgrade and documentation - Kristin
 
* <s>Listserv merging with archives preserved - Nirdesh </s>
 
* '''Ganglia - Bret'''
 
* '''Shenken - Vitalii'''
 
** latency, UPS
 
* New Layout node - ? and ?
 
* Provision Sappho (compute) - after Puppet
 
* Provision Kahlo (storage) -
 
** replace broken drive
 
* I2 setup
 
** DTN, storage nodes, head nodes, ports in CST
 
* [[Sysadmin:WhedonProvisioning|Provision Whedon]] (compute) - after Puppet
 
* '''Shutdown and startup test - scheduled for Sunday 27 November'''
 
* Disk cleaning - Charlie
 
* <s>Password changing in the CS and cluster domains - Vitalii and James</s>
 
* Proto setup and maintenance with HIP/Green Science
 

Latest revision as of 08:32, 20 March 2024

This is the hub for the CS sysadmins on the wiki.

Overview

If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!

Server room

Our servers are in Noyes, the science building that predates the CST. For general information about the server room and how to use it, check out this page.

Columns: machine name, IPs, type (virtual, metal), purpose, dies, cores, RAM

Compute Resources

Machines and VMs related information here!

Network

We have two network fabrics linking the machines together. There are three subdomains.

10 Gb

We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses.

1 Gb (cluster, cs)

We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric.

Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network.

Intra-cluster fabrics

The layout cluster has an Infiniband infrastructure. Wachowski has only a 1Gb infrastructure.

Power

We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well.

HVAC

HVAC systems are static and are largely managed by Facilities.

See full topology diagrams here.

A word about what's happening between files and the drives they live on.

New sysadmins

These pages will be helpful for you if you're just starting in the group:

Note: you'll need to log in with wiki credentials to see most Sysadmin pages.

Additional information

These pages contain a lot of the most important information about our systems and how we operate.

Handy Tools

Technical docs

Common tasks

Group and institution information