Difference between revisions of "Sysadmin"

From Earlham CS Department
Jump to navigation Jump to search
m (Systems Administration Documentation)
(Current Projects)
Line 312: Line 312:
 
* [[Web logins]]
 
* [[Web logins]]
 
* [[Password management]]
 
* [[Password management]]
* [[Authentication with keys]]
 
 
* [[Docker and WebODM on Bronte]]
 
* [[Docker and WebODM on Bronte]]
* Check on qsub - add to pollock and bronte
 
* WebMO, nginx on pollock
 
 
* [[Fix shinken server access]]
 
* [[Fix shinken server access]]
 
* [[Verify Lovelace DNS]] - check what machines we have vs. what's in the DNS file
 
* [[Verify Lovelace DNS]] - check what machines we have vs. what's in the DNS file
* [[Software audit]]
 
 
* [[Layout Layout]]
 
* [[Layout Layout]]
 
* Backup in Lilly basement
 
* Backup in Lilly basement
Line 324: Line 320:
 
* Accounting for hours logged
 
* Accounting for hours logged
 
* key-based access to whedon, pollock, bronte (not just passwords)
 
* key-based access to whedon, pollock, bronte (not just passwords)
* Upgrades
 
 
* Power map additions and updates
 
* Power map additions and updates
* [[Ghost machine]]
 
 
* [[Backup on all machines]] - includes backup.cs.e.e (indiana?)
 
* [[Backup on all machines]] - includes backup.cs.e.e (indiana?)
* [[Automated user emails]]
 
 
* [[Fix Lovelace machines]]
 
* [[Fix Lovelace machines]]
* [[Upgrade CentOS on Pollock and Bronte]]
 
  
 
Post-shutdown, here are things that need fixed, updated, etc.:
 
Post-shutdown, here are things that need fixed, updated, etc.:
* Tools: Jupyter starts but with the wrong config file
+
* Al-salam: PDU was powered off when we got to the basement - nodes 1-4 and 9-12 are connected to the PDU so they were powered down. Unsure when 9-12 were connected to the PDU; 1-4 were the only al-salam nodes connected this summer.
* Al-salam: PDU was powered off when we got to the basement
 
** nodes 1-4 and 9-12 are connected to the PDU so they were powered down. Unsure when 9-12 were connected to the PDU; 1-4 were the only al-salam nodes connected this summer.
 
* Compute nodes
 
** AS compute nodes down: 1-4, 9-12, off
 
 
* Sudo for whedon only pwd required
 
* Sudo for whedon only pwd required
* Ghost IP ping requests start timing out intermittently. On as0?
 
 
* Hard to force shutdown on hopper
 
* Hard to force shutdown on hopper
 
* Babbage slow to shut down, had to reboot (even in the shell, shutdown -h now only rebooted it)
 
* Babbage slow to shut down, had to reboot (even in the shell, shutdown -h now only rebooted it)

Revision as of 14:17, 12 December 2018


Machines and Brief Descriptions of Services

CS Machines

Server layout as of May 2017
NET
(vm1)
LDAP Server
DNS
DHCP

Backup to Dali: etc, var
WEB
(vm2)
Mailman
Mail Stack
Apache2
PostgresQL
MySQL
Wiki

Backup to Dali: etc, var
TOOLS
(vm3)
SageNB Server
Jupyterhub Server
Software Modules
NginX
SSH
Users

Backup to Dali: etc, var, mnts, sage
BABBAGE
Firewall
PROTO
Weather Monitoring
GPS/NTP
Energy Monitoring

Backup to Dali: etc, var
CONTROL
Users
SSH
HOME
TOOLS

Backup to Dali: etc, var
SMILEY
XenDocs
NET
WEB
NFS

Backup to Dali: etc, var
SHINKEN
Users
SSH
Add machines
MURPHY
Elderly email stack
Users
SSH














Cluster Machines

HOPPER
Users
SSH
NFS server
LDAP server
Software Modules
PostgreSQL
Wiki
Apache2
DNS
DHCP

Backup to Indiana: etc, var, cluster
INDIANA
New Storage Server
DALI
Storage Server
Gitlab
Backups
NginX

Backup to Indiana (/media/r10_vol/backups/): etc, var/opt/gitlab/backups
AL-SALAM
WebMO
Software Modules
Apache2

Backup to Indiana: etc, var
WHEDON
Software Modules

Backups to Indiana: etc, var
LAYOUT
Jupyterhub Server
Software Modules
NginX
Apache2
WebMO

Backup to Indiana: etc, var
BRONTE
Software Modules

Backup to Indiana: etc, var, nbserver
POLLOCK
Software Modules
WebMO
NginX

Backup to Indiana: etc, var
KAHLO
Storage Server
Backups
NginX

Backup to Indiana: etc, var
BIGFE
Software Modules

Hosts BCCD related repositories and distributions.
T-VOC
Software Modules
ELWOOD
Software Modules

Used by BCCD to host www.bccd.net and www.littlefe.net. Will be deprecated when BCCD project offloads their sites onto cloud-based hosting platforms.
krasner
Docker platform on an old lovelace machine upgraded to have 16GB of RAM.










































Switches

SG538SF02J
  • Model: HP Procurve 3400cl
  • Ports: 24
  • Backplane bandwidth:
    • 88 Gbps
    • 64 million pps
  • Memory:
    • 2MB packet buffer
    • 16 MB dual flash
    • 128 MB SDRAM
  • Cut-through switching: No
  • Unused as of May 12, 2017
CN63FP762S
  • Model: HP 2530-24G
  • Ports: 24
  • Switching Capacity:
    • 56 Gbps
    • 41.6 million pps
  • Memory:
    • 1.5 MB packet buffer
    • 256 MB flash
    • 128 MB DDR3 DIMM
  • Cut-through switching: No
  • Connected to Al-Salam as of May 12, 2017
SG525SG025
  • Model: HP Procurve 3400cl
  • Ports: 24
  • Backplane bandwidth:
    • 88 Gbps
    • 64 million pps
  • Memory:
    • 2MB packet buffer
    • 16 MB dual flash
    • 128 MB SDRAM
  • Cut-through switching: No
  • Connected to layout and whedon as of May 12, 2017
Netgear JGS524
  • Current cluster head-node
  • Unmanaged (no console/configuration)
  • Ports: 24
  • Switching bandwidth:
    • 48 Gbps
    • 1.5 million pps
  • Memory:
    • 2MB packet buffer
  • Cut-through switching: No
  • Connected to Al-Salam, Hopper, Pollock, Nagios, Dali, Kahlo, Bronte as of May 12, 2017
cs-main
  • Model: HP 5920AF-24XG
  • Ports: 24
  • Backplane bandwidth:
    • 480 Gbps
    • 367 million pps
  • Memory:
    • 3.6 GB packet buffer
    • 256 MB dual flash
    • 2 GB SDRAM
  • Cut-through switching: Yes
  • IP Address: 159.28.31.66
  • Connected to layout, kahlo, and dali as of May 12, 2017
5500denniscs-sw1
  • Model: HP 5500 JG542A
  • Ports: 24
  • Backplane bandwidth:
    • 224 Gbps
    • 166.6 million pps
  • Memory:
    • 6 MB packet buffer
    • 512 MB dual flash
    • 1 GB SDRAM
  • Cut-through switching: No
  • IP Address: 159.28.31.67
  • Connected to Babbage, Control, Nagios, and the cluster's netgear switch (via port 14) as of May 12, 2017


































Systems Administration Documentation

For old documentation, see: Old Wiki Information

Current Projects

This is the list we will work from in addition to service requests.

Some important procedural pages:

Please update specific projects at their own page.

Post-shutdown, here are things that need fixed, updated, etc.:

  • Al-salam: PDU was powered off when we got to the basement - nodes 1-4 and 9-12 are connected to the PDU so they were powered down. Unsure when 9-12 were connected to the PDU; 1-4 were the only al-salam nodes connected this summer.
  • Sudo for whedon only pwd required
  • Hard to force shutdown on hopper
  • Babbage slow to shut down, had to reboot (even in the shell, shutdown -h now only rebooted it)
  • Mounting FS in smiley, had to run: mount --source=/dev/vmdata/eccs-home-disk/ --target=/smiley-eccs-home-disk
  • Pollock needed manual ifup
  • ganglia monitoring comes back up on some nodes (definitely on head nodes) but needs to be started on compute nodes
  • Are sysadmin accounts backing up to anywhere?
  • How much power are we drawing at max from everything? (PDU, burnout, etc.)