Difference between revisions of "Sysadmin"
Jump to navigation
Jump to search
(→Current Projects (updated 15 Jan 2017)) |
(→Current Projects (updated 2017-10-26)) |
||
(45 intermediate revisions by 5 users not shown) | |||
Line 2: | Line 2: | ||
= Machines and Brief Descriptions of Services = | = Machines and Brief Descriptions of Services = | ||
+ | == CS Machines == | ||
+ | [[File:Server_layout_summer2017.jpg|thumb|200px|right|Server layout as of May 2017]] | ||
{| style="float:left; margin-right:2px;" | {| style="float:left; margin-right:2px;" | ||
| style="height:40px; width:150px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px #ADDFFF; font-size:120%;" | HOME <br> (vm0) | | style="height:40px; width:150px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px #ADDFFF; font-size:120%;" | HOME <br> (vm0) | ||
Line 37: | Line 39: | ||
| style="height:135px; width:150px; background-color: #EEDC82; border-left:solid 5px #EEDC82; border-bottom:solid 5px #EEDC82; border-right:solid 5px #EEDC82;" | Weather Monitoring <br> GPS/NTP <br> Energy Monitoring | | style="height:135px; width:150px; background-color: #EEDC82; border-left:solid 5px #EEDC82; border-bottom:solid 5px #EEDC82; border-right:solid 5px #EEDC82;" | Weather Monitoring <br> GPS/NTP <br> Energy Monitoring | ||
|} | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:40px; width:150px; text-align:center; background-color:#FF7E6D; border-left:solid 5px #FF7E6D; border-top:solid 5px #FF7E6D; border-bottom:solid 1px white; border-right:solid 5px #FF7E6D; font-size:120%;" | CONTROL | ||
+ | |- | ||
+ | | style="height:135px; width:150px; background-color:#FF7E6D; border-left:solid 5px #FF7E6D; border-bottom:solid 5px #FF7E6D; border-right:solid 5px #FF7E6D;" | Users <br> SSH <br> HOME <br> TOOLS | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:40px; width:150px; text-align:center; background-color:#54C571; border-left:solid 5px #54C571; border-top:solid 5px #54C571; border-bottom:solid 1px white; border-right:solid 5px #54C571; font-size:120%;" | SMILEY | ||
+ | |- | ||
+ | | style="height:135px; width:150px; background-color:#54C571; border-left:solid 5px #54C571; border-bottom:solid 5px #54C571; border-right:solid 5px #54C571;" | [[Sysadmin:XenDocs]] <br> NET <br> WEB | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:40px; width:150px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | SHINKEN | ||
+ | |- | ||
+ | | style="height:135px; width:150px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;" | Users <br> SSH <br> Add machines | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:40px; width:150px; text-align:center; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-top:solid 5px #C38EC7; border-bottom:solid 1px white; border-right:solid 5px #C38EC7; font-size:120%;" |MURPHY | ||
+ | |- | ||
+ | | style="height:135px; width:150px; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-bottom:solid 5px #C38EC7; border-right:solid 5px #C38EC7;" | Elderly email stack <br> Users <br> SSH | ||
+ | |} | ||
+ | |||
+ | <br> <br> <br> <br> <br> <br><br> <br> <br> <br> <br> <br> | ||
+ | |||
+ | == Cluster Machines == | ||
{| style="float:left; margin-right:2px;" | {| style="float:left; margin-right:2px;" | ||
| style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | HOPPER | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | HOPPER | ||
|- | |- | ||
− | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | Users <br> SSH <br> NFS <br> [[Sysadmin:Software Modules | Software Modules]] <br> | + | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | Users <br> SSH <br> NFS server <br> LDAP server <br> [[Sysadmin:Software Modules | Software Modules]] <br> PostgreSQL <br> Wiki <br> Apache2 <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]] |
|} | |} | ||
Line 47: | Line 77: | ||
| style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | DALI | | style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | DALI | ||
|- | |- | ||
− | | style="height:200px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | [[Sysadmin:Gitlab | Gitlab]] <br> Backups <br> NginX | + | | style="height:200px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | Storage Server <br>[[Sysadmin:Gitlab | Gitlab]] <br> Backups <br> NginX |
|} | |} | ||
Line 74: | Line 104: | ||
|} | |} | ||
− | <br><br><br><br><br><br><br><br><br><br><br><br><br> | + | {| style="float:left; margin-right:2px;" |
+ | | style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | KAHLO | ||
+ | |- | ||
+ | | style="height:200px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | Storage Server <br>Backups <br> NginX | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | BIGFE | ||
+ | |- | ||
+ | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | [[Sysadmin:Software Modules | Software Modules]] | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | T-VOC | ||
+ | |- | ||
+ | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | [[Sysadmin:Software Modules | Software Modules]] | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | ELWOOD | ||
+ | |- | ||
+ | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | [[Sysadmin:Software Modules | Software Modules]] | ||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> | ||
+ | |||
+ | == Switches == | ||
+ | |||
+ | |||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:175px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | SG538SF02J | ||
+ | |- | ||
+ | | style="height:200px; width:175px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc; font-size:80%;" | | ||
+ | *Model: HP Procurve 3400cl | ||
+ | *Ports: 24 | ||
+ | *Backplane bandwidth: | ||
+ | **88 Gbps | ||
+ | **64 million pps | ||
+ | *Memory: | ||
+ | **2MB packet buffer | ||
+ | **16 MB dual flash | ||
+ | **128 MB SDRAM | ||
+ | *Cut-through switching: No | ||
+ | *Unused as of May 12, 2017 | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:175px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | CN63FP762S | ||
+ | |- | ||
+ | | style="height:200px; width:175px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;font-size:80%;" | | ||
+ | *Model: HP 2530-24G | ||
+ | *Ports: 24 | ||
+ | *Switching Capacity: | ||
+ | **56 Gbps | ||
+ | **41.6 million pps | ||
+ | *Memory: | ||
+ | **1.5 MB packet buffer | ||
+ | **256 MB flash | ||
+ | **128 MB DDR3 DIMM | ||
+ | *Cut-through switching: No | ||
+ | *Connected to Al-Salam as of May 12, 2017 | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:175px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | SG525SG025 | ||
+ | |- | ||
+ | | style="height:200px; width:175px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;font-size:80%;" | | ||
+ | *Model: HP Procurve 3400cl | ||
+ | *Ports: 24 | ||
+ | *Backplane bandwidth: | ||
+ | **88 Gbps | ||
+ | **64 million pps | ||
+ | *Memory: | ||
+ | **2MB packet buffer | ||
+ | **16 MB dual flash | ||
+ | **128 MB SDRAM | ||
+ | *Cut-through switching: No | ||
+ | *Connected to layout and whedon as of May 12, 2017 | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:175px; text-align:center; background-color:#39ad39; border-left:solid 5px #39ad39; border-top:solid 5px #39ad39; border-bottom:solid 1px white; border-right:solid 5px #39ad39; font-size:120%;" | Netgear JGS524 | ||
+ | |- | ||
+ | | style="height:200px; width:175px; background-color:#39ad39; border-left:solid 5px #39ad39; border-bottom:solid 5px #39ad39; border-right:solid 5px #39ad39;font-size:80%;" | | ||
+ | *Current cluster head-node | ||
+ | *Unmanaged (no console/configuration) | ||
+ | *Ports: 24 | ||
+ | *Switching bandwidth: | ||
+ | **48 Gbps | ||
+ | **1.5 million pps | ||
+ | *Memory: | ||
+ | **2MB packet buffer | ||
+ | *Cut-through switching: No | ||
+ | *Connected to Al-Salam, Hopper, Pollock, Nagios, Dali, Kahlo, Bronte as of May 12, 2017 | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:175px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | cs-main | ||
+ | |- | ||
+ | | style="height:200px; width:175px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;font-size:80%;" | | ||
+ | *Model: HP 5920AF-24XG | ||
+ | *Ports: 24 | ||
+ | *Backplane bandwidth: | ||
+ | **480 Gbps | ||
+ | **367 million pps | ||
+ | *Memory: | ||
+ | **3.6 GB packet buffer | ||
+ | **256 MB dual flash | ||
+ | **2 GB SDRAM | ||
+ | *Cut-through switching: Yes | ||
+ | *IP Address: 159.28.31.66 | ||
+ | *Connected to layout, kahlo, and dali as of May 12, 2017 | ||
+ | |} | ||
+ | |||
+ | {| style="float:left; margin-right:2px;" | ||
+ | | style="height:55px; width:175px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px #ADDFFF; font-size:120%;" | 5500denniscs-sw1 | ||
+ | |- | ||
+ | | style="height:200px; width:175px; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-bottom:solid 5px #ADDFFF; border-right:solid 5px #ADDFFF;font-size:80%;" | | ||
+ | *Model: HP 5500 JG542A | ||
+ | *Ports: 24 | ||
+ | *Backplane bandwidth: | ||
+ | **224 Gbps | ||
+ | **166.6 million pps | ||
+ | *Memory: | ||
+ | **6 MB packet buffer | ||
+ | **512 MB dual flash | ||
+ | **1 GB SDRAM | ||
+ | *Cut-through switching: No | ||
+ | *IP Address: 159.28.31.67 | ||
+ | *Connected to Babbage, Control, Nagios, and the cluster's netgear switch (via port 14) as of May 12, 2017 | ||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> | ||
= Systems Administration Documentation = | = Systems Administration Documentation = | ||
Line 162: | Line 344: | ||
|} | |} | ||
− | == Current Projects (updated | + | == Current Projects (updated 2017-10-26) == |
=== TODO === | === TODO === | ||
+ | * <s>Finish migrating tools and home to smiley</s> migrate web and net back to control | ||
+ | ** Record consistent & thorough documentation, especially concerning the startup and shutdown of the VMs | ||
+ | * Setup graceful shutdown when we detect to be running solely off UPS | ||
+ | ** Additionally, setup clean shutdown and startup for VMs on <s>smiley</s> control (?) | ||
+ | * Fix reverse lookup error for mail.cs.earlham.edu | ||
+ | ** Should consistently refer to 159.28.22.2 (web.cs.earlham.edu) | ||
+ | ** It's possible that this isn't actually broken. | ||
* Layout infiniband subnet manager | * Layout infiniband subnet manager | ||
* Layout disk swap, new lo0 | * Layout disk swap, new lo0 | ||
− | * HP Al-Salam switch enable jumboframes | + | ** <s>Redo /scratch for mglerner group on /media/r10_vol</s> ? |
+ | * Migrate Elwood, BigFe, t-voc to repurposed Lovelace Machines (Eli) | ||
+ | * <s>HP Al-Salam switch enable jumboframes</s> ? | ||
+ | * Install Haskell & associated tools on Lovelace machines | ||
+ | ** Also document which 6 lovelace computers are currently set-up (e.g. l1, l2, etc) | ||
+ | ** update misc software on those machines (locally!) | ||
+ | * Strike unused lovelace machine addresses from CS DNS file | ||
+ | ** Perhaps there's a python file in root's home somewhere that checks for unused DNS/DHCP addresses? | ||
− | == | + | == Ongoing Projects (Spring 2017) == |
=== TODO === | === TODO === | ||
* EMAILING ALL THE USERS https://wiki.cs.earlham.edu/index.php/Sysadmin:Old:Contacting_All_Users | * EMAILING ALL THE USERS https://wiki.cs.earlham.edu/index.php/Sysadmin:Old:Contacting_All_Users |
Revision as of 12:37, 26 October 2017
Machines and Brief Descriptions of Services
CS Machines
HOME (vm0) |
Users SSH NFS |
NET (vm1) |
LDAP server DNS DHCP |
WEB (vm2) |
Mailman Mail Stack Apache2 PostgresQL MySQL Wiki |
TOOLS (vm3) |
SageNB Server Jupyterhub Server Software Modules NginX |
BABBAGE |
Firewall |
PROTO |
Weather Monitoring GPS/NTP Energy Monitoring |
CONTROL |
Users SSH HOME TOOLS |
SMILEY |
Sysadmin:XenDocs NET WEB |
SHINKEN |
Users SSH Add machines |
MURPHY |
Elderly email stack Users SSH |
Cluster Machines
HOPPER |
Users SSH NFS server LDAP server Software Modules PostgreSQL Wiki Apache2 DNS DHCP |
DALI |
Storage Server Gitlab Backups NginX |
AL-SALAM |
WebMO Software Modules Apache2 |
LAYOUT |
Jupyterhub Server Software Modules NginX Apache2 WebMO |
BRONTE |
Software Modules |
POLLOCK |
Software Modules WebMO NginX |
KAHLO |
Storage Server Backups NginX |
BIGFE |
Software Modules |
T-VOC |
Software Modules |
ELWOOD |
Software Modules |
Switches
SG538SF02J |
|
CN63FP762S |
|
SG525SG025 |
|
Netgear JGS524 |
|
cs-main |
|
5500denniscs-sw1 |
|
Systems Administration Documentation
For old documentation, see: Old Wiki Information
Services |
Current Projects (updated 2017-10-26)
TODO
Finish migrating tools and home to smileymigrate web and net back to control- Record consistent & thorough documentation, especially concerning the startup and shutdown of the VMs
- Setup graceful shutdown when we detect to be running solely off UPS
- Additionally, setup clean shutdown and startup for VMs on
smileycontrol (?)
- Additionally, setup clean shutdown and startup for VMs on
- Fix reverse lookup error for mail.cs.earlham.edu
- Should consistently refer to 159.28.22.2 (web.cs.earlham.edu)
- It's possible that this isn't actually broken.
- Layout infiniband subnet manager
- Layout disk swap, new lo0
Redo /scratch for mglerner group on /media/r10_vol?
- Migrate Elwood, BigFe, t-voc to repurposed Lovelace Machines (Eli)
HP Al-Salam switch enable jumboframes?- Install Haskell & associated tools on Lovelace machines
- Also document which 6 lovelace computers are currently set-up (e.g. l1, l2, etc)
- update misc software on those machines (locally!)
- Strike unused lovelace machine addresses from CS DNS file
- Perhaps there's a python file in root's home somewhere that checks for unused DNS/DHCP addresses?
Ongoing Projects (Spring 2017)
TODO
- EMAILING ALL THE USERS https://wiki.cs.earlham.edu/index.php/Sysadmin:Old:Contacting_All_Users
- SHUTDOWN SCHEDULED FOR SUNDAY (APRIL 16)
- Check/update instructions - one version is at https://wiki.cs.earlham.edu/index.php/Sysadmin:ImportantInfo:PowerFailure, there are others too
- Notify users
- Fix certs for gitlab, etc.
- Secure 1-2 admins for the summer
- Prep layout for May-June usage
- Practice shutdown-startup procedure (with Michael)
- Nsswitch consistency across all machines
- Document tools: startup / shutdown - Charlie
- Use Sysadmin namespace for all our pages - All
- Testing usefulness of documentation - Dave
- Al Salam: configure switch, re-rack. - Vitalii
- HP switch should be reset and tested.
- LDAP cleanup of system users / old groups - James
- Layout - Nirdesh
- Lo0 RAID (mdadm)
- 10GB from Dali to lo0 (adding rules on compute node routing tables as a possible fix)
- BIOS reset
- 10Gb, perfsonar, ...
- Monitoring: (Ganglia, Shinken)
- Getting consistency among all the machines(check_nrpe regularly stops working).
- Whedon: configured and available
- Change passwords (on everything). Postgres, shenken, ...
- Webcam on office whiteboard (new office location?)
- Learn virtual machine architecture and modules - Dave
- Document in a format for future admin training?
- Find existing introduction material
- Mirror control for testing, swapping, etc.
DONE (19 Jan 2017)
- Examine extra "layout" node. - Adam
- Differences are: Single PSU, Single GPGPU, No VGA.
- It has Infiniband and 10GB cards installed.
- Networking - Adam, Charlie
- IP over Infiniband working on layout
- Resolved by resetting IB switch configuration:
ibwarn: [3349] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)
- Resolved by resetting IB switch configuration:
- IP over Infiniband working on layout
FUTURE
- Centralized password database / manager / location
Current Projects (updated 13 Oct 16)
- Groups and LDAP and sudo - James
Amber - JamesEdward's setup - VitalliWebDev access - Nirdesh- Puppet - James and Vitalii
- Bacula - Nirdesh
- SSL certificate upgrade and documentation - Kristin
Listserv merging with archives preserved - Nirdesh- Ganglia - Bret
- Shenken - Vitalii
- latency, UPS
- New Layout node - ? and ?
- Provision Sappho (compute) - after Puppet
- Provision Kahlo (storage) -
- replace broken drive
- I2 setup
- DTN, storage nodes, head nodes, ports in CST
- Provision Whedon (compute) - after Puppet
- Shutdown and startup test - scheduled for Sunday 27 November
- Disk cleaning - Charlie
Password changing in the CS and cluster domains - Vitalii and James- Proto setup and maintenance with HIP/Green Science