|
|
(274 intermediate revisions by 16 users not shown) |
Line 1: |
Line 1: |
− | __NOTOC__
| + | This is the hub for the CS sysadmins on the wiki. |
| | | |
− | = Machines and Brief Descriptions of Services = | + | = Overview = |
− | {| style="float:left; margin-right:2px;"
| |
− | | style="height:40px; width:150px; text-align:center; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-top:solid 5px #ADDFFF; border-bottom:solid 1px white; border-right:solid 5px #ADDFFF; font-size:120%;" | HOME <br> (vm0)
| |
− | |-
| |
− | | style="height:135px; width:150px; background-color:#ADDFFF; border-left:solid 5px #ADDFFF; border-bottom:solid 5px #ADDFFF; border-right:solid 5px #ADDFFF;" | Users <br> SSH <br> NFS
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | [https://docs.google.com/drawings/d/1XaULz5IxXV_BZQjrko3QJ8wV5aXsSTYcSWxxT49OyZk/edit If you're visually inclined, we have a colorful and easy-to-edit map of our servers here!] |
− | | style="height:40px; width:150px; text-align:center; background-color:#54C571; border-left:solid 5px #54C571; border-top:solid 5px #54C571; border-bottom:solid 1px white; border-right:solid 5px #54C571; font-size:120%;" | NET <br> (vm1)
| |
− | |-
| |
− | | style="height:135px; width:150px; background-color:#54C571; border-left:solid 5px #54C571; border-bottom:solid 5px #54C571; border-right:solid 5px #54C571;" | LDAP server <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]]
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | == Server room == |
− | | style="height:40px; width:150px; text-align:center; background-color:#E77471; border-left:solid 5px #E77471; border-top:solid 5px #E77471; border-bottom:solid 1px white; border-right:solid 5px #E77471; font-size:120%;" | WEB <br> (vm2)
| |
− | |-
| |
− | | style="height:135px; width:150px; background-color:#E77471; border-left:solid 5px #E77471; border-bottom:solid 5px #E77471; border-right:solid 5px #E77471;" | Mailman <br> [[Sysadmin:Mail Stack | Mail Stack]]<br> Apache2 <br> PostgresQL <br> MySQL <br> Wiki
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | Our servers are in Noyes, the science building that predates the CST. For general information about the server room and how to use it, check out [[Sysadmin:Server Room|this page]]. |
− | | style="height:40px; width:150px; text-align:center; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-top:solid 5px #C38EC7; border-bottom:solid 1px white; border-right:solid 5px #C38EC7; font-size:120%;" | TOOLS <br> (vm3)
| |
− | |-
| |
− | | style="height:135px; width:150px; background-color:#C38EC7; border-left:solid 5px #C38EC7; border-bottom:solid 5px #C38EC7; border-right:solid 5px #C38EC7;" | [[Sysadmin:SageNB Server | SageNB Server]] <br> [[Sysadmin:Jupyterhub Notebook Server | Jupyterhub Server]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> NginX
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | Columns: machine name, IPs, type (virtual, metal), purpose, dies, cores, RAM |
− | | style="height:55px; width:150px; text-align:center; background-color:#E3A869; border-left:solid 5px #E3A869; border-top:solid 5px #E3A869; border-bottom:solid 1px white; border-right:solid 5px #E3A869; font-size:120%;" | BABBAGE
| |
− | |-
| |
− | | style="height:135px; width:150px; background-color: #E3A869; border-left:solid 5px #E3A869; border-bottom:solid 5px #E3A869; border-right:solid 5px #E3A869;" | [[Sysadmin:Firewall | Firewall]]
| |
− | |}
| |
| | | |
− | {|
| + | == Compute Resources == |
− | | style="height:55px; width:150px; text-align:center; background-color:#EEDC82; border-left:solid 5px #EEDC82; border-top:solid 5px #EEDC82; border-bottom:solid 1px white; border-right:solid 5px #EEDC82; font-size:120%;" | [[Sysadmin:Servers:Proto | PROTO]]
| |
− | |-
| |
− | | style="height:135px; width:150px; background-color: #EEDC82; border-left:solid 5px #EEDC82; border-bottom:solid 5px #EEDC82; border-right:solid 5px #EEDC82;" | Weather Monitoring <br> GPS/NTP <br> Energy Monitoring
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | [https://wiki.cs.earlham.edu/index.php/Sysadmin:Computer_Resources Machines and VMs related information here!] |
− | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | HOPPER
| |
− | |-
| |
− | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | Users <br> SSH <br> NFS <br> [[Sysadmin:Software Modules | Software Modules]] <br> PostgresQL <br> Wiki <br> Apache2 <br> [[Sysadmin:DNS & DHCP | DNS]] <br> [[Sysadmin:DNS & DHCP | DHCP]]
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | == Network == |
− | | style="height:55px; width:150px; text-align:center; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-top:solid 5px #ffdb4d; border-bottom:solid 1px white; border-right:solid 5px #ffdb4d; font-size:120%;" | DALI
| |
− | |-
| |
− | | style="height:200px; width:150px; background-color:#ffdb4d; border-left:solid 5px #ffdb4d; border-bottom:solid 5px #ffdb4d; border-right:solid 5px #ffdb4d;" | [[Sysadmin:Gitlab | Gitlab]] <br> Backups <br> NginX
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | We have two network fabrics linking the machines together. There are three subdomains. |
− | | style="height:55px; width:150px; text-align:center; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-top:solid 5px #ff4d94; border-bottom:solid 1px white; border-right:solid 5px #ff4d94; font-size:120%;" | AL-SALAM
| |
− | |-
| |
− | | style="height:200px; width:150px; background-color:#ff4d94; border-left:solid 5px #ff4d94; border-bottom:solid 5px #ff4d94; border-right:solid 5px #ff4d94;" | WebMO <br> [[Sysadmin:Software Modules | Software Modules]] <br> Apache2
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | === 10 Gb === |
− | | style="height:55px; width:150px; text-align:center; background-color:#39ad39; border-left:solid 5px #39ad39; border-top:solid 5px #39ad39; border-bottom:solid 1px white; border-right:solid 5px #39ad39; font-size:120%;" | LAYOUT
| |
− | |-
| |
− | | style="height:200px; width:150px; background-color:#39ad39; border-left:solid 5px #39ad39; border-bottom:solid 5px #39ad39; border-right:solid 5px #39ad39;" | [[Sysadmin:Jupyterhub Notebook Server | Jupyterhub Server]] <br> [[Sysadmin:Software Modules | Software Modules]] <br> NginX <br> Apache2 <br> WebMO
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | We have 10Gb fabric to mount files over NFS. Machines with 10Gb support have an IP address in the class C range 10.10.10.0/24 and we want to add DNS to these addresses. |
− | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | BRONTE
| |
− | |-
| |
− | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | [[Sysadmin:Software Modules | Software Modules]]
| |
− | |}
| |
| | | |
− | {| style="float:left; margin-right:2px;"
| + | === 1 Gb (cluster, cs) === |
− | | style="height:55px; width:150px; text-align:center; background-color:#0099cc; border-left:solid 5px #0099cc; border-top:solid 5px #0099cc; border-bottom:solid 1px white; border-right:solid 5px #0099cc; font-size:120%;" | POLLOCK
| |
− | |-
| |
− | | style="height:200px; width:150px; background-color:#0099cc; border-left:solid 5px #0099cc; border-bottom:solid 5px #0099cc; border-right:solid 5px #0099cc;" | [[Sysadmin:Software Modules | Software Modules]] <br> WebMO <br> NginX
| |
− | |}
| |
| | | |
− | <br><br><br><br><br><br><br><br><br><br><br><br><br>
| + | We have two class C subnets on the 1Gb fabric: 159.28.22.0/24 (CS) and 159.28.23.0/24 (cluster). This means we have double the IP addresses on the 1Gb fabric that we have on the 10Gb fabric. |
| | | |
− | = Systems Administration Documentation =
| + | Any user accessing *.cluster.earlham.edu and *.cs.earlham.edu is making calls on a 1Gb network. |
− | For old documentation, see: [[Sysadmin:Old | Old Wiki Information]]
| |
| | | |
− | {|
| + | === Intra-cluster fabrics === |
− | |- valign:"top"
| |
− | |
| |
− | <div style="border:10px solid #E0EAF8; padding:5px; width:230px; height:500px">
| |
− | <div style="background-color:#CEDEF4; padding:5px;">
| |
| | | |
− | === Admin Tasks === | + | The layout cluster has an Infiniband infrastructure. Wachowski has only a 1Gb infrastructure. |
− | </div>
| + | |
− | * [[Sysadmin:Nagios | Nagios Monitoring ]]
| + | == Power == |
− | * [[Sysadmin:Shinken | Shinken Monitoring ]]
| + | |
− | * [[Sysadmin:Upgrading SSL Certificate | Upgrading SSL Certificates ]]
| + | We have a backup power supply, with batteries last upgraded in 2019 (?). We’ve had a few outages since then and power has held up well. |
− | * [[Sysadmin:User Management | User Management]]
| + | |
− | * [[Newmodules | Installing software under modules ]]
| + | == HVAC == |
− | * [[Sysadmin:Backup|Backup]]
| + | |
− | * [[Sysadmin:Contacting all users|Contacting all users]]
| + | HVAC systems are static and are largely managed by Facilities. |
− | * [[Sysadmin:New Sysadmins | Welcoming a new sysadmin to the fold]]
| |
− | * [[Sysadmin:AddComputer|Add a computer]]
| |
− | * [[Sysadmin:Setting up Lovelace Lab Machines | Setting up Lovelace Lab Machines]]
| |
− | * [[Reset password]]
| |
| | | |
| + | [[Topology|See full topology diagrams here.]] |
| | | |
− | <!-- This has to stay as part of the formatting -->
| + | [[Sysadmin:Layers of abstraction for filesystems|A word about what's happening between files and the drives they live on.]] |
− | </div>
| |
− | | style="float:left;" |
| |
− | |
| |
− | <div style="border:10px solid #FFDFFF; padding:5px; width:230px; height:500px;">
| |
− | <div style="background-color:#FFCEFF; padding:5px;">
| |
| | | |
− | === Services === | + | = New sysadmins = |
− | </div>
| |
− | * [[Sysadmin:Services:Apache2|Apache2]]
| |
− | * [[Sysadmin:Services:Databases|Databases]]
| |
− | * [[Sysadmin:DNS & DHCP|DNS and DHCP]]
| |
− | * [[Sysadmin:Services:Virtualization | Virtualization]]
| |
− | * [[Sysadmin:Services:XenServerSetup | Xen Server]]
| |
| | | |
− | <!-- This has to stay as part of the formatting -->
| + | These pages will be helpful for you if you're just starting in the group: |
− | </div>
| |
− | | style="float:left;" |
| |
− | |
| |
− | <div style="border:10px solid #F0DDD5; padding:5px; width:230px; height:500px;">
| |
− | <div style="background-color:#E4C0B1; padding:5px;">
| |
| | | |
− | === Miscellaneous ===
| + | * [[Sysadmin:New Sysadmins | Welcoming a new sysadmin ]] |
− | </div>
| + | * [[Sysadmin:Troubleshooting|General troubleshooting tips for admins]] |
− | * [[ShutdownProcedure| Shutdown and Boot up]]
| + | * [[Sandbox Notes|Sandbox Notes]] |
− | * [[SysadminContactInfo| Contact Information]]
| + | * [[Password managers]] |
− | * [[Sysadmin:ImportantInfo:PhoneNumbers| Phone Numbers]] | + | * [[Server safety]] |
− | * [[Sysadmin:ImportantInfo:AuthenticationInfo| Authentication Information]] | + | * [https://code.cs.earlham.edu/sysadmin/ticket-tracker Ticket tracking for current projects] |
− | * [[Sysadmin:ImportantInfo:UPS| UPS]] | |
− | * [[Sysadmin:ImportantInfo:SSLcerts| Generating SSL Certificates]] | |
− | * [[Sysadmin:Power draws| Power draws]] | |
− | * [[Sysadmin:ImportantInfo:SunHardware|Working with Sun Hardware]] | |
− | * [[Sysadmin:Passwords]]
| |
− | * Patching
| |
− | ** [[LinuxKernelPatching|Linux Kernel Patching]]
| |
− | * [[Sysadmin:SerialConsoleCableEnds|Cable Ends]]
| |
− | * [[Sysadmin:VirtualizationComparison|NEW Virtualization Comparison]]
| |
| | | |
− | <!-- This has to stay as part of the formatting -->
| + | Note: you'll need to log in with wiki credentials to see most Sysadmin pages. |
− | </div>
| |
− | | style="float:left;" |
| |
− | |
| |
− | <div style="border:10px solid #D6F8DE; padding:5px; width:230px; height:500px;">
| |
− | <div style="background-color:#BDF4CB; padding:5px;">
| |
| | | |
− | === Networking === | + | = Additional information = |
− | </div>
| |
− | * [[Sysadmin:Networking:NetworkLayout|Network Layout (as of 08/2006)]]
| |
− | * [[Sysadmin:Networking:D224 cable plant|D224 cable plant]]
| |
− | * [[Sysadmin:Networking:Fiber plans|Fiber plans]]
| |
− | * [[Sysadmin:Networking:Switches|Switches]]
| |
− | * [[Sysadmin:Networking:Rack notes|Rack notes]]
| |
− | * [[Sysadmin:Networking:Public|Public Network]]
| |
− | * [[Sysadmin:Networking:NetworkTopo|Old Network Topo Figures]]
| |
− | * [[Sysadmin:Networking:NetworkDiagram|Network layout (May 2007)]]
| |
− | * [[Sysadmin:Networking:Alternate Network Path|Alt Network path]]
| |
− | * [[Sysadmin:UPS Setup]]
| |
| | | |
− | <!-- This has to stay as part of the formatting -->
| + | These pages contain a lot of the most important information about our systems and how we operate. |
− | </div>
| |
− | |}
| |
| | | |
− | == Current Projects (updated 2017-04-27) == | + | ===Handy Tools=== |
− | === TODO === | + | * [http://monitor.cluster.earlham.edu:8088/packages Porter's Package Explorer] |
− | * Layout infiniband subnet manager
| |
− | * Layout disk swap, new lo0
| |
− | * HP Al-Salam switch enable jumboframes | |
| | | |
− | == On Going Projects (updated 15 Jan 2017) == | + | ===Technical docs=== |
− | === TODO === | |
− | * EMAILING ALL THE USERS https://wiki.cs.earlham.edu/index.php/Sysadmin:Old:Contacting_All_Users
| |
− | * SHUTDOWN SCHEDULED FOR SUNDAY (APRIL 16)
| |
− | ** Check/update instructions - one version is at https://wiki.cs.earlham.edu/index.php/Sysadmin:ImportantInfo:PowerFailure, there are others too
| |
− | ** Notify users
| |
− | * Fix certs for gitlab, etc.
| |
− | * Secure 1-2 admins for the summer
| |
− | * Prep layout for May-June usage
| |
− | * Practice shutdown-startup procedure (with Michael)
| |
− | * Nsswitch consistency across all machines
| |
− | * Document tools: startup / shutdown - Charlie
| |
− | * Use Sysadmin namespace for all our pages - All
| |
− | ** Testing usefulness of documentation - Dave
| |
− | * Al Salam: configure switch, re-rack. - Vitalii
| |
− | ** HP switch should be reset and tested.
| |
− | * LDAP cleanup of system users / old groups - James
| |
− | * Layout - Nirdesh
| |
− | ** Lo0 RAID (mdadm)
| |
− | ** 10GB from Dali to lo0 (adding rules on compute node routing tables as a possible fix)
| |
− | ** BIOS reset
| |
− | * 10Gb, perfsonar, ...
| |
− | * Monitoring: (Ganglia, Shinken)
| |
− | ** Getting consistency among all the machines(check_nrpe regularly stops working).
| |
− | * Whedon: configured and available
| |
− | * Change passwords (on everything). Postgres, shenken, ...
| |
− | * Webcam on office whiteboard (new office location?)
| |
− | * Learn virtual machine architecture and modules - Dave
| |
− | ** Document in a format for future admin training?
| |
− | ** Find existing introduction material
| |
− | * Mirror ''control'' for testing, swapping, etc.
| |
| | | |
− | === DONE (19 Jan 2017) ===
| + | * [https://code.cs.earlham.edu/sysadmin/ticket-tracker Ticket tracking for current projects] |
− | * Examine extra "layout" node. - Adam | + | * [[Server safety]] |
− | ** Differences are: Single PSU, Single GPGPU, No VGA. | + | * [[Sysadmin:Backup|Backup]] |
− | ** It has Infiniband and 10GB cards installed. | + | * [[Sysadmin:Monitoring | Monitoring ]] |
− | * Networking - Adam, Charlie | + | * [[Sysadmin:SSH|SSH info relevant to admins]] |
− | ** IP over Infiniband working on layout | + | * [[Sysadmin:User Management | User Management]] and [[Sysadmin:LDAP|LDAP]] generally |
− | *** Resolved by resetting IB switch configuration: <code>ibwarn: [3349] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)</code> | + | * [[Sysadmin:Jupyterhub Notebook Server|Jupyterhub]] and [[Nbgrader notes|NBGrader]] |
| + | * [[Sysadmin:MailStack|Email service]] |
| + | * [[Sysadmin:XenDocs | Xen Server]] |
| + | * [[Sysadmin:NFS|Network File System (NFS)]] |
| + | * [[Sysadmin:Web Servers|Web Servers and Websites]] |
| + | * [[Sysadmin:Services:Databases|Databases]] |
| + | * [[Sysadmin:DNS & DHCP|DNS and DHCP]] |
| + | * [[Sysadmin:AWS|AWS]] |
| + | * [[Bash_start_up_script|Bash startup scripts]] |
| + | * [[Sysadmin:VirtualBox | VirtualBox]] |
| + | * [[X Applications]] |
| + | * [[Sysadmin:Services:ClusterOverview|Cluster Overview]] and [[Sysadmin:Ccg-admin|additional details]] |
| + | * [[Sysadmin:Firewall|Firewall]] running on babbage.cs.e.e |
| + | * [[Sysadmin:Setting_up_Lovelace_Lab_Machines|Setting up Lab Machines]] |
| | | |
− | === FUTURE === | + | ===Common tasks=== |
− | * Centralized password database / manager / location | + | * [[Sysadmin:Recurring Tasks | Recurring tasks - e.g. software updates, hardware replacements]] |
| + | * [[Sysadmin:Contacting all users|Contacting all users]] |
| + | * [[Reset password]] |
| + | * [[Sysadmin:Software installation | Software installation]] |
| + | * [[Modules | Installing software under modules ]] |
| + | * [[Sysadmin:AddComputer|Add a computer to CS or cluster domains]] |
| + | * [[Senior projects|Supporting senior projects]] |
| + | * [[ShutdownProcedure|How to do a planned shutdown and reboot of the system]] |
| + | ** [[Sysadmin:TestingServices | Testing services]] (after a reboot, upgrade, change in the phase of the moon, etc.) |
| + | * [[Sysadmin:Upgrading SSL Certificate | Upgrading SSL Certificates ]] |
| + | * [[Sysadmin:Launch at startup|Launch a process at startup]] |
| + | * [[Sysadmin:Psql-setup | setup psql for cs430 students]] |
| | | |
− | == Current Projects (updated 13 Oct 16) == | + | ===Group and institution information=== |
− | * '''Groups and LDAP and sudo - James''' | + | * [[Sysadmin:CS-ITS Interoperability|Working with ITS]] |
− | * <s>Amber - James</s>
| + | * [[Sysadmin:Recurring spending | Recurring spending ]] |
− | * <s>Edward's setup - Vitalli</s>
| + | * [[Sysadmin:SlackAndGitLab | Slack and GitLab integration]] |
− | * <s>WebDev access - Nirdesh<s>
| |
− | * Puppet - James and Vitalii
| |
− | * '''Bacula - Nirdesh'''
| |
− | * SSL certificate upgrade and documentation - Kristin
| |
− | * <s>Listserv merging with archives preserved - Nirdesh </s>
| |
− | * '''Ganglia - Bret'''
| |
− | * '''Shenken - Vitalii'''
| |
− | ** latency, UPS
| |
− | * New Layout node - ? and ?
| |
− | * Provision Sappho (compute) - after Puppet
| |
− | * Provision Kahlo (storage) -
| |
− | ** replace broken drive
| |
− | * I2 setup
| |
− | ** DTN, storage nodes, head nodes, ports in CST
| |
− | * [[Sysadmin:WhedonProvisioning|Provision Whedon]] (compute) - after Puppet | |
− | * '''Shutdown and startup test - scheduled for Sunday 27 November''' | |
− | * Disk cleaning - Charlie
| |
− | * <s>Password changing in the CS and cluster domains - Vitalii and James</s>
| |
− | * Proto setup and maintenance with HIP/Green Science
| |