Sysadmin
Jump to navigation
Jump to search
Machines and Brief Descriptions of Services
CS Machines
HOME (vm0) |
Users SSH NFS Backup to Dali: eccs, etc, var |
NET (vm1) |
LDAP server DNS DHCP Backup to Dali: etc, var |
WEB (vm2) |
Mailman Mail Stack Apache2 PostgresQL MySQL Wiki Backup to Dali: etc, var |
TOOLS (vm3) |
SageNB Server Jupyterhub Server Software Modules NginX Backup to Dali: etc, var, mnts, sage |
BABBAGE |
Firewall |
PROTO |
Weather Monitoring GPS/NTP Energy Monitoring |
CONTROL |
Users SSH HOME TOOLS |
SMILEY |
Sysadmin:XenDocs NET WEB |
SHINKEN |
Users SSH Add machines |
MURPHY |
Elderly email stack Users SSH |
Cluster Machines
HOPPER |
Users SSH NFS server LDAP server Software Modules PostgreSQL Wiki Apache2 DNS DHCP Backup to Dali: etc, var, cluster |
DALI |
Storage Server Gitlab Backups NginX No backup (storage) |
AL-SALAM |
WebMO Software Modules Apache2 No backup |
LAYOUT |
Jupyterhub Server Software Modules NginX Apache2 WebMO Backup to Dali: etc, var |
BRONTE |
Software Modules Backup to Dali: etc, var, nbserver |
POLLOCK |
Software Modules WebMO NginX No backup |
KAHLO |
Storage Server Backups NginX No backup |
BIGFE |
Software Modules |
T-VOC |
Software Modules |
ELWOOD |
Software Modules |
Switches
SG538SF02J |
|
CN63FP762S |
|
SG525SG025 |
|
Netgear JGS524 |
|
cs-main |
|
5500denniscs-sw1 |
|
Systems Administration Documentation
For old documentation, see: Old Wiki Information
Current Projects
To do - updated 2018-03-01
- UPS Shinken monitoring - Aleks and Vitalii
- UPS load monitoring - Eli
- FIFO for requests rather than ad-hoc
- Accounting for hours logged
- PBS Shinken monitoring - Aleks and Vitalii
- Power layout - legend, color-code servers by type, how are servers with 2x power supplies plumbed?
- Bringing the new people on-board - Aleks (Vitalii to add them to the listserv)
- -------------------------
- Shinken - Vitalii and Aleks (documentation, monitoring webmo and pm8)
- Hadoop on Whedon - Vitalii and Adam (stuck on ?)
- Layout - Adam (stuck on ldap)
- Gaussian & WebMO on Whedon - Ahsan and Eli (stuck on firewall)
- Backup - Châu (moving along, setup backup.cs.e.e next)
- Installing power monitor, etc. and rack cleanup - TO BE ASSIGNED (eli and charlie)
- switch switch (charlie to check inventory)
- Mothur - Ahsan
- password policy, force change and random initial
- for now notify people with default and then change after a couple of days; script will generate random string
- Talk about at next meeting:
- Spring break and summer people (important)
- Jon's user and Postgres database
- investigate tools /clients/ directory with what looks like duplicate user directories
(list from 2017-10-26)
Finish migrating tools and home to smileymigrate web and net back to control- Record consistent & thorough documentation, especially concerning the startup and shutdown of the VMs
- Setup graceful shutdown when we detect to be running solely off UPS
- Additionally, setup clean shutdown and startup for VMs on
smileycontrol (?)
- Additionally, setup clean shutdown and startup for VMs on
- Fix reverse lookup error for mail.cs.earlham.edu
- Should consistently refer to 159.28.22.2 (web.cs.earlham.edu)
- It's possible that this isn't actually broken.
- Layout infiniband subnet manager
- Layout disk swap, new lo0
Redo /scratch for mglerner group on /media/r10_vol?
- Migrate Elwood, BigFe, t-voc to repurposed Lovelace Machines (Eli)
HP Al-Salam switch enable jumboframes?- Strike unused lovelace machine addresses from CS DNS file
- Perhaps there's a python file in root's home somewhere that checks for unused DNS/DHCP addresses?
Ongoing Projects (Spring 2017)
TODO
- EMAILING ALL THE USERS https://wiki.cs.earlham.edu/index.php/Sysadmin:Old:Contacting_All_Users
- SHUTDOWN SCHEDULED FOR SUNDAY (APRIL 16)
- Check/update instructions - one version is at https://wiki.cs.earlham.edu/index.php/Sysadmin:ImportantInfo:PowerFailure, there are others too
- Notify users
- Fix certs for gitlab, etc.
- Secure 1-2 admins for the summer
- Prep layout for May-June usage
- Practice shutdown-startup procedure (with Michael)
- Nsswitch consistency across all machines
- Document tools: startup / shutdown - Charlie
- Use Sysadmin namespace for all our pages - All
- Testing usefulness of documentation - Dave
- Al Salam: configure switch, re-rack. - Vitalii
- HP switch should be reset and tested.
- LDAP cleanup of system users / old groups - James
- Layout - Nirdesh
- Lo0 RAID (mdadm)
- 10GB from Dali to lo0 (adding rules on compute node routing tables as a possible fix)
- BIOS reset
- 10Gb, perfsonar, ...
- Monitoring: (Ganglia, Shinken)
- Getting consistency among all the machines(check_nrpe regularly stops working).
- Whedon: configured and available
- Change passwords (on everything). Postgres, shenken, ...
- Webcam on office whiteboard (new office location?)
- Learn virtual machine architecture and modules - Dave
- Document in a format for future admin training?
- Find existing introduction material
- Mirror control for testing, swapping, etc.
DONE (19 Jan 2017)
- Examine extra "layout" node. - Adam
- Differences are: Single PSU, Single GPGPU, No VGA.
- It has Infiniband and 10GB cards installed.
- Networking - Adam, Charlie
- IP over Infiniband working on layout
- Resolved by resetting IB switch configuration:
ibwarn: [3349] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)
- Resolved by resetting IB switch configuration:
- IP over Infiniband working on layout
FUTURE
- Centralized password database / manager / location
Current Projects (updated 13 Oct 16)
- Groups and LDAP and sudo - James
Amber - JamesEdward's setup - VitalliWebDev access - Nirdesh- Puppet - James and Vitalii
- Bacula - Nirdesh
- SSL certificate upgrade and documentation - Kristin
Listserv merging with archives preserved - Nirdesh- Ganglia - Bret
- Shenken - Vitalii
- latency, UPS
- New Layout node - ? and ?
- Provision Sappho (compute) - after Puppet
- Provision Kahlo (storage) -
- replace broken drive
- I2 setup
- DTN, storage nodes, head nodes, ports in CST
- Provision Whedon (compute) - after Puppet
- Shutdown and startup test - scheduled for Sunday 27 November
- Disk cleaning - Charlie
Password changing in the CS and cluster domains - Vitalii and James- Proto setup and maintenance with HIP/Green Science