Sysadmin:Recurring Tasks
We do a number of system tasks regularly and these are the intervals that seem to be best practices. This is open to revision. What matter isn't the exact schedule but being conscious about it and carrying it out.
By request
If there's a problem (e.g. "Jupyter is down!") it is the job of the sysadmin students to respond in a timely fashion.
Automatic, but check occasionally
- monitoring
- backups (including verifications of backups)
Per semester
Send an announcement when the time comes for these. They need to be done during the semester when all admins are available.
- reboot the server system
Per-semester or per-year, off-hours
These may be disruptive, so do them during non-business windows (e.g. right after exams end each semester)
- yum clean all && yum makecache && yum update OR apt update && apt upgrade
- in-place OS upgrades
- Be careful with these. Verify backups first, and make sure you reserve a block of time to fixing problems if necessary.
- Remember they are only possible on a server running Debian or Ubuntu - CentOS requires a clean reinstall.
- GitLab upgrades: [on Dali] after you run the usual yum update, it is very important that you run gitlab-ctl reconfigure or you will get a bunch of error notices when you try to navigate the site
- WebMO upgrades: check with Lori Watson about whether it's currently in use, and get the password for the admin account from CP
Every few years
Replace batteries in the UPS (last done during the 2018-19 academic year). We have a few power outages or failures each year, and the life cycle of the batteries that make sure we don't have a hard crash is (depending on who you ask) maybe 3 or 4 years. The hardware audit Drive Doc in the CS Admins folder contains the most current records we know of for the age of batteries (and other hardware).