Sysadmin:Recurring Tasks: Difference between revisions
| (5 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
We do a number of system tasks regularly and these are the intervals that seem to be best practices. This is open to revision. What matter isn't the exact schedule but being conscious about it and carrying it out. | We do a number of system tasks regularly and these are the intervals that seem to be best practices. This is open to revision. What matter isn't the exact schedule but being conscious about it and carrying it out. | ||
==By request== | |||
If there's a problem (e.g. "Jupyter is down!") it is the job of the sysadmin students to respond in a timely fashion. | |||
==Automatic, but check occasionally== | ==Automatic, but check occasionally== | ||
| Line 13: | Line 17: | ||
These may be disruptive, so do them during non-business windows (e.g. right after exams end each semester) | These may be disruptive, so do them during non-business windows (e.g. right after exams end each semester) | ||
* yum update OR apt update && apt upgrade | * <tt>yum clean all && yum makecache && yum update</tt> OR <tt>apt update && apt upgrade</tt> | ||
* in-place OS upgrades | * in-place OS upgrades | ||
* gitlab upgrades | ** Be careful with these. Verify backups first, and make sure you reserve a block of time to fixing problems if necessary. | ||
** Remember they are only possible on a server running Debian or Ubuntu - CentOS ''requires'' a clean reinstall. | |||
* GitLab upgrades: [on Dali] after you run the usual yum update, it is very important that you run <tt>gitlab-ctl reconfigure</tt> or you will get a bunch of error notices when you try to navigate the site | |||
* WebMO upgrades: check with Lori Watson about whether it's currently in use, and get the password for the admin account from CP | |||
==Every few years== | ==Every few years== | ||
Replace batteries in the UPS. We have a few power outages or failures each year, and the life cycle of the batteries that make sure we don't have a hard crash is (depending on who you ask) maybe 3 or 4 years. The hardware audit Drive Doc in the CS Admins folder contains the most current records we know of for the age of batteries (and other hardware). | Replace batteries in the UPS (last done during the 2018-19 academic year). We have a few power outages or failures each year, and the life cycle of the batteries that make sure we don't have a hard crash is (depending on who you ask) maybe 3 or 4 years. The hardware audit Drive Doc in the CS Admins folder contains the most current records we know of for the age of batteries (and other hardware). | ||
Latest revision as of 14:31, 21 August 2019
We do a number of system tasks regularly and these are the intervals that seem to be best practices. This is open to revision. What matter isn't the exact schedule but being conscious about it and carrying it out.
By request
If there's a problem (e.g. "Jupyter is down!") it is the job of the sysadmin students to respond in a timely fashion.
Automatic, but check occasionally
- monitoring
- backups (including verifications of backups)
Per semester
Send an announcement when the time comes for these. They need to be done during the semester when all admins are available.
- reboot the server system
Per-semester or per-year, off-hours
These may be disruptive, so do them during non-business windows (e.g. right after exams end each semester)
- yum clean all && yum makecache && yum update OR apt update && apt upgrade
- in-place OS upgrades
- Be careful with these. Verify backups first, and make sure you reserve a block of time to fixing problems if necessary.
- Remember they are only possible on a server running Debian or Ubuntu - CentOS requires a clean reinstall.
- GitLab upgrades: [on Dali] after you run the usual yum update, it is very important that you run gitlab-ctl reconfigure or you will get a bunch of error notices when you try to navigate the site
- WebMO upgrades: check with Lori Watson about whether it's currently in use, and get the password for the admin account from CP
Every few years
Replace batteries in the UPS (last done during the 2018-19 academic year). We have a few power outages or failures each year, and the life cycle of the batteries that make sure we don't have a hard crash is (depending on who you ask) maybe 3 or 4 years. The hardware audit Drive Doc in the CS Admins folder contains the most current records we know of for the age of batteries (and other hardware).