Sysadmin:Old:ACL:Update

From Earlham CS Department
Jump to navigation Jump to search

Proposed ACL update Policy

Discussion needs to start on determining a rigorous and systematic policy for updating the software on the ACLs. Once we have a solid policy, all of our users will be able to know exactly when to expect changes to the systems as well as the potential magnitude of said changes. Below is an initial proposal for one such policy.
Proposed Implementation Details

Classifications

There are three possible classifications of updates, each causing more impact and work than the previous:

  1. Installing new packages based on the requests of users and fixing critical problems
    1. New packages are requested all that often, but it does happen. Upper-level CS students often need additional software in order to get class projects done, and we need to be able to get their requested software installed in a timely manner.
  2. Updating installed packages and fixing problems that aren't time-sensitive
    1. Even in the 7 months that we've been running this distribution a large amount of the software on the ACLs is out-of-date and in most cases these have had numerous bug-fixes, performance updates, and security holes patched. 251 packages are marked as out-of-date by apt on the ACLs. Some of these updates include feature additions that mid to upper-level students require (perl5.10, gcc4.3, Oo3, etc.). These updates will need to happen less frequently, as they have a high possibility of breaking existing software.
  3. Full system upgrade (such as major OS version upgrades or switching OS)
    1. Every so often OSs get updated for performance, features, security, and more. At some point in the future, we may also want to have a lengthy discussion on which OS we want to use, discussing the pros/cons of the options available. Obviously this would happen least frequently since it would cause the most amount of work in order to make sure all of our software works properly under whatever new system we choose.


Schedule

Suggested here is a potential schedule for doing the above updates. It is just a rough possibility and thus is up for debate and can be accepted, modified or rejected as is.

  1. New packages should be installed ASAP. If the dependency list includes other packages being updated/removed then some level of QA testing should be done before rolling out a new image. Otherwise, a single new package isn't going to have much effect on the overall system. This sort of update should happen during the lowest usage period of the week, say Sunday morning around 3am.
  2. Updating all existing packages needs to happen less frequently. One possible suggestion would be to have an ACL or two that would, on the first of every month, be imaged to an ACL-Devel image which would contain all the latest packages (as installed by "apt-get upgrade"). One of these ACL-Devel machines could be in the Admin office, and another could be in the Pedagogical/Applied-groups office (and perhaps one more could be a "public access" devel machine). We'd give one or two weeks for these people to run tests and comment on issues with the updates before officially rolling out an update on the, e.g., 15th of the month. The suggestion would be that groups that rely on software on the ACLs would have scripts that could verify the integrity of their software stack (specifically thinking of the Pedagogical group here).
  3. Once a year or so we need to have a discussion about the OS we're using, including the version and distribution. Personal opinions aside, we'd talk about the advantages/disadvantages of upgrading, as well as what work would need to be done in order to complete that upgrade and verify its stability. The beginning of the summer is when this discussion should take place, with actual upgrades (if they happen) taking place mid-summer in order to have plenty of time before the semester to verify or roll-back.


Ideally this process will be mostly automated. With SystemImager and a proper set of test scripts, we could make sure new images only get rolled out when they are verified to be working properly. This requires some work on the part of a couple of the applied groups (Admin and Pedagogical specifically) in order to have those test scripts for every piece of critical software and have them properly report back to the image server and the admins if something failed the test.