Sysadmin:ImportantInfo:PowerFailure
(Redirected from PowerFailure)
this page has been deprecated
Before a planned shutdown
- Send an email to content at least a week in advance and get them to update the news section of the web page with the pertinent information (where/when/why).
- Send an email to clients informing them of the shutdown at least two days in advance.
- Schedule a shutdown of the ACL machines. Preferably this would be a day in advance, but certainly leave no less than 15 minutes of warning. Have the ACLs shutdown between 15 and 30 minutes prior to the planned shutdown. You can use the c3 tools on image to do this for the entire cluster.
- Schedule a shutdown of the servers. This includes:
- quark
- quarkprime
- image
- backup
- logger
- millie
Again, for quark, this would preferably be at least a day in advance, but certainly no less than 15 minutes. Cut the planned shutdown of the servers as close to the planned shutdown time as possible, but be careful not to cut it too close. 5--10 minutes should be good.
During an unplanned shutdown
Check list
- Shut off the breakers by the CS rack to control the behavior of the network when power returns.
Notes
- The UPS in the bottom of the CS rack provides the only backup power in the room. There are two rack-mounted PDUs backed by the UPS, one in the 4-post rack and one in the 2-post rack. These should not be shut off except in an emergency. These are labeled on the front to distinguish them from the mains-backed PDUs.
- To conserve battery power, backup, image, and quarkprime shut themselves off automatically a few minutes after a loss of mains power. You should not turn these back on except in an emergency.
After power comes back on
- Power up these servers in this order:
- quark
- quarkprime
- backup
- image
You can come back in the other servers later. These ones have special services (NIS and NFS in particular) that the other machines require.
- Power up the ACLs.
- Go back around and power up the other servers.
- millie
- bagend
- proto
Things to check
- Sendmail: Make sure sendmail and MailScanner on quark are both running. If they need prodding, run /usr/local/etc/rc.d/mta.sh restart and /usr/local/etc/rc.d/mailscanner.sh restart.
- Apache: Make sure Apache is running on quark. If it needs restarting, stop it with apachectl stop and restart it with apachectl startssl. Make sure to restart it with SSL enabled!
Problems you will run into
- backup will need you to push enter when it comes up to clear an "error" in one of its SCSI drives.
Checklists (new!)
Shutdown
- Shut off the breakers by the CS rack to control the behavior of the network when power returns.
- Shut off the switches on the mains-backed (not UPS backed!) rack-mounted PDUs in all four racks.
- Shut off the HVAC system.
Startup
- Turn on all breakers.
- Turn on HVAC system.
- Turn on mains-backed CS PDUs.
- Turn on image if it's not on.
- Turn on the PDU labeled "cluster" in the 4-post cluster rack.
- Turn on hopper and wait for it to settle.
- Turn on admin.
- Turn on cairo PDUs.
- Turn on bazaar PDUs.
Notes
- The UPS in the bottom of the CS rack provides the only backup power in the room. There are two rack-mounted PDUs backed by the UPS, one in the 4-post rack and one in the 2-post rack. These should not be shut off except in an emergency. These are labeled on the front to distinguish them from the mains-backed PDUs.
- To conserve battery power, backup, image, and quarkprime shut themselves off automatically a few minutes after a loss of mains power. You should not turn these back on except in an emergency.