Power down before outages

From Earlham CS Department
Jump to navigation Jump to search

Name of Task: Power down before outages

People Involved: Craig Earley, Davit Kvartskhava

Date Started: June 5, 2018 (or thereabouts)

Deadline:

Description:

After a hard crash in June 2018, we want to have a program that powers down our servers gracefully after we switch to battery power. The program APCUPSD should be helpful.

Phase 1

APCUPSD is installed and works. It does not yet interface with our hardware.

We have useful SNMP codes as well, which can be helpful in getting information.

Some notes scraped from a Google Drive Doc:

  • Decided we’ll use the current UPS info and adapt later if needed
  • Send message to ask machines to initiate shutdown
  • Apcupsd: need some more info to actually connect to the UPS, but the software runs on shinken


Phase 2

We've decided to take a different approach to fulfill this project.

  • Decided not to use APCUPSD even though it is a great software for controlling the UPS. The wide range of functionality that it offers is simply not needed in our case. We just need to get the info about the current battery level.
  • The current battery level can be extracted using SNMP.
  • Another important aspect: In the sake of good network design and particularly, robustness, we determined that it is better not to have a centralized system that asks other machines to initiate the shutdown. Instead, each machine will make a decision on its own, based on the level of the battery of the UPS which will be the same for all.
  • Each machine will run a python file periodically using the Linux daemon Cron[1].
  • Things that need to be discussed:
  1. What should be the time interval for checking the battery power?
  2. Whats the critical battery level?
  • We don't need to shut down the head nodes!

Phase 3 - current status

  • The python script that can be found here[2] is running on most of the machines as a daemon. The script and the log file associated with it can be found in /etc/cron.d directory on each machine.
  • The script checks for battery status every minute. If it's below the critical level (80%), then it waits for 10 minutes and checks again. If the battery level has declined further, then it initiates its shutdown.
  • All the scripts are run with python2.

Helpful sources

Date Last Updated:' August 1, 2018