Difference between revisions of "Power down before outages"

From Earlham CS Department
Jump to navigation Jump to search
m
Line 56: Line 56:
 
----
 
----
  
[[Category:Open Tasks]]
+
[[Category:Closed Tasks]]

Revision as of 15:07, 12 December 2018

Name of Task: Power down before outages

People Involved: Craig Earley, Davit Kvartskhava

Date Started: June 5, 2018 (or thereabouts)

Deadline:

Description:

After a hard crash in June 2018, we want to have a program that powers down our servers gracefully after we switch to battery power. The program APCUPSD should be helpful.

Phase 1

APCUPSD is installed and works. It does not yet interface with our hardware.

We have useful SNMP codes as well, which can be helpful in getting information.

Some notes scraped from a Google Drive Doc:

  • Decided we’ll use the current UPS info and adapt later if needed
  • Send message to ask machines to initiate shutdown
  • Apcupsd: need some more info to actually connect to the UPS, but the software runs on shinken


Phase 2

We've decided to take a different approach to fulfill this project.

  • Decided not to use APCUPSD even though it is a great software for controlling the UPS. The wide range of functionality that it offers is simply not needed in our case. We just need to get the info about the current battery level.
  • The current battery level can be extracted using SNMP.
  • Another important aspect: In the sake of good network design and particularly, robustness, we determined that it is better not to have a centralized system that asks other machines to initiate the shutdown. Instead, each machine will make a decision on its own, based on the level of the battery of the UPS which will be the same for all.
  • Each machine will run a python file periodically using the Linux daemon Cron[1].
  • Things that need to be discussed:
  1. What should be the time interval for checking the battery power?
  2. Whats the critical battery level?
  • We don't need to shut down the head nodes!

Phase 3 - current status

  • The python script that can be found here[2] is running on most of the machines as a daemon. The script and the log file associated with it can be found in /etc/cron.d directory on each machine.
  • The script checks for battery status every minute. If it's below the critical level (80%), then it waits for 2 minutes and checks again. If the battery level is still low, then it initiates its shutdown.
  • The script that we are running behaves differently when running with different versions of python. After python 3, we need to add an extra line mentioned in the file.

Helpful sources

Date Last Updated:' August 1, 2018