Difference between revisions of "Power"

From Earlham CS Department
Jump to navigation Jump to search
m
 
(7 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
This is where we should keep track of info about the power supplies, power usage, etc.
 
This is where we should keep track of info about the power supplies, power usage, etc.
 +
 +
== Hardware ==
 +
 +
=== Server configuration ===
 +
 +
Most of our servers have two power supplies. If there is a steady beeping noise in the server room, it's likely that, on at least one server, one of the two power supplies is no longer drawing current (but the other is).
 +
 +
=== Power supply units ===
 +
 +
We have several layers of power supply hardware, some with software interfaces and others not.
 +
 +
We get power from the usual Earlham electric grid (hereandafter called "wall"), and we don't manage that much. Reach out to facilities with issues.
 +
 +
We also have an internal backup power supply system in case wall power goes down. It's a large power supply unit with several batteries of cells (which we replace every 3-4 years, last as of this writing was 2018-19 year). This unit is mounted to the bottom of the Arctic rack.
 +
 +
That larger system feeds a satellite power distribution unit at the very bottom of the Antarctic rack, feeding the cluster machines and others on that rack. Each set of four outlets has its own circuit breaker. If power goes down for a subset of the Antarctic rack (i.e. some outlets are energized and others aren't), check these circuit breakers first.
 +
 +
Each of those power units feeds a series of power distribution units that act as industrial-scale power strips, carrying power to devices and in some cases providing surge protection.
 +
 +
== apcupsd ==
 +
 +
The daemon apcupsd looks useful for controlling the power supply.
 +
* [http://www.apcupsd.org/manual/manual.html User manual]
 +
* [https://www.svennd.be/install-apcupsd-on-centos-7/ Install guide (CentOs 7)]
 +
The UPS web interface is only accessible using lynx with the IP address 159.28.23.15 directly from the console in the server room, all other attempts to connect from other machines are  blocked by the firewall.
 +
 +
Our installation [http://www.apcupsd.org/manual/manual.html#installation-from-source from source] is incomplete. Steps are completed up to the point where we modify the conf file.
 +
 +
SNMP tools are enabled. [http://www.apcupsd.org/manual/manual.html#support-for-snmp-upses Info here].
 +
 +
== Some snmpget commands ==
 +
 +
Here are some common [https://www.opsview.com/resources/monitoring/blog/monitoring-apc-ups-useful-oids OID's]. SNMP uses these OID's to get very specific, discrete pieces of information from a device.
 +
 +
As an example, we've run the following commands to get info about the power supply for cluster:
 +
 +
<pre>
 +
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.1.1.1.0
 +
SNMPv2-SMI::enterprises.318.1.1.1.1.1.1.0 = STRING: "Smart-UPS RT 10000 XL"
 +
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.3.2.5.0
 +
SNMPv2-SMI::enterprises.318.1.1.1.3.2.5.0 = INTEGER: 9
 +
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.2.2.3.0
 +
SNMPv2-SMI::enterprises.318.1.1.1.2.2.3.0 = Timeticks: (132000) 0:22:00.00
 +
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.2.2.1.0
 +
SNMPv2-SMI::enterprises.318.1.1.1.2.2.1.0 = Gauge32: 100
 +
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.2.1.1.6.0
 +
SNMPv2-MIB::sysLocation.0 = STRING: Noyes Machine Room, Arctic Rack
 +
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.2.2.2.0
 +
SNMPv2-SMI::enterprises.318.1.1.1.2.2.2.0 = Gauge32: 26
 +
</pre>
 +
 +
== Some other useful pages ==
 +
 +
[[:File:ClusterPowerDiagram.png | Most recent cluster power diagram]]
  
 
[[Sysadmin:ImportantInfo:UPS|Old UPS page]]
 
[[Sysadmin:ImportantInfo:UPS|Old UPS page]]
 
The most recent cluster power diagram:
 
[[File:ClusterPowerDiagram.png]]
 
  
 
[[Sysadmin:ImportantInfo:PowerFailure | Deprecated power failure info]]
 
[[Sysadmin:ImportantInfo:PowerFailure | Deprecated power failure info]]

Latest revision as of 08:51, 25 October 2021

This is where we should keep track of info about the power supplies, power usage, etc.

Hardware

Server configuration

Most of our servers have two power supplies. If there is a steady beeping noise in the server room, it's likely that, on at least one server, one of the two power supplies is no longer drawing current (but the other is).

Power supply units

We have several layers of power supply hardware, some with software interfaces and others not.

We get power from the usual Earlham electric grid (hereandafter called "wall"), and we don't manage that much. Reach out to facilities with issues.

We also have an internal backup power supply system in case wall power goes down. It's a large power supply unit with several batteries of cells (which we replace every 3-4 years, last as of this writing was 2018-19 year). This unit is mounted to the bottom of the Arctic rack.

That larger system feeds a satellite power distribution unit at the very bottom of the Antarctic rack, feeding the cluster machines and others on that rack. Each set of four outlets has its own circuit breaker. If power goes down for a subset of the Antarctic rack (i.e. some outlets are energized and others aren't), check these circuit breakers first.

Each of those power units feeds a series of power distribution units that act as industrial-scale power strips, carrying power to devices and in some cases providing surge protection.

apcupsd

The daemon apcupsd looks useful for controlling the power supply.

The UPS web interface is only accessible using lynx with the IP address 159.28.23.15 directly from the console in the server room, all other attempts to connect from other machines are blocked by the firewall.

Our installation from source is incomplete. Steps are completed up to the point where we modify the conf file.

SNMP tools are enabled. Info here.

Some snmpget commands

Here are some common OID's. SNMP uses these OID's to get very specific, discrete pieces of information from a device.

As an example, we've run the following commands to get info about the power supply for cluster:

> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.1.1.1.0 
SNMPv2-SMI::enterprises.318.1.1.1.1.1.1.0 = STRING: "Smart-UPS RT 10000 XL" 
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.3.2.5.0 
SNMPv2-SMI::enterprises.318.1.1.1.3.2.5.0 = INTEGER: 9 
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.2.2.3.0 
SNMPv2-SMI::enterprises.318.1.1.1.2.2.3.0 = Timeticks: (132000) 0:22:00.00 
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.2.2.1.0 
SNMPv2-SMI::enterprises.318.1.1.1.2.2.1.0 = Gauge32: 100 
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.2.1.1.6.0 
SNMPv2-MIB::sysLocation.0 = STRING: Noyes Machine Room, Arctic Rack 
> /usr/bin/snmpget -v 1 -c public ups.cluster.earlham.edu .1.3.6.1.4.1.318.1.1.1.2.2.2.0 
SNMPv2-SMI::enterprises.318.1.1.1.2.2.2.0 = Gauge32: 26 

Some other useful pages

Most recent cluster power diagram

Old UPS page

Deprecated power failure info