Sysadmin:Old:Start/Shutdown

The Machine room is located behind the NoYes lab. There are 2 main infrastructures in the machine room. The CS and the cluster which are spread over 3 racks. The CS servers and systems are on the Arctic rack and cluster servers and systems are on the Equatorial and Antartica rack.

Here is the list of all servers and nodes. If the server is in green color that means it is on a UPS energy backup.

CS Subnet	murphy	41 (quark)	elwood	elwood'	babbage	hopper	sage	bestey
Cluster Subnet	as0 - as12	bigFe	dali	lo0 - lo4	fatboy	t-voc	bs0 - bs11

Shutting down / start up of the CS servers

Shutting Down

Order of shutdown and startup is very important. When shutting down the CS servers, make sure that murphy goes down last, babbage goes before murphy, everything else goes down before and in any order.

Starting up

When bringing CS servers up, make sure that murphy goes up first, babbage second, and after all other servers in no particular order.

Other notes:

to start up murphy in case of unsuccessful boot up use fsck command:
- go to single user mode
- df -h
- cat /etc/fstab (look for the matching mount names of the particions) (the fsck should be issued in the right order: first on '/' patrician, then '/var', and then the rest)
- fsck -y /dev/mfid0s1d (example for /var)
- after fsck are done; control D

when 41 is up, log in and start quark vm:
- localhost:8080 (in the browser)
- root & root password
- in vmware console - press power on

Shutting down / start up of the cluster servers

Shutting Down

First step is to shut down as1-12 (working nodes). In order to do so, ssh from hopper to as0 and than become root on as0:

as0$ sudo su - root

Now write a message to all users that system is going down (good sysadmin practice):

# wall this system is going down in 5 minuets because of...

In order to see who is on system, just type "who: in the shell.

In order to shut down all working nodes (as1 - as12) we will use cexecs command:

# cexecs shutdown -h now

In order to check weather working nodes are still up:

# cexecs uptime

Once all working nodes are down, you can shutdown the as0 (server).

Now you can shutdown the rest of the cluster servers.

Starting up

When bringing cluster servers up, it is important to bring as0 first. Once as0 is up, bring the rest of the working nodes up. Also bring other cluster servers up.

Planned Building/Campus Power Outages

This is easy-peasy now, most of the work is setting up lights so you can work in the lab while the power is out. Run an extension cord from one of the free wall outlets in the machine room into the lab, put a power strip on it, and feed the desk lamp, floor lamps, power bricks, etc.

Power Down

hopper# ssh al-salam.cluster.earlham.edu "cexecs shutdown -h now"
hopper# ssh bobsced.cluster.earlham.edu "cexecs shutdown -h now"
hopper# ssh layout.cluster.earlham.edu "cexecs shutdown -h now" 
in the future - positron# cexecs acls: shutdown -h now

Power Up

Press the power buttons on the machines shutdown above, wait 5 minutes, check them with cexec.

Sysadmin:Old:Start/Shutdown

Shutting down / start up of the CS servers

Shutting down / start up of the cluster servers

Planned Building/Campus Power Outages

Navigation menu

Search