Difference between revisions of "ShutdownProcedure"
(→Starting VMs) |
|||
Line 1: | Line 1: | ||
These are the shutdown and boot up instructions for CS and Cluster servers. | These are the shutdown and boot up instructions for CS and Cluster servers. | ||
These page also has the reboot procedure. | These page also has the reboot procedure. | ||
− | |||
− | |||
− | |||
= Backup Critical wiki pages = | = Backup Critical wiki pages = | ||
Line 56: | Line 53: | ||
# hopper | # hopper | ||
− | == | + | === Order to bring up === |
The reverse of shutdown, again make sure to wait before proceeding at the appropriate steps. | The reverse of shutdown, again make sure to wait before proceeding at the appropriate steps. | ||
Line 62: | Line 59: | ||
= CS = | = CS = | ||
− | == | + | == Shutdown process == |
If <code>hopper</code> is back online, <code>ssh sysadmin@cluster.cs.earlham.edu</code> and then <code>ssh sysadmin@control.cs.earlham.edu</code>. This way we can shutdown all the VMs directly without being knocked off line or being in the machine room. | If <code>hopper</code> is back online, <code>ssh sysadmin@cluster.cs.earlham.edu</code> and then <code>ssh sysadmin@control.cs.earlham.edu</code>. This way we can shutdown all the VMs directly without being knocked off line or being in the machine room. | ||
− | + | Recipe for shutting down a machine on <code>smiley</code>: | |
<pre> | <pre> | ||
− | ssh sysadmin@ | + | ssh sysadmin@tools.cs.earlham.edu |
− | ssh sysadmin@ | + | ssh sysadmin@smiley.cs.earlham.edu |
sudo su - | sudo su - | ||
− | + | smiley-# xl destroy <hostname>.cs.earlham.edu | |
</pre> | </pre> | ||
+ | |||
List running VMs | List running VMs | ||
− | |||
<pre> | <pre> | ||
− | + | smiley-# xl list | |
</pre> | </pre> | ||
+ | |||
+ | == Order of shutdowns == | ||
+ | |||
# proto (lives seperatly, <code>ssh admin@proto.cs.earlham.edu</code>) | # proto (lives seperatly, <code>ssh admin@proto.cs.earlham.edu</code>) | ||
− | |||
# tools | # tools | ||
# web | # web | ||
− | |||
# net | # net | ||
− | # | + | # smiley (tools, web, net are VM's run on smiley's hardware) |
− | |||
# babbage (firewall) | # babbage (firewall) | ||
− | + | *Make sure all virtual machines are shut down before restarting the bare metal hardware* | |
− | + | Ideally the VM's should be shutdown from inside (by ssh'ing into them and running <code>shutdown</code>). After that, run "xl list" to see if they're still listed as domains, then run the "xl destroy" commands as needed. | |
+ | <pre> | ||
+ | # xl destroy tools.cs.earlham.edu | ||
+ | # xl destroy web.cs.earlham.edu | ||
+ | # xl destroy net.cs.earlham.edu | ||
+ | </pre> | ||
− | + | == Start up again == | |
=== Mounting Logical Volumes === | === Mounting Logical Volumes === | ||
Line 108: | Line 110: | ||
=== Starting VMs === | === Starting VMs === | ||
− | The VMs on <code> | + | The VMs on <code>smiley</code> should be brought up in the reverse order they were shutdown. |
− | It is | + | It is important to bring up <b>net</b> first because it runs DNS, DHCP, and LDAP. |
<pre> | <pre> | ||
− | + | smiley-# xl create -c ~sysadmin/xen-configs/eccs-<hostname>.cfg | |
# To exit to the hypervisor shell you can press Ctrl + ] | # To exit to the hypervisor shell you can press Ctrl + ] | ||
− | |||
</pre> | </pre> | ||
− | + | To start them up without going into the console: | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<pre> | <pre> | ||
− | # xl | + | # xl create ~sysadmin/xen-configs/eccs-<hostname>.cfg |
− | |||
</pre> | </pre> | ||
− | + | Connect to VM console after the VM is running: | |
− | |||
<pre> | <pre> | ||
− | # xl | + | smiley-# xl console <hostname>.cs.earlham.edu |
− | |||
</pre> | </pre> | ||
− | === | + | === Troubleshooting === |
If things aren't going well, it's possible to start the VMs in a pseudo single-user mode: | If things aren't going well, it's possible to start the VMs in a pseudo single-user mode: | ||
<pre> | <pre> |
Revision as of 09:20, 26 July 2018
These are the shutdown and boot up instructions for CS and Cluster servers. These page also has the reboot procedure.
Contents
Backup Critical wiki pages
There is a script in sysadmin@home.cs.earlham.edu:~/wiki_critical/
called send_wiki.sh
that specifies which pages to pull down and send out via email. This is important to do before anyone starts shutting down the machines because the wiki will go offline.
General Info
- babbage should be the very last machine brought down
- to get to
sysadmin@control
firstssh
intohome
orhopper
. - Make sure all virtual machines are shut down before restarting the bare metal hardware
CS virtual machies
The different VMs mount from eachother, so just be patient and hopefully everything will work out.
Tools
We may have to restart nginx
, jupyter
, and sage
by hand. Using history | grep <command>
is helpful here. (make sure to grab the entire command including ampersand)
Jupyter
eccs-tools# nohup su -c "/mnt/lovelace/software/anaconda/envs/py35/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py --no-ssl" &
Sage
eccs-tools# nohup /home/sage/sage-6.8/sage --notebook=sagenb accounts=False automatic_login=False interface= port=8080 &
Hadoop
Hadoop runs on whedon
and might also need to be restarted manually.
sysadmin@hopper$ ssh w0 sysadmin@w0$ sudo su hadoop haddop@w0$ cd $HADOOP_HOME hadoop@w0$ ./sbin/start-all.sh
Cluster
order of shutdowns
- all compute nodes: (layout, alsalam, whedon) and t-voc, bigfe, elwood
- all head nodes: (layout, alsalam, whedon)
- pollock
- bronte + disk array
- wait until everything up to this point has shutdown
- dali
- kahlo
- wait until everything up to this point has shutdown
- hopper
Order to bring up
The reverse of shutdown, again make sure to wait before proceeding at the appropriate steps.
CS
Shutdown process
If hopper
is back online, ssh sysadmin@cluster.cs.earlham.edu
and then ssh sysadmin@control.cs.earlham.edu
. This way we can shutdown all the VMs directly without being knocked off line or being in the machine room.
Recipe for shutting down a machine on smiley
:
ssh sysadmin@tools.cs.earlham.edu ssh sysadmin@smiley.cs.earlham.edu sudo su - smiley-# xl destroy <hostname>.cs.earlham.edu
List running VMs
smiley-# xl list
Order of shutdowns
- proto (lives seperatly,
ssh admin@proto.cs.earlham.edu
) - tools
- web
- net
- smiley (tools, web, net are VM's run on smiley's hardware)
- babbage (firewall)
- Make sure all virtual machines are shut down before restarting the bare metal hardware*
Ideally the VM's should be shutdown from inside (by ssh'ing into them and running shutdown
). After that, run "xl list" to see if they're still listed as domains, then run the "xl destroy" commands as needed.
# xl destroy tools.cs.earlham.edu # xl destroy web.cs.earlham.edu # xl destroy net.cs.earlham.edu
Start up again
Mounting Logical Volumes
When you reboot, the LVM volume groups and logical volumes may not be automatically enabled. To bring them back do
console-# lvscan console-# vgscan console-# vgchange -a y
This should be done at boot using /etc/init.d/rc.sysinit
but there still might be some subtleties there.
Starting VMs
The VMs on smiley
should be brought up in the reverse order they were shutdown.
It is important to bring up net first because it runs DNS, DHCP, and LDAP.
smiley-# xl create -c ~sysadmin/xen-configs/eccs-<hostname>.cfg # To exit to the hypervisor shell you can press Ctrl + ]
To start them up without going into the console:
# xl create ~sysadmin/xen-configs/eccs-<hostname>.cfg
Connect to VM console after the VM is running:
smiley-# xl console <hostname>.cs.earlham.edu
Troubleshooting
If things aren't going well, it's possible to start the VMs in a pseudo single-user mode:
xm create -c eccs-home.cfg extra="init=/bin/bash" # start and leave it in single user mode with the console (from within the vm) mount -o remount,rw / service networking start # ignore the upstart errors mount /eccs/users mount /eccs/clients mount /mnt/lovelace/software
If you exit that shell the kernel will panic, if you leave it with ^]
it seems to stay stable.