Difference between revisions of "Bobsced Cluster"
Jump to navigation
Jump to search
What broke
(added info on disabling kickstart after hard reset) |
m (cleanup) |
||
Line 1: | Line 1: | ||
− | = | + | =Todo= |
+ | ==411 tools== | ||
+ | * fix ganglia to recognize broadcasts & update | ||
+ | ==Naming scheme== | ||
+ | * bs* vs compute-*-* vs c*-* | ||
+ | * This is terrible it needs work | ||
+ | ==Updating bobsced0's RPM repo== | ||
+ | * yum-- free | ||
+ | * up2date-- RHEL | ||
+ | * "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc | ||
+ | ==NIS map== | ||
+ | *<code>/etc/passwd & /etc/group</code> permissions | ||
+ | ==What broke <code>cluster-fork</code>?== | ||
− | == | + | =Howtos= |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | ==Updating nodes to be kickstarted & adding new packages== | |
− | |||
− | |||
* bobsced0 can be updated by just installing rpms | * bobsced0 can be updated by just installing rpms | ||
* Check for an RPM in: <code>/state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/</code> | * Check for an RPM in: <code>/state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/</code> | ||
Line 29: | Line 30: | ||
** <code>ssh –p 2200 compute-x-x</code> | ** <code>ssh –p 2200 compute-x-x</code> | ||
− | + | == Adding post install scripts to kickstart == | |
* Edit <code>/home/install/site-profiles/4.1.1/nodes/extend-compute.xml</code> | * Edit <code>/home/install/site-profiles/4.1.1/nodes/extend-compute.xml</code> | ||
* Add a <code><post arch="x86_64"></code> entry i.e.: | * Add a <code><post arch="x86_64"></code> entry i.e.: | ||
** <code><post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post></code> | ** <code><post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post></code> | ||
− | + | ||
+ | == Using 411 tools == | ||
* make -C /var/411 on bobsced0 | * make -C /var/411 on bobsced0 | ||
** Copies the files to /etc/411.d/ using 411put | ** Copies the files to /etc/411.d/ using 411put | ||
Line 39: | Line 41: | ||
** The files that are watched can be updated by changing the makefiles in /var/411/ | ** The files that are watched can be updated by changing the makefiles in /var/411/ | ||
* <code>cluster-fork /opt/rocks/bin/411get --all</code> | * <code>cluster-fork /opt/rocks/bin/411get --all</code> | ||
− | + | ||
+ | == cluster-fork == | ||
* Used to run commands on all cluster nodes like the c3tools | * Used to run commands on all cluster nodes like the c3tools | ||
** Broken, see todo | ** Broken, see todo | ||
** Temporary fix: <code>cluster-fork --nodes="compute-0-%d:0-14" <command> </code> | ** Temporary fix: <code>cluster-fork --nodes="compute-0-%d:0-14" <command> </code> | ||
− | + | == disabling reinstall (kickstart) after hard reset == | |
* [http://www.rocksclusters.org/rocks-documentation/4.2.1/faq-configuration.html#DISABLE-REINSTALL Official documentation] | * [http://www.rocksclusters.org/rocks-documentation/4.2.1/faq-configuration.html#DISABLE-REINSTALL Official documentation] | ||
* [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022969.html From the mailing list] | * [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022969.html From the mailing list] | ||
− | + | =General Info= | |
− | + | ==NIS Importing== | |
− | + | * <code>/etc/cron.hourly/importNIS.sh</code> | |
− | + | * This comes from the rocks users guide & a mailing list thread. | |
− | + | ==http== | |
− | + | * <code>/cluster/www/bobsced/</code> | |
− | + | ==<code>/cluster</code>== | |
− | + | * Mounted using <code>/etc/rc.local</code> | |
− | + | ==<code>/cluster/bobsced/etc/</code>== | |
− | + | * What's in here? Things for client or bobsced0? | |
− | == | + | =References= |
− | + | ==Rocks Documentation== | |
− | + | * [http://www.rocksclusters.org/rocks-documentation/4.1/rocks-usersguide-4.1.pdf Rocks users guide pdf] | |
− | + | * [http://www.rocksclusters.org/rocks-documentation/4.1/ Online version] | |
* [http://www.dell.com/downloads/global/power/ps4q05-20050227-Ali.pdf Platform rocks] | * [http://www.dell.com/downloads/global/power/ps4q05-20050227-Ali.pdf Platform rocks] | ||
+ | ==Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava== | ||
* pr_troubleshooting.doc | * pr_troubleshooting.doc | ||
+ | ==411 Tools== | ||
* [http://www.rocksclusters.org/rocks-doc/papers/hpdc2005/hpdc2005-411.pdf 411tools] | * [http://www.rocksclusters.org/rocks-doc/papers/hpdc2005/hpdc2005-411.pdf 411tools] | ||
− | * [http://www.centos.org/docs/4/pdf/rhel-ig-x8664-multi-en.pdf RHEL] | + | ==RHEL== |
+ | * [http://www.centos.org/docs/4/pdf/rhel-ig-x8664-multi-en.pdf RHEL] | ||
+ | * [http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/pdf/rhel-isa-en.pdf More RHEL] |
Revision as of 10:07, 1 June 2007
Contents
Todo
411 tools
- fix ganglia to recognize broadcasts & update
Naming scheme
- bs* vs compute-*-* vs c*-*
- This is terrible it needs work
Updating bobsced0's RPM repo
- yum-- free
- up2date-- RHEL
- "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
NIS map
/etc/passwd & /etc/group
permissions
What broke cluster-fork
?
Howtos
Updating nodes to be kickstarted & adding new packages
- bobsced0 can be updated by just installing rpms
- Check for an RPM in:
/state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/
- Edit
/home/install/site-profiles/4.1.1/nodes/extend-compute.xml
- Add a package i.e.
<package arch="x86_64">libgfortran</package>
- Add a package i.e.
- Update the files that get loaded on kickstart:
cd /home/install
rocks-dist dist
- Check the kickstart file
dbreport kickstart c0-0
- If there were no errors, kickstart the node. i.e.:
shoot-node c0-0
- Check the progress of a kickstart
ssh –p 2200 compute-x-x
Adding post install scripts to kickstart
- Edit
/home/install/site-profiles/4.1.1/nodes/extend-compute.xml
- Add a
<post arch="x86_64">
entry i.e.:<post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post>
Using 411 tools
- make -C /var/411 on bobsced0
- Copies the files to /etc/411.d/ using 411put
- Notifies client nodes to run 411get using ganglia
- The files that are watched can be updated by changing the makefiles in /var/411/
cluster-fork /opt/rocks/bin/411get --all
cluster-fork
- Used to run commands on all cluster nodes like the c3tools
- Broken, see todo
- Temporary fix:
cluster-fork --nodes="compute-0-%d:0-14" <command>
disabling reinstall (kickstart) after hard reset
General Info
NIS Importing
/etc/cron.hourly/importNIS.sh
- This comes from the rocks users guide & a mailing list thread.
http
/cluster/www/bobsced/
/cluster
- Mounted using
/etc/rc.local
/cluster/bobsced/etc/
- What's in here? Things for client or bobsced0?
References
Rocks Documentation
Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava
- pr_troubleshooting.doc