Difference between revisions of "Bobsced Cluster"

From Earlham CS Department
Jump to navigation Jump to search
(added info on disabling kickstart after hard reset)
m (cleanup)
Line 1: Line 1:
=Bobsced=
+
=Todo=
 +
==411 tools==
 +
* fix ganglia to recognize broadcasts & update
 +
==Naming scheme==
 +
* bs* vs compute-*-* vs c*-*
 +
* This is terrible it needs work
 +
==Updating bobsced0's RPM repo==
 +
* yum-- free
 +
* up2date-- RHEL
 +
* "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
 +
==NIS map==
 +
*<code>/etc/passwd & /etc/group</code> permissions
 +
==What broke <code>cluster-fork</code>?==
  
==Todo==
+
=Howtos=
* 411 tools-- fix ganglia to recognize broadcasts & update
 
* Naming scheme bs* vs compute-*-* vs c*-*
 
** This is terrible it needs work
 
* Updating bobsced0's RPM repo:
 
** yum-- free
 
** up2date-- RHEL
 
** "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
 
* NIS map & <code>/etc/passwd & /etc/group</code> permissions
 
* What broke <code>cluster-fork</code>?
 
  
==Howtos==
+
==Updating nodes to be kickstarted & adding new packages==
 
 
===Updating nodes to be kickstarted & adding new packages===
 
 
* bobsced0 can be updated by just installing rpms
 
* bobsced0 can be updated by just installing rpms
 
* Check for an RPM in: <code>/state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/</code>
 
* Check for an RPM in: <code>/state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/</code>
Line 29: Line 30:
 
** <code>ssh –p 2200 compute-x-x</code>
 
** <code>ssh –p 2200 compute-x-x</code>
  
=== Adding post install scripts to kickstart ===
+
== Adding post install scripts to kickstart ==
 
* Edit <code>/home/install/site-profiles/4.1.1/nodes/extend-compute.xml</code>
 
* Edit <code>/home/install/site-profiles/4.1.1/nodes/extend-compute.xml</code>
 
* Add a <code><post arch="x86_64"></code> entry i.e.:
 
* Add a <code><post arch="x86_64"></code> entry i.e.:
 
** <code><post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post></code>
 
** <code><post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post></code>
=== Using 411 tools ===
+
 
 +
== Using 411 tools ==
 
* make -C /var/411 on bobsced0
 
* make -C /var/411 on bobsced0
 
** Copies the files to /etc/411.d/ using 411put
 
** Copies the files to /etc/411.d/ using 411put
Line 39: Line 41:
 
** The files that are watched can be updated by changing the makefiles in /var/411/
 
** The files that are watched can be updated by changing the makefiles in /var/411/
 
* <code>cluster-fork /opt/rocks/bin/411get --all</code>
 
* <code>cluster-fork /opt/rocks/bin/411get --all</code>
=== cluster-fork ===
+
 
 +
== cluster-fork ==
 
* Used to run commands on all cluster nodes like the c3tools
 
* Used to run commands on all cluster nodes like the c3tools
 
** Broken, see todo
 
** Broken, see todo
 
** Temporary fix: <code>cluster-fork --nodes="compute-0-%d:0-14" <command> </code>
 
** Temporary fix: <code>cluster-fork --nodes="compute-0-%d:0-14" <command> </code>
  
=== disabling reinstall (kickstart) after hard reset ===
+
== disabling reinstall (kickstart) after hard reset ==
 
* [http://www.rocksclusters.org/rocks-documentation/4.2.1/faq-configuration.html#DISABLE-REINSTALL Official documentation]
 
* [http://www.rocksclusters.org/rocks-documentation/4.2.1/faq-configuration.html#DISABLE-REINSTALL Official documentation]
 
* [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022969.html From the mailing list]
 
* [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022969.html From the mailing list]
  
==General Info==
+
=General Info=
* NIS Importing
+
==NIS Importing==
** <code>/etc/cron.hourly/importNIS.sh</code>
+
* <code>/etc/cron.hourly/importNIS.sh</code>
** This comes from the rocks users guide & a mailing list thread.
+
* This comes from the rocks users guide & a mailing list thread.
* http
+
==http==
** <code>/cluster/www/bobsced/</code>
+
* <code>/cluster/www/bobsced/</code>
* <code>/cluster</code>
+
==<code>/cluster</code>==
** Mounted using <code>/etc/rc.local</code>
+
* Mounted using <code>/etc/rc.local</code>
* <code>/cluster/bobsced/etc/</code>
+
==<code>/cluster/bobsced/etc/</code>==
** What's in here? Things for client or bobsced0?
+
* What's in here? Things for client or bobsced0?
  
  
==References==
+
=References=
* Rocks user's guide
+
==Rocks Documentation==
** [http://www.rocksclusters.org/rocks-documentation/4.1/rocks-usersguide-4.1.pdf Rocks users guide pdf]
+
* [http://www.rocksclusters.org/rocks-documentation/4.1/rocks-usersguide-4.1.pdf Rocks users guide pdf]
** [http://www.rocksclusters.org/rocks-documentation/4.1/ Online version]
+
* [http://www.rocksclusters.org/rocks-documentation/4.1/ Online version]
 
* [http://www.dell.com/downloads/global/power/ps4q05-20050227-Ali.pdf Platform rocks]
 
* [http://www.dell.com/downloads/global/power/ps4q05-20050227-Ali.pdf Platform rocks]
 +
==Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava==
 
* pr_troubleshooting.doc
 
* pr_troubleshooting.doc
 +
==411 Tools==
 
* [http://www.rocksclusters.org/rocks-doc/papers/hpdc2005/hpdc2005-411.pdf 411tools]
 
* [http://www.rocksclusters.org/rocks-doc/papers/hpdc2005/hpdc2005-411.pdf 411tools]
* [http://www.centos.org/docs/4/pdf/rhel-ig-x8664-multi-en.pdf RHEL] & [http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/pdf/rhel-isa-en.pdf More RHEL]
+
==RHEL==
 +
* [http://www.centos.org/docs/4/pdf/rhel-ig-x8664-multi-en.pdf RHEL]  
 +
* [http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/pdf/rhel-isa-en.pdf More RHEL]

Revision as of 11:07, 1 June 2007

Todo

411 tools

  • fix ganglia to recognize broadcasts & update

Naming scheme

  • bs* vs compute-*-* vs c*-*
  • This is terrible it needs work

Updating bobsced0's RPM repo

  • yum-- free
  • up2date-- RHEL
  • "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc

NIS map

  • /etc/passwd & /etc/group permissions

What broke cluster-fork?

Howtos

Updating nodes to be kickstarted & adding new packages

  • bobsced0 can be updated by just installing rpms
  • Check for an RPM in: /state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/
  • Edit /home/install/site-profiles/4.1.1/nodes/extend-compute.xml
    • Add a package i.e. <package arch="x86_64">libgfortran</package>
  • Update the files that get loaded on kickstart:
    • cd /home/install
    • rocks-dist dist
  • Check the kickstart file
    • dbreport kickstart c0-0
  • If there were no errors, kickstart the node. i.e.:
    • shoot-node c0-0
  • Check the progress of a kickstart
    • ssh –p 2200 compute-x-x

Adding post install scripts to kickstart

  • Edit /home/install/site-profiles/4.1.1/nodes/extend-compute.xml
  • Add a <post arch="x86_64"> entry i.e.:
    • <post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post>

Using 411 tools

  • make -C /var/411 on bobsced0
    • Copies the files to /etc/411.d/ using 411put
    • Notifies client nodes to run 411get using ganglia
    • The files that are watched can be updated by changing the makefiles in /var/411/
  • cluster-fork /opt/rocks/bin/411get --all

cluster-fork

  • Used to run commands on all cluster nodes like the c3tools
    • Broken, see todo
    • Temporary fix: cluster-fork --nodes="compute-0-%d:0-14" <command>

disabling reinstall (kickstart) after hard reset

General Info

NIS Importing

  • /etc/cron.hourly/importNIS.sh
  • This comes from the rocks users guide & a mailing list thread.

http

  • /cluster/www/bobsced/

/cluster

  • Mounted using /etc/rc.local

/cluster/bobsced/etc/

  • What's in here? Things for client or bobsced0?


References

Rocks Documentation

Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava

  • pr_troubleshooting.doc

411 Tools

RHEL