Difference between revisions of "Bobsced Cluster"

From Earlham CS Department
Jump to navigation Jump to search
(NAT & NFS added to todo)
(Todo)
Line 1: Line 1:
 
=Todo=
 
=Todo=
==411 tools==
+
* 411 tools
* fix ganglia to recognize broadcasts & update
+
** fix ganglia to recognize broadcasts & update
==Naming scheme==
+
* Naming scheme
* bs* vs compute-*-* vs c*-*
+
** bs* vs compute-*-* vs c*-*
* This is terrible it needs work
+
** This is terrible it needs work
==Updating bobsced0's RPM repo==
+
*Updating bobsced0's RPM repo
* yum-- free
+
** yum-- free
* up2date-- RHEL
+
** up2date-- RHEL
* "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
+
** "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
==NIS map==
+
*NIS map
*<code>/etc/passwd & /etc/group</code> permissions
+
**<code>/etc/passwd & /etc/group</code> permissions
==What broke <code>cluster-fork</code>?==
+
**An architecture without a variable amount of delay before BobSCEd is updated would be nice.
==NAT & NFS==
+
*What broke <code>cluster-fork</code>?
* [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-October/021958.html Mailing list]
+
*NAT & NFS
*[https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-May/018503.html Mailing list]
+
**[https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-October/021958.html Mailing list]
 +
**[https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-May/018503.html Mailing list]
 +
**We should consider flattening the network, that is moving everything into the 159.28.234/24 subnet.
 +
*Stop the hourly cron from producing output on stdout unless there is an error
 +
*Setup and test Infiniband fabric
 +
*Why does hopper see an interface flap from bobsced0?
  
 
=Howtos=
 
=Howtos=

Revision as of 12:14, 1 June 2007

Todo

  • 411 tools
    • fix ganglia to recognize broadcasts & update
  • Naming scheme
    • bs* vs compute-*-* vs c*-*
    • This is terrible it needs work
  • Updating bobsced0's RPM repo
    • yum-- free
    • up2date-- RHEL
    • "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
  • NIS map
    • /etc/passwd & /etc/group permissions
    • An architecture without a variable amount of delay before BobSCEd is updated would be nice.
  • What broke cluster-fork?
  • NAT & NFS
    • Mailing list
    • Mailing list
    • We should consider flattening the network, that is moving everything into the 159.28.234/24 subnet.
  • Stop the hourly cron from producing output on stdout unless there is an error
  • Setup and test Infiniband fabric
  • Why does hopper see an interface flap from bobsced0?

Howtos

Updating nodes to be kickstarted & adding new packages

  • bobsced0 can be updated by just installing rpms
  • Check for an RPM in: /state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/
  • Edit /home/install/site-profiles/4.1.1/nodes/extend-compute.xml
    • Add a package i.e. <package arch="x86_64">libgfortran</package>
  • Update the files that get loaded on kickstart:
    • cd /home/install
    • rocks-dist dist
  • Check the kickstart file
    • dbreport kickstart c0-0
  • If there were no errors, kickstart the node. i.e.:
    • shoot-node c0-0
  • Check the progress of a kickstart
    • ssh –p 2200 compute-x-x

Adding post install scripts to kickstart

  • Edit /home/install/site-profiles/4.1.1/nodes/extend-compute.xml
  • Add a <post arch="x86_64"> entry i.e.:
    • <post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post>

Using 411 tools

  • make -C /var/411 on bobsced0
    • Copies the files to /etc/411.d/ using 411put
    • Notifies client nodes to run 411get using ganglia
    • The files that are watched can be updated by changing the makefiles in /var/411/
  • cluster-fork /opt/rocks/bin/411get --all

cluster-fork

  • Used to run commands on all cluster nodes like the c3tools
    • Broken, see todo
    • Temporary fix: cluster-fork --nodes="compute-0-%d:0-14" <command>

disabling reinstall (kickstart) after hard reset

General Info

NIS Importing

  • /etc/cron.hourly/importNIS.sh
  • This comes from the rocks users guide & a mailing list thread.

http

  • /cluster/www/bobsced/

/cluster

  • Mounted using /etc/rc.local

/cluster/bobsced/etc/

  • What's in here? Things for client or bobsced0?


References

Rocks Documentation

Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava

  • pr_troubleshooting.doc

411 Tools

RHEL