Difference between revisions of "Bobsced Cluster"

From Earlham CS Department
Jump to navigation Jump to search
(Generating an up-to-date machinefile for BobSCEd over Ethernet and Infiniband.)
(Generating an up-to-date machinefile)
Line 62: Line 62:
 
== Generating an up-to-date machinefile ==
 
== Generating an up-to-date machinefile ==
 
Ethernet:
 
Ethernet:
  cluster-fork /sbin/ifconfig -a | grep -1 Ethernet | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bobsced.eth_hosts
+
  cluster-fork /sbin/ifconfig -a | grep -1 Ethernet | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bs-eth-hosts
 
Infiniband:
 
Infiniband:
  cluster-fork /sbin/ifconfig -a | grep -1 UNSPEC | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bobsced.eth_hosts
+
  cluster-fork /sbin/ifconfig -a | grep -1 UNSPEC | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bs-ib-hosts
  
 
Notice that the only difference is the search field in the first grep command. UNSPEC here refers to Infiniband.
 
Notice that the only difference is the search field in the first grep command. UNSPEC here refers to Infiniband.

Revision as of 14:28, 3 January 2010

Todo

  • 411 tools
    • fix ganglia to recognize broadcasts & update
  • Naming scheme
    • bs* vs compute-*-* vs c*-*
    • This is terrible it needs work
  • Updating bobsced0's RPM repo
    • yum-- free
    • up2date-- RHEL
    • "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
  • NIS map
    • /etc/passwd & /etc/group permissions
    • An architecture without a variable amount of delay before BobSCEd is updated would be nice.
  • What broke cluster-fork?
  • NAT & NFS
    • Mailing list
    • Mailing list
    • We should consider flattening the network, that is moving everything into the 159.28.234/24 subnet.
  • Stop the hourly cron from producing output on stdout unless there is an error
  • Setup and test Infiniband fabric
  • Why does hopper see an interface flap from bobsced0?
  • bosced0 wants to be the DNS server for the compute nodes

Howtos

Updating nodes to be kickstarted & adding new packages

  • bobsced0 can be updated by just installing rpms
  • Check for an RPM in: /state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/
  • Edit /home/install/site-profiles/4.1.1/nodes/extend-compute.xml
    • Add a package i.e. <package arch="x86_64">libgfortran</package>
  • Update the files that get loaded on kickstart:
    • cd /home/install
    • rocks-dist dist
  • Check the kickstart file
    • dbreport kickstart c0-0
  • If there were no errors, kickstart the node. i.e.:
    • shoot-node c0-0
  • Check the progress of a kickstart
    • ssh –p 2200 compute-x-x

Adding post install scripts to kickstart

  • Edit /home/install/site-profiles/4.1.1/nodes/extend-compute.xml
  • Add a <post arch="x86_64"> entry i.e.:
    • <post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post>

Using 411 tools

  • make -C /var/411 on bobsced0
    • Copies the files to /etc/411.d/ using 411put
    • Notifies client nodes to run 411get using ganglia
    • The files that are watched can be updated by changing the makefiles in /var/411/
  • cluster-fork /opt/rocks/bin/411get --all

cluster-fork

  • Used to run commands on all cluster nodes like the c3tools
    • Broken, see todo
    • Temporary fix: cluster-fork --nodes="compute-0-%d:0-14" <command>

disabling reinstall (kickstart) after hard reset

Generating an up-to-date machinefile

Ethernet:

cluster-fork /sbin/ifconfig -a | grep -1 Ethernet | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bs-eth-hosts

Infiniband:

cluster-fork /sbin/ifconfig -a | grep -1 UNSPEC | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bs-ib-hosts 

Notice that the only difference is the search field in the first grep command. UNSPEC here refers to Infiniband.

General Info

NIS Importing

  • /etc/cron.hourly/importNIS.sh
  • This comes from the rocks users guide & a mailing list thread.

http

  • /cluster/www/bobsced/

/cluster

  • Mounted using /etc/rc.local

/cluster/bobsced/etc/

  • What's in here? Things for client or bobsced0?


References

Rocks Documentation

Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava

  • pr_troubleshooting.doc

411 Tools

RHEL