Difference between revisions of "Bobsced Cluster"
Jump to navigation
Jump to search
m (bobsced0 as dns server) |
|||
Line 59: | Line 59: | ||
* [http://www.rocksclusters.org/rocks-documentation/4.2.1/faq-configuration.html#DISABLE-REINSTALL Official documentation] | * [http://www.rocksclusters.org/rocks-documentation/4.2.1/faq-configuration.html#DISABLE-REINSTALL Official documentation] | ||
* [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022969.html From the mailing list] | * [https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022969.html From the mailing list] | ||
+ | |||
+ | == Generating an up-to-date machinefile for BobSCEd over Ethernet and Infiniband. == | ||
+ | Ethernet: | ||
+ | cluster-fork /sbin/ifconfig -a | grep -1 Ethernet | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bobsced.eth_hosts | ||
+ | Infiniband: | ||
+ | cluster-fork /sbin/ifconfig -a | grep -1 UNSPEC | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bobsced.eth_hosts | ||
+ | |||
+ | Notice that the only difference is the search field in the first grep command. UNSPEC here refers to Infiniband. | ||
=General Info= | =General Info= |
Revision as of 14:24, 3 January 2010
Contents
Todo
- 411 tools
- fix ganglia to recognize broadcasts & update
- Naming scheme
- bs* vs compute-*-* vs c*-*
- This is terrible it needs work
- Updating bobsced0's RPM repo
- yum-- free
- up2date-- RHEL
- "Aborting the rocks-update tool while the tool is downloading RPMs might produce corrupted RPM packages (SDSC Toolkit)" from pr_troubleshooting.doc
- NIS map
/etc/passwd & /etc/group
permissions- An architecture without a variable amount of delay before BobSCEd is updated would be nice.
- What broke
cluster-fork
? - NAT & NFS
- Mailing list
- Mailing list
- We should consider flattening the network, that is moving everything into the 159.28.234/24 subnet.
- Stop the hourly cron from producing output on stdout unless there is an error
- Setup and test Infiniband fabric
- Why does hopper see an interface flap from bobsced0?
- bosced0 wants to be the DNS server for the compute nodes
Howtos
Updating nodes to be kickstarted & adding new packages
- bobsced0 can be updated by just installing rpms
- Check for an RPM in:
/state/partition1/home/install/rocks-dist/lan/x86_64/RedHat/RPMS/
- Edit
/home/install/site-profiles/4.1.1/nodes/extend-compute.xml
- Add a package i.e.
<package arch="x86_64">libgfortran</package>
- Add a package i.e.
- Update the files that get loaded on kickstart:
cd /home/install
rocks-dist dist
- Check the kickstart file
dbreport kickstart c0-0
- If there were no errors, kickstart the node. i.e.:
shoot-node c0-0
- Check the progress of a kickstart
ssh –p 2200 compute-x-x
Adding post install scripts to kickstart
- Edit
/home/install/site-profiles/4.1.1/nodes/extend-compute.xml
- Add a
<post arch="x86_64">
entry i.e.:<post arch="x86_64">cp /cluster/ganglia/gmond.conf /etc/gmond.conf</post>
Using 411 tools
- make -C /var/411 on bobsced0
- Copies the files to /etc/411.d/ using 411put
- Notifies client nodes to run 411get using ganglia
- The files that are watched can be updated by changing the makefiles in /var/411/
cluster-fork /opt/rocks/bin/411get --all
cluster-fork
- Used to run commands on all cluster nodes like the c3tools
- Broken, see todo
- Temporary fix:
cluster-fork --nodes="compute-0-%d:0-14" <command>
disabling reinstall (kickstart) after hard reset
Generating an up-to-date machinefile for BobSCEd over Ethernet and Infiniband.
Ethernet:
cluster-fork /sbin/ifconfig -a | grep -1 Ethernet | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bobsced.eth_hosts
Infiniband:
cluster-fork /sbin/ifconfig -a | grep -1 UNSPEC | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bobsced.eth_hosts
Notice that the only difference is the search field in the first grep command. UNSPEC here refers to Infiniband.
General Info
NIS Importing
/etc/cron.hourly/importNIS.sh
- This comes from the rocks users guide & a mailing list thread.
http
/cluster/www/bobsced/
/cluster
- Mounted using
/etc/rc.local
/cluster/bobsced/etc/
- What's in here? Things for client or bobsced0?
References
Rocks Documentation
Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava
- pr_troubleshooting.doc