Difference between revisions of "Cluster: New BobSCEd Install Log"

From Earlham CS Department
Jump to navigation Jump to search
(Modules Software)
(Modules Software)
Line 3: Line 3:
== Modules Software ==
== Modules Software ==
'''Intel Compilers'''
'''Intel Compilers'''
* [[Image:Release NotesC en US.pdf | C/C++ Compiler Release Notes (PDF)]]
* [https://wiki.cs.earlham.edu/images/8/83/Release_NotesC_en_US.pdf C/C++ Compiler Release Notes (PDF)]

Revision as of 15:33, 19 October 2009

The source code for *anything* installed locally is in /usr/local/src. The source for *anything* installed on NFS is in /mounts/bobsced/usr/local/src.

Modules Software

Intel Compilers


  • 1.3.1 - configured with ./configure --prefix=/mounts/bobsced/usr/local/modules-sw/openmpi/1.3.1/ --enable-mpi-threads --with-openib
  • 1.3.3 same


  • MPICH1 installed with ./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1
    • Make note: to uninstall - /mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/sbin/mpiuninstall
  • MPICH2 installed with
./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich2/2.1.2 --enable-cxx --enable-f90 --enable-f77 --enable-threads=multiple --with-thread-package=posix


Green color indicates something that still needs to be done.


  • Download the udpcast rpm from http://udpcast.linux.lu/source.html
    • Install with yum --nogpgcheck localinstall udpcast-20081213-1.i386.rpm
    • On hopper, installed the syslinux and tftpd-hpa ports
      • Enable tftpd in /etc/inetd.conf by removing the comments and restart inetd with /etc/rc.d/inetd restart, and then also run the command listed on that line to start tftpd
      • The following lines were already in /usr/local/etc/dhcpd.conf: allow booting; allow bootp;, put the filename in the particular group (see Debian Clusters)
      • cp /usr/local/share/syslinux/pxelinux.0 /tftpboot/ (the /tftpboot directory needs to be created)
      • Download linux, initrd, and default from the udpcast site into /tftpboot
      • Move default into /tftpboot/pxelinux.cfg
      • Restart dhcpd (killall -KILL dhcpd and /usr/local/sbin/dhcpd -q -cf /usr/local/etc/dhcpd.conf -lf /var/db/dhcpd/dhcpd.leases -pf /var/run/dhcpd/dhcpd.pid -user dhcpd -group dhcpd

Head Node

Yum installed:

  • gcc.x86_64, gcc-c++.x86_64
  • for Ganglia:
    • apr.x86_64 and apr-devel.x86_64
    • libconfuse-2.5-4.el5.x86_64.rpm, libconfuse-devel-2.5-4.el5.x86_64.rpm (from Fedora repositories)
    • expat-devel.x86_64
  • for Intel updates:
    • compat-libstdc++-33.i386
  • blas.x86_64 (on all nodes)

Install C3 tools from http://www.csm.ornl.gov/torc/C3/C3softwarepage.shtml

  • Downloaded full install rpm on bs0, installed with yum --nogpgcheck localinstall c3-4.0.1-1.noarch.rpm


  • On hopper, added the data_source line for bs0 to /usr/local/etc/gmetad.conf and restarted it with /usr/local/etc/rc.d/gmetad restart
  • Downloaded tar ball from http://sourceforge.net/projects/ganglia/
    • See Ganglia README
    • ./configure --prefix=/cluster
    • The head node uses a different Ganglia gmond.conf in /etc/ganglia/gmond.conf and the workers just have theirs symlinked to /cluster/etc/gmond.conf
    • By default, iptables is running on the CentOS install and blocks hopper's Ganglia requests
      • Turned off by clearing it and then running /sbin/service iptables save


  • Shorewall, see /etc/shorewall/params for almost all of the important definitions
    • Natting is done through /etc/shorewall/masq
  • DHCP relay, added to boot with chkconfig on, set for hopper (installed as part of dhcp yum package)
    • See /etc/sysconfig/dhcrelay
    • This means that a dhcp server is also installed, but it is not set to run and is not configured, either
    • Hopper needs to have a static route added in order to have the responses return, these are in /etc/rc.conf:



  • Installed from source with ./configure --with-default-server=bs0.bobsced.loc --with-rc=scp --disable-mom --with-server-home=/var/spool/pbs ("clients" is what installs qmgr)
  • Installs to /usr/local/
  • Set up according to Debian Clusters setup
  • Reran the ./configure but without --disable-moms, then ran make packages, copied this to worker node


  • installed Maui according to same link as above

Intel Firmware Updates


  • Configured sendmail by adding bs0.bobsced.loc to /etc/mail/local-host-names


  • The actual filesystem on bs0-new is at /mounts/bobsced. The nodes all mount this in the same place.
  • It's mounted on hopper at /mounts/bobsced, with the symlink (currently) at /cluster/bobscednew


  • yum installed httpd
  • Installed on bs0 with the following params:
Path to perl:         /usr/bin/perl
Webserver name:       bs0-new.cluster.earlham.edu
HTML directory:       /var/www/webmo
HTML URL:             /webmo
CGI script directory: /var/www/cgi-bin
CGI script URL:       /cgi-bin
User files directory: /mounts/bobsced/WebMO
  • Get this error when authing with LDAP: Can't locate Authen/Simple/LDAP.pm
  • yum installed perl-LDAP.noarch, didn't work, so used CPAN to install Authen::Simple::LDAP
  • edited /var/www/cgi-bin/interfaces/authen.conf for our LDAP settings
  • Before externally authenticated users can use it, you have to go in as administrator and check the box to allow them in the Webmo group (or whatever other group)
  • Gamess:
    • yum install compat-gcc-34-g77.x86_64 and gfortran
    • Followed directions from Webmo site
  • Added the following line to httpd.conf:
SuexecUserGroup bob users
  • Gaussian 09 not supported, though it's installed in /mounts/bobsced/usr/local/g09
  • Installed g03, except get errors:
Erroneous write during file extend. write 160 instead of 4096
Probably out of disk space.
Write error in NtrExt1: No such file or directory


Write error in NtrExt1: Bad address
    • To fix this, do echo 0 > /proc/sys/kernel/randomize_va_space
    • This needs to be set to happen all the time on boot


  • Drivers downloaded from here - the Red Hat 5.3 ones
    • mount the ISO as a loopback somewhere (ie mount -o loop /mounts/bobsced/usr/src/MLNX_OFED_LINUX-1.4-rhel5.3.iso /media/)
    • run with -msm (ie /media/mlnxofedinstall --msm
  • Need to boot into the kernel that came in the original install (2.6.18-128.el5), otherwise get a message like this:
The 2.6.18-164.el5 kernel is installed, but do not have drivers available. 
Cannot continue.
  • Then (before rebooting back to old kernel), run mst start
  • Then, sym link to current kernel version, like this: (see Rocks discussion here about it)
    • You should get the version number in the error mst start will give you
ln -s /usr/mst/lib/2.6.18-128.el5/ /usr/mst/lib/2.6.18-164.2.1.e15.plus
  • Success looks like this:
[root@bs0-new ~]# mst start
    Starting MST (Mellanox Software Tools) driver set: 
Loading MST PCI module                                     [  OK  ]
Loading MST PCI configuration module                       [  OK  ]
Saving configuration for PCI device 01:00.0                [  OK  ]
Create devices
  • Does this need to be done every time? It looks like yes.