Archive:Cluster
2025 Note (Porter): I've condensed many of the pages below that were originally links into this page. Most are from 2004-2011, and are no longer actively used. A few have some updated content (as recent as 2021) from before we shifted to the Sysadmin Collection for all cluster related pages. While they aren't active, they are an important part of what built CS at Earlham and will be preserved on this page.
Contents
- 1 Summer 2011 (draft)
- 2 Al-Salam/CCG Downtime tasks
- 3 How to use the PetaKit
- 4 Current To-Do
- 5 Generalized, Modular Parallel Framework
- 6 Summer of Fun (2009)
- 7 BobSCEd Upgrade
- 8 (Old) To Do
- 9 Current Projects
- 10 Past Projects
- 11 General Stuff
- 12 Items Particular to a Specific Cluster
- 13 Curriculum Modules
- 14 Possible Future Projects
- 15 Archive
Summer 2011 (draft)
Calendar: May 9th through August 19th
Events:
- Undergraduate Petascale Institute @ NCSA - 29 May through 11 June (tentative)
- Intermediate Parallel Programming and Distributed Computing workshop @ OU - 30 July through 6 August (tentative)
Projects:
- LittleFe Build-out @ Earlham - n weeks of x people's time
- New cluster assembly - n weeks of x people's time
- Petascale project work - n weeks of x people's time
Personal Schedules:
- Charlie: Out 3-12 June (beach),
- Ivan:
- Aaron:
- Fitz:
- Mobeen:
- Brad:
Al-Salam/CCG Downtime tasks
- LDAP Server migration from bs-new -> hopper
- yum update over all nodes
- Turn HT off
- PVFS
- PBS server on Hopper
How to use the PetaKit
- If you do not already have it, obtain the source for the PetaKit from the CVS repository on hopper (curriculum-modules/PetaKit).
- cd to the Subkits directory of PetaKit and run the area-subkit.sh to make an area subkit tarball or GalaxSee-subkit.sh to make a GalaxSee subkit tarball.
- scp the tarball to the target resource and unpack it.
- cd into the directory and run ./configure --with-mpi --with-openmp
- Use stat.pl and args_man to make an appropriate statistics run. See args_man for a description of predicates. Example:
perl -w stat.pl --program area --style serial,mpi,openmp,hybrid --scheduler lsf --user leemasa --problem_size 200000000000 --processes 1,2,3,4,5,6,7,8-16-64 --repetitions 10 -m -tag Sooner-strongest-newest --mpirun mpirun.lsf --ppn 8
Modifying Programs for Use with PetaKit
Current To-Do
Date represents last meeting where we discussed the item
- Brad's Graphing Tool (28/Feb/10)
* Nice new functionality, see * Waiting on clean data to finish multiple resource displays * Error bars for left and right y-axes with checkboxes for each
- TeraGrid Runs (28/Feb/10)
In first box: Put initial of who is doing run
In second box: B = builds, R = runs, D = reports back to database, S = there is a good set of runs (10 per data point) for strong scaling in the database that appear on a graph, W = there is a good set of runs (10 per data point) for weak scaling in the database that appear on a graph
area under curve | GalaxSee | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |||||||||
ACLs | Sam | Sam | Sam | Sam | AW | AW | AW | AW | ||||||||
BobSCEd | ||||||||||||||||
BigRed | Sam | Sam | Sam | Sam | ||||||||||||
Sooner | Sam | Sam | Sam | Sam | ||||||||||||
pople | AW/CP | AW/CP | AW/CP | AW/CP |
Problem-sizes
- Big Red: 750000000000
- Sooner: 200000000000
- BobSCEd: 18000000000
- ACL: 18000000000
- New cluster (5/Feb/10)
* wiki page * Decommission Cairo * Figure out how to mount on Telco Rack * Get pdfs of all materials -- post them on wiki
- BCCD Testing (5/Feb/10)
* Get Fitz's liberation instructions into wiki * Get Kevin's VirtualBox instructions into wiki * pxe booting -- see if they booted, if you can ssh to them, if the run matrix works * Send /etc/bccd-revision with each email * Send output of netstat -rn and /sbin/ifconfig -a with each email * Run Matrix * For the future: scripts to boot & change bios, watchdog timer, 'test' mode in bccd, send emails about errors * USB scripts -- we don't need the "copy" script
- SIGCSE Conference -- March 10-13 (28/Feb/10)
* Leaving 8:00 Wednesday * Brad, Sam, or Gus pick up the van around 7, bring it by loading dock outside Noyes * Posters -- new area runs for graphs, start implementing stats collection and OpenMP, print at small size (what is that?) * Take 2 LittleFes, small switch, monitor/kyb/mouse (wireless), printed matter
- Spring Cleaning (Noyes Basement) (5/Feb/10)
* Next meeting: Saturday 6/Feb @ 3 pm
Generalized, Modular Parallel Framework
10,000 foot view of problems
Parent Process Sends Out | Children Send Back | Results Compiled By | |
---|---|---|---|
Area | function, bounds, segment size or count | sum of area for specified bounds | sum |
GalaxSee | complete array of stars, bounds (which stars to compute) | an array containing the computed stars | construct a new array of stars and repeat for next time step |
Matrix x Matrix | n rows from Matrix A and n columns from Matrix B, location of rows and cols | n resulting matrix position values, their location in results matrix | construct new result array |
Visualizing Parallel Framework
http://cs.earlham.edu/~carrick/parallel/parallelism-approaches.png
Parallel Problem Space
- Dwarf (algorithm family)
- Style of parallelism (shared, distributed, GPGPU, hybrid)
- Tiling (mapping problem to work units to workers)
- Distribution algorithm (getting work units to workers)
Summer of Fun (2009)
An external doc for GalaxSee
Documentation for OpenSim GalaxSee
What's in the database?
GalaxSee (MPI) | area-under-curve (MPI, openmpi) | area-under-curve (Hybrid, openmpi) | |||||||
---|---|---|---|---|---|---|---|---|---|
acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | |
np X-XX | 2-20 | 2-48 | 2-48 | 2-12 | 2-48 | 2-48 | 2-20 | 2-48 | 2-48 |
What works so far? B = builds, R = runs, W = works
area under curve | GalaxSee (standalone) | |||||||
---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |
acls | BRW | BRW | BRW | BRW | BR | |||
bobsced0 | BRW | BRW | BRW | BRW | BR | |||
c13 | BR | |||||||
BigRed | BRW | BRW | BRW | BRW | ||||
Sooner | BRW | BRW | BRW | BRW | ||||
pople | ||||||||
Charlie's laptop | BR |
To Do
- Fitz/Charlie's message
- Petascale review
- BobSCEd stress test
Implementations of area under the curve
- Serial
- OpenMP (shared)
- MPI (message passing)
- MPI (hybrid mp and shared)
- OpenMP + MPI (hybrid)
GalaxSee Goals
- Good piece of code, serves as teaching example for n-body problems in petascale.
- Dials, knobs, etc. in place to easily control how work is distributed when running in parallel.
- Architecture generally supports hybrid model running on large-scale constellations.
- Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency).
- Render in BCCD, metaverse, and /dev/null environments.
- Serial version
- Improve performance on math?
GalaxSee - scale to petascale with MPI and OpenMP hybrid.
- GalaxSee - render in-world and steer from in-world.
- Area under a curve - serial, MPI, and OpenMP implementations.
- OpenMPI - testing, performance.
- Start May 11th
LittleFe
- Testing
- Documentation
- Touch screen interface
Notes from May 21, 2009 Review
- Combined Makefiles with defines to build on a particular platform
- Write a driver script for GalaxSee ala the area under the curve script, consider combining
- Schema
- date, program_name, program_version, style, command line, compute_resource, NP, wall_time
- Document the process from start to finish
- Consider how we might iterate over e.g. number of stars, number of segments, etc.
- Command line option to stat.pl that provides a Torque wrapper for the scripts.
- Lint all code, consistent formatting
- Install latest and greatest Intel compiler in /cluster/bobsced
BobSCEd Upgrade
Build a new image for BobSCEd:
- One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
- Firmware update?
- C3 tools and configuration [v4.0.1]
- Ganglia and configuration [v3.1.2]
- PBS and configuration [v2.3.16]
- /cluster/bobsced local to bs0
- /cluster/... passed-through to compute nodes
- Large local scratch space on each node
- Gaussian09
- WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
- Infiniband and configuration
- GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
- Intel toolchain with OpenMPI and native libraries
- Sage with do-dads (see Charlie)
- Systemimager for the client nodes?
Installed:
Fix the broken nodes.
(Old) To Do
BCCD Liberation
- v1.1 release - upgrade procedures
Curriculum Modules
- POVRay
- GROMACS
- Energy and Weather
- Dave's math modules
- Standard format, templates, how-to for V and V
LittleFe
- Explore machines from first Intel donation (notes and pictures)
- Build 4 SCED units
Infrastructure
- Masa's GROMACS interface on Cairo
- gridgate configuration, Open Science Grid peering
- hopper'
SC Education
- Scott's homework (see the message)
- SC10 brainstorming
Current Projects
Past Projects
General Stuff
- Todo
- General
- Hopper
- Howto's
- Networking
- 2005-11-30 Meeting
- 2006-12-12 Meeting
- 2006-02-02 Meeting
- 2006-03-16 Meeting
- 2006-04-06 Meeting
- Node usage
Numbers for Netgear switches
Model: GSM712
- Main:
- SN: GM72B28DB005149
- RMA: 442348
- Cairo0:
- SN: GM72B28DB005120
- Cairo1:
- SN: GM72B28DB005119
Latex poster creation
Plumbing
Two latex style files and one class file are needed:
- textpos.sty for the text positioning package
- a0size.sty for the poster format package
- a0size.cls for the poster classes
These have to be either in the latex path (usually /usr/local/share/texmf/) or in the directory in which you invoke the latex command. Note that to add a style or class file that is not in either the standard or working directory you have to put the path in TEXINPUTS enviroment variable (not including the standard tex include directory in this will result in none of the standard class or style files being found) or put the full path in the \include in the latex source (I haven't tried this --Josh).
I just copied the files to my working directory while confirming the whole poster creation process works. While refining the poster, I just made some links to the textpos and a0size style and class files in a standard place in my home directory. --Josh
Setting up the poster format
\documentclass[landscape,a0,draft]{a0poster}
- The draft option is supposed to produce an A4 version of the poster. This does now work and the draft option has no noticable effect.
- This is the latex that requires the a0poster.sty and a0poster.cls.
Setting up and using textblocks
The textpos package sets up a grid over the document with spacing of about two inches per one grid spacing (the exact dimensions are unspecified; some eyeballing is needed) over which arbitrary latex and be placed. To setup the texpos package, use the following command:
\usepackage[absolute,overlay]{textpos}
- The absolute option makes the origin of the grid on which textblocks are positioned the upper lefthand corner. Using relative instead of absolute makes the origin the upper lefthand corner of the last textblock positioned.
- Overlay gives the textblocks opaque backgrounds. Without the overlay option, the background of the textblocks are transparent (no option for translucency).
\textblockcolour{white}
- Sets the background color of the subsequent textblocks.
\begin{textblock}{36.0}(3.0, 0.25) {\scriptsize \include{problem} } \end{textblock}
- This is the basic usage of the textblocks environment. The textblock arguments to the textblock look like:
\begin{textblock}{block_width}(x_loc_of_top_left, y_loc_of_top_left)
Turning images into PS/Latex
The easiest way I know of turning an image into eps is through xfig:
- Open xfig with a new project.
- Import an image (camera looking icon on the left icon array). Make sure to choose the original size and aspect ratio options.
- Export the project. Choose the "combined PS/Latex; both parts" option.
Your chosen export directory should hold both a .pstex file and a .pstex_t file.
Including sized images
Include the following functions to the top of your poster latex file (we should probably make our own .sty file at some point; I haven't done this yet because there are only two functions --Josh).
% pstex xfig export \newcommand{\location}{.} \newcommand{\pstex}[1]{\input{\location/#1.pstex_t}}
% \pstexsized{w}{h}{file} \width \height---orig, !---keep aspect ratio \newcommand{\pstexsized}[3]{% \resizebox*{#1}{#2}{\input{\location/#3.pstex_t}}}
Example Usage:
\pstexsized{5.1in}{3.9in}{cairo-dppc-rate}
- The arguments are the width of the image, the height of the image, and the name of the pstex files with out the .pstex or .pstex_t extensions.
Including a background image
The following latex takes a pstex image and maps it to the size of the poster via tex's makebox function.
\AddToShipoutPicture{% \AtTextCenter{% \makebox(0,0)[c]{\resizebox{\textwidth}{!}{% \rotatebox{0}{\pstex{fatc-poster-background-no-title}}}}% } }
Printing the poster in A4
Kludge alert: The only way I have been able to print the poster in A4 is to open the poster pdf on a mac using the default preview app. I use that app to export the image as an A4 jpeg (which prints with no problems) -- Josh.
Bugzilla
Quick Links
All Open LLK Bugs | Enter a Bug
Getting Started with Bugzilla
For help getting started with Bugzilla, see chapter 6 in the official documentation, Using Bugzilla.
General Bugzilla Etiquette
- Make certain that bugs are always assigned to or CC'ed to clustcomp att cs dott earlham dott edu. So, for example, when you accept a bug or assign it to someone else (thereby changing the default asignee from clustcomp att cs dott earlham dott edu), add a CC to clustcomp att cs dott earlham dott edu.
- Prioritize bug reports (especially new ones) accordingly:
- P5: of immediate concern, only the most pressing items (do these now!)
- P4: also highly urgent items, only slightly less so
- P3: slightly elevated priority
- P2: default/normal/average priority
- P1: Feature request
Modules
Software build options
ARCHPATH=`uname -s`/`/cluster/software/os_release`/`uname -p`
- Tcl:
./configure --prefix=/cluster/software/modules-sw/tcl/8.5.7/$ARCHPATH --enable-shared && make
- Modules:
./configure --prefix=/cluster/software/modules-sw/modules/3.2.7/$ARCHPATH --with-tcl=/cluster/software/modules-sw/tcl/8.5.7/$ARCHPATH/lib --with-static=yes && make
- Had to remove bash_completion from init/Makefile.
- After installation, changed the version from 3.2.6->3.2.7 in the init directory, and /usr/share/Modules to /cluster/software/modules-sw/modules/3.2.7/$ARCHPATH/Modules
- OpenMPI:
./configure --prefix=/cluster/software/modules-sw/openmpi/1.3.1/$ARCHPATH && make
E-mails from Skylar
That's still doable with multiple modules repositories. We can setup meta-modules that don't actually point you at software but alter how modules itself behaves. The first meta-module would prepend MODULEPATH with (say) "/cluster/software/modules" for software that's the same across all the clusters, and modules-bobsced would prepend MODULEPATH with "/cluster/bobsced/software/modules". Just like with PATH, the first hit within a module repository is used. If Perl is only in the main software repository it'll be grabbed from there, but if OpenMPI is in both it'll be grabbed from the bobsced repository since it comes before the main repository.
Items Particular to a Specific Cluster
Curriculum Modules
- gprof - statistical source code profiler
- Curriculum
- Fluid Dynamics
- Population Ecology
- GROMACS Web Interface
- Wiki Life for Academics
- PetaKit
Possible Future Projects
Archive
- TeraGrid '06 (Indianapolis, June 12-15, 2006)
- SIAM Parallel Processing 2006 (San Fransisco, February 22-24, 2006)
- Conference webpage
- Little-Fe abstract
- Low Latency Kernal abstract
- Folding@Clusters
- Best practices for teaching parallel programming to science faculty (Charlie only)