Difference between revisions of "Cluster Information"
(→Al-Salam/CCG Downtime tasks)
|Line 4:||Line 4:|
* Turn HT off
* Turn HT off
== How to use the PetaKit ==
== How to use the PetaKit ==
Revision as of 08:21, 12 August 2010
- 1 Al-Salam/CCG Downtime tasks
- 2 How to use the PetaKit
- 3 Current To-Do
- 4 Generalized, Modular Parallel Framework
- 5 Summer of Fun (2009)
- 6 BobSCEd Upgrade
- 7 (Old) To Do
- 8 Current Projects
- 9 Past Projects
- 10 General Stuff
- 11 Items Particular to a Specific Cluster
- 12 Curriculum Modules
- 13 Possible Future Projects
- 14 Archive
Al-Salam/CCG Downtime tasks
- LDAP Server migration from bs-new -> hopper
- yum update over all nodes
- Turn HT off
- PBS server on Hopper
How to use the PetaKit
- If you do not already have it, obtain the source for the PetaKit from the CVS repository on hopper (curriculum-modules/PetaKit).
- cd to the Subkits directory of PetaKit and run the area-subkit.sh to make an area subkit tarball or GalaxSee-subkit.sh to make a GalaxSee subkit tarball.
- scp the tarball to the target resource and unpack it.
- cd into the directory and run ./configure --with-mpi --with-openmp
- Use stat.pl and args_man to make an appropriate statistics run. See args_man for a description of predicates. Example:
perl -w stat.pl --program area --style serial,mpi,openmp,hybrid --scheduler lsf --user leemasa --problem_size 200000000000 --processes 1,2,3,4,5,6,7,8-16-64 --repetitions 10 -m -tag Sooner-strongest-newest --mpirun mpirun.lsf --ppn 8
Date represents last meeting where we discussed the item
- Brad's Graphing Tool (28/Feb/10)
* Nice new functionality, see * Waiting on clean data to finish multiple resource displays * Error bars for left and right y-axes with checkboxes for each
- TeraGrid Runs (28/Feb/10)
In first box: Put initial of who is doing run
In second box: B = builds, R = runs, D = reports back to database, S = there is a good set of runs (10 per data point) for strong scaling in the database that appear on a graph, W = there is a good set of runs (10 per data point) for weak scaling in the database that appear on a graph
|area under curve||GalaxSee|
- Big Red: 750000000000
- Sooner: 200000000000
- BobSCEd: 18000000000
- ACL: 18000000000
- New cluster (5/Feb/10)
* wiki page * Decommission Cairo * Figure out how to mount on Telco Rack * Get pdfs of all materials -- post them on wiki
- BCCD Testing (5/Feb/10)
* Get Fitz's liberation instructions into wiki * Get Kevin's VirtualBox instructions into wiki * pxe booting -- see if they booted, if you can ssh to them, if the run matrix works * Send /etc/bccd-revision with each email * Send output of netstat -rn and /sbin/ifconfig -a with each email * Run Matrix * For the future: scripts to boot & change bios, watchdog timer, 'test' mode in bccd, send emails about errors * USB scripts -- we don't need the "copy" script
- SIGCSE Conference -- March 10-13 (28/Feb/10)
* Leaving 8:00 Wednesday * Brad, Sam, or Gus pick up the van around 7, bring it by loading dock outside Noyes * Posters -- new area runs for graphs, start implementing stats collection and OpenMP, print at small size (what is that?) * Take 2 LittleFes, small switch, monitor/kyb/mouse (wireless), printed matter
- Spring Cleaning (Noyes Basement) (5/Feb/10)
* Next meeting: Saturday 6/Feb @ 3 pm
Generalized, Modular Parallel Framework
10,000 foot view of problems
|Parent Process Sends Out||Children Send Back||Results Compiled By|
|Area||function, bounds, segment size or count||sum of area for specified bounds||sum|
|GalaxSee||complete array of stars, bounds (which stars to compute)||an array containing the computed stars||construct a new array of stars and repeat for next time step|
|Matrix x Matrix||n rows from Matrix A and n columns from Matrix B, location of rows and cols||n resulting matrix position values, their location in results matrix||construct new result array|
Visualizing Parallel Framework
Parallel Problem Space
- Dwarf (algorithm family)
- Style of parallelism (shared, distributed, GPGPU, hybrid)
- Tiling (mapping problem to work units to workers)
- Distribution algorithm (getting work units to workers)
Summer of Fun (2009)
What's in the database?
|GalaxSee (MPI)||area-under-curve (MPI, openmpi)||area-under-curve (Hybrid, openmpi)|
|acl0-5||bs0-5 GigE||bs0-5 IB||acl0-5||bs0-5 GigE||bs0-5 IB||acl0-5||bs0-5 GigE||bs0-5 IB|
What works so far? B = builds, R = runs, W = works
|area under curve||GalaxSee (standalone)|
- Fitz/Charlie's message
- Petascale review
- BobSCEd stress test
Implementations of area under the curve
- OpenMP (shared)
- MPI (message passing)
- MPI (hybrid mp and shared)
- OpenMP + MPI (hybrid)
- Good piece of code, serves as teaching example for n-body problems in petascale.
- Dials, knobs, etc. in place to easily control how work is distributed when running in parallel.
- Architecture generally supports hybrid model running on large-scale constellations.
- Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency).
- Render in BCCD, metaverse, and /dev/null environments.
- Serial version
- Improve performance on math?
GalaxSee - scale to petascale with MPI and OpenMP hybrid.
- GalaxSee - render in-world and steer from in-world.
- Area under a curve - serial, MPI, and OpenMP implementations.
- OpenMPI - testing, performance.
- Start May 11th
- Touch screen interface
Notes from May 21, 2009 Review
- Combined Makefiles with defines to build on a particular platform
- Write a driver script for GalaxSee ala the area under the curve script, consider combining
- date, program_name, program_version, style, command line, compute_resource, NP, wall_time
- Document the process from start to finish
- Consider how we might iterate over e.g. number of stars, number of segments, etc.
- Command line option to stat.pl that provides a Torque wrapper for the scripts.
- Lint all code, consistent formatting
- Install latest and greatest Intel compiler in /cluster/bobsced
Build a new image for BobSCEd:
- One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
- Firmware update?
- C3 tools and configuration [v4.0.1]
- Ganglia and configuration [v3.1.2]
- PBS and configuration [v2.3.16]
- /cluster/bobsced local to bs0
- /cluster/... passed-through to compute nodes
- Large local scratch space on each node
- WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
- Infiniband and configuration
- GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
- Intel toolchain with OpenMPI and native libraries
- Sage with do-dads (see Charlie)
- Systemimager for the client nodes?
Fix the broken nodes.
(Old) To Do
- v1.1 release - upgrade procedures
- Energy and Weather
- Dave's math modules
- Standard format, templates, how-to for V and V
- Explore machines from first Intel donation (notes and pictures)
- Build 4 SCED units
- Masa's GROMACS interface on Cairo
- gridgate configuration, Open Science Grid peering
- 2005-11-30 Meeting
- 2006-12-12 Meeting
- 2006-02-02 Meeting
- 2006-03-16 Meeting
- 2006-04-06 Meeting
- Node usage
- Numbers for Netgear switches
- Latex Poster Creation
- Bugzilla Etiquette
Items Particular to a Specific Cluster
- gprof - statistical source code profiler
- Fluid Dynamics
- Population Ecology
- GROMACS Web Interface
- Wiki Life for Academics
Possible Future Projects
- TeraGrid '06 (Indianapolis, June 12-15, 2006)
- SIAM Parallel Processing 2006 (San Fransisco, February 22-24, 2006)