Difference between revisions of "Cluster Information"
Jump to navigation
Jump to search
Amweeden06 (talk | contribs) (→Current To-Do) |
Amweeden06 (talk | contribs) (→Current To-Do) |
||
Line 3: | Line 3: | ||
* Writeup for Jeff Krause -- see email -- '''Aaron, Sam, Fitz, Gus?, Charlie?''' | * Writeup for Jeff Krause -- see email -- '''Aaron, Sam, Fitz, Gus?, Charlie?''' | ||
− | * Brad's Graphing Tool ( | + | * Brad's Graphing Tool (19/Oct/09) -- '''Brad''' |
* 2 y-axis scales (runtime and problem size) | * 2 y-axis scales (runtime and problem size) | ||
* error bars for left and right y-axes with checkboxes for each | * error bars for left and right y-axes with checkboxes for each | ||
* eps file for the poster | * eps file for the poster | ||
− | * TeraGrid Runs ( | + | * TeraGrid Runs (19/Oct/09) -- '''Aaron, Sam, Gus''' |
* Update configure.ac based on Fitz's notes from Kraken | * Update configure.ac based on Fitz's notes from Kraken | ||
− | * New LittleFe Boards ( | + | * New LittleFe Boards (19/Oct/09) -- '''Gus''' |
* Order the boards -- '''Charlie''' | * Order the boards -- '''Charlie''' | ||
'''Board Criteria:''' | '''Board Criteria:''' | ||
Line 16: | Line 16: | ||
* 2 GB ram | * 2 GB ram | ||
* Cuda Enabled (with chip on board); OpenCL O.K. too | * Cuda Enabled (with chip on board); OpenCL O.K. too | ||
− | * BCCD Testing ( | + | * BCCD Testing (19/Oct/09) -- '''Gus, Sam, and Aaron''' |
* How many node boot up as dhcp servers? (answer: just the first one, confirmed 10/13) | * How many node boot up as dhcp servers? (answer: just the first one, confirmed 10/13) | ||
* Liberation testing -- '''Charlie, Gus, Sam, Aaron''' -- moved to Tuesday, 10/20 @ 2:30p | * Liberation testing -- '''Charlie, Gus, Sam, Aaron''' -- moved to Tuesday, 10/20 @ 2:30p | ||
Line 31: | Line 31: | ||
* Code Sanity Check (12/Oct/09) -- '''Aaron and Sam''' | * Code Sanity Check (12/Oct/09) -- '''Aaron and Sam''' | ||
* Take exact same set of runs on Sooner | * Take exact same set of runs on Sooner | ||
− | * EC/SCED Poster Sessions ( | + | * EC/SCED Poster Sessions (19/Oct/09) '''Aaron, Sam, Fitz, and Brad''' |
* Describe the environment we've created -- how we obtain results is just as interesting as results themselves | * Describe the environment we've created -- how we obtain results is just as interesting as results themselves | ||
* Pick good graphs | * Pick good graphs |
Revision as of 13:49, 19 October 2009
Contents
Current To-Do
Date represents last meeting where we discussed the item
- Writeup for Jeff Krause -- see email -- Aaron, Sam, Fitz, Gus?, Charlie?
- Brad's Graphing Tool (19/Oct/09) -- Brad
* 2 y-axis scales (runtime and problem size) * error bars for left and right y-axes with checkboxes for each * eps file for the poster
- TeraGrid Runs (19/Oct/09) -- Aaron, Sam, Gus
* Update configure.ac based on Fitz's notes from Kraken
- New LittleFe Boards (19/Oct/09) -- Gus
* Order the boards -- Charlie Board Criteria: * mini itx form factor * at least 2 core (probably max 2 core) * 2 GB ram * Cuda Enabled (with chip on board); OpenCL O.K. too
- BCCD Testing (19/Oct/09) -- Gus, Sam, and Aaron
* How many node boot up as dhcp servers? (answer: just the first one, confirmed 10/13) * Liberation testing -- Charlie, Gus, Sam, Aaron -- moved to Tuesday, 10/20 @ 2:30p * Verify no I/O errors in dmesg * Test all boot options, including linux 5 * Send /etc/bccd-revision to Skylar w/ each email * Test MPICH2 on Life, paramspace, GalaxSee, etc. * module unload openmpi && module load mpich2 && make clean && make * for MPICH2, machines file must be in current directory * do -np 2, 4, etc.
- Travel Plans (12/Oct/09)
* Send flight info to Jeff Krause -- Charlie * Talk to Travel-On -- Brad and Charlie
- Code Sanity Check (12/Oct/09) -- Aaron and Sam
* Take exact same set of runs on Sooner
- EC/SCED Poster Sessions (19/Oct/09) Aaron, Sam, Fitz, and Brad
* Describe the environment we've created -- how we obtain results is just as interesting as results themselves * Pick good graphs * Add text of TeraGrid poster into CVS -- Sam * Update Fitz's abstract -- Fitz * EC -- Wednesday (10/21) @ 6 pm * SCEd -- Combine TeraGrid and Fitz posters
Summer of Fun (2009)
An external doc for GalaxSee
Documentation for OpenSim GalaxSee
What's in the database?
GalaxSee (MPI) | area-under-curve (MPI, openmpi) | area-under-curve (Hybrid, openmpi) | |||||||
---|---|---|---|---|---|---|---|---|---|
acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | |
np X-XX | 2-20 | 2-48 | 2-48 | 2-12 | 2-48 | 2-48 | 2-20 | 2-48 | 2-48 |
What works so far? B = builds, R = runs, W = works
B-builds, R-runs | area under curve | GalaxSee (standalone) | ||||||
---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |
acls | BRW | BRW | BRW | BRW | BRW | |||
bobsced0 | BRW | BRW | BRW | BRW | BRW | |||
c13 | BRW | |||||||
pople | ||||||||
Charlie's laptop | BRW |
To Do
- Fitz/Charlie's message
- Petascale review
- BobSCEd stress test
Implementations of area under the curve
- Serial
- OpenMP (shared)
- MPI (message passing)
- MPI (hybrid mp and shared)
- OpenMP + MPI (hybrid)
GalaxSee Goals
- Good piece of code, serves as teaching example for n-body problems in petascale.
- Dials, knobs, etc. in place to easily control how work is distributed when running in parallel.
- Architecture generally supports hybrid model running on large-scale constellations.
- Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency).
- Render in BCCD, metaverse, and /dev/null environments.
- Serial version
- Improve performance on math?
GalaxSee - scale to petascale with MPI and OpenMP hybrid.
- GalaxSee - render in-world and steer from in-world.
- Area under a curve - serial, MPI, and OpenMP implementations.
- OpenMPI - testing, performance.
- Start May 11th
LittleFe
- Testing
- Documentation
- Touch screen interface
Notes from May 21, 2009 Review
- Combined Makefiles with defines to build on a particular platform
- Write a driver script for GalaxSee ala the area under the curve script, consider combining
- Schema
- date, program_name, program_version, style, command line, compute_resource, NP, wall_time
- Document the process from start to finish
- Consider how we might iterate over e.g. number of stars, number of segments, etc.
- Command line option to stat.pl that provides a Torque wrapper for the scripts.
- Lint all code, consistent formatting
- Install latest and greatest Intel compiler in /cluster/bobsced
BobSCEd Upgrade
Build a new image for BobSCEd:
- One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
- Firmware update?
- C3 tools and configuration [v4.0.1]
- Ganglia and configuration [v3.1.2]
- PBS and configuration [v2.3.16]
- /cluster/bobsced local to bs0
- /cluster/... passed-through to compute nodes
- Large local scratch space on each node
- Gaussian09
- WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
- Infiniband and configuration
- GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
- Intel toolchain with OpenMPI and native libraries
- Sage with do-dads (see Charlie)
- Systemimager for the client nodes?
Installed:
Fix the broken nodes.
(Old) To Do
BCCD Liberation
- v1.1 release - upgrade procedures
Curriculum Modules
- POVRay
- GROMACS
- Energy and Weather
- Dave's math modules
- Standard format, templates, how-to for V and V
LittleFe
- Explore machines from first Intel donation (notes and pictures)
- Build 4 SCED units
Infrastructure
- Masa's GROMACS interface on Cairo
- gridgate configuration, Open Science Grid peering
- hopper'
SC Education
- Scott's homework (see the message)
Current Projects
Past Projects
General Stuff
- Todo
- General
- Hopper
- Howto's
- Networking
- 2005-11-30 Meeting
- 2006-12-12 Meeting
- 2006-02-02 Meeting
- 2006-03-16 Meeting
- 2006-04-06 Meeting
- Node usage
- Numbers for Netgear switches
- Latex Poster Creation
- Bugzilla Etiquette
- Modules
Items Particular to a Specific Cluster
Curriculum Modules
Possible Future Projects
Archive
- TeraGrid '06 (Indianapolis, June 12-15, 2006)
- SIAM Parallel Processing 2006 (San Fransisco, February 22-24, 2006)
- Conference webpage
- Little-Fe abstract
- Low Latency Kernal abstract
- Folding@Clusters
- Best practices for teaching parallel programming to science faculty (Charlie only)