Difference between revisions of "Cluster Information"
(134 intermediate revisions by 12 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | == Summer 2011 (draft) == |
− | * | + | Calendar: May 9th through August 19th |
− | ** [http://www. | + | |
− | ** [[Cluster: | + | Events: |
− | ** [[Cluster: | + | * Undergraduate Petascale Institute @ NCSA - 29 May through 11 June (tentative) |
+ | * Intermediate Parallel Programming and Distributed Computing workshop @ OU - 30 July through 6 August (tentative) | ||
+ | |||
+ | Projects: | ||
+ | * LittleFe Build-out @ Earlham - n weeks of x people's time | ||
+ | * New cluster assembly - n weeks of x people's time | ||
+ | * Petascale project work - n weeks of x people's time | ||
+ | |||
+ | Personal Schedules: | ||
+ | * Charlie: Out 3-12 June (beach), | ||
+ | * Ivan: | ||
+ | * Aaron: | ||
+ | * Fitz: | ||
+ | * Mobeen: | ||
+ | * Brad: | ||
+ | |||
+ | == Al-Salam/CCG Downtime tasks == | ||
+ | * LDAP Server migration from bs-new -> hopper | ||
+ | * yum update over all nodes | ||
+ | * Turn HT off | ||
+ | * PVFS | ||
+ | * PBS server on Hopper | ||
+ | |||
+ | == How to use the PetaKit == | ||
+ | # If you do not already have it, obtain the source for the PetaKit from the CVS repository on hopper (curriculum-modules/PetaKit). | ||
+ | # cd to the Subkits directory of PetaKit and run the area-subkit.sh to make an area subkit tarball or GalaxSee-subkit.sh to make a GalaxSee subkit tarball. | ||
+ | # scp the tarball to the target resource and unpack it. | ||
+ | # cd into the directory and run ./configure --with-mpi --with-openmp | ||
+ | # Use stat.pl and args_man to make an appropriate statistics run. See args_man for a description of predicates. Example: | ||
+ | perl -w stat.pl --program area --style serial,mpi,openmp,hybrid --scheduler lsf --user leemasa --problem_size 200000000000 | ||
+ | --processes 1,2,3,4,5,6,7,8-16-64 --repetitions 10 -m -tag Sooner-strongest-newest --mpirun mpirun.lsf --ppn 8 | ||
+ | [[Modifying Programs for Use with PetaKit]] | ||
+ | |||
+ | == Current To-Do == | ||
+ | Date represents last meeting where we discussed the item | ||
+ | |||
+ | * Brad's Graphing Tool (28/Feb/10) | ||
+ | * Nice new functionality, see | ||
+ | * Waiting on clean data to finish multiple resource displays | ||
+ | * Error bars for left and right y-axes with checkboxes for each | ||
+ | * TeraGrid Runs (28/Feb/10) | ||
+ | |||
+ | In first box: Put initial of who is doing run | ||
+ | |||
+ | In second box: B = builds, R = runs, D = reports back to database, S = there is a good set of runs (10 per data point) for strong scaling in the database that appear on a graph, W = there is a good set of runs (10 per data point) for weak scaling in the database that appear on a graph | ||
+ | {| class="wikitable" border="1" | ||
+ | ! rowspan="2" | | ||
+ | ! colspan="8" | area under curve | ||
+ | ! colspan="8" | GalaxSee | ||
+ | |- | ||
+ | ! colspan="2" | Serial | ||
+ | ! colspan="2" | MPI | ||
+ | ! colspan="2" | OpenMP | ||
+ | ! colspan="2" | Hybrid | ||
+ | ! colspan="2" | Serial | ||
+ | ! colspan="2" | MPI | ||
+ | ! colspan="2" | OpenMP | ||
+ | ! colspan="2" | Hybrid | ||
+ | |- | ||
+ | ! [http://cs.earlham.edu/~amweeden06/memory-heap-summary.pdf ACLs] | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | AW | ||
+ | | | ||
+ | | AW | ||
+ | | | ||
+ | | AW | ||
+ | | | ||
+ | | AW | ||
+ | | | ||
+ | |- | ||
+ | ! [http://cs.earlham.edu/~amweeden06/memory-heap-summary.pdf BobSCEd] | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! [http://rc.uits.iu.edu/kb/index.php?kbID=aueo BigRed] | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! [http://www.oscer.ou.edu/resources.php#sooner Sooner] | ||
+ | | Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | |Sam | ||
+ | | | ||
+ | | Sam | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! [http://www.psc.edu/machines/sgi/altix/pople.php#hardware pople] | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | AW/CP | ||
+ | | | ||
+ | | AW/CP | ||
+ | | | ||
+ | | AW/CP | ||
+ | | | ||
+ | | AW/CP | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | Problem-sizes | ||
+ | *Big Red: 750000000000 | ||
+ | *Sooner: 200000000000 | ||
+ | *BobSCEd: 18000000000 | ||
+ | *ACL: 18000000000 | ||
+ | |||
+ | |||
+ | * New cluster (5/Feb/10) | ||
+ | * [https://wiki.cs.earlham.edu/index.php/Al-salam wiki page] | ||
+ | * Decommission Cairo | ||
+ | * Figure out how to mount on Telco Rack | ||
+ | * Get pdfs of all materials -- post them on wiki | ||
+ | * BCCD Testing (5/Feb/10) | ||
+ | * Get Fitz's liberation instructions into wiki | ||
+ | * Get Kevin's VirtualBox instructions into wiki | ||
+ | * pxe booting -- see if they booted, if you can ssh to them, if the run matrix works | ||
+ | * Send /etc/bccd-revision with each email | ||
+ | * Send output of netstat -rn and /sbin/ifconfig -a with each email | ||
+ | * [http://bccd.net/ver3/wiki/index.php/Tests Run Matrix] | ||
+ | * For the future: scripts to boot & change bios, watchdog timer, 'test' mode in bccd, send emails about errors | ||
+ | * USB scripts -- we don't need the "copy" script | ||
+ | * SIGCSE Conference -- March 10-13 (28/Feb/10) | ||
+ | * Leaving 8:00 Wednesday | ||
+ | * Brad, Sam, or Gus pick up the van around 7, bring it by loading dock outside Noyes | ||
+ | * Posters -- new area runs for graphs, start implementing stats collection and OpenMP, print at small size (what is that?) | ||
+ | * Take 2 LittleFes, small switch, monitor/kyb/mouse (wireless), printed matter | ||
+ | * Spring Cleaning (Noyes Basement) (5/Feb/10) | ||
+ | * Next meeting: Saturday 6/Feb @ 3 pm | ||
+ | |||
+ | == Generalized, Modular Parallel Framework == | ||
+ | |||
+ | === 10,000 foot view of problems === | ||
+ | {| class="wikitable" border="1" | ||
+ | |+ this conceptual view may not reflect current code | ||
+ | |- | ||
+ | ! || Parent Process Sends Out || Children Send Back || Results Compiled By | ||
+ | |- | ||
+ | ! Area | ||
+ | |function, bounds, segment size or count || sum of area for specified bounds || sum | ||
+ | |- | ||
+ | ! GalaxSee | ||
+ | | complete array of stars, bounds (which stars to compute) || an array containing the computed stars|| construct a new array of stars and repeat for next time step | ||
+ | |- | ||
+ | ! Matrix x Matrix | ||
+ | | n rows from Matrix A and n columns from Matrix B, location of rows and cols || n resulting matrix position values, their location in results matrix || construct new result array | ||
+ | |} | ||
+ | |||
+ | === Visualizing Parallel Framework === | ||
+ | http://cs.earlham.edu/~carrick/parallel/parallelism-approaches.png | ||
+ | |||
+ | === Parallel Problem Space === | ||
+ | * Dwarf (algorithm family) | ||
+ | * Style of parallelism (shared, distributed, GPGPU, hybrid) | ||
+ | * Tiling (mapping problem to work units to workers) | ||
+ | * Distribution algorithm (getting work units to workers) | ||
+ | |||
+ | == Summer of Fun (2009) == | ||
+ | [[GalaxSee|An external doc for GalaxSee]]<br /> | ||
+ | [[Cluster:OSGal|Documentation for OpenSim GalaxSee]] | ||
+ | |||
+ | What's in the database? | ||
+ | {| class="wikitable" border="1" | ||
+ | ! rowspan ="2" | | ||
+ | ! colspan ="3" | GalaxSee (MPI) | ||
+ | ! colspan ="3" | area-under-curve (MPI, openmpi) | ||
+ | ! colspan ="3" | area-under-curve (Hybrid, openmpi) | ||
+ | |- | ||
+ | ! acl0-5 | ||
+ | ! bs0-5 GigE | ||
+ | ! bs0-5 IB | ||
+ | ! acl0-5 | ||
+ | ! bs0-5 GigE | ||
+ | ! bs0-5 IB | ||
+ | ! acl0-5 | ||
+ | ! bs0-5 GigE | ||
+ | ! bs0-5 IB | ||
+ | |- | ||
+ | | np X-XX | ||
+ | | 2-20 | ||
+ | | 2-48 | ||
+ | | 2-48 | ||
+ | | 2-12 | ||
+ | | 2-48 | ||
+ | | 2-48 | ||
+ | | 2-20 | ||
+ | | 2-48 | ||
+ | | 2-48 | ||
+ | |} | ||
+ | |||
+ | What works so far? B = builds, R = runs, W = works | ||
+ | {| class="wikitable" border="1" | ||
+ | ! rowspan="2" | | ||
+ | ! colspan="4" | area under curve | ||
+ | ! colspan="4" | GalaxSee (standalone) | ||
+ | |- | ||
+ | ! Serial | ||
+ | ! MPI | ||
+ | ! OpenMP | ||
+ | ! Hybrid | ||
+ | ! Serial | ||
+ | ! MPI | ||
+ | ! OpenMP | ||
+ | ! Hybrid | ||
+ | |- | ||
+ | ! acls | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | | ||
+ | | BR | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! bobsced0 | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | | ||
+ | | BR | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! c13 | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | BR | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! BigRed | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! Sooner | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | BRW | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! pople | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | ! Charlie's laptop | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | BR | ||
+ | | | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | To Do | ||
+ | * Fitz/Charlie's message | ||
+ | * Petascale review | ||
+ | * BobSCEd stress test | ||
+ | |||
+ | Implementations of area under the curve | ||
+ | * Serial | ||
+ | * OpenMP (shared) | ||
+ | * MPI (message passing) | ||
+ | * MPI (hybrid mp and shared) | ||
+ | * OpenMP + MPI (hybrid) | ||
+ | |||
+ | GalaxSee Goals | ||
+ | * Good piece of code, serves as teaching example for n-body problems in petascale. | ||
+ | * Dials, knobs, etc. in place to easily control how work is distributed when running in parallel. | ||
+ | * Architecture generally supports hybrid model running on large-scale constellations. | ||
+ | * Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency). | ||
+ | * Render in BCCD, metaverse, and /dev/null environments. | ||
+ | * Serial version | ||
+ | * Improve performance on math? | ||
+ | |||
+ | GalaxSee - scale to petascale with MPI and OpenMP hybrid. | ||
+ | * GalaxSee - render in-world and steer from in-world. | ||
+ | * Area under a curve - serial, MPI, and OpenMP implementations. | ||
+ | * OpenMPI - testing, performance. | ||
+ | * Start May 11th | ||
+ | |||
+ | LittleFe | ||
+ | * Testing | ||
+ | * Documentation | ||
+ | * Touch screen interface | ||
+ | |||
+ | Notes from May 21, 2009 Review | ||
+ | * Combined Makefiles with defines to build on a particular platform | ||
+ | * Write a driver script for GalaxSee ala the area under the curve script, consider combining | ||
+ | * Schema | ||
+ | ** date, program_name, program_version, style, command line, compute_resource, NP, wall_time | ||
+ | * Document the process from start to finish | ||
+ | * Consider how we might iterate over e.g. number of stars, number of segments, etc. | ||
+ | * Command line option to stat.pl that provides a Torque wrapper for the scripts. | ||
+ | * Lint all code, consistent formatting | ||
+ | * Install latest and greatest Intel compiler in /cluster/bobsced | ||
+ | |||
+ | == BobSCEd Upgrade == | ||
+ | Build a new image for BobSCEd: | ||
+ | # One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see [http://www.gaussian.com/g09_plat.htm G09 platform list]) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok | ||
+ | # Firmware update? | ||
+ | # C3 tools and configuration [v4.0.1] | ||
+ | # Ganglia and configuration [v3.1.2] | ||
+ | # PBS and configuration [v2.3.16] | ||
+ | # /cluster/bobsced local to bs0 | ||
+ | # /cluster/... passed-through to compute nodes | ||
+ | # Large local scratch space on each node | ||
+ | # Gaussian09 | ||
+ | # WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker | ||
+ | # Infiniband and configuration | ||
+ | # GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1] | ||
+ | # Intel toolchain with OpenMPI and native libraries | ||
+ | # Sage with do-dads (see Charlie) | ||
+ | # Systemimager for the client nodes? | ||
+ | |||
+ | Installed: | ||
+ | * [[Cluster: New BobSCEd Install Log | New BobSCEd Install Log]] | ||
+ | * [[Cluster: New BobSCEd LDAP Log | New BobSCEd LDAP Log]] | ||
+ | * [[Cluster: Sage Chroot | Sage Chroot]] | ||
+ | |||
+ | Fix the broken nodes. | ||
+ | |||
+ | == (Old) To Do == | ||
+ | BCCD Liberation | ||
+ | * v1.1 release - upgrade procedures | ||
+ | |||
+ | Curriculum Modules | ||
+ | * POVRay | ||
+ | * GROMACS | ||
+ | * Energy and Weather | ||
+ | * Dave's math modules | ||
+ | * Standard format, templates, how-to for V and V | ||
+ | |||
+ | LittleFe | ||
+ | * Explore machines from first Intel donation ([[intel-lf-server|notes and pictures]]) | ||
+ | * Build 4 SCED units | ||
+ | |||
+ | Infrastructure | ||
+ | * Masa's GROMACS interface on Cairo | ||
+ | * gridgate configuration, Open Science Grid peering | ||
+ | * [[hopperprime|hopper']] | ||
+ | |||
+ | SC Education | ||
+ | * Scott's homework (see [[sc-education-homework-1|the message]]) | ||
+ | * SC10 [[sc10-brainstorming|brainstorming]] | ||
+ | |||
+ | == Current Projects == | ||
+ | * [[BCCD]] | ||
+ | * [[LittleFe Cluster|LittleFe]] | ||
+ | * [[Folding@Clusters|Folding@Clusters]] | ||
+ | * [[OpenMPI|Benchmarking OpenMPI]] | ||
− | == | + | == Past Projects == |
− | * [[ | + | * [[Cluster:Big-Fe|Big-FE]] |
− | + | * [[Cluster:LowLatency|Low Latency Linux Kernel]] | |
− | |||
− | * [[ | ||
== General Stuff == | == General Stuff == | ||
Line 17: | Line 436: | ||
* [[Cluster Howto's|Howto's]] | * [[Cluster Howto's|Howto's]] | ||
* [[Cluster:Networking|Networking]] | * [[Cluster:Networking|Networking]] | ||
+ | * [[Cluster:2005-11-30 Meeting|2005-11-30 Meeting]] | ||
+ | * [[Cluster:2006-12-12 Meeting|2006-12-12 Meeting]] | ||
+ | * [[Cluster:2006-02-02 Meeting|2006-02-02 Meeting]] | ||
+ | * [[Cluster:2006-03-16 Meeting|2006-03-16 Meeting]] | ||
+ | * [[Cluster:2006-04-06 Meeting|2006-04-06 Meeting]] | ||
+ | * [[Cluster:Node usage|Node usage]] | ||
+ | * [[Cluster:Netgear numbers|Numbers for Netgear switches]] | ||
+ | * [[Cluster:Latex poster creation|Latex Poster Creation]] | ||
+ | * [[Cluster:Bugzilla|Bugzilla Etiquette]] | ||
+ | * [[Cluster:Modules|Modules]] | ||
− | == | + | == Items Particular to a Specific Cluster == |
− | * [[ | + | * [[ACL Cluster|ACL]] |
− | * [[Cluster | + | * [[Al-salam|Al-Salam]] |
− | * [[ | + | * [[Athena Cluster|Athena]] |
+ | * [[Bazaar Cluster|Bazaar]] | ||
+ | * [[Cairo Cluster|Cairo]] | ||
+ | * [[Bobsced Cluster|Bobsced]] | ||
== Curriculum Modules == | == Curriculum Modules == | ||
+ | * [[Cluster:Gprof|gprof]] - statistical source code profiler | ||
* [[Cluster:Curriculum|Curriculum]] | * [[Cluster:Curriculum|Curriculum]] | ||
* [[Cluster:Fluid Dynamics|Fluid Dynamics]] | * [[Cluster:Fluid Dynamics|Fluid Dynamics]] | ||
* [[Cluster:Population Ecology|Population Ecology]] | * [[Cluster:Population Ecology|Population Ecology]] | ||
+ | * [[Cluster:GROMACS Web Interface|GROMACS Web Interface]] | ||
+ | * [[Cluster:Wiki|Wiki Life for Academics]] | ||
+ | * [[Cluster:PetaKit|PetaKit]] | ||
== Possible Future Projects == | == Possible Future Projects == | ||
* [[Cluster:Realtime Parallel Visualization|Realtime Parallel Visualization]] | * [[Cluster:Realtime Parallel Visualization|Realtime Parallel Visualization]] | ||
+ | |||
+ | == Archive == | ||
+ | * TeraGrid '06 (Indianapolis, June 12-15, 2006) | ||
+ | ** [http://www.teragrid.org Conference webpage] | ||
+ | ** [http://www.teragrid.org/events/2006conference/contest_poster.html Student poster guidelines] | ||
+ | ** [[Big-FE-teragrid-abstract|Big-FE abstract]] | ||
+ | |||
+ | * SIAM Parallel Processing 2006 (San Fransisco, February 22-24, 2006) | ||
+ | ** [http://www.siam.org/meetings/pp06 Conference webpage] | ||
+ | ** [[Cluster:little-fe-siam-pp06-abstract|Little-Fe abstract]] | ||
+ | ** [[Cluster:llk-siam-pp06-abstract|Low Latency Kernal abstract]] | ||
+ | ** Folding@Clusters | ||
+ | ** Best practices for teaching parallel programming to science faculty (Charlie only) | ||
+ | |||
+ | * [[College Avenue]] |
Latest revision as of 07:29, 11 February 2011
Contents
- 1 Summer 2011 (draft)
- 2 Al-Salam/CCG Downtime tasks
- 3 How to use the PetaKit
- 4 Current To-Do
- 5 Generalized, Modular Parallel Framework
- 6 Summer of Fun (2009)
- 7 BobSCEd Upgrade
- 8 (Old) To Do
- 9 Current Projects
- 10 Past Projects
- 11 General Stuff
- 12 Items Particular to a Specific Cluster
- 13 Curriculum Modules
- 14 Possible Future Projects
- 15 Archive
Summer 2011 (draft)
Calendar: May 9th through August 19th
Events:
- Undergraduate Petascale Institute @ NCSA - 29 May through 11 June (tentative)
- Intermediate Parallel Programming and Distributed Computing workshop @ OU - 30 July through 6 August (tentative)
Projects:
- LittleFe Build-out @ Earlham - n weeks of x people's time
- New cluster assembly - n weeks of x people's time
- Petascale project work - n weeks of x people's time
Personal Schedules:
- Charlie: Out 3-12 June (beach),
- Ivan:
- Aaron:
- Fitz:
- Mobeen:
- Brad:
Al-Salam/CCG Downtime tasks
- LDAP Server migration from bs-new -> hopper
- yum update over all nodes
- Turn HT off
- PVFS
- PBS server on Hopper
How to use the PetaKit
- If you do not already have it, obtain the source for the PetaKit from the CVS repository on hopper (curriculum-modules/PetaKit).
- cd to the Subkits directory of PetaKit and run the area-subkit.sh to make an area subkit tarball or GalaxSee-subkit.sh to make a GalaxSee subkit tarball.
- scp the tarball to the target resource and unpack it.
- cd into the directory and run ./configure --with-mpi --with-openmp
- Use stat.pl and args_man to make an appropriate statistics run. See args_man for a description of predicates. Example:
perl -w stat.pl --program area --style serial,mpi,openmp,hybrid --scheduler lsf --user leemasa --problem_size 200000000000 --processes 1,2,3,4,5,6,7,8-16-64 --repetitions 10 -m -tag Sooner-strongest-newest --mpirun mpirun.lsf --ppn 8
Modifying Programs for Use with PetaKit
Current To-Do
Date represents last meeting where we discussed the item
- Brad's Graphing Tool (28/Feb/10)
* Nice new functionality, see * Waiting on clean data to finish multiple resource displays * Error bars for left and right y-axes with checkboxes for each
- TeraGrid Runs (28/Feb/10)
In first box: Put initial of who is doing run
In second box: B = builds, R = runs, D = reports back to database, S = there is a good set of runs (10 per data point) for strong scaling in the database that appear on a graph, W = there is a good set of runs (10 per data point) for weak scaling in the database that appear on a graph
area under curve | GalaxSee | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |||||||||
ACLs | Sam | Sam | Sam | Sam | AW | AW | AW | AW | ||||||||
BobSCEd | ||||||||||||||||
BigRed | Sam | Sam | Sam | Sam | ||||||||||||
Sooner | Sam | Sam | Sam | Sam | ||||||||||||
pople | AW/CP | AW/CP | AW/CP | AW/CP |
Problem-sizes
- Big Red: 750000000000
- Sooner: 200000000000
- BobSCEd: 18000000000
- ACL: 18000000000
- New cluster (5/Feb/10)
* wiki page * Decommission Cairo * Figure out how to mount on Telco Rack * Get pdfs of all materials -- post them on wiki
- BCCD Testing (5/Feb/10)
* Get Fitz's liberation instructions into wiki * Get Kevin's VirtualBox instructions into wiki * pxe booting -- see if they booted, if you can ssh to them, if the run matrix works * Send /etc/bccd-revision with each email * Send output of netstat -rn and /sbin/ifconfig -a with each email * Run Matrix * For the future: scripts to boot & change bios, watchdog timer, 'test' mode in bccd, send emails about errors * USB scripts -- we don't need the "copy" script
- SIGCSE Conference -- March 10-13 (28/Feb/10)
* Leaving 8:00 Wednesday * Brad, Sam, or Gus pick up the van around 7, bring it by loading dock outside Noyes * Posters -- new area runs for graphs, start implementing stats collection and OpenMP, print at small size (what is that?) * Take 2 LittleFes, small switch, monitor/kyb/mouse (wireless), printed matter
- Spring Cleaning (Noyes Basement) (5/Feb/10)
* Next meeting: Saturday 6/Feb @ 3 pm
Generalized, Modular Parallel Framework
10,000 foot view of problems
Parent Process Sends Out | Children Send Back | Results Compiled By | |
---|---|---|---|
Area | function, bounds, segment size or count | sum of area for specified bounds | sum |
GalaxSee | complete array of stars, bounds (which stars to compute) | an array containing the computed stars | construct a new array of stars and repeat for next time step |
Matrix x Matrix | n rows from Matrix A and n columns from Matrix B, location of rows and cols | n resulting matrix position values, their location in results matrix | construct new result array |
Visualizing Parallel Framework
http://cs.earlham.edu/~carrick/parallel/parallelism-approaches.png
Parallel Problem Space
- Dwarf (algorithm family)
- Style of parallelism (shared, distributed, GPGPU, hybrid)
- Tiling (mapping problem to work units to workers)
- Distribution algorithm (getting work units to workers)
Summer of Fun (2009)
An external doc for GalaxSee
Documentation for OpenSim GalaxSee
What's in the database?
GalaxSee (MPI) | area-under-curve (MPI, openmpi) | area-under-curve (Hybrid, openmpi) | |||||||
---|---|---|---|---|---|---|---|---|---|
acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | |
np X-XX | 2-20 | 2-48 | 2-48 | 2-12 | 2-48 | 2-48 | 2-20 | 2-48 | 2-48 |
What works so far? B = builds, R = runs, W = works
area under curve | GalaxSee (standalone) | |||||||
---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |
acls | BRW | BRW | BRW | BRW | BR | |||
bobsced0 | BRW | BRW | BRW | BRW | BR | |||
c13 | BR | |||||||
BigRed | BRW | BRW | BRW | BRW | ||||
Sooner | BRW | BRW | BRW | BRW | ||||
pople | ||||||||
Charlie's laptop | BR |
To Do
- Fitz/Charlie's message
- Petascale review
- BobSCEd stress test
Implementations of area under the curve
- Serial
- OpenMP (shared)
- MPI (message passing)
- MPI (hybrid mp and shared)
- OpenMP + MPI (hybrid)
GalaxSee Goals
- Good piece of code, serves as teaching example for n-body problems in petascale.
- Dials, knobs, etc. in place to easily control how work is distributed when running in parallel.
- Architecture generally supports hybrid model running on large-scale constellations.
- Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency).
- Render in BCCD, metaverse, and /dev/null environments.
- Serial version
- Improve performance on math?
GalaxSee - scale to petascale with MPI and OpenMP hybrid.
- GalaxSee - render in-world and steer from in-world.
- Area under a curve - serial, MPI, and OpenMP implementations.
- OpenMPI - testing, performance.
- Start May 11th
LittleFe
- Testing
- Documentation
- Touch screen interface
Notes from May 21, 2009 Review
- Combined Makefiles with defines to build on a particular platform
- Write a driver script for GalaxSee ala the area under the curve script, consider combining
- Schema
- date, program_name, program_version, style, command line, compute_resource, NP, wall_time
- Document the process from start to finish
- Consider how we might iterate over e.g. number of stars, number of segments, etc.
- Command line option to stat.pl that provides a Torque wrapper for the scripts.
- Lint all code, consistent formatting
- Install latest and greatest Intel compiler in /cluster/bobsced
BobSCEd Upgrade
Build a new image for BobSCEd:
- One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
- Firmware update?
- C3 tools and configuration [v4.0.1]
- Ganglia and configuration [v3.1.2]
- PBS and configuration [v2.3.16]
- /cluster/bobsced local to bs0
- /cluster/... passed-through to compute nodes
- Large local scratch space on each node
- Gaussian09
- WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
- Infiniband and configuration
- GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
- Intel toolchain with OpenMPI and native libraries
- Sage with do-dads (see Charlie)
- Systemimager for the client nodes?
Installed:
Fix the broken nodes.
(Old) To Do
BCCD Liberation
- v1.1 release - upgrade procedures
Curriculum Modules
- POVRay
- GROMACS
- Energy and Weather
- Dave's math modules
- Standard format, templates, how-to for V and V
LittleFe
- Explore machines from first Intel donation (notes and pictures)
- Build 4 SCED units
Infrastructure
- Masa's GROMACS interface on Cairo
- gridgate configuration, Open Science Grid peering
- hopper'
SC Education
- Scott's homework (see the message)
- SC10 brainstorming
Current Projects
Past Projects
General Stuff
- Todo
- General
- Hopper
- Howto's
- Networking
- 2005-11-30 Meeting
- 2006-12-12 Meeting
- 2006-02-02 Meeting
- 2006-03-16 Meeting
- 2006-04-06 Meeting
- Node usage
- Numbers for Netgear switches
- Latex Poster Creation
- Bugzilla Etiquette
- Modules
Items Particular to a Specific Cluster
Curriculum Modules
- gprof - statistical source code profiler
- Curriculum
- Fluid Dynamics
- Population Ecology
- GROMACS Web Interface
- Wiki Life for Academics
- PetaKit
Possible Future Projects
Archive
- TeraGrid '06 (Indianapolis, June 12-15, 2006)
- SIAM Parallel Processing 2006 (San Fransisco, February 22-24, 2006)
- Conference webpage
- Little-Fe abstract
- Low Latency Kernal abstract
- Folding@Clusters
- Best practices for teaching parallel programming to science faculty (Charlie only)