Difference between revisions of "Cluster Information"

From Earlham CS Department
Jump to navigation Jump to search
(Current To-Do)
(Current To-Do)
Line 7: Line 7:
 
   * error bars for left and right y-axes with checkboxes for each
 
   * error bars for left and right y-axes with checkboxes for each
 
   * eps file for the poster
 
   * eps file for the poster
* TeraGrid Runs  (9/Nov/09)
+
* TeraGrid Runs  (1/Dec/09)
 
   * Check new GalaxSee on BobSCEd
 
   * Check new GalaxSee on BobSCEd
 
   * Update configure.ac based on Fitz's notes from Kraken
 
   * Update configure.ac based on Fitz's notes from Kraken
Line 15: Line 15:
 
     * Double check on BobSCEd, find the algorithm problem
 
     * Double check on BobSCEd, find the algorithm problem
 
* New LittleFe Boards  (19/Oct/09)
 
* New LittleFe Boards  (19/Oct/09)
* BCCD Testing  (9/Nov/09)
+
* BCCD Testing  (1/Dec/09)
 
   * Liberation testing
 
   * Liberation testing
 
   * pxe booting
 
   * pxe booting
Line 26: Line 26:
 
     * for MPICH2, machines file must be in current directory
 
     * for MPICH2, machines file must be in current directory
 
     * do -np 2, 4, etc.
 
     * do -np 2, 4, etc.
 
 
Most important -- Runs, BCCD testing,
 
  
 
== Summer of Fun (2009) ==
 
== Summer of Fun (2009) ==

Revision as of 15:58, 1 December 2009

Current To-Do

Date represents last meeting where we discussed the item

  • Writeups for Jeff Krause (3/Nov/09)
  • Brad's Graphing Tool (19/Oct/09)
 * 2 y-axis scales (runtime and problem size)
 * error bars for left and right y-axes with checkboxes for each
 * eps file for the poster
  • TeraGrid Runs (1/Dec/09)
 * Check new GalaxSee on BobSCEd
 * Update configure.ac based on Fitz's notes from Kraken
 * Status on pople?
 * Remember Big Red has a debug queue
 * GalaxSee -- after stats changes, GalaxSee slowed down by a lot.
   * Double check on BobSCEd, find the algorithm problem
  • New LittleFe Boards (19/Oct/09)
  • BCCD Testing (1/Dec/09)
 * Liberation testing
 * pxe booting
 * Test all boot options
 * Test CUDA boot options
 * Send /etc/bccd-revision with each email
 * Send output of netstat -rn and /sbin/ifconfig -a with each email
 * Test MPICH2 on Life, paramspace, GalaxSee, etc.
   * module unload openmpi && module load mpich2 && make clean && make
   * for MPICH2, machines file must be in current directory
   * do -np 2, 4, etc.

Summer of Fun (2009)

An external doc for GalaxSee
Documentation for OpenSim GalaxSee

What's in the database?

GalaxSee (MPI) area-under-curve (MPI, openmpi) area-under-curve (Hybrid, openmpi)
acl0-5 bs0-5 GigE bs0-5 IB acl0-5 bs0-5 GigE bs0-5 IB acl0-5 bs0-5 GigE bs0-5 IB
np X-XX 2-20 2-48 2-48 2-12 2-48 2-48 2-20 2-48 2-48

What works so far? B = builds, R = runs, W = works

B-builds, R-runs area under curve GalaxSee (standalone)
Serial MPI OpenMP Hybrid Serial MPI OpenMP Hybrid
acls BRW BRW BRW BRW BRW
bobsced0 BRW BRW BRW BRW BRW
c13 BRW
pople
Charlie's laptop BRW

To Do

  • Fitz/Charlie's message
  • Petascale review
  • BobSCEd stress test

Implementations of area under the curve

  • Serial
  • OpenMP (shared)
  • MPI (message passing)
  • MPI (hybrid mp and shared)
  • OpenMP + MPI (hybrid)

GalaxSee Goals

  • Good piece of code, serves as teaching example for n-body problems in petascale.
  • Dials, knobs, etc. in place to easily control how work is distributed when running in parallel.
  • Architecture generally supports hybrid model running on large-scale constellations.
  • Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency).
  • Render in BCCD, metaverse, and /dev/null environments.
  • Serial version
  • Improve performance on math?

GalaxSee - scale to petascale with MPI and OpenMP hybrid.

  • GalaxSee - render in-world and steer from in-world.
  • Area under a curve - serial, MPI, and OpenMP implementations.
  • OpenMPI - testing, performance.
  • Start May 11th

LittleFe

  • Testing
  • Documentation
  • Touch screen interface

Notes from May 21, 2009 Review

  • Combined Makefiles with defines to build on a particular platform
  • Write a driver script for GalaxSee ala the area under the curve script, consider combining
  • Schema
    • date, program_name, program_version, style, command line, compute_resource, NP, wall_time
  • Document the process from start to finish
  • Consider how we might iterate over e.g. number of stars, number of segments, etc.
  • Command line option to stat.pl that provides a Torque wrapper for the scripts.
  • Lint all code, consistent formatting
  • Install latest and greatest Intel compiler in /cluster/bobsced

BobSCEd Upgrade

Build a new image for BobSCEd:

  1. One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
  2. Firmware update?
  3. C3 tools and configuration [v4.0.1]
  4. Ganglia and configuration [v3.1.2]
  5. PBS and configuration [v2.3.16]
  6. /cluster/bobsced local to bs0
  7. /cluster/... passed-through to compute nodes
  8. Large local scratch space on each node
  9. Gaussian09
  10. WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
  11. Infiniband and configuration
  12. GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
  13. Intel toolchain with OpenMPI and native libraries
  14. Sage with do-dads (see Charlie)
  15. Systemimager for the client nodes?

Installed:

Fix the broken nodes.

(Old) To Do

BCCD Liberation

  • v1.1 release - upgrade procedures

Curriculum Modules

  • POVRay
  • GROMACS
  • Energy and Weather
  • Dave's math modules
  • Standard format, templates, how-to for V and V

LittleFe

Infrastructure

  • Masa's GROMACS interface on Cairo
  • gridgate configuration, Open Science Grid peering
  • hopper'

SC Education

Current Projects

Past Projects

General Stuff

Items Particular to a Specific Cluster

Curriculum Modules

Possible Future Projects

Archive