Difference between revisions of "Cluster:Todo"

From Earlham CS Department
Jump to navigation Jump to search
(Current Items (updated November 9, 2005))
 
(87 intermediate revisions by 5 users not shown)
Line 1: Line 1:
(Need a notation for relative priority)  
+
(Need a notation for relative priority.  Please don't delete anything unless we're updating this during a meeting.)  
 
__NOTOC__
 
__NOTOC__
 +
 +
== Quick Test ==
 +
== Current Items (updated November 9, 2005)==
 +
=== Little-Fe ===
 +
* Liberate BCCD onto Little-Fe, making progress, see BCCD/PPC wiki for details (Kevin and Toby)
 +
** Use singularity
 +
* Send email to Paul about BCCD changes (Toby)
 +
* Setup archive for list-packages
 +
* Get test clusters liberated
 +
* Get qemu or UML working for test environment (Skylar/Kevin)
 +
* Talk about Bugzilla setup for liberation
 +
 +
=== LLK (see [[Cluster:LowLatency]] for the details) ===
 +
* '''Note:''' migration of this list to bugzilla is in progress; here's the current [http://cluster.earlham.edu/bugzilla/buglist.cgi?query_format=specific&order=relevance+desc&bug_status=__open__&product=llk llk bug list]
 +
* Check Alteon drivers for STP, do they support cairo?
 +
* Find statistics information (Skylar,Alex)
 +
* Read the STP paper, emulate his test methodology/program? (Everyone)
 +
* Is there a 2.6 version of STP? (''Only from us.'')  SGI? (''Not likely.'') (Skylar)
 +
* Look at separate socket implementations (Skylar)
 +
* Look at Netpipe calls for STP help (Skylar)
 +
* Measure latency in kernel and on wire using either kperf or tp_timer - (Alex and Toby)
 +
* Investigate tp_timer instabilities (Toby, Alex)
 +
** Test accuracy by loading one of the nodes with CPU and disk traffic (lots)
 +
** Setup and document this, along with kernel building/loading/starting, so that any of us can make a change and a measurement.
 +
** Measure packet loss rate at each node and the switches and hopper using SNMP/Cricket (Skylar)
 +
** Measure bit error rate (Skylar)
 +
** Use a structure/array for tp_ routines (Alex and Toby)
 +
* Figure out linking (AlexL)
 +
=== Folding@Clusters ===
 +
* Build a 3.1.4 CVS export with instructions and park it in ~pande for Guha (JoshM)
 +
* Test a range of molecules, clusters, and sizes with Alex's scripts and PBS/Maui (Skylar)
 +
* Plumbing
 +
* Figure out how to get poster in one go (Skylar) LaTeX?
 +
* on bazaar: ccache & distcc with wiki howto (Skylar,''done'')
 +
* on cairo: distcc (currently installed but not running) [[Building_the_GIMP|with wiki howto]] (Skylar,''done''). Pull image too (Skylar)
 +
* Investigate cairo's network delays. Switches? sshd? Timer reset? IPF? (Skylar, Toby)
 +
* Setup testing flows (UDP, ICMP) between hopper and cairo to test latency. (Skylar)
 +
 +
== Aug 26, 2005 Meeting Minutes ==
 +
* Charlie, Toby, Alex (3 credits), Kevin (2 credits), Josh, Skylar (3 credits)
 +
* SIAM PP06
 +
** September 30, 2005 abstracts due, conference is February 22-24.
 +
** Low Latency Kernel
 +
*** Collect papers, read, discuss next Wednesday.  Wiki entry, papers, Beowulf posts, kernel source, in central place.  Literature search, scholar.google.com, kernel development How-Tos, kernel mailing list, Beowulf mailing list, Toby to ask.
 +
** Little-Fe
 +
*** BCCD with scripts to do mods for diskless booting
 +
*** Write-up with pedagogical stuff and curriculum modules (list-packages)
 +
* Other
 +
** Clean and organize Wiki
 +
** Clean and organize Recompute/CCG lab
  
 
== Plumbing ==
 
== Plumbing ==
* Get PBS working on bazaar, cairo, athena, and ACL (Skylar)
+
* Whitebord(s) for 4th (Charlie)
* Setup WeatherDuck on hopper (Skylar)
+
* Change Vijay's password (JoshM)
Problem with some serial ports
 
* Setup Amanda (Skylar)  
 
* Copy /cluster/old-hopper to tape, give it to charlie (Skylar)
 
 
* Fiber uplink for bazaar (Charlie)
 
* Fiber uplink for bazaar (Charlie)
* Athena in display cabinet  
+
* Return GBIC module (Alex)
* Setup Povray on Athena
+
* Cool Athena in the display cabinet  
* Protect F@C source and molecular systems, open http, ftp?, ssh? at cluster.earlham.edu
+
* Construct shelving for Athena
 +
* Figure-out why CVS commit emails don't always appear
 +
* Setup POVray on Athena (Skylar)
 +
* Updates to http://cluster.earlham.edu
 +
** add link to cluster wiki page
 +
** loose rss feed, leave a single link to mt, last mt entry
 +
** news - siam posters, bccd.cs.uni.edu, others?
 +
** General and Resources link sets horizontal instead of vertical.
 +
** Add link to Resources called documents (add static link to last MT entry and link to wiki)
 +
** Update weatherduck link
 +
** Add link to wiki doc under tools
 +
** Update preset query link
 +
** Overview and Press prose update (Charlie)
 
* Update speedup and speedup/efficiency within DVT for endnodes (Alex)
 
* Update speedup and speedup/efficiency within DVT for endnodes (Alex)
 +
* Talk about backup strategies/repartitioning
  
 
== LittleFe ==
 
== LittleFe ==
 +
* [[LittleFe:Todo|LittleFe's Todo]]
  
 
== Folding@Clusters ==
 
== Folding@Clusters ==
* Work with Betsy Ward to get the plumbing for F@C setup on the D224 OSX machines. Local user, document the setup with a Wiki entry.  (Alex)
+
* Checkpointing hook in GROMACS - change mdrun data structures (checkpoint frequency variable) instead of using SIGUSR1, and which files we are usingFigure-out how we can start, checkpoint, and recover using TPR and TOP files as input (JoshM)
* Dr Dobbs submission (Charlie)
+
* Develop test canon (Alex and Charlie)
* Fix a2.4 release (JoshM)
+
* Document pval_report.pl and compare_walltime.pl (done) in Wiki (headings for each are already under HowTos) (Alex)
* Develop test canon (Alex)
+
* Supervise test runs, non-nfs, a2.7, all molecules, 1-4 nodes, bazaar and cairo, separate table (Alex)
* Develop tool to compare run sets using Pvalue and Ptest (Alex)
+
* rerun the following configurations and compare nfs/nonnfs (Alex)
* Supervise test runs (Alex)
+
** bazaar proteasome
 +
** bazaar villin-urea
 +
** cairo methanol 1-8 nodes
 +
** cairo mixed
 +
** cairo proteasome
 +
** cairo water 1-8 nodes
 +
** bazaar water
 +
* Check copyright headers, see Adam's message of March 11, 2005
 +
* Test with PBS/Maui (Skylar)
 +
* Re-write run-fatc to take PBS (no LAM) into account. (Skylar)
 +
* Get PBS to allow unlimited walltime. (Skylar)
  
 +
== Curriculum Modules ==
 +
* Producing a cluster/distro specific set of modules out of one base unit
 +
* Generating a wiki entry and repository entry from one base unit
 +
* Population ecology module, start by finding what packages are available and making a list. (Skylar)
  
== Curriculum Modules ==
+
== Recompute ==
  
 +
* Setup room in permament configuration.
 +
* Accept next shipment from ECS.
  
== Recompute ==
+
== Green Science ==
 +
* Track down current and archival weather data (wind, temperature, others) for this area, Muncie airport, RP&L, other sources? (Mary)

Latest revision as of 13:02, 20 August 2008

(Need a notation for relative priority. Please don't delete anything unless we're updating this during a meeting.)


Quick Test

Current Items (updated November 9, 2005)

Little-Fe

  • Liberate BCCD onto Little-Fe, making progress, see BCCD/PPC wiki for details (Kevin and Toby)
    • Use singularity
  • Send email to Paul about BCCD changes (Toby)
  • Setup archive for list-packages
  • Get test clusters liberated
  • Get qemu or UML working for test environment (Skylar/Kevin)
  • Talk about Bugzilla setup for liberation

LLK (see Cluster:LowLatency for the details)

  • Note: migration of this list to bugzilla is in progress; here's the current llk bug list
  • Check Alteon drivers for STP, do they support cairo?
  • Find statistics information (Skylar,Alex)
  • Read the STP paper, emulate his test methodology/program? (Everyone)
  • Is there a 2.6 version of STP? (Only from us.) SGI? (Not likely.) (Skylar)
  • Look at separate socket implementations (Skylar)
  • Look at Netpipe calls for STP help (Skylar)
  • Measure latency in kernel and on wire using either kperf or tp_timer - (Alex and Toby)
  • Investigate tp_timer instabilities (Toby, Alex)
    • Test accuracy by loading one of the nodes with CPU and disk traffic (lots)
    • Setup and document this, along with kernel building/loading/starting, so that any of us can make a change and a measurement.
    • Measure packet loss rate at each node and the switches and hopper using SNMP/Cricket (Skylar)
    • Measure bit error rate (Skylar)
    • Use a structure/array for tp_ routines (Alex and Toby)
  • Figure out linking (AlexL)

Folding@Clusters

  • Build a 3.1.4 CVS export with instructions and park it in ~pande for Guha (JoshM)
  • Test a range of molecules, clusters, and sizes with Alex's scripts and PBS/Maui (Skylar)
  • Plumbing
  • Figure out how to get poster in one go (Skylar) LaTeX?
  • on bazaar: ccache & distcc with wiki howto (Skylar,done)
  • on cairo: distcc (currently installed but not running) with wiki howto (Skylar,done). Pull image too (Skylar)
  • Investigate cairo's network delays. Switches? sshd? Timer reset? IPF? (Skylar, Toby)
  • Setup testing flows (UDP, ICMP) between hopper and cairo to test latency. (Skylar)

Aug 26, 2005 Meeting Minutes

  • Charlie, Toby, Alex (3 credits), Kevin (2 credits), Josh, Skylar (3 credits)
  • SIAM PP06
    • September 30, 2005 abstracts due, conference is February 22-24.
    • Low Latency Kernel
      • Collect papers, read, discuss next Wednesday. Wiki entry, papers, Beowulf posts, kernel source, in central place. Literature search, scholar.google.com, kernel development How-Tos, kernel mailing list, Beowulf mailing list, Toby to ask.
    • Little-Fe
      • BCCD with scripts to do mods for diskless booting
      • Write-up with pedagogical stuff and curriculum modules (list-packages)
  • Other
    • Clean and organize Wiki
    • Clean and organize Recompute/CCG lab

Plumbing

  • Whitebord(s) for 4th (Charlie)
  • Change Vijay's password (JoshM)
  • Fiber uplink for bazaar (Charlie)
  • Return GBIC module (Alex)
  • Cool Athena in the display cabinet
  • Construct shelving for Athena
  • Figure-out why CVS commit emails don't always appear
  • Setup POVray on Athena (Skylar)
  • Updates to http://cluster.earlham.edu
    • add link to cluster wiki page
    • loose rss feed, leave a single link to mt, last mt entry
    • news - siam posters, bccd.cs.uni.edu, others?
    • General and Resources link sets horizontal instead of vertical.
    • Add link to Resources called documents (add static link to last MT entry and link to wiki)
    • Update weatherduck link
    • Add link to wiki doc under tools
    • Update preset query link
    • Overview and Press prose update (Charlie)
  • Update speedup and speedup/efficiency within DVT for endnodes (Alex)
  • Talk about backup strategies/repartitioning

LittleFe

Folding@Clusters

  • Checkpointing hook in GROMACS - change mdrun data structures (checkpoint frequency variable) instead of using SIGUSR1, and which files we are using. Figure-out how we can start, checkpoint, and recover using TPR and TOP files as input (JoshM)
  • Develop test canon (Alex and Charlie)
  • Document pval_report.pl and compare_walltime.pl (done) in Wiki (headings for each are already under HowTos) (Alex)
  • Supervise test runs, non-nfs, a2.7, all molecules, 1-4 nodes, bazaar and cairo, separate table (Alex)
  • rerun the following configurations and compare nfs/nonnfs (Alex)
    • bazaar proteasome
    • bazaar villin-urea
    • cairo methanol 1-8 nodes
    • cairo mixed
    • cairo proteasome
    • cairo water 1-8 nodes
    • bazaar water
  • Check copyright headers, see Adam's message of March 11, 2005
  • Test with PBS/Maui (Skylar)
  • Re-write run-fatc to take PBS (no LAM) into account. (Skylar)
  • Get PBS to allow unlimited walltime. (Skylar)

Curriculum Modules

  • Producing a cluster/distro specific set of modules out of one base unit
  • Generating a wiki entry and repository entry from one base unit
  • Population ecology module, start by finding what packages are available and making a list. (Skylar)

Recompute

  • Setup room in permament configuration.
  • Accept next shipment from ECS.

Green Science

  • Track down current and archival weather data (wind, temperature, others) for this area, Muncie airport, RP&L, other sources? (Mary)