Difference between revisions of "Cluster:Todo"
Jump to navigation
Jump to search
(→Current Items (updated November 9, 2005)) |
|||
(37 intermediate revisions by 3 users not shown) | |||
Line 2: | Line 2: | ||
__NOTOC__ | __NOTOC__ | ||
+ | == Quick Test == | ||
+ | == Current Items (updated November 9, 2005)== | ||
+ | === Little-Fe === | ||
+ | * Liberate BCCD onto Little-Fe, making progress, see BCCD/PPC wiki for details (Kevin and Toby) | ||
+ | ** Use singularity | ||
+ | * Send email to Paul about BCCD changes (Toby) | ||
+ | * Setup archive for list-packages | ||
+ | * Get test clusters liberated | ||
+ | * Get qemu or UML working for test environment (Skylar/Kevin) | ||
+ | * Talk about Bugzilla setup for liberation | ||
+ | |||
+ | === LLK (see [[Cluster:LowLatency]] for the details) === | ||
+ | * '''Note:''' migration of this list to bugzilla is in progress; here's the current [http://cluster.earlham.edu/bugzilla/buglist.cgi?query_format=specific&order=relevance+desc&bug_status=__open__&product=llk llk bug list] | ||
+ | * Check Alteon drivers for STP, do they support cairo? | ||
+ | * Find statistics information (Skylar,Alex) | ||
+ | * Read the STP paper, emulate his test methodology/program? (Everyone) | ||
+ | * Is there a 2.6 version of STP? (''Only from us.'') SGI? (''Not likely.'') (Skylar) | ||
+ | * Look at separate socket implementations (Skylar) | ||
+ | * Look at Netpipe calls for STP help (Skylar) | ||
+ | * Measure latency in kernel and on wire using either kperf or tp_timer - (Alex and Toby) | ||
+ | * Investigate tp_timer instabilities (Toby, Alex) | ||
+ | ** Test accuracy by loading one of the nodes with CPU and disk traffic (lots) | ||
+ | ** Setup and document this, along with kernel building/loading/starting, so that any of us can make a change and a measurement. | ||
+ | ** Measure packet loss rate at each node and the switches and hopper using SNMP/Cricket (Skylar) | ||
+ | ** Measure bit error rate (Skylar) | ||
+ | ** Use a structure/array for tp_ routines (Alex and Toby) | ||
+ | * Figure out linking (AlexL) | ||
+ | === Folding@Clusters === | ||
+ | * Build a 3.1.4 CVS export with instructions and park it in ~pande for Guha (JoshM) | ||
+ | * Test a range of molecules, clusters, and sizes with Alex's scripts and PBS/Maui (Skylar) | ||
+ | * Plumbing | ||
+ | * Figure out how to get poster in one go (Skylar) LaTeX? | ||
+ | * on bazaar: ccache & distcc with wiki howto (Skylar,''done'') | ||
+ | * on cairo: distcc (currently installed but not running) [[Building_the_GIMP|with wiki howto]] (Skylar,''done''). Pull image too (Skylar) | ||
+ | * Investigate cairo's network delays. Switches? sshd? Timer reset? IPF? (Skylar, Toby) | ||
+ | * Setup testing flows (UDP, ICMP) between hopper and cairo to test latency. (Skylar) | ||
== Aug 26, 2005 Meeting Minutes == | == Aug 26, 2005 Meeting Minutes == | ||
− | * Charlie, Toby, Alex (3 credits), Kevin ( | + | * Charlie, Toby, Alex (3 credits), Kevin (2 credits), Josh, Skylar (3 credits) |
* SIAM PP06 | * SIAM PP06 | ||
** September 30, 2005 abstracts due, conference is February 22-24. | ** September 30, 2005 abstracts due, conference is February 22-24. | ||
** Low Latency Kernel | ** Low Latency Kernel | ||
− | ** Little-Fe | + | *** Collect papers, read, discuss next Wednesday. Wiki entry, papers, Beowulf posts, kernel source, in central place. Literature search, scholar.google.com, kernel development How-Tos, kernel mailing list, Beowulf mailing list, Toby to ask. |
− | + | ** Little-Fe | |
+ | *** BCCD with scripts to do mods for diskless booting | ||
+ | *** Write-up with pedagogical stuff and curriculum modules (list-packages) | ||
+ | * Other | ||
+ | ** Clean and organize Wiki | ||
+ | ** Clean and organize Recompute/CCG lab | ||
== Plumbing == | == Plumbing == | ||
− | * | + | * Whitebord(s) for 4th (Charlie) |
− | * | + | * Change Vijay's password (JoshM) |
− | |||
* Fiber uplink for bazaar (Charlie) | * Fiber uplink for bazaar (Charlie) | ||
* Return GBIC module (Alex) | * Return GBIC module (Alex) | ||
Line 21: | Line 61: | ||
* Figure-out why CVS commit emails don't always appear | * Figure-out why CVS commit emails don't always appear | ||
* Setup POVray on Athena (Skylar) | * Setup POVray on Athena (Skylar) | ||
− | |||
− | |||
− | |||
* Updates to http://cluster.earlham.edu | * Updates to http://cluster.earlham.edu | ||
** add link to cluster wiki page | ** add link to cluster wiki page | ||
Line 35: | Line 72: | ||
** Overview and Press prose update (Charlie) | ** Overview and Press prose update (Charlie) | ||
* Update speedup and speedup/efficiency within DVT for endnodes (Alex) | * Update speedup and speedup/efficiency within DVT for endnodes (Alex) | ||
+ | * Talk about backup strategies/repartitioning | ||
== LittleFe == | == LittleFe == | ||
Line 40: | Line 78: | ||
== Folding@Clusters == | == Folding@Clusters == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Checkpointing hook in GROMACS - change mdrun data structures (checkpoint frequency variable) instead of using SIGUSR1, and which files we are using. Figure-out how we can start, checkpoint, and recover using TPR and TOP files as input (JoshM) | * Checkpointing hook in GROMACS - change mdrun data structures (checkpoint frequency variable) instead of using SIGUSR1, and which files we are using. Figure-out how we can start, checkpoint, and recover using TPR and TOP files as input (JoshM) | ||
* Develop test canon (Alex and Charlie) | * Develop test canon (Alex and Charlie) | ||
Line 81: | Line 104: | ||
* Setup room in permament configuration. | * Setup room in permament configuration. | ||
* Accept next shipment from ECS. | * Accept next shipment from ECS. | ||
− | |||
− | |||
− | |||
− | |||
== Green Science == | == Green Science == | ||
* Track down current and archival weather data (wind, temperature, others) for this area, Muncie airport, RP&L, other sources? (Mary) | * Track down current and archival weather data (wind, temperature, others) for this area, Muncie airport, RP&L, other sources? (Mary) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 13:02, 20 August 2008
(Need a notation for relative priority. Please don't delete anything unless we're updating this during a meeting.)
Quick Test
Current Items (updated November 9, 2005)
Little-Fe
- Liberate BCCD onto Little-Fe, making progress, see BCCD/PPC wiki for details (Kevin and Toby)
- Use singularity
- Send email to Paul about BCCD changes (Toby)
- Setup archive for list-packages
- Get test clusters liberated
- Get qemu or UML working for test environment (Skylar/Kevin)
- Talk about Bugzilla setup for liberation
LLK (see Cluster:LowLatency for the details)
- Note: migration of this list to bugzilla is in progress; here's the current llk bug list
- Check Alteon drivers for STP, do they support cairo?
- Find statistics information (Skylar,Alex)
- Read the STP paper, emulate his test methodology/program? (Everyone)
- Is there a 2.6 version of STP? (Only from us.) SGI? (Not likely.) (Skylar)
- Look at separate socket implementations (Skylar)
- Look at Netpipe calls for STP help (Skylar)
- Measure latency in kernel and on wire using either kperf or tp_timer - (Alex and Toby)
- Investigate tp_timer instabilities (Toby, Alex)
- Test accuracy by loading one of the nodes with CPU and disk traffic (lots)
- Setup and document this, along with kernel building/loading/starting, so that any of us can make a change and a measurement.
- Measure packet loss rate at each node and the switches and hopper using SNMP/Cricket (Skylar)
- Measure bit error rate (Skylar)
- Use a structure/array for tp_ routines (Alex and Toby)
- Figure out linking (AlexL)
Folding@Clusters
- Build a 3.1.4 CVS export with instructions and park it in ~pande for Guha (JoshM)
- Test a range of molecules, clusters, and sizes with Alex's scripts and PBS/Maui (Skylar)
- Plumbing
- Figure out how to get poster in one go (Skylar) LaTeX?
- on bazaar: ccache & distcc with wiki howto (Skylar,done)
- on cairo: distcc (currently installed but not running) with wiki howto (Skylar,done). Pull image too (Skylar)
- Investigate cairo's network delays. Switches? sshd? Timer reset? IPF? (Skylar, Toby)
- Setup testing flows (UDP, ICMP) between hopper and cairo to test latency. (Skylar)
Aug 26, 2005 Meeting Minutes
- Charlie, Toby, Alex (3 credits), Kevin (2 credits), Josh, Skylar (3 credits)
- SIAM PP06
- September 30, 2005 abstracts due, conference is February 22-24.
- Low Latency Kernel
- Collect papers, read, discuss next Wednesday. Wiki entry, papers, Beowulf posts, kernel source, in central place. Literature search, scholar.google.com, kernel development How-Tos, kernel mailing list, Beowulf mailing list, Toby to ask.
- Little-Fe
- BCCD with scripts to do mods for diskless booting
- Write-up with pedagogical stuff and curriculum modules (list-packages)
- Other
- Clean and organize Wiki
- Clean and organize Recompute/CCG lab
Plumbing
- Whitebord(s) for 4th (Charlie)
- Change Vijay's password (JoshM)
- Fiber uplink for bazaar (Charlie)
- Return GBIC module (Alex)
- Cool Athena in the display cabinet
- Construct shelving for Athena
- Figure-out why CVS commit emails don't always appear
- Setup POVray on Athena (Skylar)
- Updates to http://cluster.earlham.edu
- add link to cluster wiki page
- loose rss feed, leave a single link to mt, last mt entry
- news - siam posters, bccd.cs.uni.edu, others?
- General and Resources link sets horizontal instead of vertical.
- Add link to Resources called documents (add static link to last MT entry and link to wiki)
- Update weatherduck link
- Add link to wiki doc under tools
- Update preset query link
- Overview and Press prose update (Charlie)
- Update speedup and speedup/efficiency within DVT for endnodes (Alex)
- Talk about backup strategies/repartitioning
LittleFe
Folding@Clusters
- Checkpointing hook in GROMACS - change mdrun data structures (checkpoint frequency variable) instead of using SIGUSR1, and which files we are using. Figure-out how we can start, checkpoint, and recover using TPR and TOP files as input (JoshM)
- Develop test canon (Alex and Charlie)
- Document pval_report.pl and compare_walltime.pl (done) in Wiki (headings for each are already under HowTos) (Alex)
- Supervise test runs, non-nfs, a2.7, all molecules, 1-4 nodes, bazaar and cairo, separate table (Alex)
- rerun the following configurations and compare nfs/nonnfs (Alex)
- bazaar proteasome
- bazaar villin-urea
- cairo methanol 1-8 nodes
- cairo mixed
- cairo proteasome
- cairo water 1-8 nodes
- bazaar water
- Check copyright headers, see Adam's message of March 11, 2005
- Test with PBS/Maui (Skylar)
- Re-write run-fatc to take PBS (no LAM) into account. (Skylar)
- Get PBS to allow unlimited walltime. (Skylar)
Curriculum Modules
- Producing a cluster/distro specific set of modules out of one base unit
- Generating a wiki entry and repository entry from one base unit
- Population ecology module, start by finding what packages are available and making a list. (Skylar)
Recompute
- Setup room in permament configuration.
- Accept next shipment from ECS.
Green Science
- Track down current and archival weather data (wind, temperature, others) for this area, Muncie airport, RP&L, other sources? (Mary)