Difference between revisions of "Cluster:Todo"

From Earlham CS Department
Jump to navigation Jump to search
Line 36: Line 36:
  
 
== Folding@Clusters ==
 
== Folding@Clusters ==
* Work with Betsy Ward to get the plumbing for F@C setup on the D224 OSX machines. Local user, document the setup with a Wiki entry.  Modify run-fatc.pl, command line option to produce raw SQL.  (Alex)
+
* Work with Betsy Ward to get the plumbing for F@C setup on the D224 OSX machines. Local user, document the setup with a Wiki entry.  Modify run-fatc.pl, command line option to produce raw SQL.  (JoshM and Alex)
 
* Console (JoshM)
 
* Console (JoshM)
 
** Environment variable called $FATCHOME
 
** Environment variable called $FATCHOME

Revision as of 12:39, 22 June 2005

(Need a notation for relative priority. Please don't delete anything unless we're updating this during a meeting.)


Plumbing

  • Get PBS working on bazaar, cairo, athena (done), and ACL (done); wiki for usage description, wiki for setup description; node0 exlude lists for startup files or figure-out particular files per machine with SystemImager (Skylar)
  • Setup Amanda, follow-up with Dan (Skylar)
  • Copy /cluster/old-hopper to tape, give it to charlie (Skylar)
  • Fiber uplink for bazaar (Charlie)
  • Network lag, monitoring?
  • Put Athena in the display cabinet
  • Setup POVray on Athena
  • Protect F@C source and molecular systems, open http, ftp?, ssh? at cluster.earlham.edu (JoshM and Skylar)
    • Protect folding-at-clusters/articles/dr-dobbs
    • Can script find-out (environment variable?) where the checkout went so that it can protect those files?
  • Updates to http://cluster.earlham.edu
    • add link to cluster wiki page
    • loose rss feed, leave a single link to mt, last mt entry
    • news - siam posters, bccd.cs.uni.edu, others?
    • General and Resources link sets horizontal instead of vertical.
    • Add link to Resources called documents (add static link to last MT entry and link to wiki)
    • Update weatherduck link
    • Add link to wiki doc under tools
    • Update preset query link
    • Overview and Press prose update (Charlie)
  • Update speedup and speedup/efficiency within DVT for endnodes (Alex)

LittleFe

  • Test CD/DVD (Alex)
  • Before anything else copy node0's drive as it is now and put on Charlie's desk (Toby)
  • Setup second board (Alex)
  • Setup PXE environment, try run-in-RAM first (Alex and Toby)
  • Second disk drive in node1 for redundancy (rsynch from node0) (Alex and Toby)
  • Respect Paul's philosophy about setup and usage models, see the other wiki entry (All)
  • node0 has script that uses netboot and sequences bootup. (Alex)
  • Use a faster drive, 7200RPM, ATA or SATA (Charlie)

Folding@Clusters

  • Work with Betsy Ward to get the plumbing for F@C setup on the D224 OSX machines. Local user, document the setup with a Wiki entry. Modify run-fatc.pl, command line option to produce raw SQL. (JoshM and Alex)
  • Console (JoshM)
    • Environment variable called $FATCHOME
    • Command line
    • Sockets to communicate with mother
    • Variable in mother.conf for console port
    • Mother listens and responds to commands on the console port
    • Command list: status [(running|paused|stopped), molecular system, x out of y steps completed, estimated time remaining, # nodes started, # of nodes current], checkpoint, pause, resume, stop.
    • Command line option -nn interval for compact, refreshed display.
    • First version of console has to be supplied with a hostname and port number.
    • Future versions (possibly when we introduce the grandmother) can take a $FATCHOME environment variable that points to a mother.conf file (to get a port number) as a discovery mechanism.
      • Can happen now: status(x out of y, stopped or running), stop
      • After mods to mdrun and F@C updates: status(paused, molecular system, estimated time remaining, nodes started, nodes current), pause, resume, checkpoint
    • Give the console the ability to trigger the checkpoint and quit mechanism.
  • Checkpointing hook in GROMACS - change mdrun data structures (checkpoint frequency variable) instead of using SIGUSR1, and which files we are using (Charlie needs to find Vijay's notes)
  • Develop test canon (Alex and Charlie)
  • Document pval_report.pl and compare_walltime.pl (done) in Wiki (headings for each are already under HowTos) (Alex)
  • Supervise test runs, non-nfs, a2.7, all molecules, 1-4 nodes, bazaar and cairo, separate table (Alex)
  • rerun the following configurations and compare nfs/nonnfs (Alex)
    • bazaar proteasome
    • bazaar villin-urea
    • cairo methanol 1-8 nodes
    • cairo mixed
    • cairo proteasome
    • cairo water 1-8 nodes
    • bazaar water
  • Check copyright headers, see Adam's message of March 11, 2005

Curriculum Modules

  • Producing a cluster/distro specific set of modules out of one base unit
  • Generating a wiki entry and repository entry from one base unit

Recompute

Whiteboard

  • Aug 26?
  • DBI::Proxy
  • BOINC
  • RT Par Graphics: JoshM & CharlieP
  • DCF: Skylar & Alex
  • LittleFe
  • Computation Sci Curr
  • Low Lat Kernel: Toby & Alex
  • F@C: JoshH