CiSE-EOT-Article

From Earlham CS Department
Jump to: navigation, search

Overview

CiSE Special Issue on SC/HPC Education

  • Computing in Science and Engineering (CiSE) is publishing a special issue in 2008 to highlight the role of education in developing a skilled and knowledgeable workforce capable of harnessing the power of tomorrow's high performance computing (HPC) infrastructure to solve global scale, multi-disciplinary problems critical to society and to the world.
  • The 2008 CiSE special issue on HPC will incorporate successful and innovative strategies from high school through graduate school, from all fields including traditionally under-represented fields of study, and from all institutions. Addressing society's most pressing and complex problems will only be realized with the next generation of scientists, technologists, engineers and mathematicians being well educated and experienced in adapting and using the ever-advancing HPC environments. We are interested in HPC educational problems and solutions facing all people, in particular minorities, women, and people with disabilities.

Background Information

Old Notes

The Paper

N.B - this has been moved to LaTeX. charliep, January 18, 2008

Title

Essential attributes of delivering successful HPC EOT

Authors

Dave Joiner, Charlie Peck, Tom Murphy, Paul Gray

Introduction

The current models for delivering education, outreach, and training (EOT) in the high performance computing (HPC) realm strive to serve their community well, but they often lack key attributes necessary to support the next generation of HPC users. The next generation of HPC users is going to be continue the trend towards more diversity: diversity of applications, disciplines, and participants. The HPC applications continue to cross and integrate knowledge boundaries. Computational science is no longer even a correct designation asa HPC disciplines continue to spread through the humanities, arts and social sciences. In a few years, when an 80 core processor is the heart of everyones laptop, everyone will at least be familiar with running parallel applications, which sets the stage for everyone to contribute to the generation of those applications.

In the August 26, 2005 edition of HPCwire, the authors of this paper described HPC education as broken. Is it still? The definitive answer is "It depends". There have been tremendous advancements with the various science portals, while new computational science programs continue to burst forth at a snails pace. The SC educational outreach via its pathways program is doings an outstanding job, and with 100% private sector funding, at reaching new groups, but is still just introducing a trickle of new thinking and new capabilities into the broad river of the educational status quo.

The challenges facing the country in science education were clearly laid out in the June 2005 report to the president "Computational Science: Assuring America's Competitiveness" (http://www.nitrd.gov/pitac/reports/20050609_computational/computational.pdf). 2 years later, these challenges have been largely reaffirmed by the National Academies report "Is America Falling Off the Flat Earth?" (http://www.nap.edu/catalog.php?record_id=12021), which echoes the same concerns compounded by current trends in globalization.

We predicted wonderful things coming down the pipe, what has come down the pipe and how has it evolved from our original ideas? We are part of the new thinking and we are part of the status quo, and we are in the thick of things. We have done a good job coming up with a low cost cluster platform, with a no cost operating environment, with some integrated curricular models. The no cost operating at times also has no cost of system administration, but we are still actively involved with making it significantly easier for us to maintain that condition of no cost system administration as we add new applications and as its underlying operating system continues to evolve. We have been wildly unsuccessful at having the breadth of curricular modules we'd like spanning disciplines and the K-grad academic range. Though there is a broad base of "good stuff" that is heavily being used. I suppose where we are all at is clutching the tail of a bucking bronco. We are making good progress in moving towards the more stable front of the horse, but we still find ourselves at the wrong end.

In this paper, we focus on the mechanisms and attributes of the delivery of good HPC EOT, not on the content.

In order to broaden participation in HPC and computational science generally, more of our EOT efforts need to be directed towards faculty and students at smaller institutions, faculty who are primarily teachers, working with the next generation of scientists. These users require materials designed for learning how to use HPC and computational methods at a variety of levels, not just as the tools of a research scientist.

The challenges of HPC Education

Computational Science and Engineering is by its very nature multi- inter- and cross-disciplinary, with practitioners in the fields of mathematics, computer science, and domain specific sciences having to work together (or at least use the product of each others work) in a tight knit collaboration.

For years the traditional training of computational scientists has been left largely to graduate advisers and a few specialized courses, if that. However, there is a growing trend towards teaching computation at the undergraduate level. For over a decade, the Krell institute has strived to survey and list every computational science program; their current list shows 16 undergraduate degree programs specifically in computational science, and tens of schools with significant coursework available for undergraduate students. However, new programs are being developed and offered every year, and the current list often does not include these new programs or programs outside of traditional departments, such as multi-disciplinary centers. In New Jersey alone, there are 2 undergraduate degree programs, with a 3rd in the pipeline, that do not show up on the current list.

Possibly the biggest challenge to getting computation into undergraduate curriculum is the politics. If you are planning on developing a computational physics degree, for example, with one third of the coursework in CS, one third in Physics, and one third in mathematics, which department gets credit for the degree? Where is the incentive for the other two departments to modify their curriculum for what will likely be a low enrollment new program in another department? No one gets tenure, after all, for making another department's classes better. On top of this many departments don't have direct or current experience of the use of computation in their disciplines.

Creating a separate department has its own hurdles, jumping from the frying pan of few students courses to even fewer, as the bulk of the students coursework is still going to come from courses in existing established departments in CS, math, and <name your science here>.

In addition to traditional graduate and undergraduate students, we are concerned with the ongoing process of retraining existing professionals, for whom the advance of computing was largely unforeseen. As computing continues to encroach into new fields and into old processes, the existing workforce faces the challenge of catching up. This training largely falls on professional organizations.

Finally, there are a large body of non-scientists who have a fundamental lack of computational literacy. This is not exactly new, we've been dealing with a mathematically illiterate public for some time, however, computational literacy in some ways has lower barriers than mathematical literacy. The "eye-candy" factor cannot be ignored, both of technology with blinking lights and computational products that are animated and less abstract than mathematical products. A 3D animation of the formation and motion of a cloud system, for example, can convey information the novice can immediately recognize as valuable, with little recognition going to an isocontour plot of pressures across a flattened map, or perish the thought, a table of raw data reflecting the same situation.


For the purposes of EOT the current working definition of "underserved group" is too narrow. For the next generation of EOT it not only needs to cover the traditional areas of gender and race but also geography, specifically including rural areas which generally have not been well served by either technology build-out or educational efforts. These communities are full of smaller colleges and universities which could make very good use of both EOT offerings and computational resources.

If we are to broaden access to computational methods for a wide range of traditionally underserved groups (TUGs), the next generation of HPC EOT will need to do a better job of supporting first generation HPC consumers. This audience has much more modest needs for computational power but requires more human capital in the form of support for workshops, virtual rounds, curriculum materials, and software interfaces optimized for HPC pedagogy.

The disciplinary focus of HPC EOT activities has largely been in the natural sciences. Going forward we should be looking to engage a much wider audience, particularly building on efforts in the humanities, arts, and social science (HASS) communities. This will require changes at both the high level, making the language and processes of Grid computing more accessible to this wider audience, and at lower levels where software interfaces, curriculum materials, and support structure will need to be provided for those disciplines.

Elements

Built on a nationally recognized curriculum, e.g. CSERD

As is often the case in the era of web, there is not so much a problem with the lack of lessons and activities as there is an inability to find lessons and activities. They often are out there if you know where to look. However, the quest is time-consuming and the quality varies. Teachers need a place where they can go to find the good stuff.

This is a big challenge, however, as there is little agreement on what the good stuff is. NOt only is there no agreement on what the good stuff is, there is no agreement on criteria by which to judge the good stuff, and a debate within the library community as to whether judging quality is even an achievable goal.

There has been some progress on this front, however. A number of efforts have been underway to define standards of quality for a nationally recognized curriculum.

(cite educational technology standards, cite Ralph Regular standards. Describe)

Additionally, efforts are underway in the digital library community to apply a recognized set of standards for computational modeling (Verification and Validation) to computational lessons (Verification, Validation, and Accreditation). The Computational Science Education Reference Desk (http://cserd.nsdl.org) is collecting computational science activities, organizing and meta-tagging them, and sharing them through the National Science Digital Library. In addition, CSERD is applying the principles of Verification, Validation, and Accreditation to provide a comprehensive list of reviewed computational science activities for classroom use.

Ubiquitous access to supercomputing resources

Access to activities, however, is meaningless without equal access to computational resources. Giveng students access to high performance resources not only allows them to practice HPC skills, but also gives them an opportunity to see the computing power that will be on the desktop when they are young professionals. This trend has held true recently not only in raw computing power but in architectural design, as the move towards commodity clusters in the HPC market has been mirrored with the move towards multi-core architectures in the desktop market.

HPC hardware in most institutions, however, is limited to access to a small group of faculty and graduate students performing research, if it exists on campus at all. What campuses do have in spades, however, is internet connected PCs. Students can make use of those internet connected PCs in three ways. First, the most destructive use, they could reformat the hard drives on the PCs and install open source linux operating systems and parallel computing tools. A bit extreme, to be sure, this has often been done with machines being down-sized from the campus computing pool. However, a deviation from the ubiquitous leftover cluster is the portable cluster, possibly made out of parts stripped from old machines, possibly built from newly ordered parts. The design we have been using is called "LittleFe," a play on words based on the common nickname "Big Iron" given to supercomputers. Essentially, you remove the most volume-consuming and weight consuming component of the commodity cluster, the cases. Using plywood mounted micro sized motherboards on aluminum frames, we have created 4-8 nodes portable clusters for the cost of a single laptop that are travel ruggedized and can be checked as airline luggage. Second, a less destructive option, students could use live-CD operating systems or network-booted systems to temporarily boot lab PCs as diskless nodes. The Bootable Cluster CD is a live-CD clustering solution designed for campus PC labs. A variation of the live-CD solution is the virtualized cluster, where students run a virtual computer node on lab PCs. The virtualization solution has benefits and drawbacks. It can be always-on and doesn't interfere with normal lab use, but often masks the hardwre underneath and does not allow students to make full use of the machines. Finally, least destructive of all, students could use the internet connected PCs to log on to resources elsewhere in the world and do their work there, through the use of grid middleware.

Including undergraduate and graduate students as assistant instructors

  1. Cost-effectively supports a lower instructor-participant ratio
  2. Broadens our reach in terms of age, gender, racial, and discipline diversity

One of our biggest successes has been involving undergraduate and graduate students as assistant instructors and as collaborators in our HPC projects. It ended up starting up as a natural outgrowth of our own collaboration of academic focus (PhD granting institution, masters granting institution, liberal arts college, community college), as well as distance (California, Iowa, Illinois, New Jersey.) Our students have actually become (unofficially) our students. With the four of us collaborating across distance. For instance, Jessie a grad student at University of Northern Iowa is the lead investigator of the metaverse project on which Charlie from Earlham and Tom from Contras Costa are participating. Students from all our campus have been working with us and with each other on our LittleFe portable education project. We are even starting to incorporate students from other campuses we encounter though our workshops. The most gratifying part is our students are continuing to collaborate years after they have graduated.

There are the very practical aspects of this arrangement. We don't have to have as many faculty leaders at the workshop, which is cost effective. The students have brought improvements to presentation material and have instituted new sessions as well. The students have also broadened our age, gender, racial, and discipline diversity

This is not a guaranteed way to create a vast quantity of HPC ready graduates, but it provides a wonderful unique learning environment, where students can wield their academic knowledge as they gain real world experience.

Authenticity of experience

For the student, authenticity in the problems they face and the experience of their education is important. We use problem based learning as one approach to increase the authenticity of the student experience, through the creation of "supercomputer based labs" in which students make use of professional computing tools and libraries to solve problems related to course content and learning goals.

The use of problems as a teaching tool in HPC is particularly interesting given the cross-disciplinary nature of the field. Students solving the same problem need to be able to master at least some aspect of another field, but different students will still take away and give to the problem from their own perspective.

We have a series of lessons based around the study of protein structure using GROMACS. Depending on the students perspective, many different questions could arise from the same problem. What is the best way to optimize compiler directives for a given hardware? What hardware configuration results in the greatest performance for a given cluster? How does that relate to computing power per dollar or per megawatt?

Broad range of materials aimed at a variety of starting points

There is often an "if you build it, they will come" approach to developing educational materials and training opportunities, and this approach is largely to blame for the failure of many otherwise valuable training opportunities. No matter how nice the graphical user interface, sometimes you have to take the interface to the user instead of waiting for the user to come to the interface. This is especially true in the case of broadening participation, as participants may have any variety of reasons for avoiding mainstream training i they know it exists at all.

A current effort is using these tools will be a workshop with middle and high school teachers from the Office of Diné Science, Math and Technology, who are responsible for education within the Navajo Nation. The workshop is focusing on developing computational awareness using the Shodor Foundation's Interactivate tools, which has been an ongoing fertile entry point into computational science for teachers and students for over a decade. The workshop is also train teachers to use existing Vensim and NetLogo system dynamics models with their students. Using the tools to run existing models is a wonderful way to ease students towards being able to develop their own models. The system dynamics tools are complex and sophisticated, but students are quite capable of following narrow cookbook usage instructions o which they can expand depending on their interest and ability.

An into to computation science course was taught at Contra Costa College in Fall 2007 to a class of predominantly high school sophomores. Middle College High School is collocated at the community college where its students are able to take college courses. A handful of he students were ultimately able to be competent developing Vensim models, after getting over the mental barrier they were doing integral calculus without benefit of calculus or even a pre-calculus foundation. System dynamics codes rely on understanding difference equations. If we had the ability to turn mathematics education inside out, we would precede a course in calculus with one on system dynamics modeling. Integral calculus will then be more easily understood as taking the size of the time-step to zero, rather than just taking it as close to zero as necessary for the model to produce accurate results relative to the acceptable amount of error. The biggest difficulty with the class was it hinged on a student being able to reason their way to a solution, rather than regurgitating desired data. This was something of a epiphany for the instructor which will facilitate the next teaching of the course. It is also a strong argument for formally including system dynamics modeling at the high school level to foster higher reasoning skills.

Long-term engagement through repeated contacts at workshops and other events

Another important factor in authenticity is in relation to professional activity, and a key factor in professional activity is engagement in professional organizations. Through workshops hosted by the National Computational Science Institute, the SCXY conference, and other organizations, we have worked with over XXXXX faculty in fields including XXXX, XXXXX, XXXX, and XXXX.


Through the NCSI/SC workshops, we have seen a number of repeat participants, with our most successful participants coming to multiple workshops and progressing in their level of participation throughout. Typically our participants will show up the first time with a desire to see a lot of things and get a lot of hands on instruction. We know, however, we have made real impact when our participants come back to the workshop and tell us that they would really just like a machine in the corner, our help when they need it, and one week not in their office in which to get work done.

Year-round support via Virtual Rounds

Henry Neeman has done an extraordinary job applying the notion of a doctor's rounds to helping support faculty and scientists learning to apply parallel methods to the discipline problems they want to solve computationally. Instead of dragging interns from bed to bed to learn the practice of being a doctor, Henry travels from office to office to help folk where they work.

We are working to extend this to the notion of virtual rounds. Once we have established a relationship with a person of group, it is possible to productively interact from a distance. A hard part is assembling the right tools. There is a certain utility to a conference call, but is hard to see things. The people we are serving don't have access grid nodes, so we have to find other ways to see. Through SC funding, we are exploring using a metaverse as a content rich environment to explore collaboration. Second Life is where the action is, unfortunately it is not particularly educational. The Open Croquet Consortium is promising for it more easy supports content, but there is a ways to go before we can explore the visualization of a computational model in world.

Related to rounds, is the goal of re-gathering workshop attendees to get status of their successes and their current struggles so we can create mini-sessions to address those needs, much as we do during our regular workshops. Re-gathering in person is expensive; the same metaverse tools allowing rounds will allow us to create a virtual conference room support a similar environment to the original workshop.

Participation of all stakeholder disciplines

We can report on an approach to the challenge of bringing different departments together at Kean with some success, and that is to focus not just on the undergraduate, but the introductory undergraduate. The New Jersey Center for Science, Technology, and Math Education is a multi-disciplinary home for a series of science and math based degree programs which have as their core an integrated approach to math, computation, and science. By combining a number of new targeted degree programs, the challenge of low enrollment that a new degree program traditionally faces is mitigated, allowing us at the introductory level to offer special sections of math, CS, physics, biology, and chemistry--allowing the faculty who have for long said that they wanted <name your course in another department here> taught differently to live and breath in the same department as the faculty member actually teaching that course, with an expectation that people in different disciplines would (*gasp*) talk to each other.

One outcome of this is that we can state to the need in any political process resulting in incorporating computation into undergraduate education the value of working with faculty in multiple departments. It's a lot easier to go into a <physics, chemistry, biology> class planning on teaching a lesson requiring skills in numerical <calculations of cross products, integration of the changing pressure in a compressing cylinder, correlation between presence of a specific genotype with phenotype> if the students have seen computational environments and used them when learning <linear algebra,calculus,statistics>.

Another component of broadening engagement is the consideration of novel delivery mechanisms. To this end we have begun to work with 3D Internet technologies, specifically metaverses, as a tool for EOT. There are three specific models we are considering in this space: the attractor model, the rounds model, and the science based interactive simulation model.

Attractor -

Rounds -

Science Simulation -

Big audience (Second Life) vs open toolset with more appropriate primitives and controls (Qwaq/Croquet).

Support

Why should you believe what we say? You shouldn't. Ultimately we are just reflecting our experience, albeit an experience of a fairly large group of collaborators and workshop leaders.

If what we say resonates, then lets join efforts. The National Computational Science Institute, the Shodor Foundation, and the SC Education program have been our ongoing hangouts.

If what we say is discordant, then lets join efforts. Help us see the light so we can get these efforts on track.

We know a couple of things quite clearly, one is that there is a lot of dedicated quality work needed, and the other is the further we can get from the old style of academic, self-serving posturing, the easier and more pleasant will be the journey of getting the job done.

Conclusion

In the end, like success in many areas, it is going to be impossible to state what one thing was seminal to the success of HPC education, outreach, and training. We could say the seminal event will be getting enough critical mass in terms of people and programs, but that is really a circular definition. Success is coming from all of our chipping away at it. We do know helping each person, course, and program is a step towards forming that critical mass; we know these advancements are often hard fought. Unless there are a few trees we are missing, there is not a lot of low hanging fruit.

Involving faculty across many institutions and academic levels fosters HPC, involving graduates and undergraduates with interesting real life HPC problems fosters their experience and helps us achieve more, struggling to find and understand new applications of HPC helps, instituting rounds where we can support the growth of others in person or remotely is a win, and most of all, growing community has been both the best way to shoulder the load of HPC EOT, as well as to consistently stride towards our goals. The journey is much more pleasant walking with jovial companions.

  1. Unanswered questions
  2. Directives to readers

Possible References

References - check these to make sure what we're saying is either new or emphasized