Skip to content

Power7 and Parallel Programming Education

August 26, 2009

The computer science curriculum at UTM does not include a course on parallel programming, and that makes me grumpy. What brought this on? The Pervasive Datacenter has a good write-up about IBM’s Power7 — an 8-core (32-thread) out-of-order processor.

To me, the Power7 confirms that, “It’s all about the memory.” A few years ago with the Power6, IBM released a machine that grabbed for every Hz of speed and beefed up every core to run your single-threaded applications faster. Granted, the 7 features advances in processing power over the 6, but those advances are largely obtained by masking memory latency, and the interesting advances in the Power7 are found in the memory system.

For example, the Power7 supports out-of-order execution, which masks latency by allowing instructions later in the stream to be executed earlier if their data is available. This, in turn, increases the number of memory requests that may be ready at the same time. If the memory system can support multiple in-flight requests on the same thread (and it does), each thread can complete multiple memory requests in the time of required to service just one.

The Power7 also has significantly more cores, increasing the stress on the memory system, and each core has support for many more simultaneous threads, which allows the hardware to remain in use while waiting for memory requests on other threads it is executing. Multithreading also increases the potential number of memory operations in-flight simultaneously, which means the memory system for the Power7 needed a major overhaul. They delivered by significantly increasing memory bandwidth, tuning the cache system to link subsets of cores on the chip, and, most interestingly, moving one more level of the cache hierarchy on chip (a 32 MB shared L3). That L3 is large, so moving it on-chip signals a real commitment to the memory system.

Back to my first point: industry has been pumping out significant multi-core designs with exceedingly complex memory systems for the past five years, and all factors except one indicate that we’ll see larger and more capable multi-core designs in the future. That one factor is, unfortunately, a supply of programmers who understand how to take advantage of parallelism.

We should remedy that need in the universities, but higher education is behind the curve. A short list of universities offer parallel programming courses in the undergraduate curriculum (Waterloo and MIT do, for example), and even fewer require it for graduation or place it early in the curriculum. My campus shoehorns a short course on parallelism and threading into its systems programming and operating systems courses.

But … we’re working on it! What’s needed is more experience and more sharing in the community. Industry recognizes the need and has been sponsoring initiatives (like Intel’s academic community) to bring educators together and to provide them with materials, and the number of universities offering parallel programming courses is increasing. At UTM, I’m hoping to offer a topics course on the subject next year, with the aim of transferring some of the material from that course into required courses in the curriculum (like CSC207, the third course in our program). If you have experience teaching parallel programming in the undergraduate curriculum, I’d love to hear from you!

2 Comments leave one →
  1. July 11, 2013 3:36 am

    You don’t ask your friend to hang out that way do you. If you give her nothing else to go on then she will have no choice but to feel this way. It also portrays you as an adventurous and outgoing person willing to try exciting things.

  2. September 22, 2020 3:51 pm

    Thank you Bogdan and anyone involved for bringing over CSC367: Parallel Programming to UTM. It is one of the best courses I have ever taken at UTM and I don’t know anyone in my clas who regretted taking the course.

Leave a comment