Skip to content

Real-Time Static Analysis in Eclipse

December 22, 2009

This term, I had the pleasure of working with Elliott Baron on an independent study that focused on implementing static analyses with real-time and real-world constraints. Elliott was an intern at Red Hat during his professional experience year and became familiar with Eclipse and the CDT, so he developed a proof-of-concept static analysis for the Eclipse IDE. He has generated several blog posts about his work and just submitted a proposal for an EclipseCon presentation to share his efforts with the community.

Elliott implemented a variant of property simulation1 to demonstrate that a static analysis could be run within an IDE in near real-time (a few seconds) on C projects as large as Eclipse could handle. Eclipse provided an AST and control-flow graph. However, he needed to add support for control-flow order traversal of the code (to verify temporal properties) and support for merging execution states. The original paper used a theorem prover, but Elliott found that a straightforward boolean minimization and substitution of known values is enough for most real-world code.

Along the way, Elliott ran into several problems that lead to potential future work:

  1. The control flow generated by multiple returns or statements like break/continue/goto is difficult to support. Eclipse could use a standard set of traversals (control-flow and data-flow at a minimum) that handle these cases cleanly.
  2. Pointers are, of course, a problem. The analysis framework could use a fast pointer analysis algorithm that could generate (incrementally) rough points-to sets to be shared by all of the analyses in the framework.
  3. Elliott’s solution of merging execution states could be expanded to handle a wider set of conditions and then extracted to make it available to other analyses.
  4. Whole-program analysis isn’t always possible, but since Eclipse tracks whole projects, the analysis framework may be able to compute and store function summaries to provide support for context-insensitive whole-program analyses.

That’s a lot of work, but I think it would be well worth it. IDEs are in a fantastic position to provide real-time support (or close) to developer. They already provide suggestions and make local corrections in real-time. Now, with a few extra cores available to maintain the necessary structures (AST, CFG, points-to sets) in the background, static verifiers that aim to identify common errors (like MSR’s prefix and prefast) could be run quickly on demand.


1Das, M., Lerner, S., and Seigle, M. 2002. ESP: path-sensitive program verification in polynomial time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (Berlin, Germany, June 17 – 19, 2002). PLDI ‘02. ACM, New York, NY, 57-68.

Advertisements

Should I attend grad school in computer science?

August 29, 2009

Lately, I’ve had a series of conversations with students who are considering about attending graduate school in computer science next year. I thought it might be useful to condense and summarize those conversations from my perspective. I’d be very interested in seeing a student’s perspective on a conversation like this one.

Why (or why not) grad school?

Let’s consider “why not” first.

Don’t go if you think grad school will make you rich. A graduate degree will increase the size of your paycheck, but you have to consider the time that you spent in school not earning a paycheck. Will a two-year master’s degree increase your paycheck as much as working for two years and then using the experience to get yourself a promotion? And will that slight increase ever overcome the two years of income you didn’t earn? As for a Ph.D., forget it: you’ll never catch up.

Don’t go to grad school because you think you’ll hit a ceiling in your career. You may, but it will take a few years. In the meantime, many workplaces will help pay for additional education, and you can use the time to figure out exactly which degree you need and what you should study.

Don’t go to graduate school if you’d just rather not enter the workforce yet. Honestly, grad school is a job — and it’s a low-paying job. Take a job that will get you some experience and will let you build up your savings, and spend some time figuring out what you want to do.

Finally, do consider grad school if you really want a job that requires a graduate degree and you are searching for an intellectual challenge.

Do I really need to know what I want to do?

Yes and no. Mainly yes. It’s common for new grad students to not know exactly what they want to study — or for them to have two or three broad disciplines that interest them. Even if you think you know what you want to study, you’re almost certain to change your mind after you begin.

Nevertheless, you should think about what you want to study and why before you apply to grad school. First, your application will be much stronger if you do. Second, and more importantly, you need to know that you could be passionate about answering some question. Wanting to “learn more about computer science” is a good start, but think hard about whether or not you can satisfy that urge while you work. It’s not like you’ll stop learning if you go into industry! On the other hand, if you want to start creating knowledge — if you start asking questions that don’t appear to have been answered yet — then grad school is for you.

What is grad school like?

You read, think, and argue — a lot. Graduate school is about learning how to think and develop ideas — not about learning material. You’ll certainly learn a lot of material along the way, but it’s important that you learn how to identify an interesting idea, how to develop it into a series of questions that you can answer, and how to present your answers within the context of existing knowledge and motivations.

To develop those skills, you’ll need to read a lot of arguments (papers), hear and watch many presentations in different settings (reading groups, classes, defenses, and conferences or colloquium), and present your own findings to critical, intelligent audiences.

When do I need to decide grad school is for me?

This topic deserves its own post, so I’ll follow up in a couple days. Briefly, you need to keep your options open, but you don’t actually need to decide until the end of your fourth year.

If you are considering graduate school, you should start preparing early — in your first or second year — by making sure you get to know your professors. You need them to remember you, so that you can get three or four good recommendations in your fourth year, and you want to develop a strong relationship with at least one professor so that you can get involved in a research project with him or her by your fourth year.

However, you can defer the final decision until your fourth year. You aren’t actually committed to grad school until you sign on the dotted line. The due date for committing to a grad school is around May 15, though some schools will accept a commitment in the summer.

How much will it cost me?

We’re lucky that we’re in a discipline with a shortage of students interested in graduate studies and significant industry and research agency support. Most Ph.D. students will be paid to study, usually by working as a teaching assistant (TA) or a research assistant (RA). Some will get fellowships, and you should spend time applying to these. Having your own, independent source of funding can really reduce the time required to obtain your degree.

Unfortunately, if you’re a master’s student, grad school can be quite expensive. Some schools will offer TA or RA positions to master’s students, but most will prioritize these positions and offer them to Ph.D. students first. If you’re certain you wish to obtain a master’s degree and stop, consider entering industry and getting assistance paying for it through your workplace or by saving from your salary.

Where should I go?

This question requires some contemplation on your part. The graduate school you attend should reflect what you want to do with the degree afterward. If you want to work in industry in some region, then a degree from the regional school is ideal. On the other hand, if you want to work in a specific field, you should consider schools which are strong in that specific discipline. Finally, if you are considering becoming faculty, school ranking becomes important. Larger, more research-focused universities will only hire from the best universities in the world.

Don’t forget to consider life issues, as well. If you are working on a Ph.D., the grad school you choose will be home for at least four years — and potentially six or more. That’s a long time, so don’t go some place that you — or your significant other — will hate. At the same time, don’t restrict yourself too much. Going somewhere new and expanding your horizons is an important aspect of the experience.

Can I do an undergrad degree and then go to the same place for a grad degree?

Yes, but don’t. In my opinion, the most important aspect of graduate school is being thrown into an unfamiliar environment and being exposed to new ideas. If you continue at the same school, you’ll be comfortable — not in a position to change your habits — and even if you take classes from different professors, you’ll be exposed to variations on the same ideas. The professors at your school hired each other after all; to some degree, they agree with each other.

Is there a difference between Canadian and U.S. graduate schools?

Yes and no. The top Canadian universities don’t rank as highly as the top US universities, so if you’re interested in being faculty and want the option to teach in the US, you should probably attend a US school. Exceptions exist, but don’t count on being one.

However, expectations and procedures at universities vary widely regardless of where that university is located. For example, Stanford and the University of Washington have very different cultures, despite both being west-coast US universities. The only way to get a good feeling about a university’s culture is to visit, so make sure you take advantage of visit day. Lots of my friends from grad school will tell you that they had their hearts set on going to University X but ended up choosing University Y after actually visiting.

Summary, plz?

Going to grad school is a big choice, and it’s not for everyone. Completing a graduate degree opens a lot of doors, but it closes some, too. Furthermore, the time required to earn a degree is significant, so you should spend a lot of time thinking about whether grad school is for you, and if it is, where you should go and what you should study. Don’t go to grad school because it’s the “next thing to do” or because you haven’t considered other options. That won’t lead to success. However, if you know why you want to go and have thought about where, then by all means do! I did, and I don’t regret it.

Power7 and Parallel Programming Education

August 26, 2009

The computer science curriculum at UTM does not include a course on parallel programming, and that makes me grumpy. What brought this on? The Pervasive Datacenter has a good write-up about IBM’s Power7 — an 8-core (32-thread) out-of-order processor.

To me, the Power7 confirms that, “It’s all about the memory.” A few years ago with the Power6, IBM released a machine that grabbed for every Hz of speed and beefed up every core to run your single-threaded applications faster. Granted, the 7 features advances in processing power over the 6, but those advances are largely obtained by masking memory latency, and the interesting advances in the Power7 are found in the memory system.

For example, the Power7 supports out-of-order execution, which masks latency by allowing instructions later in the stream to be executed earlier if their data is available. This, in turn, increases the number of memory requests that may be ready at the same time. If the memory system can support multiple in-flight requests on the same thread (and it does), each thread can complete multiple memory requests in the time of required to service just one.

The Power7 also has significantly more cores, increasing the stress on the memory system, and each core has support for many more simultaneous threads, which allows the hardware to remain in use while waiting for memory requests on other threads it is executing. Multithreading also increases the potential number of memory operations in-flight simultaneously, which means the memory system for the Power7 needed a major overhaul. They delivered by significantly increasing memory bandwidth, tuning the cache system to link subsets of cores on the chip, and, most interestingly, moving one more level of the cache hierarchy on chip (a 32 MB shared L3). That L3 is large, so moving it on-chip signals a real commitment to the memory system.

Back to my first point: industry has been pumping out significant multi-core designs with exceedingly complex memory systems for the past five years, and all factors except one indicate that we’ll see larger and more capable multi-core designs in the future. That one factor is, unfortunately, a supply of programmers who understand how to take advantage of parallelism.

We should remedy that need in the universities, but higher education is behind the curve. A short list of universities offer parallel programming courses in the undergraduate curriculum (Waterloo and MIT do, for example), and even fewer require it for graduation or place it early in the curriculum. My campus shoehorns a short course on parallelism and threading into its systems programming and operating systems courses.

But … we’re working on it! What’s needed is more experience and more sharing in the community. Industry recognizes the need and has been sponsoring initiatives (like Intel’s academic community) to bring educators together and to provide them with materials, and the number of universities offering parallel programming courses is increasing. At UTM, I’m hoping to offer a topics course on the subject next year, with the aim of transferring some of the material from that course into required courses in the curriculum (like CSC207, the third course in our program). If you have experience teaching parallel programming in the undergraduate curriculum, I’d love to hear from you!

Rigor in Academic Publishing

August 16, 2009

I opened this blog with a few paragraphs explaining why I believe it’s important for us to encourage our students to communicate and network in modern modes. Following up on that, I was sent a link to this recent journal article by Whitworth and Friedman. The somewhat bitter tone of the article aside, it’s a worthwhile read if you’re interested in the future of academic collaboration and publishing.

The authors’ most controversial conclusion is that we may need to sacrifice some rigor for progress. They argue that errors of omission (not publishing that which is later shown to be true) are as damning as printing that which is false, so we must accept the risk of some inaccuracies in order to publish articles that are relevant (timely) or which challenge accepted practice (controversial). I have some personal experience on the subject, and while that conclusion is difficult to accept (when I wrote my first draft of this entry, I opposed it), I think it’s correct.

My research group in graduate school went through a long publication drought (more than two years). We were building a computing system built on a non-traditional computing model, so our publications were judged irrelevant and/or difficult to compare to common benchmarks. We spent far longer gathering evidence to support our results than similar projects based on more traditional ideas. More importantly, during that time, we had difficulty finding venues to share our preliminary results, so we were unable to get critical feedback that would have helped us progress, and researchers on related projects were unable to learn from our mistakes and successes. Furthermore, I believe that in a less supporting environment, the project would not have been carried to completion. A multi-year drought can kill an academic career.

Rigor rightfully has a central position in science. We must demand accuracy and correctness. However, I agree that as a community, we must also provide opportunities — and credit — for publishing controversial and preliminary results, and we should have balance rigor with innovation and timeliness. To do otherwise will decrease the rate at which we generate knowledge.

Learning about Learning Styles at Head Start

August 16, 2009

I’m not ready for the end of summer, but I’m definitely ready to get back into the classroom. On Wednesday, I had the pleasure of presenting at one of the Head Start sessions offered at UTM. Presented by the Academic Skills Centre, Head Start is an optional part of orientation — a series of 2-hour sessions to ease the first-year transition by introducing incoming students to a variety of university resources, study skills, and lecture styles.

I was impressed by the inclusion of the meta-commentary. University orientations often stress resources and opportunities — usually via a barrage of 10 minute pitches by various offices on campus — but Head Start takes a longer approach. The series includes 9 sessions spread over three weeks, with some students attending only one or two and others registering for all 9. Each session features 1-3 instructors who present one or two small ideas from their home discipline in whatever style they wish, and the session’s moderator steps in between talks to encourage the students to think not only about the topic being discussed but also how it was delivered (“Why did the instructor use a lecture format?” “What did you think of the ‘icebreaker’?”) and what the instructor’s expectations are (“How do you recommend students prepare for your lectures?” “What online resources do you provide your classes?”).

The students were (by and large) engaged and active in the discussion. A few left between speakers, but the majority stayed through the entire lecture, and I saw better participation in the in-class activities than I typically get from first year classes. Of course, the group is self-selected, but I believe the focus on learning and teaching styles made them more willing to try and engage.

The session reminded me of two things. First, the students will be stronger — and happier — if they know why the course is structured as it is, so I should incorporate Head Start-like commentary into my first year lecture. Nothing will help students succeed in their first year more than being aware of how they learn and what they can do to adapt to learning environments that don’t naturally match their learning styles. Second, I should include more active components to my large lecture classes. On Wednesday, we played 20 questions to illustrate mental modeling, and the students and I had a blast!

Anonymous Aggregation of Student Data

August 9, 2009

Working with strong, independent students is a pleasure. Many thanks to MG for his hard work this term. He’s been working on a web app for us for the past nine weeks.

Performing program evaluation, let alone education research, can be a bureaucratic mess. The registrar’s servers with course grades and program registration are (rightfully) closed to almost everyone, and only the instructor of a course has grade, survey, or attendance information other than the final mark. I’ve spent days gathering the data to answer a question like, “Is first year calculus a good indicator for success in computer science?”, and we haven’t yet found the time to ask, “Is success in the assignments in first year calculus a good indicator?” If you’re interested in turning these questions into a research study and publishing the results, add an ethics review and be prepared to explain how the students’ data will be kept confidential and secure.

To simplify the data collection process an the security of the collected data, we hired MG (a summer student) to build a data aggregation and query system that allows users to publish potentially interesting data in a central, secure server and to perform joins on the data on the server without ever seeing identifying information. The system doesn’t aggregate anonymized data. Instead, it performs queries on data with identifying information and then releases sanitized results. The former would limit us to queries that do not correlate data between individuals across courses or data sets.

Eventually, we hope the system will lead to simpler privacy and ethics board reviews for individual projects. Once the board is familiar with the data that the system produces and weaknesses of the system, we believe that ethics proposals can be simplified to focus on the method of data collection and which questions will be asked, eliminating questions about how the data will be stored or processed without danger to the students. Unfortunately, review of the questions being asked is still required since this system does not provide guaranteed anonymity. It’s possible to perform a series of queries that will uniquely identify a student, although the querier will also need a way to relate the output from the query engine to personally identifying information.

I’m excited that this tool will be available to UTM instructors for the fall. It’s a step in the right direction, with the eventual goal being a content management system for academia that supports true anonymity and analysis of anonymous data. (For more on the future of content management systems, take a look at some of David Jones’s work.) Such a system should also support the use of anonymity within the classroom to encourage feedback and participation for novice learners; more on that when we release an anonymous feedback module this fall.

Science 2.0 and Collaborative Communication

August 6, 2009

Last week, I attended Science 2.0 — an afternoon of presentations associated with the Software Carpentry course. The premise of the event is that “doing science” is — and should be — changed by the communication modes available. With the availability of free, instantaneous communication across communities, why shouldn’t science become more open and collaborative?

Despite some evidence of success — the popularity of blogs run by leading researchers in various communities, the success of collaborative math proofs, and the development of open lab notebooks — I expect that vision will be a long time coming. Technology cannot solve social problems; the people who use the technology have to do that. Here, the main problem is determining attribution of ideas. Copyright and intellectual property is a hot topic with no clear answer in the best case, let alone when ownership of an idea (if that’s possible) is shared or in dispute.

On the academic side, attribution is important for career advancement. Until we find a measure to contribution to group efforts, tenure and promotion committees will continue to rely heavily on individual publication count. One obvious answer is to rely on reputation in the community by soliciting references from members of the field, but this feels like passing the buck, since referees also need a way to measure contribution.

We can’t wait for the social process to evolve. We have to start by contributing ideas and being part of the discussion. We need to encourage our students — and ourselves — to find connections within this framework just as we  encourage attendance at job fairs and networking events. Since I can’t ask my students to do what I can’t or won’t, I’ll devote a few minutes each day this year to writing. And, just in case you’re a student of mine and I have just bullied you into starting an online portfolio for your project, here’s a link to a great post about starting to blog.