Skip to content

Anonymous Aggregation of Student Data

August 9, 2009

Working with strong, independent students is a pleasure. Many thanks to MG for his hard work this term. He’s been working on a web app for us for the past nine weeks.

Performing program evaluation, let alone education research, can be a bureaucratic mess. The registrar’s servers with course grades and program registration are (rightfully) closed to almost everyone, and only the instructor of a course has grade, survey, or attendance information other than the final mark. I’ve spent days gathering the data to answer a question like, “Is first year calculus a good indicator for success in computer science?”, and we haven’t yet found the time to ask, “Is success in the assignments in first year calculus a good indicator?” If you’re interested in turning these questions into a research study and publishing the results, add an ethics review and be prepared to explain how the students’ data will be kept confidential and secure.

To simplify the data collection process an the security of the collected data, we hired MG (a summer student) to build a data aggregation and query system that allows users to publish potentially interesting data in a central, secure server and to perform joins on the data on the server without ever seeing identifying information. The system doesn’t aggregate anonymized data. Instead, it performs queries on data with identifying information and then releases sanitized results. The former would limit us to queries that do not correlate data between individuals across courses or data sets.

Eventually, we hope the system will lead to simpler privacy and ethics board reviews for individual projects. Once the board is familiar with the data that the system produces and weaknesses of the system, we believe that ethics proposals can be simplified to focus on the method of data collection and which questions will be asked, eliminating questions about how the data will be stored or processed without danger to the students. Unfortunately, review of the questions being asked is still required since this system does not provide guaranteed anonymity. It’s possible to perform a series of queries that will uniquely identify a student, although the querier will also need a way to relate the output from the query engine to personally identifying information.

I’m excited that this tool will be available to UTM instructors for the fall. It’s a step in the right direction, with the eventual goal being a content management system for academia that supports true anonymity and analysis of anonymous data. (For more on the future of content management systems, take a look at some of David Jones’s work.) Such a system should also support the use of anonymity within the classroom to encourage feedback and participation for novice learners; more on that when we release an anonymous feedback module this fall.

5 Comments leave one →
  1. August 15, 2009 3:34 pm

    Have you looked at Wayner’s “translucent databases”?

  2. August 15, 2009 3:39 pm

    Also: is this system going to be open sourced?

  3. Andrew Petersen permalink*
    August 15, 2009 4:59 pm

    Greg: I should have known you would find me in less than a week! Thanks for the link to Wayner’s work.

    Two thoughts about systems that attempt to secure subsets of records:
    1. Whenever we allow a user to take data away from a secured collection — to get a report which can be saved — we are increasing the risk of that data getting into the wrong hands. Even users we trust to see the private data should use an anonymizing system, since we cannot trust that any data they take away will remain secure.
    2. We have to be able to join and filter records using private data. The key problem is determining when operations on private data generate reports that indirectly reveal it.

    We’re in the process of open sourcing all of our software for this summer — this system, the python memory visualizer, the curriculum developer, etc. The university has a system in place for open sourcing projects developed using research funds — but not one that we could find for projects that use only teaching or administrative resources. To be safe, we’ve requested a review from the IP office.


  1. The Third Bit » Blog Archive » Anonymizing Student Data
  2. Greg Wilson: Anonymizing Student Data | Alex Kessinger

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: