Security and Privacy of University Grades - Part 1: Privacy

I wanted to write this article since half a year or so but I never found the time to do it properly. However, here is a quick and dirty sketch of what I've been able to do within 2 days or so back in mid 2013.

The topic is split up into two blog posts: Part 1: Privacy (this one) and Part 2: Security.

Privacy of University Grades

Almost every professor publishes the exam grades on his or her website together with the unique student id totally open to the public. Furthermore at the beginning of the course they publish the tutorial groups together with the attending students' ids.

The student id provides some anonymity because it can not be directly linked to a human but it provides a unique identifier for a student throughout his or her study.

I have downloaded the exam results as well as the tutorial group listings from the last 4-8 years or so. Of course this data is not complete as some course websites were already offline or professors did not publish the grades. Furthermore the data is not fully reliable as grades may change if students discover flaws in the marking. But for the moment I am going to ignore that.

With this data one can already do some interesting social network analysis as well as generating detailed statistics on each student's performance. By assigning each student a score which describes his or her aptitude one can also evaluate, how difficult specific exams were and also answer questions like "In which course/With which professor I'll most likely get the best grade?" or "In which tutorial group are the brightest students?"

To summarize the above: With the given data, one can do interesting analysis but the students' privacy is not highly threatened. However, it turns out that linking the student ids to human is not that difficult as it should be. When a student submits a homework assignment or signs up for an exam, he or she usually has to provides his or her full name and student id. Thus the tutors, who conduct the tutorial groups and are themselves ordinary students, usually have access to datasets containing the student id, full name, e-mail address, subject of study, and semester. Sometimes professors also print out these data sets and pin them on the wall, e.g. to organize the seating arrangements at the exam. In summary, it is not difficult to deannonymize ~85% of the students or so and get a detailed view of their courses, their grades, their friends, etc.