News & Blog

Getting to grips with messy data at the Scottish Funding Council

Posted on September 18, 2018

This summer, three students undertook work at the Scottish Funding Council (SFC) on datasets covering students at Scottish colleges and universities. This work was completed with the ‘Performance Measurement and Analysis’ team which looks at many aspects of university and college performance.

All three students were from different academic backgrounds and had very different summer projects, but were linked by the common theme of getting further use and benefit out of SFC-held data. These projects are great examples of the benefits of placements for students and employers.

Two of these students who were on placements based at SFC and this post focuses on Laura’s placement, the DataLab MSc student we had with us, including details of her project and the resulting successes.

Laura Hepburn: MSc Data Science for Business at The University of Stirling

Laura's Project - Increase the usage and visibility of HEIPR

The Higher Education Initial Participation Rate (HEIPR) is a figure used by the Department for Education in England and the Scottish Government to show levels of participation in higher education in the population. One of the key benefits of the statistic is its comparability across countries. The figure has been produced for over a decade by SFC but hasn’t been a main priority in recent years.

The aim of Laura’s project was to increase the usage and visibility of the HEIPR by understanding, improving and simplifying the methodology. This was a technical system improvement project as usage could only be increased if there was the ability to understand and change the current system.

Dealing with messy data

One reason that the methodology for this statistic is complex is the data matching element. Sixteen years of datasets for the distinct and separately collected college and university records have to be searched to find those who are first time entrants to higher education in the most recent academic year, without the luxury of a unique identifier number for students. These ‘initial entrants’ are then calculated as a rate of an adjusted population figure by age.

After an initial period of learning SAS and getting to grips with the numerous large university, college and Scottish population datasets, Laura set off decoding the existing HEIPR SAS programmes which included overlapping macros and many examples of messy data which had to be reformatted and manipulated to allow merging. It was great working with Laura in these initial stages as she picked things up really quickly and was incredibly motivated to figure everything out – an essential skill for this project!

Laura says that: “having direct access to the data meant that every decision I made was driven and justified by the data.”

“I would recommend taking part in a consultancy project to any student. You gain hands on experience in the type of work place you are likely to join after graduating, there’s an opportunity to showcase the soft skills you have developed throughout the course, as well as the chance to develop new and existing technical skills in a supportive environment.”

Results - a HEIPR methodology that is more efficient and a lot easier to use

Laura’s main finding was that the process could be refined substantially without losing any of the key steps that make the Scottish HEIPR a comparable statistic with other UK nations, which was a great success. We now have a HEIPR methodology that is more efficient and a lot easier to use.

This success was recognised at the “Data Driven Decision Engineering… the next chapter in Data Science” closing event for the DataLab MSc programme where Laura was shortlisted for the DataLab Project Award. Despite not winning, Laura had the opportunity to showcase her skills and impact at SFC to the busy event in a 5 minute presentation.

Next steps in the project

The simplified methodology will be incorporated early next year when the HEIPR for 2017-18 is published in SFC’s HE Students and Qualifiers National Statistics publication.

Brilliantly, it will likely be Laura who is the one working on the HEIPR for next year’s publication as she has recently taken up a permanent post at SFC in the Performance Measurement and Analysis team. We’re absolutely delighted to have her continue on at SFC after a great summer.

View by Categories


Mailing List


To recieve updates from The Data Lab please enter your details below.