Joe Heffer

Joe Heffer
Joe Heffer

Senior Research Data Engineer

University of Sheffield

I studied physics at the University of Manchester and did my postgraduate research at the Physics Department at the University of Liverpool. I have work experience in business analytics and software development in the financial services sector. I have worked in the Research & Innovation IT team at the University of Sheffield since 2020.

Data Standardisation Course, HDR UK

Research Project
CUREd+ Research Database

This research database involves using NHS patient records and linking them with emergency services data to facilitate further research projects to help understand how patients access healthcare services and what happens to them afterward. This unprecedented access to large amounts of data on emergency services, hospital admissions, and mental health services enables new insights to be gained into the flow of patients through the healthcare system.

Find out more on the Data Connect site at the University of Sheffield

How has the project supported good research and DMP practice?
The sensitive nature of the health data necessitated the adherence to specific governance processes and the use of secure systems. A clear process and plan to authorise access to the data. We used the Secure Data Service at the University of Sheffield, which provides training and infrastructure as part of its Secure Data Environment (SDE) service. The project involves handling over a dozen data sets, each of which comprises many tables and hundreds of columns. This required a clear structure to organise the data and describe it with metadata. The terms of the data sharing agreement meant that some datasets could not be merged with others. To prevent this, we stored those files in a separate storage area that was locked with a password to reduce the probability of accidental access. We created reproducible data processing workflows by using Linux shell scripts to build a series of data processing steps, from raw data through to the clean, research-ready outputs that were securely stored. These data transformations were implemented using the R statistical package and the Structured Query Language (SQL) which were managed using a collaborative version control system (Git).

How do you think this project will feed into future work?
This project has increased my understanding of the technical and organisational challenges of large-scale research data management projects, especially those involving sensitive data that necessitates using Secure Data Environments (SDEs). I have a greater appreciation of the value of well-written and maintained documentation, which ensures that all collaborators can navigate and understand a large database and browse its metadata.

What's next?
The team members involved in the CUREd+ project have built up a wealth of technical skills and health data knowledge that puts it on a firm foundation to provide high-quality research data to projects that need it.

Return to article index