Alex Casson
University of Manchester
Analysing the 100,000+ accelerometer datasets in the UK Biobank using the University of Manchester computational shared facilities
The UK Biobank contains over 100,000 records from participants who wore a wrist wearable device for a period of a week, allowing investigations into their activity patterns, sleep patterns, and more. The raw accelerometery data is more than 20 TB in size, with a large amount of meta-data also available.
This presentation will overview our computational approaches taken to analyse the entire dataset using the University of Manchester computational shared facilities, with optimizations to the storage and processing approach to accelerate the analysis. We will detail these optimizations, and our open source code for processing the data on high performance clusters and on Windows PCs.
We will also highlight some of the caveats and limitations of the dataset users should be aware of when designing their studies using it.
You can see the video of the talk on our YouTube channel at: https://youtu.be/PGUueSz6e9o
Emma Drummond
Lancaster University
Extending the Reach of UK Biobank Data into the Mitochondrial Genome
Emma Drummond has developed a novel interpolation route which links microarray data to open-source libraries of full mtDNA sequence. Mitochondria are the cell organelle relied upon to provide ATP for all eukaryotic cells, which they must do efficiently and responsively.
The UKBiobank has collected a volume of data which enables an exploration of the human mitochondrial genome (mtDNA) for variants influencing mitochondrial performance. However, the density of the genetic data requires imputation, interpolating from known data points. Current methods fail to fully exploit what is known about the mtDNA and its inheritance pattern, which Drummond has improved upon.
You can see the video of the talk on our YouTube channel at: https://www.youtube.com/watch?v=3-9C9_7c8ag
Richard Williams
University of Manchester
Code Set Selection Methods for Primary Care Data
Most primary care health data is ‘coded’ – meaning, instead of a patient’s record containing the term “type 2 diabetes“, it would contain the clinical code “C10F”. When we want to analyse patient data we must first make sets of these clinical codes, called ‘code sets’, to describe what we want to find out.
However, caution must be exercised as the process of creating these code sets is non-trivial, and mistakes at this early stage of the analysis pipeline have been shown to lead different research teams to reach different results and conclusions from the same data source. In this talk, Richard will provide an introduction to the various coding systems in use in UK primary care, talk about the right ways to create clinical code sets, and signpost attendees to various online resources that can help.
You can see the video of the talk on our YouTube channel at: https://youtu.be/hg5t9inEE4w
Alex Casson
Analysing the 100,000+ accelerometer datasets in the UK Biobank
Emma Drummond
Extending the Reach of UK Biobank Data into the Mitochondrial Genome
Richard Williams
Code Set Selection Methods for Primary Care Data