High Performance Computing for Healthcare

Dates

This workshop takes place over two days, 12 July and 14 July. Please only register if you will be able to attend both sessions if your application is successful.

Overview

High Performance Computing (HPC) is an increasingly important tool utilised by researchers, helping to overcome the limitations processing parallelism and memory found in even the most powerful desktop systems. With more computing power available researchers can perform more and bigger experiments leading to improved accuracy and better research outcomes. A further benefit is that HPC systems require researchers to define their workflow as jobs, which aids reproducibility.

However, making use of an HPC system, whether on-premise or in the cloud, isn’t as simple as just logging on and firing up the software. HPC users must be able to log in to remote computers using the bash shell, upload data and install software before they can begin to conduct experiments. Even the code may need to be adapted to run in parallel and maximise the benefits of HPC platforms.

For these reasons, many researchers shy away from adding HPC to their research toolkit.

This two-day workshop is intended to remove some of the mystery from HPC systems, making it easier for researchers to access them and accelerate their research. The material in this course will be healthcare-related using an omics data analysis. Participants will be given a set of sequencing reads from a next-generation sequencing experiment and perform the steps involved in identifying DNA mutations.

Although the example dataset used will be Genomics, many of the tools and techniques will be applicable to other healthcare use-cases.

Learning Outcomes

Improved familiarity with the bash shell

Install local software needed to access HPC (e.g. mobaxterm)

Transfer files to and from HPC

Submit, manage and evaluate the resource utilisation of HPC batch jobs

Be aware of the importance of parallelisation

Ability to run “embarrassingly parallel” Bioinformatics pipelines (e.g. quality control and alignment of a set of individuals profiled using next generation sequencing data)

Prerequisites

Previous experience with the bash shell would be useful but the workshop does include an introduction to using the shell on HPC systems.

Some familiarity with biological concepts, including the structure of DNA, nucleotide abbreviations, and the concept of genomic variation within a population.

Registration

As part of the application process, you will be asked to provide a brief explanation of how attending this workshop will benefit your research. You may find it useful to write this piece before attempting to register for the event.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by e-mail. This process will help to ensure that each of the N8 universities are represented at, and benefit from the course.

This event is only open to those working or studying at one of the N8 Research Partnership universities. Please register using your academic (.ac.uk) e-mail address to help verify your eligibility.

About the Instructor - Will Furnass

Will Furnass is a Senior Research Engineer in the University of Sheffield's Research Software Engineering team and is also a member of the University's Research Platforms team that develops and maintains digital research infrastructure such as the University's high-performance computing (HPC) clusters.

In a previous life he worked in the film industry, building and maintaining HPC clusters used for applying special effects. He's definitely not a bioinformatician but did once conduct a PCR and a temperature-gradient gel electrophoresis!

About the Instructor - Mark Dunning

Mark obtained his PhD in the Statistics and Computational Biology group of Simon Tavare at The University of Cambridge. As part of his thesis he developed open-source software for the analysis of Illumina microarray data, which is available through the Bioconductor project. He joined the Bioinformatics Core at Cancer Research Uk Cambridge Institute and played a key role in the analysis of gene expression profiles as part of the METABRIC project, which identified and described new subtypes of breast cancer. Mark also participated in the pilot phases of the International Cancer Genome Consortium (ICGC) project by developing computational pipelines to process the whole-genome sequencing data from Oesophageal cancer patients. During his time in the Bioinformatics Core he also developed a passion for teaching and commenced a role dedicated to organising and delivering Bioinformatics training courses, with the aim of empowering wet-lab scientists to begin to explore data for themselves and foster more-productive collaborations with Bioinformaticians.

Mark has a strong commitment to reproducible research and making his research outputs available to other researchers, and indeed members of the public who may have funded the research in the first place. For instance, he recently developed and deployed a Shiny application that allows interested parties to query various prostate cancer datasets. In keeping with his open access principles, the code underlying the application is available via github and utilises data sets that can be downloaded from Bioconductor. Mark has also recently investigated technologies such as Galaxy and Docker to ease the deployment of software and facilitate reproducible research.

Mark is a certified instructor for Data and Software Carpentry

12 — 14

JUL

2021