HPC for Healthcare

Overview

High-performance computing (HPC) is an increasingly important tool utilised by researchers, helping to overcome the limitations of processing parallelism and memory found in even the most powerful desktop systems. With more computing power available, researchers can perform more and bigger experiments, leading to improved accuracy and better research outcomes. A further benefit is that HPC systems require researchers to define their workflow as jobs, which aids reproducibility.

However, using an HPC system, whether on-premise or in the cloud, isn’t as simple as just logging on and firing up the software. HPC users must be able to log in to remote computers using the bash shell, upload data, and install software before they can begin to conduct experiments. Even the code may need to be adapted to run in parallel and maximize the benefits of HPC platforms.

For these reasons, many researchers shy away from adding HPC to their research toolkit.

Resources

Using the Shell

This set of materials will introduce the UNIX command line, also known as the terminal or shell, to operate a computer, connect to a cluster, and write simple shell scripts. You can find the course materials at:

Introduction to Using the Shell in a High-Performance Computing Context

Workshop data repository

Introducing High Performance Computing

This set of materials will further develop your understanding of the sorts of work you can undertake with a high-performance computing platform. You should be able to identify the problems suitable for HPC, develop further skills with the terminal or command line, and learn how to submit and manage jobs on a cluster using a scheduler. You will also learn how to transfer files and use software through environment modules.

Introduction to High-Performance Computing

Workshop data repository

Wrangling Genomics

Building on the lessons of the previous two sections, participants will learn how to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population.

We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples.

Whilst the workflow may be slightly different from the one you plan to use in your work, it will be useful to apply the lessons learned so far to real-world data. This will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Overview

Resources

Using the Shell

Introduction to Using the Shell in a High-Performance Computing Context

Workshop data repository

Introducing High Performance Computing

Introduction to High-Performance Computing

Workshop data repository

Wrangling Genomics

Data Wrangling and Processing for Genomics

Workshop data repository