Overview
High Performance Computing (HPC) is an
increasingly important tool utilised by researchers, helping to
overcome the limitations processing parallelism and memory found in
even the most powerful desktop systems. With more computing power
available researchers can perform more and bigger experiments leading
to improved accuracy and better research outcomes. A further benefit
is that HPC systems require researchers to define their workflow as
jobs, which aids reproducibility.
However, making use of an HPC system, whether on-premise or in the cloud, isn’t as simple as just logging on and firing up the software. HPC users must be able to log in to remote computers using the bash shell, upload data and install software before they can begin to conduct experiments. Even the code may need to be adapted to run in parallel and maximise the benefits of HPC platforms.
For these reasons, many researchers shy away from adding HPC to their research toolkit.
Resources
Using the Shell
This set of materials will introduce the UNIX command line, also known as the terminal or shell, to operate a computer, connect to a cluster and write simple shell scripts. https://rse.shef.ac.uk/hpc-shell-tuos-citc/
You can find the Github repository for this section of the workshop at: https://github.com/RSE-Sheffield/hpc-shell-tuos-citc
Introducing High Performance Computing
This set of materials will further develop your understanding of the sorts of work you can undertake with a high-performance computing platform. You should be able to identify the problems suitable for HPC, develop further skills with the terminal or command line and learn how to submit and manage jobs on a cluster using a scheduler. You will also learn how to transfer files and use software through environment modules.
You can find the course materials at: https://rse.shef.ac.uk/hpc-intro-tuos-citc/
The Github repository for this section of the workshop at: https://github.com/RSE-Sheffield/hpc-intro-tuos-citc
Wrangling Genomics
Building on the lessons of the previous two sections, participants will learn how to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population.
We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples.
Whilst the workflow may be slightly different yo the one you plan to use in your own work, it will be useful to apply the lessons learnt so far to real-world data. This will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.
You can find the course materials at: https://sbc.shef.ac.uk/wrangling-genomics/ Whilst the Github repository is at: https://github.com/sheffield-bioinformatics-core/wrangling-genomics/