Managing complex analysis pipelines with workflows
Research data and its analysis is becoming more complex; in domains such as health sciences, bioinformatics, material sciences and astronomy, data often has to be analysed in many steps using different applications to produce a meaningful analysis. The problem with this is the management of all of the different steps, when doing this manually it is time consuming, boring and error prone. It is also not easily scalable; each analysis must be carried out separately.
Workflow tools allow analyses to be described in one single place. All of the steps are described up front. The workflows still allow flexibility but can now be scaled up so that many analysis runs can be orchestrated and conducted simultaneously.
Common workflow language is a standard for connecting command line tools into larger orchestrated workflows. It is compatible with analyses on all scales, from running tests on your laptop to running thousands of simultaneous analyses on an HPC cluster. Its free and open source and widely used making it a good choice for academics wanting to scale up their research.
The aim of the session is to introduce users to the basic concepts behind the CWL language, and how to use the language for building workflows. After the session the users will be able to write, and run, a CWL basic workflow, as well as having knowledge of what resources are available for building and publishing their own workflows.
The material that the session will be based on is a CWL tutorial being developed following Software Carpentry principles (https://carpentries-incubator.github.io/cwl-novice-tutorial/). We will be taking the participants through select lessons in this tutorial, asking them to follow the development of a CWL workflow on their computer. The session collaborators will be on-hand to help participants with doing this, and for advising participants on how they might use CWL for constructing their own workflows.
To take part in this session the students will need Docker installed on their computer, as well as python, and a Unix command line interface. And participants will be asked to follow the setup instructions (https://carpentries-incubator.github.io/cwl-novice-tutorial/setup.html) before the session.
The session will feature around four hours of taught content. After this, participants will be able to work with the facilitators to improve their own code by writing tool descriptors, prototyping a workflow or any other useful activity.
Attendees should have a basic knowledge of using the unix shell. If attendees have completed the SWC shell course that will be sufficient (https://swcarpentry.github.io/shell-novice/). During the course attendees will use VSCode, Docker, CWLtool and graphviz - and will be expected to have these installed on their own computer. Instructions on installing these are available here: https://carpentries-incubator.github.io/cwl-novice-tutorial/setup.html. We will organise a virtual drop-in session before the course for attendees, to help them make sure they are ready for the course.
About the Instructors
Doug Lowe is a Research Software Engineer in Research IT at the University of Manchester, with a research background is in atmospheric sciences, working with large-scale atmospheric chemical models. Since joining the RSE team he has focused on supporting the development of HPC computational workflows, first for atmospheric chemistry research, and now for bioinformatics.
He has a strong interest in sustainable research, both through supporting tool development, and through teaching researchers how to use research software tools. He is a qualified Software Carpentry instructor, and teaches python, shell scripting, and version control using git. He has experience in using Fortran, Python, shell scripting, and CWL.
Michael R. Crusoe is the CWL Project Leader and one of the CWL project co-founders. He works part-time for ELIXIR Netherlands and ELIXIR Germany on topics related to scientific workflows, standards, and community building. He is also a part-time PhD student at VU Amsterdam on the topic of workflow standards in practice. Originally from Phoenix, Arizona, USA; Michael is happy to call Berlin, Germany his home.