29 JUN

2022

Building Workflows in Digital Health

University Place, Oxford Road, Manchester
29 Jun 2022 10 a.m. — 5 p.m.

Overview

Orchestration Workflows are widely used in computational data analysis, enabling innovation and decision-making. Often the analysis components are numerous, and written by third parties, without an eye on interoperability. In addition, many competing workflow systems exist, potentially limiting portability of workflows written for any one specific workflow system. This hinders the transfer of workflows between different systems and projects, limiting their re-usability. The Common Workflow Language (CWL) project (https://www.commonwl.org/) was established in order to produce free and open standards for describing command-line tool-based workflows. The CWL language is declarative and provides a focused set of common abstractions enabling the expression of computational workflows constructed from diverse software tools. Explicit declaration of requirements for runtime environments and software containers enables portability and reuse. Workflows written according to the CWL standards are a reusable description of that analysis, runnable on a diverse set of computing environments.

A number of workflow engines, listed on the project webpage above, have implemented support for the CWL language, enabling use of CWL workflows on a wide range of platforms. Libraries of CWL tool descriptions are available (e.g. https://github.com/common-workflow-library), and CWL workflows can be published on WorkflowHub (https://workflowhub.eu/), facilitating the sharing and reuse of these tools and workflows. These tools are already used within the life sciences, for managing genomic analysis pipelines, such as in the VIRify (https://workflowhub.eu/workflows/26) and VariantCaller_GATK3.6 (https://doi.org/10.48546/workflowhub.workflow.107.1) workflows.

Outcomes

The aim of the session is to introduce users to the basic concepts behind the CWL language, and how to use the language for building workflows. After the session the users will be able to write, and run, a CWL basic workflow, as well as having knowledge of what resources are available for building and publishing their own workflows.

The material that the session will be based on is a CWL tutorial being developed following Software Carpentry principles (https://carpentries-incubator.github.io/cwl-novice-tutorial/). We will be taking the participants through select lessons in this tutorial, asking them to follow the development of a CWL workflow on their computer. The session collaborators will be on-hand to help participants with doing this, and for advising participants on how they might use CWL for constructing their own workflows.

To take part in this session the students will need Docker installed on their computer, as well as python, and a Unix command line interface. And participants will be asked to follow the setup instructions (https://carpentries-incubator.github.io/cwl-novice-tutorial/setup.html) before the session.

The session will feature around four hours of taught content. After this, participants will be able to work with the facilitators to improve their own code by writing tool descriptors, prototyping a workflow or any other useful activity.

Prerequisites

Attendees should have a basic knowledge of using the unix shell. If attendees have completed the SWC shell course that will be sufficient (https://swcarpentry.github.io/shell-novice/). During the course attendees will use VSCode, Docker, CWLtool and graphviz - and will be expected to have these installed on their own computer. Instructions on installing these are available here: https://carpentries-incubator.github.io/cwl-novice-tutorial/setup.html. We will organise a virtual drop-in session before the course for attendees, to help them make sure they are ready for the course.

About the Instructors

Doug Lowe is a Research Software Engineer in Research IT at the University of Manchester, with a research background is in atmospheric sciences, working with large-scale atmospheric chemical models. Since joining the RSE team he has focused on supporting the development of HPC computational workflows, first for atmospheric chemistry research, and now for bioinformatics.

He has a strong interest in sustainable research, both through supporting tool development, and through teaching researchers how to use research software tools. He is a qualified Software Carpentry instructor, and teaches python, shell scripting, and version control using git. He has experience in using Fortran, Python, shell scripting, and CWL.

Michael R. Crusoe is the CWL Project Leader and one of the CWL project co-founders. He works part-time for ELIXIR Netherlands and ELIXIR Germany on topics related to scientific workflows, standards, and community building. He is also a part-time PhD student at VU Amsterdam on the topic of workflow standards in practice. Originally from Phoenix, Arizona, USA; Michael is happy to call Berlin, Germany his home.

Return to event index