green-graph-1197618.jpg

Reproducible Analyses in R

In July 2020 Emma Rand of the University of York ran a course introducing R and RStudio to help researchers to produce reproducible analyses of their data. This resource page includes slides and video from that workshop to help you get started in R.

Overview

The importance and scale of data in the health sciences means researchers are increasingly required to develop the data skills needed to design reproducible workflows for the collection, organisation, processing, analysis and presentation of data. Developing such data skills requires at least some coding, also known as scripting. This makes your work (everything you do with your raw data) explicitly described, totally transparent and completely reproducible. However, learning to code can be a daunting prospect for many health scientists! That's where an Introduction to reproducible analyses in R comes in!

R is a free and open source language especially well-suited to data analysis and visualisation and has a relatively inclusive and newbie-friendly community. R caters to users who do not see themselves as programmers, but then allows them to slide gradually into programming.

To complete this course you will need to install:

  • R version 3.6 or higher
  • RStudio 1.2 or higher
  • The tidyverse package
Alternatively, you can use RStudio Cloud with tidyverse installed.

Learning outcomes

After this workshop the successful learner will be able to:

  • Find their way around the RStudio windows
  • Create and plot data using the base package and ggplot
  • Explain the rationale for scripting analysis
  • Use the help pages
  • Know how to make additional packages available in an R session
  • Reproducibly import data in a variety of formats
  • Understand what is meant by the working directory, absolute and relative paths and be able to apply these concepts to data import
  • Summarise data in a single group or in multiple groups
  • Recognise tidy data format and carry out some typical data tidying tasks

The Slides

The slides from this workshop are available on GitHub: https://github.com/3mmaRand/N8-CIR-intro-repro They include hyperlinks to the sample data used in the workshop.

Videos

You will need to set aside a total of around 2 hours to watch all of the videos. However, they have been split into smaller tutorials that you can view and revisit at your own pace. The videos are all hosted on YouTube.

Part 1 - Introduction and overview

Part 2 - Introduction to R and R Studio

Part 3 - Adding data and basic calculations

Part 4 - Basic plotting with ggplot

Part 5 - Using and understanding the manual

Part 6 - Importing, tidying and working with data

Part 7 - Course summary

Return to article index