Chapter 1 Why do we need another book?

R is a very flexible programming language, which inevitably means there are lots of ways to achieve the same result. This is true of all programming languages, but is particularly exaggerated in R which makes use of ‘meta-programming’.

For example, here is how to calculate a new variable using standard R and filter on a variable:

# Calculate kilometers per litre from miles per gallon
mtcars$kpl <- mtcars$mpg * 0.425144

# Select cars with a horsepower greater than 250 & show only mpg and kpl columns
mtcars[mtcars$hp > 250, c("car", "mpg", "kpl")]
car mpg kpl
29 15.8 6.717275
31 15.0 6.377160

Here’s the same thing using {tidyverse} style R:

mtcars %>%
  # Calculate kilometers per litre
  dplyr::mutate(
    kpl = mpg * 0.425144
  ) %>%
  # Filter cars with a horsepower greater than 250
  dplyr::filter(
    hp > 250
  ) %>%
  # Take only the car, mpg, and newly created kpl columns
  dplyr::select(car, mpg, kpl)
car mpg kpl
29 15.8 6.717275
31 15.0 6.377160

These coding styles are quite different. As people write more code across the Department, it will become increasingly important that code can be handed over to other R users. It is much easier to pick up code written by others if it uses the same coding style you are familiar with.

This is the main motivation for this book, to establish a way of coding that represents a sensible default for those who are new to R that is readily transferable across DfT.

1.1 Coding standards

Related to this, the Data Science team maintain a coding standards document, that outlines some best practices when writing R code. This is not prescriptive and goes beyond the scope of this document, but might be useful for managing your R projects.

1.2 Data

The data used in this book is all derived from opensource data. As well as being availble fro the data folder on the github site here you can also find the larger data sets at the links below.

  • Road Safety Data
  • Search and Rescue Helicopter data; SARH0112 record level data download available under All data at the bottom of this webpage.
  • Pokemon data, not sure of original source, but borrowed from Matt Dray here
  • Port data; the port data can be downloaded by clicking on the table named PORT0499

1.3 Work in progress

This book is not static - new chapters can be added and current chapters can be amended. Please let us know if you have suggestions by raising an issue here.