The data

The data we’re using today is the daily transport usage stats, initially produced for Covid reporting. This data provides a daily figure for usage by mode, indexed to a pre-Covid baseline figure.

The data is available here in a tidy, machine-readable format, and you can see the first 10 rows of the data below:

date transport_type value
2020-03-01 cars 1.03
2020-03-01 light_commercial_vehicles 1.11
2020-03-01 heavy_goods_vehicles 1.08
2020-03-01 all_motor_vehicles 1.04
2020-03-01 tfl_tube 1.03
2020-03-01 tfl_bus 1.02
2020-03-01 national_rail 0.95
2020-03-01 national_rail_noCR 0.95
2020-03-02 cars 1.02
2020-03-02 light_commercial_vehicles 1.06

The data covers the following modes:

Motor vehicles is further broken down by vehicle type:

There are two series available for rail, one which includes Crossrail and one which excludes it:

Where no data is available for a specific date, you can treat that as an NA value.

The task

  1. Read the data in to R. You don’t need to save the file locally, you can do this directly from the web link (https://assets.publishing.service.gov.uk/media/65257d612548ca0014ddf09b/full_data_clean.csv)

  2. Check that the data is clean, and the different modal names are in a publication-ready format You can use a combination of mutate and gsub/str_replace to swap underscores for spaces

  3. Create a ggplot line chart of the data, with dates on the x axis and value on the y axis, with one line per transport mode.

  4. Make your chart publication-worthy! Aspects you may want to consider include: