The data
The data we’re using today is the now-defunct Google Mobility Data Series, produced to allow people to monitor how people travelled in different countries across the world, based on Google Maps data. This data provides a daily figure for change in mobility compared to the baseline, for different countries, regions, and sub-regions around the world.
The data is available here in a semi-tidy, machine-readable format, and you can see the first 10 rows of the data below:
country_region_code | country_region | sub_region_1 | sub_region_2 | metro_area | iso_3166_2_code | census_fips_code | place_id | date | retail_and_recreation_percent_change_from_baseline | grocery_and_pharmacy_percent_change_from_baseline | parks_percent_change_from_baseline | transit_stations_percent_change_from_baseline | workplaces_percent_change_from_baseline | residential_percent_change_from_baseline |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AE | United Arab Emirates | NA | ChIJvRKrsd9IXj4RpwoIwFYv0zM | 2020-02-15 | 0 | 4 | 5 | 0 | 2 | 1 | ||||
AE | United Arab Emirates | NA | ChIJvRKrsd9IXj4RpwoIwFYv0zM | 2020-02-16 | 1 | 4 | 4 | 1 | 2 | 1 | ||||
AE | United Arab Emirates | NA | ChIJvRKrsd9IXj4RpwoIwFYv0zM | 2020-02-17 | -1 | 1 | 5 | 1 | 2 | 1 | ||||
AE | United Arab Emirates | NA | ChIJvRKrsd9IXj4RpwoIwFYv0zM | 2020-02-18 | -2 | 1 | 5 | 0 | 2 | 1 | ||||
AE | United Arab Emirates | NA | ChIJvRKrsd9IXj4RpwoIwFYv0zM | 2020-02-19 | -2 | 0 | 4 | -1 | 2 | 1 | ||||
AE | United Arab Emirates | NA | ChIJvRKrsd9IXj4RpwoIwFYv0zM | 2020-02-20 | -2 | 1 | 6 | 1 | 1 | 1 |
The data covers travel in the following types of area:
- retail_and_recreation
- grocery_and_pharmacy
- parks
- transit_stations
- workplaces
- residential
Part of the challenge of this data is how large it is; you will need to get used to handling it solely R rather than looking at the CSV in Excel, and working efficiently so your code doesn’t take ages to run!
The task
Read the data in to R. You don’t need to save the file locally, you can do this directly from the web link (https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv)
Filter the data down to just the United Kingdom and the country-level data only. The country_region_code for the United Kingdom is GB, and country-level data is indicated by no sub_region_1
Bring the data into a tidy format, so there is only one column for value and one for type of area. You will want to use the tidyr pivot_longer() function for this
Create a ggplot chart of the data, with dates on the x axis and value on the y axis, with one line per type of area.
Make your chart publication-worthy! Aspects you may want to consider include:
- The theme and colours used in your charts
- The formatting and labelling of your chart axes
- What should the date range of your data be?
- Do you want to include weekends and bank holidays?
- Do you want to include every type of area? Or would you like to show more than one country, or a breakdown by region?