The data
The data we’re using today are new experimental statistics looking at the percentage of taxis and PHVs (private hire vehicles) of different ages by region.
The data is available in the data folder associated with this project (“Data/taxi0116.xlsx”) in a non-tidy, human-readable format. You can see the first rows of the data below:
Region | Up to 1 year | 1 up to 2 years | 2 up to 3 years | 3 up to 4 years | 4 up to 5 years | 5 up to 6 years | 6 up to 10 years | 10 up to 13 years | 13 years and over | Unknown [note 2] | Average age (years) |
---|---|---|---|---|---|---|---|---|---|---|---|
East Midlands | 0.3 | 2.0 | 1.4 | 2.5 | 7.6 | 10.4 | 49.2 | 16.1 | 8.6 | 1.8 | 7.3 |
East of England | 0.4 | 3.4 | 2.2 | 2.5 | 8.2 | 10.8 | 44.7 | 14.6 | 12.1 | 1.1 | 7.5 |
London | 2.1 | 10.3 | 6.8 | 5.8 | 12.4 | 6.5 | 32.3 | 20.0 | 3.9 | 0.0 | 6.1 |
North East | 0.3 | 2.2 | 1.8 | 3.0 | 6.9 | 10.6 | 51.9 | 16.0 | 6.3 | 1.1 | 7.1 |
North West | 0.1 | 1.9 | 1.4 | 1.3 | 5.1 | 6.5 | 32.2 | 22.1 | 28.3 | 1.1 | 9.4 |
South East | 0.4 | 2.1 | 1.7 | 2.8 | 7.5 | 11.2 | 47.1 | 16.3 | 8.3 | 2.6 | 7.2 |
The data covers both taxis and PHVs in separate sheets (called taxi and phv respectively). Each different age range of the vehicles is recorded in a separate column, and each row is for a different region.
Each cell contains a value of percentage of vehicles in each region or country that are in each age bracket. Each region or country sums to 100%.
The task
Read the data in to R using the readxl
read_excel()
function. It is saved in the Data folder of this repository, and is calledtaxi0116.xlsx
. You will want to read in sheet 1 for taxi data. You will want to include the folder name in the code when reading the file in e.g. “Data/taxi0116.xlsx”Pivot the data longer into a tidy data format, so you have the taxi ages in one column, and the percentage of the total in another You will want to use the tidyr function pivot_longer() to do this
Filter the data to remove the England and Wales and England total rows.
Create a stacked bar chart in ggplot of the data, with region on the x axis and percentages on the y axis, splitting the data by taxi age.
Make your chart publication-worthy! Aspects you may want to consider include:
- The theme and colours used in your charts
- The formatting and labelling of your chart axes
- The order of the regions, and the ages of the taxis
- Do you want to include the average age of taxis on the chart somehow?
- Can you duplicate the same chart for the PHV data?