The (spooky) data
The data we’re looking at today is taken from the ORR passenger footfall in train stations over time. Some stations in the UK show very low footfall over the course of the year, and are often termed “ghost stations”. We’re going to take a look at how the footfall has changed in these stations over time.
station_name | NLC | TLC | region | local_authority | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abbey Wood | 5131 | ABW | London | Greenwich | 2060584 | 2284585 | 2355943 | 2443651 | 2400216 | 2425400 | NA | 2201541 | 2089975 | 2804493 | 3096498 | 3029176 | 2882868 | 3030212 | 3134250 | 3175430 | 3282240 | 3319408 | 2929472 | 2988802 | 3124850 | 3769402 | 3825206 | 1412638 | 2638456 |
Aber | 3813 | ABE | Wales | Caerphilly - Caerffili | 88714 | 87910 | 112812 | 115079 | 115667 | 134397 | NA | 134191 | 136549 | 169463 | 183136 | 192180 | 192788 | 202486 | 203432 | 209622 | 219868 | 212546 | 214996 | 227270 | 251108 | 245218 | 228480 | 15712 | 73642 |
Abercynon | 3801 | ACY | Wales | Rhondda Cynon Taf - Rhondda Cynon Taf | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 194164 | 195702 | 214492 | 240070 | 243948 | 251688 | 265458 | 275404 | 293638 | 298358 | 289008 | 282886 | 33006 | 105822 |
Abercynon North | NA | NA | NA | NA | 43073 | 42890 | 47417 | 52014 | 62184 | 82961 | NA | 112811 | 114833 | 123455 | 127598 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Abercynon South | NA | NA | NA | NA | 51191 | 49553 | 56520 | 67107 | 72290 | 68360 | NA | 92256 | 82208 | 70294 | 64660 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Aberdare | 3982 | ABA | Wales | Rhondda Cynon Taf - Rhondda Cynon Taf | 158945 | 178680 | 219986 | 260244 | 302859 | 331006 | NA | 464026 | 465675 | 469086 | 471586 | 506004 | 507820 | 524592 | 537542 | 552436 | 557992 | 551928 | 566904 | 569364 | 571746 | 555780 | 504622 | 53668 | 177350 |
Aberdeen | 8976 | ABD | Scotland | Aberdeen City | 1550570 | 1590804 | 1609251 | 1607702 | 1762708 | 1761041 | NA | 1931973 | 2107855 | 2278872 | 2470281 | 2568810 | 2657014 | 2964302 | 3170464 | 3338072 | 3599431 | 3742646 | 3459944 | 3058268 | 2948150 | 2616142 | 2497108 | 393982 | 1536720 |
Aberdour | 9090 | AUR | Scotland | Fife | 78964 | 87940 | 95220 | 103545 | 95850 | 96376 | NA | 112941 | 109580 | 120420 | 121724 | 128074 | 131874 | 129786 | 126000 | 124298 | 127470 | 129474 | 127312 | 125208 | 135240 | 140048 | 126340 | 14726 | 62990 |
Aberdovey | 4435 | AVY | Wales | Gwynedd - Gwynedd | 20938 | 19424 | 21204 | 21030 | 21040 | 21302 | NA | 23299 | 20461 | 23365 | 25093 | 25418 | 27996 | 32190 | 36696 | 33612 | 34450 | 36684 | 38094 | 35960 | 37706 | 40390 | 36560 | 5396 | 22886 |
Abererch | 4440 | ABH | Wales | Gwynedd - Gwynedd | 513 | 426 | 516 | 735 | 473 | 498 | NA | 1038 | 1027 | 1095 | 1261 | 1258 | 1326 | 1620 | 1786 | 1214 | 1380 | 326 | 1984 | 2140 | 2506 | 2228 | 2148 | 0 | 396 |
The data provides details for each station including name, code, region, local authority, and then a footfall figure by year from 1998 onwards. The footfall figures are calculated based on tickets purchased, and years are financial years. The data is in human readable format, with one column per year, and the data is provided in CSV format.
The task
Read the data in to R. It is saved in the Data folder of this repository, and is called
ghost_stations.csv
. If you haven’t cloned this repository, it’s also available to read in directly from: https://raw.githubusercontent.com/department-for-transport/learn_r_by_doing/main/Data/ghost_stations.csvOrder the data by footfall in 2020 (the last normal year before Covid!), with the stations with the lowest footfall at the top of the table.
Keep only the 10 stations with the lowest footfall in 2020 You will want to use the head function with n = 10 to do this
Pivot the data longer into a tidy data format, so you have the year in one column, and the footfall in another You will want to use the tidyr function pivot_longer() to do this
Create a line chart in ggplot of the data, with year on the x axis and footfall on the y axis.
Make your chart publication-worthy! Aspects you may want to consider include:
- The theme and colours used in your charts; can you create an autumnal/halloween-themed colour palette and apply it to your chart?
- The formatting and labelling of your chart axes
- Can you annotate the chart to highlight the Covid-related drop in 2021?
- Can you split different regions onto different charts to make the chart easier to read?