Chapter 2 Sample Selection
2.1 Sample size and structure
The NTS 2023 was designed to provide a representative sample of households in England and was based on a stratified 2-stage random probability sample of private households. The sampling frame was the ‘small user’ Postcode Address File (PAF), a list of all addresses in the country (also known as delivery points).
The sample for the 2023 survey was drawn firstly by selecting the Primary Sampling Units (PSUs), and then by selecting addresses within PSUs. The sample design employs postcode sectors as PSUs. Each PSU represents 1 sample point (also known as an assignment), and for fieldwork purposes each point is issued to an individual interviewer.
For NTS 2023, the sampling design was updated in order achieve an increase in the responding sample This was to facilitate more granular analysis of the data both geographically and demographically, such as allowing more analysis by specific protected characteristics. The intention was to increase the number of PSUs selected from 756 to 1,164 and to the increase the number of addresses selected per PSU from 17 to 22 addresses, equating to a total of 25,608 selected addresses (an increase from 12,852 in previous years).
Due to continuing fieldwork delivery challenges following the COVID-19 pandemic, it was not possible to issue the full sample that had initially been selected for the first 2 quarters of the survey year, meaning that a reduced sample was issued for those 2 quarters. To avoid bias in the responding sample, 30% of PSUs were randomly selected from each of quarter 1 and quarter 2 using the same stratification approach as the initial sampling. As a result, 175 PSUs (that is, 88 in quarter 1 and 87 in quarter 2)) were dropped from the initially drawn sample. The remaining 70% of PSUs were issued in quarter 1 and quarter 2 (that is, 203 and 204 PSUs respectively). In quarters 3 and <span title = “October to December 2023>4, field capacity improvements allowed the full sample (100%) to be issued, consisting of 291 PSUs in each quarter.
In total, 989 of the initially selected 1,164 PSUs were issued, resulting in a final issued sample of 21,758 addresses.
2.2 Quasi-panel design
Following a review of the NTS methodology in 2000, it was decided that the NTS should introduce a quasi-panel design from 2002 onwards. According to this design, half the PSUs in a given year’s sample are retained for the next year’s sample and the other half are replaced. This has the effect of reducing the variance of estimates of year-on-year change.
Therefore 378 of the PSUs selected for the 2022 sample were retained for the 2023 core sample. As the overall sample size was increased for 2023, these 378 retained PSUs were supplemented with 786 new PSUs. The PSUs carried over from the 2022 sample for inclusion in 2023 were excluded from the 2023 sample frame, so they could not appear twice in the sample, however, the dropped PSUs from 2022 were included.
As mentioned above, the number of PSUs was reduced by 30% in quarters 1 and 2 of NTS 2023. In order not to bias the geographical stratification, both retained PSUs and new PSUs were included in the process of selecting the 30% of PSUs to be removed, meaning that not all of the 378 PSUs that were intended to be carried over from 2022 into 2023 were included in the final issued sample. Of the 989 issued PSUs, 324 were retained from 2022.
Whilst the same PSU postcode sectors might appear in different survey years, no single addresses were allowed to be included in 3 consecutive years to minimise the chances of the same address being selected again. Each year, NatCen provides the sampling company with a list of the addresses selected for the previous 3 survey years. These addresses were excluded from the sampling frame before the addresses for 2023 were selected. This means respondents to the 3 previous year’s surveys in the carried over PSUs could not be contacted again.
For further information about the methodological review, see Elliott, D. (2000) ONS Quality Review of the National Travel Survey: Some Aspects of Design and Estimation Methods.
2.3 Selection of sample points
Sample points were selected firstly by generating a list of all postcode sectors in England (excluding those in the Isles of Scilly due to cost of interviewing). Sectors carried over from the previous year were also excluded, as described in section 2.2 above. Sectors with fewer than 500 delivery points were grouped with an adjacent sector. Grouped sectors were then treated as 1 PSU. On average each PSU contained about 3,250 delivery points.
This list of grouped postcode sectors in England was then stratified using a regional variable, an urban or rural indicator, car ownership and a working from home indicator (note that this stratification approach was first implemented in NTS 2015 following a stratification review that NatCen carried out in 2014). This was done to increase the precision of the sample and to ensure that the different strata in the population are correctly represented. Random samples of PSUs were then selected within each stratum.
The regional strata for England are based on the International Territorial Level 2 (ITL2) areas, formerly NUTS 2, grouped in a few cases where single areas are too small. International Territorial Levels (formerly known as Nomenclature of Units for Territorial Statistics) replaces the European-wide geographical classification developed by the European Office for Statistics (Eurostat) following the UK withdrawal from the EU. The 2 classifications are equivalent, and the categories unchanged from previous years. ITL2 roughly relates to counties or groups of counties in England. The 30 regional strata for the survey are shown in Table 2.1, along with the region codes that each of the strata belong to.
Table 2.1: NTS regional stratification variable
Stratification number | England | Region code |
---|---|---|
1 | Inner London – East | 7 Greater London |
2 | Inner London – West | 7 Greater London |
3 | Outer London – East and North East | 7 Greater London |
4 | Outer London – South | 7 Greater London |
5 | Outer London West and North West | 7 Greater London |
6 | Devon and Cornwall | 9 South West |
7 | North Somerset, North East Somerset, Bath, Somerset and Dorset | 9 South West |
8 | Bristol, South Gloucestershire, Gloucestershire and Wiltshire | 9 South West |
9 | Oxfordshire, Buckinghamshire and Berkshire | 8 South East |
10 | Hampshire and Isle of Wight | 8 South East |
11 | Kent | 8 South East |
12 | West Sussex and East Sussex | 8 South East |
13 | Surrey | 8 South East |
14 | Essex | 6 Eastern |
15 | Cambridgeshire, Suffolk and Norfolk | 6 Eastern |
16 | Hertfordshire and Bedfordshire | 6 Eastern |
17 | Leicestershire, Lincolnshire and Northamptonshire | 4 East Midlands |
18 | Warwickshire and Hereford and Worcester | 5 West Midlands |
19 | West Midlands | 5 West Midlands |
20 | Shropshire and Staffordshire | 5 West Midlands |
21 | Nottinghamshire and Derbyshire | 4 East Midlands |
22 | Cheshire | 2 North West and Merseyside |
23 | Merseyside | 2 North West and Merseyside |
24 | Greater Manchester | 2 North West and Merseyside |
25 | Lancashire and Cumbria | 2 North West and Merseyside |
26 | South Yorkshire | 3 Yorkshire and Humberside |
27 | West Yorkshire | 3 Yorkshire and Humberside |
28 | North Yorkshire and Humberside | 3 Yorkshire and Humberside |
29 | Cleveland, County Durham and Northumberland | 1 North East |
30 | Tyne and Wear | 1 North East |
Within each region, postcode sectors were allocated to “urban” or “rural” based on the urban or rural indicator creating 51 “expanded” regions. The urban rural indicator itself was based on the 2011 Census and derived from the 10-category Rural Urban Classification. Within each “expanded” region, postcode sectors were listed in increasing order of the proportion of households with no car (according to the 2011 Census). Cut-off points were then drawn approximately 1 third and 2 thirds (in terms of delivery points) down the ordered list, to create 3 roughly equal-sized bands. Within each of the 153 bands thus created (51 times 3), sectors were listed in order of the percentage of people working from home (based on the 2011 Census).
In the next step of the process, 786 postcode sectors were then systematically selected for the core sample with probability proportional to delivery point count. Differential sampling fractions were used in Inner London, Outer London and the rest of England in order to oversample London (see section 2.4 for further details). These sectors were then added to the 378 sectors carried over from the previous year’s survey to produce the initial core sample of 1,164 sectors, before the total sample was eventually decreased to 989 PSUs as a result of sample reductions for quarters 1 and 2 (as outlined in section 2.1 above).
2.4 Oversampling of London
Each year, London PSUs are oversampled. Response rates tend to be much lower in London compared with the rest of England, with rates being lowest in Inner London. The NTS oversamples Inner and Outer London with the aim of achieving responding sample sizes in London and elsewhere which are proportional to their population. Estimates of response rates were made to oversample Inner and Outer London based on recent years of NTS. Of the 1,164 PSUs in the initially sample drawn, 112 were in Outer London and 83 in Inner London. Of the 989 issued PSUs (that is, after the 30% sample reduction in quarters 1 and 2), 96 were in Outer London and 70 in Inner London.
2.5 Selection of addresses
The number of addresses drawn per PSU increased from 17 to 22 for NTS 2023. The aim of this increase in point size was to achieve a larger responding sample without a proportionate increase in fieldwork costs. The clustering effect of this increased point size was tested in a 2013 split sample experiment and found to be acceptable. As a result, 22 addresses were systematically selected from each of the 1,164 PSUs that were initially drawn for 2023, a total of 25,608 selected addresses.
Due to the field capacity issues in quarters 1 and 2 of 2023 (described in section 2.1 above), only 21,758 of the 25,608 sampled addresses were issued from a reduced sample of 989 PSUs.
2.6 Self-completion section
Starting in NTS 2017, a Computer Assisted Self Interviewing (CASI) module for transport satisfaction questions was added, where 1 adult from those present during the household interview is asked to complete the satisfaction questions.
Introduction of the CASI module added a new element to the sample design, requiring 1 individual to be randomly selected per household. The methodology for incorporating the CASI module into the NTS sample was based on the methodological development work that NatCen carried out in 2016. This methodology is detailed in Appendix Q1 of the NTS 2017 Technical Report.
This development work showed that inclusion of the satisfaction questions in this way requires the selection of 1 adult per household among those present during the interview. Selecting only from those present, however, introduces a non-random element in the sampling process, as some individuals (those who are absent) would have a zero probability of selection, thus introducing bias to the selected sample.
The development work also showed that younger men and women are under-represented in the sub-sample of NTS household members who are present during the interview. Given that younger people are less likely to live alone, this under-representation is likely to increase if 1 person per household is selected at random amongst those who are present. Consequently the development work recommended varying the probabilities of selection so that the number of young men and women selected is increased. The CASI sample for NTS 2023 was therefore recruited using an equal probability of selection, except in households where both people aged 16 to 29 and 30 or over were present. In such households, those aged 16 to 29 were selected with an 80% probability. This differential selection probability was then adjusted for in the weighting of the CASI responding sample.
2.7 Allocation of PSUs to months
To allocate PSUs evenly across NTS 2023, the survey year was divided into 12 quota (fieldwork) months and equal numbers of PSUs (291) were initially assigned to each quarter, resulting in an average of 97 points being issued each month. Due to the 30% reduction of the sample for quarters 1 and 2, these first 2 quarters of the survey year were issued with 204 PSUs each, with an average of around 68 points per month.
Allocating PSUs evenly across a quarter (rather than a month) results in a more even spread of the average number of points and hence interviews and travel diaries per day across months. This approach makes it easier to control for variation across seasons. Furthermore, PSUs were allocated to quota months such that a nationally representative sample would be obtained for each quarter. Until 2016, an equal number of PSUs were issued each month, which meant that shorter months, particularly February, were slightly overrepresented in the data.
As noted in section 2.3 above, random samples of PSUs were selected within each stratum, as well as being evenly spread across each quarter. The distribution of sample points for each quota month across the major regional strata is shown in Appendix K.
2.8 Fieldwork start dates
Since 2014, an additional process followed the selection of sample points. As part of this process, start dates are evenly spread across each month and then assigned to the points per month at random to provide an even spread of responses across the year.
Prior to 2014, interviewers were instructed to begin fieldwork at the start of the quota month. Additionally, travel week start dates were allocated within quota months, which ran mid-month to mid-month. However, analysis using 2012 data showed that this design led to an uneven spread of travel week start dates across the month due to interviewers following similar fieldwork patterns. In 2014 a new design was implemented to address this issue, whereby interviewers were assigned to start fieldwork on different dates across the month to ensure that the interviewing dates were more evenly spread.
2.9 Selection of households at sampled addresses
Interviewers should interview only 1 household per address given to them in their sample point. At some addresses, interviewers may find that more than 1 household is present. A household is defined as 1 person or a group of people living in a dwelling unit, who (a) share cooking facilities and (b) share a living room, sitting room or a dining area.
A single address may also contain more than 1 dwelling unit, for example a house which has been split into 2 flats. A dwelling unit is a living space with its own front door, which can be either a street door or a door within a house or block of flats. Moreover, a single dwelling unit may include just 1 household or multiple resident households, for example 2 families living as 2 separate households in 1 house.
In England, addresses containing multiple dwelling units are not identified in the PAF and will not be detected until the interviewer has visited the address. For example, most apartments, whether in a block of flats or within a house, will be listed in their own right in the PAF. That is, these apartments are listed with their own address in the PAF, and assuming they meet the criteria of a single address (as defined above) they would be considered as 1 dwelling unit only. However, for some apartment blocks or houses that contain multiple dwelling units, the PAF will not list the individual addresses for each dwelling unit. Where this is the case, the interviewer will need to establish the different dwelling units that are part of the address that was given to them in their sample point. Furthermore, the PAF does not provide information on the number of households at a given address, and so the presence of other dwelling units is only detected when the interviewer visits the address.
Households residing at PAF-sampled addresses with multiple dwelling units or households, or both, will have had a lower chance of selection than others. While there are relatively few such addresses (1%), they account for a larger proportion of households, and these households tend to be rather different to others (poorer, younger, and smaller), so consequent biases may not be entirely trivial.
Interviewers must select 1 household to approach to take part at each sampled address. Interviewers are instructed to first establish the number of dwelling units at each sampled address. If there is more than 1 dwelling unit at the address, interviewers list these dwelling units in the electronic Address Record Form system (eARF) on their laptops so that the computer can randomly sample 1 of them. They then establish the number of households residing within the dwelling unit (whether it is the only dwelling unit at the address or the selected dwelling unit at an address with multiple dwelling units). Similarly, if there is more than 1 household, interviewers list them out in the eARF so that the computer can randomly select 1 of them.
Corrective weighting is then used to remove any bias arising from the lower chance of selection among dwelling units or households residing at multi-household addresses.
2.10 Ineligible (deadwood) addresses
The following types of address were classified as ineligible in 2023:
houses not yet built or under construction
demolished or derelict buildings or buildings where the address has “disappeared” when 2 addresses were combined into 1
vacant or empty housing unit: housing units known not to contain any resident household on the date of the first contact attempt
a non-residential address: an address occupied solely by a business, school, government office or other organisation with no resident persons
residential accommodation not used as the main residence of any of the residents. This is likely to apply to second homes, seasonal, vacation or temporary residences, and these were excluded to avoid double counting
a communal establishment or institution: that is, an address at which 4 or more unrelated people sleep; while they may or may not eat communally, the establishment must be run or managed by the owner or a person (or persons) employed for this purpose
an address is residential and occupied by a private household(s), but does not contain any household eligible for the survey; it is very rare for a residential household not to be eligible for the NTS interview, exceptions include ‘Household of foreign diplomat or foreign serviceman living on a base’, addresses which are not the ‘Main residence’ of any of the residents and addresses where there are no residents aged 16 or over
an address out of sample: that is, cases where interviewers were directed not to approach a particular address; this is very rare and usually only occurs where an address should not have been listed on the original sampling frame
For further information about outcome coding, see section 3.14.
2.11 PSU-level variables
In addition to the information provided by members of the sampled households, the NTS also collects information measured at the PSU-level. The value of a PSU-level variable applies to all households living within that PSU. The PSU-level is therefore the highest level at which the data may be analysed, coming just above the Household level in the analysis hierarchy.