Chapter 14 The Reproducible Analysis Project Checklist
Not sure if your project is a good candidate for a reproducible analysis approach? Complete the following checklist, ticking all of the statements that apply to your work:
Repeatability
The analysis will need to be rerun with similar data on an annual or more frequent basis (we’d like to get an update of this slidepack every quarter)
You are likely to be asked to run the analysis again to look at different groupings or breakdowns (can you show me what this looks like when broken down by age instead of gender?)
The analysis includes similar manipulation or visualisation of slightly different datasets (we need this same report for the matching data for all our ALBs)
Your report includes similar outputs for different cuts of the data (I’d like to see the same charts for every local authority)
Someone else will need to be able to easily run your analysis
Scalability
Your data is large in size, more than 100,000 rows of data, or more than 50 columns
Your data is rapidly increasing in size; you are gaining more than 50 new rows of data per month
Your data refreshes daily or more frequently
Data comes from a large number (5 or more) of sources
Analysis of your data comprises multiple (3 or more) stages
Quality
Your analysis is likely to be compared to other analysis in a similar area (can you explain why your figure here doesn’t match this one produced 3 months earlier?)
There is likely to be a large amount of public or media interest in your work, and methodology or copy/paste error or correction would be embarrassing for the department
The output of your analysis feeds in to high-impact decision making or policy, and needs to be error-free
Auditability of your work in future is a key consideration
Automation
The content you are producing doesn’t change format often (Commentary such as “category X increased by 10,000, up 12% on the previous quarter”)
Your data is provided in a format which is stable and doesn’t change often (Numbers of columns and names of columns remain consistent every time data refreshes)
The process relies on a number of manual steps carried out in order (processes such as saving files with specific names, copying formulae in Excel, or copy-pasting data into the right location)
Result:
Print Your Answers