Chapter 8 Quality assurance in coding
When developing your code, you will want to, eventually, ensure that your code is correct and that you are adopting the right approach. This is called quality assurance, or QA.
8.1 Definitions: Code review vs code QA
Often, in our day-to-day jobs, we use the words “code QA” and “code review” interchangeably. However, within DfT, and CRAN in particular, these have distinct meanings. Here are the key differences:
Code review | Code QA | |
---|---|---|
What |
A code review is a deep and critical look at someone else’s code. The aim is to review whether the code developed is of good quality and future-proof. Does it follow coding best practice and is it as efficient as possible? For example, a code reviewer might spot opportunities to functionalise some chunks of code that are repeated. |
A code QA is performed to ensure:
For example:
|
Who |
A code review is generally conducted by a relatively experienced coder, who has some knowledge of coding best practice. But you don’t have to be a coding wizard, as there is guidance available to help you. The code reviewer does not need to have expert background knowledge about the data or the policy area. The reviewer should be able to get a good enough idea of the topic simply by reading the documentation. |
A code QAer is generally someone in the same team as the code developer, or at least in the same area. They need to have a good understanding of the data and have the necessary background knowledge of the policy area. If the QAer does not have this expertise, it will be difficult for them to know whether the code is fit for purpose, or if the data is used the right way. |
When | Anytime, ideally in chunks. | Generally, as soon as a new feature is developed, or a bug is fixed. |
This chapter focuses on code QA. For information on code review, visit the Code Review Network Github repo.
8.2 General QA principles
Before getting into the details of code QA best practices, here is a reminder about what QA is, more generally.
8.2.1 WHY should we care about QA?
When conducting analysis, it is essential that it’s quality assured. If not, then, as a government department, we risk making misinformed decisions which can have lasting impacts. Incorrect analysis can lead to a range of consequences, and in some extreme cases, can result in harm to life or the misallocation of public funds. Two examples of analysis gone wrong include the west coast mainline incident and the Covid-19 test error – both could have been avoided with more robust QA.
In 2013, a numerical mistake in analysis surrounding calculated risk resulted in the government retracting the decision to award the £5 billion West Coast Main Line franchise to FirstGroup. The assumptions on passenger growth, inflation and other risks were inconsistent throughout reports which were provided to ministers to determine the winning bidder taking over the franchise.
The department was hit with backlash over the decision, suggesting the key cause for this error was rushing into a complex programme of railway refranchising with depleted staff and budgets. All of this was caused by simple human error surrounding some of the key figures in the analysis.
Along with the many public criticisms of the department, 3 civil servants were suspended from their role, with legal action being launched against DfT. In total, nearly £50 million of taxpayers’ money was wasted, roughly £8.9 million covering costs for staff, advisers, lawyers and two reviews and £40 million being reimbursed to the firms for the costs of their bids.
In 2020, due to a ‘technical glitch’, lab test results were transferred to an old excel file format (XLS) which led to roughly 16,000 Covid-19 cases being missed from the daily figures between 25th September and 2nd October. This resulted in a delay in Test and Trace contacting close contacts of patients, further spreading the virus, and impacting more lives and damaging the government’s reputation. The article goes into further depth on the consequences of the error and the impact on analytical results the change had.
8.2.2 WHAT should we do about it?
Following the incident of the InterCity West Coast franchise competition in 2012, the importance of analysis in major government projects and the consequences when done incorrectly were put in the spotlight, which led to the commission of the Review of quality assurance of government analytical models. It identified significant variation in the type and nature of quality assurance used within and between departments. The review further details the different components of best practice in quality assurance.
From the review, the cross-government working group on analytical quality assurance was established to identify and share best practice across government. The key product of this group is the Aqua book, which draws together existing practices from departments and best practices from analysts across a variety of analytical professions within government. The purpose of the book is to provide further advice rather than making specific or binding recommendations.
Alongside the Aqua book, there is a wide selection of resources within the department and across government.
This guidance will focus primarily on best practices for conducting quality assurance for code.
Before understanding how to QA analysis and assess if it’s of good quality, we need to first understand what good analysis is. Analysis is considered high quality if it applies the principles of ‘RIGOUR’, which address the key aspects of verification and validation, throughout the duration of the project.
To do this, the analyst (and later, the quality assurer) should be asking questions about the analysis that satisfy:
R: Reproducible
If an approach is repeated with the same inputs, the results should remain unchanged
I: Independent
Analysis should be free of any prejudice or bias, with an appropriate balance of views across stakeholders and experts
G: Grounded in reality
Connections should be made between the analysis and its consequences
O: Objective
Effective engagement and suitable challenge reduces potential bias and enables the commissioner and the analyst to be clear about the interpretation of the analytical results
U: Uncertainty is managed and understood
Uncertainty should be identified, managed and communicated throughout the analytical process
R: Robust
Analytical results should be provided in the context of the uncertainty and limitations to ensure it is used appropriately
It is important to accept that uncertainty is inherent within the inputs and outputs of any piece of analysis, therefore it is just as important to establish how much we can rely upon the analysis for a given problem. Quality assurance ensures that analysis addresses each part of the above mnemonic to a satisfactory level.
8.2.3 HOW should we carry out QA?
8.2.3.1 Proportionality
Analysis can range from reading key figures from a table to building highly technical business critical models, therefore it is logical that there is no ‘one-size that fits all’ approach to quality assurance. The extent of quality assurance effort should be proportionate to the risks associated with the intended use of the analysis. These risks include financial, legal, operational, and reputational impacts.
Analysis that is frequently used to support a decision-making process may require a more comprehensive analytical quality assurance response. Proportionality applies still when dealing with ad-hoc quality assurance as well as the amount of documentation required throughout the process.
The likelihood of an error occurring is dependent on the complexity of the analysis and level of precision expected of the model’s outputs, whereas the impact depends on the importance of the decision (risks in relation to people, operations, reputation, legal, financial, and political) and how central the analysis is to the decision-making process.
While there is no such thing as too much QA, the amount completed should be reasonable with consideration of the resources available and the level of risk associated with the decision the analysis is supporting. Too little quality assurance will result in more instances like the two examples mentioned above.
Despite QA commonly being conducted throughout the lifespan of a project, there will be instances when urgent ad-hoc analysis is requested. These requests include:
- PQs
- Briefing for meetings using existing data
- Operational crisis response
- Additional analysis for Agencies
- Work delivered by our partners to review policies as they are implemented
- Press office requests
- To support Govt Budget announcements / spending review decisions.
The quick turnaround of these ad-hocs makes it difficult for thorough QA to be applied, but it is even more essential for QA to be completed for these requests due to the impact of the analysis. It is easy to get overwhelmed by the urgency of the request and take immediate action, but taking the time to properly understand the ask can improve the efficiency of the method used to tackle the task. The four questions below should be answered to get an overview of the situation:
- What is being asked for?
- Why is it needed?
- How hard is the deadline?
- Is there anyone around to help, advise or support?
Answering these questions can streamline and highlight the key next steps, but consideration needs to be given to:
- The areas of the analysis that are likely to have the largest impact on the output and that are associated with the greater risk. These areas should be the focus of the verification and validation efforts. The construction of verification checks into the code will make the QA considerably quicker.
- Communication of appropriate caveats outlining what has and has not been verified or validated, along with a practical interpretation of the associated risk
- If time allows, further assurance activities performed after the event so as to capture lessons learnt and identify the necessity of saving the code (if it is requested frequently)
8.2.3.2 Continuity
Quality assurance considerations should be considered throughout the life cycle of the analysis and not just at the end. Effective communication is crucial when understanding the problem, designing the analytical approach, conducting the analysis, and relaying the outputs.
8.2.3.3 Documentation
Documentation on the quality assurance of a project needs to be created and maintained throughout the life cycle of the project. The amount of documentation completed should be proportional to the level of risk in the project, much like the quantity of quality assurance required, the greater the risk, the greater the amount documentation that will be required.
8.2.3.4 Verification and Validation
Analytical quality assurance is checking that the analysis is error-free, appropriate and satisfies its specifications.
Verification and validation are often mistaken for each other but have very distinct meanings and implications relating to QA. Analysis must satisfy both checks, so it is essential that quality assurance contains a combination of questions which relate to both verification and validation checks.
Validation | Verification |
---|---|
Is the right thing being done? | Is the thing being done right? |
Example: Is the mean average the right measure to rely on for this specific question? | Example: Has the mean average been calculated correctly? |
Key output: a judgement about whether something is fit for purpose | Key output: a judgement about whether the work has been done correctly |
|
|
The majority of QA checks fall under verification, with validation checks usually occurring near the beginning of a project’s life cycle.
Verification and validation can be carried out by different people. Someone might have the skills and expertise to validate the approach (ensuring that the right approach is used) and someone else might have more expertise in checking that the code/analysis is actually doing what it is supposed to. The remainder of this chapter will expand on the best practices when quality assuring code, with a greater focus on verification than validation.
8.3 How to write QA-able code?
Contrary to common belief, quality assurance responsibilities do not exclusively sit with the person conducting the QA. The code developer is equally responsible for QA.
Before asking for a piece of code to be QAed by someone else, the code developer must ensure that:
- The code is written in a way that is easy to QA (coding best practices)
- The code runs smoothly with no errors (debugging and automated checks)
- The code is fit for purpose (validation)
- The QAer has access to appropriate and up to date documentation
- The QAer can understand what they are trying to do, and why they are doing it (verification)
- The QAer knows how to get started to run the code (structure)
- The QAer has been given access to the data required to run the code (access management)
- The QAer has the latest version of the code (version control)
Imagine having to review a report. Have a look at the reports below. Which report below would you rather review? It’s the same for code! If your code is clean and follows coding best practice, your QAer will have more chances of spotting issues and actually performing good QA.
The remainder of this section provides tips about what, as a code developer, you need to do to ensure that your code can be quality assured.
8.3.1 Applying coding best practice
Chapter 5 of this book covers coding best practice. The necessity of applying coding best practice becomes even more apparent when it comes to the QA stage, because another person comes into the equation. If the QAer sees a structure and style which is familiar to them, the QA will be easier, and more importantly, more efficient and useful.
8.3.1.1 Writing your code
You can check out the DfT R cookbook for more tips on how to write good quality code.
Here are a few key bits of good practice that are important to make QA easier:
- Write modular code (functions) instead of repeating chunks of code. This means that the chunk of code only needs to be QAed once. Often, when code is repeated, small errors or typos might creep up and could be missed by the QAer because they assume the chunk of code is the same as the previous one.
- Use informative and consistent names for objects. This means it will be much easier for the QAer to understand the flow of the code and the purpose of each object. If very generic names such as ‘data’ or ‘df’ are used then it will take the QAer a lot more effort to work out what the object is, which can be a risk for the quality of the QA itself.
- Include project documentation, like a readme file and comments throughout your code. Similarly to the above, good documentation will avoid QA fatigue and the QAer will have to spend less time working out what the code is about and more time actually checking that things have been done right
8.3.1.2 Structuring your code
The QAer should be able to know where to start and where to find inputs, outputs and source code. Refer to this book’s section on how to structure your code well.
- Separate inputs, outputs, process into different folders/scripts. Separate your code from your data. Do not store data on GitHub.
- Consider numbering your scripts, so the QAer knows which one to look at first, or make sure it is obvious (from a readme file for example)
- Break up your code into distinct tasks to make it easy to navigate. Use functions for tasks, especially where you repeat them (e.g. graphs)
- Keep all your functions in a separate script
- Set hard coded values as parameters at the top of a script or inside a configuration script. Do not hardcode any parameters in the depths of your code: Example: define chosen_start_year <- 2022 at the beginning of your code, then reference chosen_start_year when you need to filter based on this year
- Reference the package for your function: dplyr::filter()
- Remove unused code
8.3.1.3 Formatting your code
Tidy code is much easier and quicker to understand than messy code. Ensuring that your code is visually formatted helps your QAer (and yourself).
Following known style guidelines consistency and thus facilitates collaboration.
- R style guide: The tidyverse style guide
- Python style guide: Style Guide for Python Code
- Separate out different chunks of code with blank spaces
- Make use of sections to help split up your code
- Use indentation to structure a chunk of code
- Stay within the reading frame/split over several lines, otherwise the QAer might miss something (break lines at c.80 characters to ensure no left-right scrolling is required)
- Keep to one meaningful task per line
- Keep linked manipulations together but add a paragraph when starting on the next distinct manipulation.
8.3.1.4 Documenting your code
Comments are a QAer’s best friend. Useful comments explain the why rather than the what or how. This allows the QAer to check that the code you wrote matches your intention.
Refer to this book’s section on how to document your code well.
‘Bad’ example
This comment (“Only keep the 2015 data for the first projection year”) is unhelpful for a range of reasons:
- Why do we want to remove data?
- The filtering does not happen on the year 2015 but on the scenario ‘Base_Year’. The comment does not describe how these are linked.
- The comment does not explain that 2015 is the base year from which all projection year pivot off. By keeping 2015 in every projection year we artificially increase values for 2015 once we perform mathematical operations on the data.
‘Better’ example
This comment (“Every run contains the base year data (2015). We only want to keep one instance. Therefore, remove Base_Year for all years but the first projection year (i==1).”) is more helpful:
- States 2015 data relates to the baseline from which the model pivots off and that every projection year contains the baseline data.
- States that we want to keep one instance of the base year data
- Explain the purpose of your code. Paraphrasing your code isn’t helpful. What’s important is to state the intention of your code, rather than name the function you are using.
- Ensure the code can be read by someone else (e.g., make sure to write out an acronym for example)
- Write your comments in plain English, not in coding language. This allows the QAer to check the code is a good reflection of your intentions.
- Make use of errors, warnings, and messages inside of functions, which provide the user information when something goes wrong. (This is particularly useful when complex data manipulation is performed inside a function. It can often be difficult to see when or how errors arise.)
8.3.1.5 Using version control
Using version control, such as GitHub, will ensure that the QAer is looking at the latest version of the code and is also able to document any comments they may have. See section 8.5 on how to use GitHub for QA.
8.3.2 Checking your code
The QAer is not the only person who should be testing your code. When developing your code, you have the opportunity to create your own automated tests and checks. This will both increase your own trust in your code and outputs, and it will also reduce the burden on QAers.
There are two types of checks you can perform:
- Hard checks – does the code run, do the numbers match an exact value etc. These will be a pass or fail check
- Soft checks – do the outputs make sense. This will be more of a sense checking exercise, in light of the context around the analysis/data, and requires background knowledge
8.3.2.1 Debugging your code
Debugging is the process of cleaning code from errors to ensure it runs successfully.
The simplest method to debug code is to use print statements to view the values of variables which are potentially causing the errors/bugs. This, combined with commenting out lines of code can enable the coder to pinpoint where within the code the bug is appearing. The advantage of using this method is that it is compatible with all programming languages, however it is a long process if unsure of where within the code the bug is (especially for longer scripts).
A more sophisticated way to debug is to make use of built in debugging tools. Debugging tools are built into most code developing environments and are programmed to assist in the detection and correction of syntax errors within the code. The most common tool is to create breakpoints in the code.
Breakpoints are places where the code will pause and allow the developer to then step through the code line by line. Stepping through the code enables the developer to examine the variables at each step and follow the changes made to determine what went wrong. Breakpoints can be place anywhere within the code but are a more advanced method of debugging.
In R, there are two ways to launch the debugger:
- Breakpoints – click to the left of the line number, this should result in a red circle appearing to the left of the number. It can be removed by pressing the red circle. These are not part of the code so there is no worry about accidentally checking them into a version control system.
- Add a call to “browser” in the source code – this is the standard R way of launching the interactive debugger. As it is part of the code, it can be made conditional by combining it with an if statement.
The debugger has been successfully launched when a toolbar appears at the top of the console and should look like below:
The toolbar is a user-friendly way of recalling the debugging commands that are now available. These commands are also available outside of Rstudio; they require one letter commands to activate them. The three most useful commands include:
- Next (press n): executes the next step in the function.
Note: if there is a variable named n in the code, print(n) will need to be used to display the variable’s value. - Step into (press s): works like next, but if the next step is a function, it will step into the function so it can be explored interactively
- Finish (press f): finishes execution of the current loop or function
- Continue (press c): leaves interactive debugging and continues regular execution of the function. (This is useful if you’ve fixed the bad state and want to check that the function proceeds correctly).
- Stop (press q): stops debugging, terminates the function and returns to the global workspace. This would be used once the location of the problem has been identified and can be fixed before reloading the code.
Additional helpful functions to use alongside the debugger tool when writing/testing code include:
- Traceback() – this displays the code leading up to any errors along with a brief description of what is causing the error to occur
- Print() – displays the value of the variable or string entered. Can be inserted within code, a message is printed to demonstrate when a section of the code is run
- Str() – prints the detailed structure of any object. This is useful if you need to double check you have the type of object that you expect
8.3.2.2 Automated QA checks and reports
Within your scripts you can build in automated QA checks, for both hard checks and soft checks. For example, you can write code to automatically check (hard checks):
- how your outputs compare to the raw/historical/published data
- if totals match up to expected value
- the number of rows is as expected
- if any missing values have crept in
You can create validation charts (soft checks). These could help checking for outliers or changes in your input data, such as unit changes. However, just because something looks right does not necessarily mean it is correct. Here is an example chart that compares the current month’s data with last month’s data.
In an ideal world you would have a check for every manipulation you do, but that is quite unrealistic. Instead try to have a series of checks after each step of your process. Having automated checks is great, but making sure you interpret the results of these checks is key! You don’t want to have to scroll through all your code outputs to see if a check has passed. You want it to be obvious if tests don’t pass.
You can create automated QA reports which summarises the results of all your checks.
A clear and auditable way to carry out QA checks on data is to produce simple QA reports in RMarkdown. Checks on the data are written in the form of tables, charts, etc., and these are built into a report that you can look at and store for future reference.
The reports produced can be rendered in HTML format which is easy to read in multiple formats and can’t be modified. These reports can include:
- Simple yes/no checks with verbal responses
- Returning only specific (i.e. unexpected or unusual) values in a dataset for further checking
- Returning charts of data to visually inspect for anomalies
While some hard checks can be fully automated, some other, ‘softer’ checks will require human input to assess if the trends displayed are logical (i.e. what was the trend over the past 5 years and does this seems sensible).
Reports can be rendered as part of a larger automated project by using the rmarkdown::render() function. Using the output_file argument in the render allows you to specify a dynamic file name for the output report. This allows you to produce a unique report named for the date or time it was run, producing an auditable trail of QA reporting.
Additional arguments within the render call also allow you to specify:
- The output directory, allowing you to output to a different folder
- Output format: can specify different and multiple output types
- Pass parameters to the knitted rmarkdown; this is ideal to run the same report multiple times with only small changes (i.e. filename of data to check)
Examples of QA reports generated in R:
- Transport usage monthly statistics: Input data QA report (click to download) and the Post processing QA report (click to download)
- Maritime fleet quarterly publication: Fleet data checks (click to download)
8.3.2.3 Unit testing
As a code developer, you are frequently testing your code, when running data through your code or debugging your code to find out where the code is not working. It’s important to document these checks, writing unit tests for your code is a good way to do this and provides a structured approach for your testing. A unit test takes a chunk of code (often a function) and tests it using example data, to ensure it behaves as expected and doesn’t produce unexpected results now or in the future. Unit tests will generally be written in the form of a functions themselves where the output is a pass or fail message.
Often, code developers, especially beginners write unit tests without realising it, simply by checking the outputs and data structures are as expected when writing new code. Writing unit tests allows you to automate that process.
In an ideal world, as a code developer, you would want to test every single line of code or manipulation you do. But this is quite unrealistic and time consuming.
Instead, a good rule of thumb is to write unit tests when you think the risk of a chunk of code failing is high – either because it’s likely to fail (or has failed before), or because if it fails, it would have a big impact on the overall process.
Often, unit tests are written after the developer noticed errors in the code/outputs. It’s best and try to anticipate where the code might fail, but it’s not always possible. We are human after all!
Writing unit tests is also a way to break up your code, so that when the code fails, you are able to isolate the issue and track it back to exactly the source of the issue.
Some questions to consider when developing unit tests for your code include:
- How complex is this section of code?
- Is this section of code/function regularly used?
- Are you often making changes to this section of code?
- Are there regular errors from the code?
And examples of unit tests include:
- Checking numbers are within a range (e.g. positive, negative)
- Checking the class of input parameters (e.g. real number, integer)
- Checking the size/shape of the data returned
- Checking an output matches the expected value
In R, you can write unit tests with the testthat package, which formalises the process and allows those tests to be run repeatedly. It’s essentially a library of existing tests, which can be applied to a wide range of cases.
The testthat package enables you to create tests for each function present within your code. Each function within the code will get its own test file, which will contain all the tests to be conducted on the function its linked to.
Tests for each function can be run individually while the code is still in development and can be run altogether at the end of the development of the code as well. The package allows testing for:
- Valid inputs
- Invalid inputs
- Errors, exceptions, and events
- Everything that had the potential to break
Once the tests have been run, a summary table will be provided at the end which informs the user of how many tests were failed, skipped, or passed. Test failures also provide a brief description of the test and the reason for failure. The overarching goal is to develop the code until all the unit tests are passed.
Here is an example of some unit tests to check the function is working as expected using the testthat package:
The expect_equal function allows you to compare two objects for equality, to check that your results are as expected, and expect_output allows you to check outputs are as expected.
If you think you might need to create a large number of unit tests or have many files to test, you can also use testthat with the usethis package, to help you you organise your unit tests into folders. This allows you to create a clear set-up for your unit tests.
This creates the following file structure to store your unit tests:
Running this code will automatically create testthat.R files for your unit testing, you can then set up your unit tests in the relevant files. It also allows you to create your own messages for your unit tests in the file. For the example shown above, the file created would look as follows:
8.3.2.4 Actions following a failed check
What should I do if my checks and tests fail? When you write a check or a test, which will output a fail or pass outcome, you may want your code to react to this outcome. Otherwise, you may have to wait for the whole process to run before noticing a failed check. For example, you can write your own error/warning messages, to either appear in the console and/or QA report. There are different types of messages:
message(): This is purely for information purposes. You can compare this to a print() statement in you script based coding.
warning(): Something unusual is happening that you (might) want to know about, but you want the rest of your code to continue running. This is a common warning message you have likely seen:
x <- as.numeric(c("1", "2", "X"))
Warning message:
NAs introduced by coercion
- stop(): Your code is unable to continue. This is referred to as an error message.
In this example the information is printed to the console. You may also want to add these to a QA report to save alongside your data.
Messages: Lines 22 and 26 output information to the analyst how many times the function will be executed and provides information how far along the code is. This is an example output. The output is written as the function progresses:
The first check on line 18 looks at whether the input vectors are consistent. If this is not the case, e.g. you have provided more scenario names than file paths, the whole execution stops. If your inputs are wrong, this is what appears in your console:
The function in line 32 checks whether a file exists before attempting to read it in. If your scenario points to a file that does not exist (e.g. there could be a typo in the file path, the file was moved, etc.), you do not want the whole function to stop but you want to move onto the next loop. Therefore, we write a warning message telling us that the function is skipping that scenario. If there is an error with the provided file path, this is what happens:
The function called on line 32 is shown below:
8.3.2.5 CI/CD
CI/CD, which stands for Continuous Integration/Continuous Delivery (or Continuous Deployment), is a more advanced concept primarily used in software development to ensure that code changes are tested, integrated, and deployed smoothly and efficiently.
- Continuous Integration (CI):
In data analysis, CI involves automatically integrating new code or changes made to analysis scripts into a shared repository on a regular basis, such as multiple times a day. This integration process triggers automated tests to ensure that the data analysis code still functions correctly and produces accurate results. The goal is to detect and fix integration issues early, preventing them from causing problems later in the analysis process.
- Continuous Delivery (CD):
Continuous Delivery in the context of data analysis involves automating the process of deploying analysis results or artifacts to relevant stakeholders or systems. Once the analysis code passes all tests and integration checks, it is automatically deployed to a production-like environment where it can be further validated or used for decision-making. Continuous Delivery ensures that the results of data analysis are readily available, up-to-date, and reliable for decision-makers.
- Quality Assurance (QA):
QA in the context of CI/CD for data analysis involves implementing automated tests to validate the correctness, completeness, and consistency of data analysis code and results. These tests may include checks for data accuracy, performance, data quality, and adherence to analysis requirements.
By incorporating QA into the CI/CD pipeline, data analysts can identify and address issues early in the development process, improving the overall quality and reliability of the analysis.
Implementing CI/CD
CI/CD is outside the scope of this guidance, as it is required a certain technical expertise and experience. If you want to know more about how to set up CI/CD in GitHub with Actions, for example, you can read:
8.3.3 Choosing the right time for QA
It is the code developer’s responsibility to submit their code for quality assurance at the right time in the development process.
Planning when your code will be quality assured and allowing enough time and resources for QA are almost as crucial as the code development itself.
Generally, it is recommended that you submit your code for QA after developing a new feature or fixing a bug (if using git, before your merge into your base branch).
If you are developing a new pipeline altogether, try to plan and schedule QA as part of the development plan, to avoid a very large amount of QA needing to be done at the end of the pipeline development.
It is the code developer’s responsibility to submit their code in a state which is easy to QA. Don’t wait for multiple features or bug fixes to get it QAd. Keep the QA quick and achievable for the QAer. QAing increments is easier than QAing a whole script or process in one go, and you are more likely to get the QAers full attention and decrease the risk of errors being missed.
If the changes you are developing are substantial, try to divide it up into multiple self-contained parts. Breaking up code into those separate sections means you have a specific request for the QAer which reduces their time burden. If using git, either split them up into multiple pull requests, or commits within the same pull request. See section 8.5 for more about how to use GitHub for QA.
Ensure you self-review your code first and ensure it runs smoothly, without errors and passes any tests or automated checks you may have before passing it on to your QAer.
8.3.4 Choosing the right person for QA
Another key step before code is ready for QA is for the developer to find the right person for the job. There are a few things to consider and balance out when deciding who should QA code:
- Knowledge and understanding of the topic area
- Technical coding skills
- Independence
Knowledge and understanding of the topic area
For the QA to be efficient and productive, the QAer should have a good understanding of the topic area and have some knowledge of the data and typical challenges the data poses and their consequences. QAers who possess knowledge of the topic area can provide valuable insights into the context, nuances, and intricacies of the data being analysed. Their familiarity with relevant theories, methodologies, and standards enables them to assess the validity of assumptions, the appropriateness of statistical techniques, and the interpretation of results more accurately. Additionally, domain experts can identify potential pitfalls, or biases that may affect the analysis outcomes, thus contributing to the overall reliability and trustworthiness of the findings.
Technical coding skills
Alongside domain expertise, possessing strong technical skills is equally important for reviewers in data analysis. Technical proficiency enables reviewers to thoroughly evaluate the code and analysis techniques employed, ensuring that they adhere to best practices and standards in data manipulation, modelling, and programming languages. Reviewers with robust technical skills can identify coding errors, inefficiencies, or suboptimal implementations that might compromise the accuracy or efficiency of the analysis. But remember, everyone’s code needs to be QAd, seniority is no exception. It doesn’t matter how senior or skilled you are, errors can and will always creep up.
Independence
For the QA to be reliable and robust, getting the code quality assured by someone who was not involved in developing the code (independent QA) is critical for ensuring accuracy, validity, and reliability of findings. It allows for a fresh set of eyes to examine the data processing steps, algorithms, methods used, thereby reducing the likelihood of errors or biases going unnoticed. Independent QAers can challenge assumptions, verify calculations, and suggest alternative approaches, which ultimately leads to more robust analyses and trustworthy conclusions. Moreover, this process promotes transparency and reproducibility, as well as encourages best practices in data management and analysis.
Finding the right person for QA is about finding the right balance between expertise and independence. In some cases, involving more than one people in the QA process allows to find that right balance.
8.4 How to QA someone else’s code?
When you are asked to QA someone else’s code, you are taking on a level of responsibility regarding the impact of the analysis. This does not take the responsibility off the code developer, however. It’s important to work with the code developer and collaborate to ensure the right level of QA is undertaken.
When you QA code, you will essentially be questioning the code you are seeing. In this section, we set out a series of questions you may want to ask yourself when QAing code.
8.4.1 What questions should I ask myself as a QAer?
8.4.1.1 Getting ready for QA
Am I the right person for QA ?
- Have I previously coded in this language?
- Do I have some experience with the packages being used?
- Do I have a good understanding of the topic area and some background knowledge in the data?
- Do I have some experience with the methodology being used or with something similar?
- You don’t need to be overly familiar for verification (i.e. has the thing been done right).
- For validation (i.e. has the right thing been done) you will need to be able to reflect on the methodology.
- You don’t need to be overly familiar for verification (i.e. has the thing been done right).
- Am I interested and do I have capacity?
Am I ready to QA?
- Am I clear about the objectives?
- What is the purpose of this project/analysis (impact assessment, publication, PQ etc)?
- What are the expected inputs?
- What are the expected outputs?
- Do I know what part of the code I need to check?
- Do I have access to the code/repository?
- Do I know how I should record the outputs of my QA? (QA log, github etc)
- Do I have the right permissions to get access to the data
- Do I have the correct version of the code?
- If working on Github, make sure you are on the right branch and have pulled the latest version of the code
- Is the code sufficiently documented for me to understand it?
- Is my environment set up to run the code and step through it myself?
- If you are intending to run the code yourself, make sure that your environment is correctly setup and that any pre-requisites (packages) are installed.
- Do I have access to the correct data?
- Do you have suitable input data to check the code? This may be a sample rather than a full dataset, but you should check that that you have something in the correct format.
Here are a few things to consider when QAing code and feeling confident to do so.
Don’t be afraid to ask questions. Even if you think the question is basic, it is crucial you understand the code and what it is trying to do to QA the code effectively. There are no silly questions when it comes to QA.
Don’t be afraid to challenge the approach and suggest an alternative. Even if you are QAing code which was developed by someone who is more senior than you or has more coding experience. Seniority does not mean error proof.
Ask yourself if you are the right person to do the QA.
- Be confident to tell someone if you are not the best person for this, don’t have the right skills or understanding of the topic area.
- You may be able to do one of code validation or code verification if you don’t feel you have the skill set for both:
- I may be new to the area and cannot challenge the methodology, but I am an R wizard and thus a great verifier (checking the code is right)
- I am very familiar with the topic area and can review the documentation and rationale behind the code (checking the code is doing the right thing/right approach) but I cannot comment on whether the implemented code matches the comments.
8.4.1.2 Checking the code
This section of the guidance provides some prompts and questions to think about what you should be checking and how.
Risk and proportionality
Before getting into the detail of the code, to help you understand the amount of checks you should perform, you can ask the following questions:
- What are the risks associated with a potential error in the code? Example: Will this affect the outcome of an impact assessment? Will this be published as Accredited Official Statistics?
- What impact do the changes or updates have on the overall process? Does this change the final numbers by a substantial amount?
- Is the amount of embedded testing (automated checks, unit tests) proportionate given the risk to the analysis if the code does not work?
- What is the extent of the changes you need to QA? Has much changed since you last QAd the code?
Best practice
By having a quick overview of the code, you can easily assess how best practices have been implemented. This will allow you to assess how easy or hard it will be to understand the code and do a QA.
- Does the code follow coding best practice? For example, names used in the code are informative and concise.
- Can I easily understand what the code does?
- Is it properly laid out and formatted?
- Is the code sufficiently documented for me to understand it?
- Is it well structured?
- Are the functions well documented? (inputs, outputs etc)
Reproducibility
One of the key advantages of choosing a code based approach for a project is to ensure reproducibility. This means the first thing to check is whether, as a QAer, you are able to reproduce outputs.
- Can I generate the same outputs that the analysis claims to produce?
- Are high level parameters kept in dedicated configuration files? Or would somebody need to work their way through the code with lots of manual edits to reconfigure for a new run?
- Have dependencies been sufficiently documented?
- Is the environment reproducible? Can I quickly get to a state where I have all the libraries, data etc to run the code as intended?
Methodology – validation
- Does the methodology make logical sense?
- How would you have written code to have achieved the objectives? Would you obtain the same results?
- Have all possible scenarios been taken into account?
Error proofing and future proofing
A key part of QAing code is not only to check if the code works now, but also if it will still run smoothly in the future, with a different set of parameters and circumstances. Trying to ‘break’ the code, is part of QAing code.
- Have all automated QA checks and unit tests passed?
- Are there any warnings or errors?
- How does the code respond to unexpected and erroneous input data?
- How are errors handled?
- Do you get clear warnings and error messages when something is incorrect?
- How easy will it be to alter this code when requirements change? Is the code flexible enough?
Efficiency
Another advantage of coding based approaches is reducing manual intervention and having quick and efficient processes.
- Is there duplication in the code that could be simplified by refactoring into functions?
- Does the whole process run relatively quickly? Can anything be done to improve performance?
- Are functions simple, using few parameters?
- Does the code use the most efficient functions and packages?
Version control
- Is the code version controlled using Git?
- Is any sensitive data, or input/output data saved in the repository?
- Has code been committed regularly?
- Are commit messages helpful and informative?
Common functions | QA tip |
---|---|
choose specific columns from a data frame R: dplyr::select() SQL: select … from … |
verify that the resulting data frame contains only the selected columns and retains the same number of rows as the original data frame |
subset rows based on specified conditions R: dplyr::filter() SQL: select … from … where |
Check that the number of rows in the resulting data frame matches your expectations based on the filtering criteria applied. In R, check carefully when there are things like multiple ! (not) statements, or when == is used instead of %in% |
create new columns or modify existing ones R: dplyr::mutate() SQL: alter table … (add, drop, rename, alter) column |
Verify that the new columns are created or existing columns are modified as intended. Check a few rows to ensure the calculation or transformation is correct. In R, check that data is grouped/ungrouped as expected, or mutate can cause unexpected results! |
reorder rows based on column values R: dplyr::arrange() SQL: select … from … order by … |
confirm that the rows are sorted in the desired order. Check a few rows at the beginning and end of the data frame. In R, NA values will always appear at the end of the dataframe, so check that NA is a value you are expecting |
group rows based on one or more variables. R: dplyr::group_by() SQL: select … from … group by … |
verify that subsequent summarization or aggregation functions are applied correctly within each group. |
to compute summary statistics within groups R: dplyr::summarise() SQL: select (summary function(…)) from … |
Check that the summary statistics (e.g., mean, median, count) are calculated accurately for a few group. In R, any columns not grouped/summarised on will disappear, check that these aren’t needed later on! |
to merge data frames based on common keys. R: dplyr::left_join(), dplyr::inner_join(), dplyr::right_join(), dplyr::full_join() SQL: select … from … (inner, left, right, full, self) join … on |
For any join operation, ensure that the resulting data frame has the expected number of rows and columns. For example, after a left join, confirm that all rows from the left data frame are retained. |
To remove duplicate rows from a data frame. R: dplyr::distinct() SQL: select distinct … from … |
This only removes exact duplicates, so probably check there are no near-duplicates (e.g. whitespace or capitalisation) |
to change the names of variables in a data frame. R: dplyr::rename() SQL: alter table … rename column … to … |
Check that the variable names are updated as expected after using rename(). Ensure that there are no duplicate variable names. |
To extract a single column as a vector. R: pull SQL: select … from … |
Confirm that the extracted vector contains the values from the specified column and matches your expectations. Check it’s done by name and not column number to prevent unexpected columns being pulled |
To reshape a dataframe from long to wide/wide to long R: tidyr::pivot_wider() or tidyr::pivot_longer() SQL: select … from … pivot(… for … in …) as … /unpivot |
Check that it hasn’t created a large number of unexpected NA values. Check that the right columns are included/excluded; if you miss a unique variable you can end up with a very strange shaped data set! |
to read data from CSV files into data frames. R: read_csv() SQL: load data into … from files (…) |
Check that the data is imported correctly, including verifying the column names, data types, and any special characters. |
to write data frames to CSV files. R: write_csv() SQL: export data options (…) as … |
confirm that the resulting file contains the expected information and can be read back into R without errors. Check if it’s overwriting other data produced earlier, and if that’s the intended effect? |
to combine data frames by row R: bind_rows SQL: insert into … values … |
Ensure that the dimensions of the resulting combined data frame match your expectations and that the data is combined correctly. |
Tidylog is a package that provides feedback on dplyr and tidyr operations, reporting detailed breakdowns of what each section of pipelines are doing, and how it impacts the data frame it is being applied to. This can be very helpful when QAing someone’s code.
The tidylog outputs can be displayed as a message onscreen or be saved to a file. All the functions that tidylog can provide feedback about are listed at the bottom of the documentation: tidylog package - RDocumentation.
If you are QAing some code which contains functions but no unit tests, here are some tips on how to verify the function is doing what it should.
Test with Small Inputs: Start by testing the function with small, simple inputs that you can manually verify the correctness of. This allows you to quickly spot any obvious errors or unexpected behaviour.
Test with wrong inputs: for example, provide a negative numbers when it’s supposed to only deal with positive numbers, or text instead of numeric and see how the function reacts and deals with wrong input.
Test with Edge Cases: Test the function with edge cases and boundary conditions to ensure it handles unusual or extreme inputs correctly. For example, test with empty vectors, single-element vectors, or the minimum and maximum allowed values.
Compare Output with Expected Results: After calling the function, compare the output with the expected results for a variety of input cases.
Visualise Results: If the function produces graphical output, visually inspect the plots or graphs to ensure they look as expected.
Check Intermediate Results: If the function produces intermediate results or modifies data structures, check these intermediate steps to ensure they are correct. You can use print() or str() to examine intermediate objects.
Debugging: Use debugging tools like browser, debug, or trace() to step through the function’s execution and inspect variable values at different points. This can help identify the source of errors or unexpected behaviour.
Corner Case Testing: Test the function with corner cases where multiple edge conditions overlap or interact. This helps ensure that your function handles complex scenarios correctly.
Randomised Testing: If applicable, use randomized testing techniques to test the function with a large number of randomly generated inputs. This can help uncover corner cases and edge conditions that you may not have considered.
8.4.1.3 Checking the outputs
Once you have checked the code itself, you also need to check the outputs created through the process, such as charts, tables, reports etc.
Examples of things to look out for include:
- Have all the outputs been produced (all the tables, html reports etc)? Specifically, new outputs and not outputs from previous runs
- Do the charts and tables show all the values needed
- Has the excel file outputted properly and been saved in the right place
- Does the table include the latest data available
- Does the data in the excel file match the data in the chart
- Are the right values showing up in a Rmarkdown/html document
- Is any automated commentary still relevant, based on the update
8.5 Using Git and GitHub for QA
All you need to know about Git and GitHub can be found in The Big Book of Git. The section below focuses on how to best use Git and GitHub for QA purposes. Using GitHub for quality assurance offers numerous benefits that streamline the development process and enhance the overall quality of your analysis/outputs.
8.5.1 Why is GitHub good for QA?
GitHub is the best way to collaborate on a coding project. Hosting your code in github means that you can easily ask for someone else to QA your code (by just tagging them!). You just need to make sure they can access the repository and then they will be able to see the code, see what you have changed/updated/added, and run the code themselves in order to carry out a QA exercise. GitHub features also mean that you can keep a good audit trail of the checks, comments, and approvals provided throughout the QA process.
8.5.2 How to use GitHub for QA?
GitHub allows to implement good workflow processes to develop and review code, otherwise known as Gitflow. Information on branching and workflow is detailed in Chapter 4 Gitflow and improving your use of Github | The Big Book of Git.
8.5.2.1 GitHub good practice
Under Gitflow good practices, a repository would typically have a main and a dev branch. A coder would typically develop new code or fix existing code in a feature or hotfix branch and then want to merge their update or change to the dev branch.
Is it best practice to narrow the scope of a feature branch. For example, you may want to add a new chart to a report and also add some extra analysis using another dataset. You should do these updates in separate branches. This is best practice in terms of version control but will also facilitate the QA process. The QAer can then focus on one branch and feature at a time, and will be able to understand your change more easily.
Committing your changes often and writing good commit messages also facilitates QA. By committing regularly, after small changes, you can keep a good record of the steps you took to update your code and it will help the QAer understand the reason for each change you made. A commit message is used to describe the type of change you are committing. The information in commit messages can be as important as the comments in the code itself to help with QA.
Keep it short, informative and standardised in structure: A commit message has one purpose, to answer the question “why on earth did you make this change”. 1-2 sentences is plenty, and there’s no need to paraphrase the code you wrote. Aim for the why, not the how.
Example of a bad commit messages
Fixed some stuff
Example of a good commit message
Add user authentication feature
- Implemented user login and registration functionality
- Used bcrypt for password hashing
- Added authentication middleware for protected routes
- Updated README.md with setup instructions
8.5.2.2 Pull requests
In order to merge their new feature to the dev branch, the developer would create a pull request, and assign a reviewer. This is when the opportunity for QA appears.
Pull requests are a very user-friendly way to QA code, because it tracks and flags exactly what lines of code have been changed. You can also look at changes commit by commit, rather than looking at all the changes in that branch. That is why committing regularly is important, as a developer.
Within a pull request you are also able , as a QAer, to add comments on lines of code (without editing the code directly) and keep a record of the conversation you are having with the code developer.
When assigning a QAer (a reviewer) in your pull request, it’s best practice to only assign 1 person at a time. If you assign more than one person, there might be some confusion about who QAs what, and some code might slip through the cracks. All you need to know about pull requests can be found in Chapter 3 Introduction to Github | The Big Book of Git
8.5.2.3 Protecting your repository
You can protect your repository by adding rules and policies to branches. For example, you can add a rule which means that a branch cannot be merged into main or dev without being approved through a pull request. This means that a code developer will always have to get their code QAed before merging. Find more information about how to manage a branch protection rule.
8.6 Other resources
- The Aqua Book: guidance on producing quality analysis
- Government Analysis Function guidance
- Quality assurance of code for analysis and research (Duck Book)
- QA month recordings and material - Coming soon!
- Session 1: Principles of QA
- Session 2: QA Practices at Google
- Session 3: QA at MoJ
- Session 4: Code QA Guidance launch
- Session 5: Using testthat in R
- Session 6: Quiak! doing QA quick
- Session 7: QA of a fast indicator publication
- Session 8: Bus data QA - no recording due to sensitive data
- Session 9: Using AI for QA
- Session 10: Continuous Quality with Google
- Session 11: Github, with the NHS Strategy Unit