Chapter 1 Introduction

Since the launch of reproducible analysis in coding in the Civil Service in 2018, there has been extensive sharing of how to benchmark the completeness of reproducible analytical pipelines, and documentation for established technical users as to what features they need to add to their existing code.

However, the focus of this guidance is often on experienced coders and/or those who will be doing the coding themselves. Within DfT it has been highlighted that there is a gap in this guidance material, which is particularly noticeable for analysts who are not themselves coding experts, and are often the manager of a coding team. In these cases, people can struggle to make use of existing guidance which can assume a high level of technical knowledge when making strategic prioritisation decisions, as well as the assumptions that the process to follow and the rationale behind it are both self-evident.

This book aims to fill this gap, providing information around the principles and processes of reproducible analysis, and not assuming that the reader is a regular coder themselves. It aims to cover:

  • What coding tools are available to analysts within DfT, when coding approaches and specific tools are most appropriate
  • What platform-agnostic best practice looks like within coding reproducible analysis
  • Effective maintenance of existing code bases, while ensuring new code is robust and future-proof
  • How and when open sourcing code is appropriate
  • Making strategic decisions about the benefits and risks associated with reproducible analysis approaches