Chapter 1 Introduction

Welcome to the Google Cloud Platform (GCP) to R Playbook. This playbook is designed to guide you through the process of working with Google BigQuery and integrating it with R, particularly in a Cloud R environment. Whether you are analysing large datasets, building Shiny apps, or simply looking to streamline how you access and prepare data, this guide provides practical steps, best practices, and real-world advice. It covers how to move data from Google Cloud Storage (GCS) to BigQuery, query and manage data using SQL, and bring the data into R in a safe, memory-efficient way. You will also find troubleshooting tips and techniques for using BigQuery effectively in your workflows. This playbook is for analysts, data scientists and R users who want to make the most of cloud-based tools in their data projects.

While much of the playbook focuses on connecting with and using R, it also includes guidance on interacting with other parts of the GCP, such as Cloud Storage. Whether uploading raw data, managing cloud buckets, or preparing files for BigQuery, this playbook offers practical tips for using GCS and related tools within the department’s infrastructure.

How to use the Playbook

Searching the Playbook: To search specific key words within the Playbook, you can use the ‘Search’ function by clicking on the magnifying glass located at the top left of the page.

1.1 What does this playbook cover?

Get started with:
- Google Cloud Platform
- Google Cloud Storage
- Google BigQuery
Connecting R Studio to:
- Cloud Storage
- BigQuery
Use SQL to manipulate data in BigQuery
Use R and SQL to read and write files to and from BigQuery
Learn about best practices when connecting R Studio to the Google Cloud

1.2 Presumed knowledge

To be able to understand this playbook, you must have a working knowledge of R and some experience of SQL (though the latter is not essential).

In addition, you must have an R Workstation and a Google Cloud Platform project.

1.3 Latest updates

Latest changes to this document will be clearly signposted in this section. Throughout the playbook, any new content is flagged by a yellow banner: New!

Updates to this document will also be communicated on the GCP Channel MS Teams area.

September 2025 updates

Request “Requester Access” for existing GCP projects (this is now done by submitting a request via your IT Focal Point, and seeing the full list of access packages for that project and GCP group, in the Access packages section on my.access: Chapter 2.2.1 An exisiting project
Using the {gargle} package to connect R to BigQuery without needing to use a Service Account key: Chapter 6.4.1
SAMPLE CODE - Reading multiple files from a bucket into R using the {purrr} package: Chapter 3.4.5.3
SAMPLE CODE - Uploading multiple files to BigQuery using the {purrr} package: Chapter 4.6.3.1
Top tips for bringing data from BigQuery into a Shiny application: Chapter 4.7.1 Bringing data into R Shiny

Updating links due to October updates.

October 2025 updates

Written instructions on how to connect to:

Google Cloud Storage: Chapter 3.4.2
BigQuery: Chapter 4.6.1

From R without needing a Service Account (JSON) Key.

November and December 2025 updates

GCP project transition

January and February 2026 updates

Migrating legacy SQL code

1.4 GCP Project Transition

You may have heard that we are changing the infrastructure of GCP (Google Cloud Platform) projects at DfT. All existing GCP projects will be migrated to a new setup designed to enhance cybersecurity across the department by replacing outdated authentication and login methods.

This new setup is called Project Factory — an Infrastructure-as-Code (IaC) solution that utilises Terraform to automatically build GCP projects and provision the necessary tools and permissions within minutes.

1.4.1 What is changing?

1. Service Account (JSON) keys will no longer be allowed

Using a Service Account key (JSON file) to connect from R to Google Cloud Storage (GCS) or BigQuery will no longer be an option once your project moves into the new infrastructure.

In this playbook, we have included:

The Old method (using a Service Account key) – valid only for existing projects before migration.
The New method, which employs the gargle R package for connection and authentication – this is the long-term approach for all projects (before and after migration).

2. Creating GCS buckets manually (“ClickOps”) will no longer be supported

In the new infrastructure, GCS buckets will not be created via the GCP console by navigating through menus. Instead, buckets will be generated using Terraform through a new automated process called Bucket Factory.

In this playbook, we have included:

The Old method of creating a bucket manually – again, only relevant before migration

For those who have transitioned or are transitioning to the new infrastructure, speak to the Data Engineering team about creating buckets in your GCP projects.

3. Permissions

Permissions will be managed through Microsoft Access Packages. The presentations and FAQ documents provide further details on this. If you feel that you and your team require specific permissions in your existing GCP projects, please email the Data Engineering team for further clarification. There will be more communication and documentation on how permissions will work in the new infrastructure during the transition period.

1.4.2 Do I need to learn Terraform?

No, you will not need to write or run Terraform yourself. Project Factory handles this for you. For example, you can request a GCS bucket, and it will be created automatically. You will still be able to view the Terraform code in your GCP project if you wish to do so.

1.4.3 Resources

We have delivered two presentations on this project transition, including the timelines of the transition, an FAQ and the recordings and PowerPoint slides:

Unless otherwise specified, everything in this playbook should be followed as the new infrastructure.

1.4.3.4 Data Engineering Handbook: GCP project transition

Chapter 7 - GCP project transition

1.5 We want to hear from you!

do you find this new format useful?
do you have suggestions for new content for the GCP to R Playbook?
do you have concerns about the clarity of some elements of this guidance?
do you have suggestions to improve this document?

We encourage you to provide feedback and comment on this new product. Let us know your thoughts by filling in a feedback form.

You can also join the GCP Channel to ask questions, share knowledge and get updates on GCP-related news in the department.

1.6 Working with the Data Engineering team

The Data Engineering (DE) Team, led by Francesca Bryden, is part of the Analysis Directorate (AD) and provides various services to support data management and digital transformation within the Department for Transport (DfT). Some of these services, relevant for this playbook, include:

supporting complex Google Cloud Platform (GCP) queries, including permissions, access and data structure
building automated data pipelines within GCP for data cleaning, processing and storage

Contact the DE team: data.engineering@dft.gov.uk

Read more details about how to work with DE in our handbook.

1.7 Key terminology

Google Cloud Platform (GCP) is a comprehensive suite of cloud computing services developed by Google, designed to help businesses and researchers build, deploy, and manage applications and data at scale. It provides a flexible, secure, and high-performance infrastructure that supports a variety of workloads, including data processing, artificial intelligence, and cloud storage. With GCP, users can leverage Google’s powerful infrastructure to streamline operations and drive innovation.

In this book, we will explore two essential GCP Services:

Google Cloud Storage (GCS)

A highly scalable and durable object storage service that allows users to store and access data of any type, from simple text files to massive datasets. It ensures data security, redundancy, and easy integration with other GCP services, making it a reliable solution for archiving, backup, and data sharing across distributed teams.

Google BigQuery

Is Google’s fully managed, serverless data warehouse that enables ultra-fast SQL queries on large datasets. It eliminates the need for infrastructure management and offers built-in machine learning capabilities, making it an excellent choice for big data analytics. With seamless integration with tools like R and Python, analysts can efficiently explore and visualise data, uncovering valuable insights to support decision-making.

GCP is organised as a set of projects within the Department for Transport. A project is the basic container for all activities you perform in GCP. This means your GCS buckets, your BigQuery datasets and tables, and any tools or permissions you use are all created within a specific project. If you use GCS or BigQuery, you are already using a project — even if you didn’t create it yourself.

1.8 Why move data into GCP?

Storing data in GCP instead of SharePoint, G:/V: drives, or directly in your coding environment (like Cloud R) has a number of advantages. GCP tools like Google Cloud Storage (GCS) and BigQuery are built for working with large datasets and are better suited to analysis. Here are some key reasons to make the switch:

Built for data, not just storage

SharePoint and shared drives are designed mainly for storing documents and sharing files. They are not ideal for working with big or complex datasets. GCP, on the other hand, is built for data analysis:

it is faster and more reliable when handling amounts of data
it integrates well with tools like R and BigQuery
it helps you keep things organised and easier to manage

Better performance and efficiency

When you load large datasets straight into R or Python, it can quickly use up your computer’s memory, which slows things down or causes errors. With GCP, you can:

pull in just the bit of data you need
avoid using too much memory in Cloud R environment, reducing server crashes
make your code faster and more reliable

Easier file management and better security

Using GCP makes data handling simpler:

no need to download and upload files manually
avoids making lots of copies of the same file
gives you better control over who can see or edit the data

Works better with Shiny apps

At the moment, Shiny apps hosted on our platforms must get their data from GCP or from within the R environment. While you can load data directly into the app itself, this isn’t good practice because:

it makes the app bigger and slower
it is harder to update or maintain
it will scale well if more people use it

Using GCP helps keep your Shiny apps quicker, cleaner, and easier to manage in the long run.

GCP to R Playbook