Chapter 3 R Packages

This chapter discusses best practice for using packages in your R code, and gives recommended packages for a range of applications.

3.1 What are packages?

While base R can be used to achieve some impressive results, the vast majority of coding in R is done using one or more packages. Packages (often also called libraries) are shareable units of code which are written, published and maintained by other R coders. The scope of each individual package varies massively; some perform a single function, whereas others are large and underpin many core aspects of everyday R usage. The huge variety of packages is one of the reasons that R is so successful: it allows you to benefit from the work of others by downloading their packages.

The most common way to install R packages is using the Comprehensive R Archive Network, or CRAN, which you do via an install.packages() call e.g.

install.packages("data.table")
install.packages("dplyr")

3.2 Best practice in using packages

While it’s very easy to get started with packages, there are a few best practice tips which will make it easier to manage how you use packages in a consistent, replicable and transparent way.

3.2.1 Calling packages

You can tell your code when you want to make use of the contents of a package by making a library() call in your code. This needs to be done before you run any code which references that package e.g.

## This code will produce an error
toTitleCase("hello")

library(tools)

## This code will run fine
library(tools)

toTitleCase("hello")

Your code should always call all libraries it uses at the very beginning of the code, even if the package won’t be used until much later. This allows users to know what packages they need in any piece of code.

Your code also shouldn’t include any install.packages() calls. Forcing code users to install packages will make significant changes to their coding environment, and may even break other code for them. If you need to ensure they have the right packages installed, you should use package version control such as renv.

To ensure you’ve called all necessary packages in your code, it is helpful run your code in a fresh R instance (CTRL+SHIFT+F10 to refresh your environment) and rerun your code, to check it isn’t dependent on any libraries pre-loaded in your environment.

3.2.2 Core package usage

The large number of R packages available means that it’s common for there to be multiple packages which perform essentially the same functions. Use of multiple similar packages in this way makes your code difficult to use for beginners, and is more likely to cause bugs and other errors.

To minimise this, your code should start from a basis of a core group of common packages, and only add additional or different packages where they fulfill functions which can’t be met by the core packages. This chapter includes a number of suggestions of core packages which are well-supported and easy to use, and that you should to aim to use where possible.

3.2.3 Using Github packages

While there are a large number of packages available to download on CRAN, there are many more that people have released on other platforms, particularly Github. This is usually because they don’t want to go through the stringent review process which CRAN requires, or their package is still under development.

Packages can be easily downloaded and used from Github using the remotes::install_github() function, and this allows you to make use of a wider code base in your work. However, Github packages should be used with caution:

  • They have not been peer reviewed like CRAN packages, so are more likely to contain problems, errors or even malicious code.
  • Bugs may not be regularly (or ever) corrected.
  • There is no guarantee the package will be maintained.

You should always use a CRAN package in preference to a Github alternative where one is available. You should also be cautious of downloading and using Github packages where the author is not known to you, and you should ensure you have a good understanding of the code in the package and what it does before installing or using it.

3.2.4 Using packages instead of stand-alone functions

Packages offer a number of benefits over just including stand-alone functions in your code. They increase the consistency of code usage, preventing you from using different iterations of the same function in different situations. They are also regularly tested and checked, and you can be confident they are performing their function as intended.

You should generally always aim to use a function from a common package rather than writing your own where possible.

3.2.5 Heavy dependency packages

Sometimes when calling libraries, it is easy to accidentally create a lot of heavy dependencies for your code. These are packages which in turn are dependent on a large number of other packages, requiring you to install many packages before running the code and increasing the points of failure.

When introducing new packages into your code, it is always worth checking if the function you are calling can be sourced in a more lightweight way, e.g.

##Tidyverse is an example of a heavy dependency which is not really required for simply using the pipe function

library(tidyverse)

mtcars %>%
  head()

##A much better alternative if other code from the tidyverse is not required, is to call the package that the pipe originates from instead (magrittr), which has few dependencies

library(magrittr)

mtcars %>%
  head()

3.2.6 Specifying packages when using functions

As well as calling libraries at the start of your code, you can also specify what package a function comes from every time you use it using “double colon” formatting:

##Calling a function without referencing the package
str_trim(" test ")

##Calling a function including the package using "double colon" notation.
stringr::str_trim(" test ")

While this format is slightly more time-consuming to write, it has a number of advantages:

  • Easy to understand which package each function comes from, particularly useful for unusual functions
  • Greater specificity reduces the risk of function conflicts across packages
  • Individual lines of code will run without needing to make library calls
  • Easy setup of package dependencies by packages such as renv
  • Allows code to be converted to a package easily

3.4 Package management

As previously mentioned, it is bad practice to include install.package() calls in your code. However, it is difficult to ensure that code will be functional for multiple users without controlling the packages and package versions which users have.

The solution to this is use of a package manager. These add-ons to R keep a record of the exact version of each package used in the code on a project-basis, and that collection of packages can be easily installed by any new user to the project. The best available package manager for R is renv, a marked improvement from packrat.

To get started with renv on any project, you just need to run:

renv::init()

This will create all of the files you need to get started with renv, and will also make note of any existing dependencies your project has. Once your renv project is initialised, you can use libraries in it as normal, installing and using them as needed.

To make a record of the package versions you have used, you run:

renv::snapshot()

And to retrieve this list of packages and install them in a new instance of the project, use:

renv::restore()

Further details on using renv can be found in the very comprehensive documentation