Chapter 2 The R basics

2.1 R family

A few of the common R relations are

  • R is the programming language, born in 1997, based on S, honest.
  • RStudio is a useful integrated development environment (IDE) that makes it cleaner to write, run and organise R code.
  • Rproj is the file extension for an R project, essentially a working directory marker, shortens file paths, keeps everything relevant to your project together and easy to reference.
  • packages are collections of functions written to make specific tasks easier, eg the {stringr} package contains functions to work with strings. Packages are often referred to as libraries in other programming languages.
  • .R is the file extension for a basic R script in which anything not commented out with # is code that is run.
  • .Rmd is the file extension for Rmarkdown an R package useful for producing reports. A .Rmd script is different to a .R script in that the default is text rather than code. Code is placed in code chunks - similar to how a Jupyter Notebook looks.

2.2 DfT R/RStudio - subject to change

Which version of R/RStudio should I use at DfT? A good question. Currently the ‘best’ version of R we have available on network is linked to RStudio version 11453. This can be accessed via the Citrix app on the Windows 10 devices, or via Citrix desktop. The local version of RStudio on the Windows 10 devices is currently unusable (user testing is ongoing to change this). There is also a 11423 version of RStudio available which uses slightly older versions of packages.

2.3 RStudio IDE

The RStudio integrated development environment has some very useful features which make writing and organising code a lot easier. It’s divided into 3 panes;

2.3.1 Left (bottom left if you have scripts open)

  • this is the Console it shows you what code has been run and outputs.

2.3.2 Top right; Environment, and other tabs

  • Environment tab shows what objects have been created in the global environment in the current session.
  • Connections tab will show any connections you have set up this session, for example, to an SQL server.

2.3.3 Bottom right

  • Files tab shows what directory you are in and the files there.
  • Plots tab shows all the plot outputs created this session, you can navigate through them.
  • Packages tab shows a list of installed packages, if the box in front of the package name is checked then this package has been loaded this session.
  • Help tab can be used to search for help on a topic/package function, it also holds any output from ?function_name help command that has been run in the console, again you can navigate through help topics using the left and right arrows.
  • Viewer tab can be used to view local web content.

For some pictures have a look at DfE’s R Training Course getting started with rstudio

Or Matt Dray’s Beginner R Featuring Pokemon: the RStudio interface

2.3.4 Other handy buttons in RStudio IDE

  • top left new script icon; blank page with green circle and white cross.
  • top right project icon; 3D transparent light blue cube with R. Use this to create and open projects.

2.3.5 RStudio menu bar a few pointers

  • View contains Zoom In Zoom Out
  • Tools -> Global Options contains many useful setting tabs such as Appearance where you can change the RStudio theme, and Code -> Display where you can set a margin vertical guideline (default is 80 characters).

2.4 Projects

Why you should work in an R project, how to set up and project happiness. See this section of Beginner R Featuring Pokemon by Matt Dray.

2.4.1 Folders

When you set up a project it is good practise to include separate folders for different types of files such as

  • data; for the data your R code is using
  • output; for files creates by your code
  • R; all your code files, eg .R, .Rmd
  • images

2.4.2 sessionInfo()

Include a saved file of the sessionInfo() output, this command prints out the versions of all the packages currently loaded. This information is essential when passing on code as packages can be updated and code breaking changes made.

2.5 R memory

R works in RAM, so its memory is only as good as the amount of RAM you have - however this should be sufficient for most tasks. More info in the Memory chapter of Advanced R by Hadley Wickham here.

2.6 A note on rounding

For rounding numerical values we have the base function round(x, digits = 0). This rounds the value of the first argument to the specified number of decimal places (default 0).

round(c(-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5))
## [1] -2  0  0  2  2  4  4

For example, note that 1.5 and 2.5 both round to 2, which is probably not what you were expecting, this is generally referred to as ‘round half to even’. The round() documentation explains all (?round)

Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on OS services and on representation error (since e.g. 0.15 is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and so round(0.15, 1) could be either 0.1 or 0.2).

To implement what we consider normal rounding we can use the {janitor} package and the function round_half_up

library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
janitor::round_half_up(c(-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5))
## [1] -2 -1  1  2  3  4  5

If we do not have access to the package (or do not want to depend on the package) then we can implement^[see stackoverflow

round_half_up_v2 <- function(x, digits = 0) {
  posneg <- sign(x)
  z <- abs(x) * 10 ^ digits
  z <- z + 0.5
  z <- trunc(z)
  z <- z / 10 ^ digits
  z * posneg
}

round_half_up_v2(c(-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5))
## [1] -2 -1  1  2  3  4  5

2.7 Assignment operators <- vs =

To assign or to equal? These are not always the same thing. In R to assign a value to a variable it is advised to use <- rather than =. The latter is generally used for setting parameters inside functions, e.g., my_string <- stringr::str_match(string = "abc", pattern = "a"). More on assignment operators here.

2.8 Arithmetic operators

  • addition
1 + 2
## [1] 3
  • subtraction
5 - 4
## [1] 1
  • multiplication
2 * 2
## [1] 4
  • division
3 / 2
## [1] 1.5
  • exponent
3 ^ 2
## [1] 9
  • modulus (remainder on divsion)
14 %% 6 
## [1] 2
  • integer division
50 %/% 8
## [1] 6

2.9 Relational operators

  • less than
3.14 < 3.142
## [1] TRUE
  • greater than
3.14159 > 3
## [1] TRUE
  • less than or equal to
3 <= 3.14
## [1] TRUE
3.14 <= 3.14
## [1] TRUE
  • greater than or equal to
3 >= 3.14
## [1] FALSE
3.14 >= 3.14
## [1] TRUE
  • equal to
3 == 3.14159
## [1] FALSE
  • not equal to
3 != 3.14159
## [1] TRUE

2.10 Logical operators

Logical operations are possible only for numeric, logical or complex types. Note that 0 (or complex version 0 + 0i) is equivalent to FALSE, and all other numbers (numeric or complex) are equivalent to TRUE.

  • not !
x <- c(TRUE, 0, FALSE, -4)
!x
## [1] FALSE  TRUE  TRUE FALSE
  • element-wise and &
y <- c(3.14, FALSE, TRUE, 0)
x & y
## [1]  TRUE FALSE FALSE FALSE
  • first element and &&
x && y
## Warning in x && y: 'length(x) = 4 > 1' in coercion to 'logical(1)'

## Warning in x && y: 'length(x) = 4 > 1' in coercion to 'logical(1)'
## [1] TRUE
  • element-wise or |
x | y
## [1]  TRUE FALSE  TRUE  TRUE
  • first element or ||
z <- c(0, FALSE, 8)
y || z
## Warning in y || z: 'length(x) = 4 > 1' in coercion to 'logical(1)'
## [1] TRUE

2.11 Vectors

2.11.1 Types

There are four main atomic vector types that you are likely to come across when using R1; logical (TRUE or FALSE), double (3.142), integer (2L) and character ("Awesome")

v1 <- TRUE
typeof(v1)
## [1] "logical"
v1 <- FALSE
typeof(v1)
## [1] "logical"
v2 <- 1.5
typeof(v2)
## [1] "double"
v2 <- 1
typeof(v2)
## [1] "double"
# integer values must be followed by an L to be stored as integers
v3 <- 2
typeof(v3)
## [1] "double"
v3 <- 2L
typeof(v3)
## [1] "integer"
v4 <- "Awesome"
typeof(v4)
## [1] "character"

As well as the atomic vector types you will often encounter two other vector types; Date and factor . As well as some notes here this book also contains fuller sections on both

Factor vectors are used to represent categorical data. They are actually integer vectors with two additional attributes, levels and class. At this stage it is not worth worrying too much about what attributes are, but is suffiecient to understand that, for factors, the levels attribute gives the possible categories, and combined with the integer values works much like a lookup table. The class attribute is just “factor”.

ratings <- factor(c("good", "bad", "bad", "amazing"))
typeof(ratings)
## [1] "integer"
attributes(ratings)
## $levels
## [1] "amazing" "bad"     "good"   
## 
## $class
## [1] "factor"

Date vectors are just vectors of class double with an additional class attribute set as “Date”.

DfT_birthday <- lubridate::as_date("1919-08-14")

typeof(DfT_birthday)
## [1] "double"
attributes(DfT_birthday)
## $class
## [1] "Date"

If we remove the class using unclass() we can reveal the value of the double, which is the number of days since “1970-01-01”2, since DfT’s birthday is before this date, the double is negative.

unclass(DfT_birthday)
## [1] -18403

2.11.2 Conversion between atomic vector types

Converting between the atomic vector types is done using the as.character, as.integer, as.logical and as.double functions.

value <- 1.5
as.integer(value)
## [1] 1
as.character(value)
## [1] "1.5"
as.logical(value)
## [1] TRUE

Where it is not possible to convert a value you will get a warning message

value <- "z"
as.integer(value)
## Warning: NAs introduced by coercion
## [1] NA

When combining different vector types, coercion will obey the following hierarchy: character, double, integer, logical.

typeof(c(9.9, 3L, "pop", TRUE))
## [1] "character"
typeof(c(9.9, 3L, TRUE))
## [1] "double"
typeof(c(3L, TRUE))
## [1] "integer"
typeof(TRUE)
## [1] "logical"

  1. technically there are more, see https://adv-r.hadley.nz/vectors-chap.html#atomic-vectors↩︎

  2. a special date known as the Unix Epoch↩︎