Chapter 2 The R basics
2.1 R family
A few of the common R relations are
- R is the programming language, born in 1997, based on S, honest.
- RStudio is a useful integrated development environment (IDE) that makes it cleaner to write, run and organise R code.
- Rproj is the file extension for an R project, essentially a working directory marker, shortens file paths, keeps everything relevant to your project together and easy to reference.
- packages are collections of functions written to make specific tasks easier, eg the {stringr} package contains functions to work with strings. Packages are often referred to as libraries in other programming languages.
- .R is the file extension for a basic R script in which anything not commented out with
#
is code that is run. - .Rmd is the file extension for Rmarkdown an R package useful for producing reports. A .Rmd script is different to a .R script in that the default is text rather than code. Code is placed in code chunks - similar to how a Jupyter Notebook looks.
2.2 DfT R/RStudio - subject to change
Which version of R/RStudio should I use at DfT? A good question. Currently the ‘best’ version of R we have available on network is linked to RStudio version 11453. This can be accessed via the Citrix app on the Windows 10 devices, or via Citrix desktop. The local version of RStudio on the Windows 10 devices is currently unusable (user testing is ongoing to change this). There is also a 11423 version of RStudio available which uses slightly older versions of packages.
2.3 RStudio IDE
The RStudio integrated development environment has some very useful features which make writing and organising code a lot easier. It’s divided into 3 panes;
2.3.1 Left (bottom left if you have scripts open)
- this is the Console it shows you what code has been run and outputs.
2.3.2 Top right; Environment, and other tabs
- Environment tab shows what objects have been created in the global environment in the current session.
- Connections tab will show any connections you have set up this session, for example, to an SQL server.
2.3.3 Bottom right
- Files tab shows what directory you are in and the files there.
- Plots tab shows all the plot outputs created this session, you can navigate through them.
- Packages tab shows a list of installed packages, if the box in front of the package name is checked then this package has been loaded this session.
- Help tab can be used to search for help on a topic/package function, it also holds any output from
?function_name
help command that has been run in the console, again you can navigate through help topics using the left and right arrows. - Viewer tab can be used to view local web content.
For some pictures have a look at DfE’s R Training Course getting started with rstudio
Or Matt Dray’s Beginner R Featuring Pokemon: the RStudio interface
2.4 Projects
Why you should work in an R project, how to set up and project happiness. See this section of Beginner R Featuring Pokemon by Matt Dray.
2.5 R memory
R works in RAM, so its memory is only as good as the amount of RAM you have - however this should be sufficient for most tasks. More info in the Memory chapter of Advanced R by Hadley Wickham here.
2.6 A note on rounding
For rounding numerical values we have the base function round(x, digits = 0)
. This rounds the value of the first argument to the specified number of decimal places (default 0).
round(c(-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5))
## [1] -2 0 0 2 2 4 4
For example, note that 1.5 and 2.5 both round to 2, which is probably not what you were expecting, this is generally referred to as ‘round half to even’. The round()
documentation explains all (?round
)
Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore
round(0.5)
is0
andround(-1.5)
is-2
. However, this is dependent on OS services and on representation error (since e.g.0.15
is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and soround(0.15, 1)
could be either0.1
or0.2
).
To implement what we consider normal rounding we can use the {janitor} package and the function round_half_up
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
::round_half_up(c(-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5)) janitor
## [1] -2 -1 1 2 3 4 5
If we do not have access to the package (or do not want to depend on the package) then we can implement^[see stackoverflow
<- function(x, digits = 0) {
round_half_up_v2 <- sign(x)
posneg <- abs(x) * 10 ^ digits
z <- z + 0.5
z <- trunc(z)
z <- z / 10 ^ digits
z * posneg
z
}
round_half_up_v2(c(-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5))
## [1] -2 -1 1 2 3 4 5
2.7 Assignment operators <-
vs =
To assign or to equal? These are not always the same thing. In R to assign a value to a variable it is advised to use <-
rather than =
. The latter is generally used for setting parameters inside functions, e.g., my_string <- stringr::str_match(string = "abc", pattern = "a")
. More on assignment operators here.
2.8 Arithmetic operators
- addition
1 + 2
## [1] 3
- subtraction
5 - 4
## [1] 1
- multiplication
2 * 2
## [1] 4
- division
3 / 2
## [1] 1.5
- exponent
3 ^ 2
## [1] 9
- modulus (remainder on divsion)
14 %% 6
## [1] 2
- integer division
50 %/% 8
## [1] 6
2.9 Relational operators
- less than
3.14 < 3.142
## [1] TRUE
- greater than
3.14159 > 3
## [1] TRUE
- less than or equal to
3 <= 3.14
## [1] TRUE
3.14 <= 3.14
## [1] TRUE
- greater than or equal to
3 >= 3.14
## [1] FALSE
3.14 >= 3.14
## [1] TRUE
- equal to
3 == 3.14159
## [1] FALSE
- not equal to
3 != 3.14159
## [1] TRUE
2.10 Logical operators
Logical operations are possible only for numeric, logical or complex types. Note that 0 (or complex version 0 + 0i) is equivalent to FALSE
, and all other numbers (numeric or complex) are equivalent to TRUE
.
- not
!
<- c(TRUE, 0, FALSE, -4)
x !x
## [1] FALSE TRUE TRUE FALSE
- element-wise and
&
<- c(3.14, FALSE, TRUE, 0)
y & y x
## [1] TRUE FALSE FALSE FALSE
- first element and
&&
&& y x
## Warning in x && y: 'length(x) = 4 > 1' in coercion to 'logical(1)'
## Warning in x && y: 'length(x) = 4 > 1' in coercion to 'logical(1)'
## [1] TRUE
- element-wise or
|
| y x
## [1] TRUE FALSE TRUE TRUE
- first element or
||
<- c(0, FALSE, 8)
z || z y
## Warning in y || z: 'length(x) = 4 > 1' in coercion to 'logical(1)'
## [1] TRUE
2.11 Vectors
2.11.1 Types
There are four main atomic vector types that you are likely to come across
when using R1; logical (TRUE
or FALSE
), double (3.142
), integer (2L
) and character ("Awesome"
)
<- TRUE
v1 typeof(v1)
## [1] "logical"
<- FALSE
v1 typeof(v1)
## [1] "logical"
<- 1.5
v2 typeof(v2)
## [1] "double"
<- 1
v2 typeof(v2)
## [1] "double"
# integer values must be followed by an L to be stored as integers
<- 2
v3 typeof(v3)
## [1] "double"
<- 2L
v3 typeof(v3)
## [1] "integer"
<- "Awesome"
v4 typeof(v4)
## [1] "character"
As well as the atomic vector types you will often encounter two other vector types; Date and factor . As well as some notes here this book also contains fuller sections on both
- Chapter 5 Working with dates and times
- Chapter 6 Working with factors
Factor vectors are used to represent categorical data. They are actually integer vectors with two additional attributes, levels and class. At this stage it is not worth worrying too much about what attributes are, but is suffiecient to understand that, for factors, the levels attribute gives the possible categories, and combined with the integer values works much like a lookup table. The class
attribute is just “factor”.
<- factor(c("good", "bad", "bad", "amazing"))
ratings typeof(ratings)
## [1] "integer"
attributes(ratings)
## $levels
## [1] "amazing" "bad" "good"
##
## $class
## [1] "factor"
Date vectors are just vectors of class double with an additional class attribute set as “Date”.
<- lubridate::as_date("1919-08-14")
DfT_birthday
typeof(DfT_birthday)
## [1] "double"
attributes(DfT_birthday)
## $class
## [1] "Date"
If we remove the class using unclass()
we can reveal the value of the double, which is the number of days since “1970-01-01”2, since DfT’s birthday is before this date, the double is negative.
unclass(DfT_birthday)
## [1] -18403
2.11.2 Conversion between atomic vector types
Converting between the atomic vector types is done using the as.character
, as.integer
, as.logical
and as.double
functions.
<- 1.5
value as.integer(value)
## [1] 1
as.character(value)
## [1] "1.5"
as.logical(value)
## [1] TRUE
Where it is not possible to convert a value you will get a warning message
<- "z"
value as.integer(value)
## Warning: NAs introduced by coercion
## [1] NA
When combining different vector types, coercion will obey the following hierarchy: character, double, integer, logical.
typeof(c(9.9, 3L, "pop", TRUE))
## [1] "character"
typeof(c(9.9, 3L, TRUE))
## [1] "double"
typeof(c(3L, TRUE))
## [1] "integer"
typeof(TRUE)
## [1] "logical"
technically there are more, see https://adv-r.hadley.nz/vectors-chap.html#atomic-vectors↩︎
a special date known as the Unix Epoch↩︎