Chapter 2 Introduction to Git

2.1 Session aims

how Git works
the main Git commands
cloning a repository
creating and checking out branches
adding and committing files
pushing and pulling to the remote repository

Definitions you’ll need in this chapter

Git: Git is a version control system that helps you manage changes to your code or files. It tracks modifications, keeps a history of changes, and allows collaboration among multiple people working on the same project.
GitHub: GitHub is a platform that hosts Git repositories in a remote location on the internet.
Repository: A repository (or “repo”) is a place where you can store and manage your code or any other files. It is a folder that holds all the files relevant to your project, but also contains the version history and information related to a project. There are two different types of repositories: local repositories which are personal to you and on your local computer, and remote repositories which are on Github and shared with others.

2.2 How git works

Fundamentally, git works by storing your files inside an individual repository. While a repository looks like just a folder that contains your code files, it actually performs a number of functions all at once:

Storage of files

A repository contains all the files and folders that make up your project, from source code, to text documents, images, and more.

Version history

Git continuously tracks changes made to the files within the repository. Every time you make a modification, git creates a snapshot of the entire project at that moment. Known as “commits”, these store the state of your project at different points in time, allowing you to revisit or revert to earlier versions as required.

Metadata and tracking

On top of this, git stores metadata, such as who made specific changes, when they were made, and why. This helps you understand the progress made throughout the project and facilitates collaboration among team members.

All of this is done through a set of simple git commands, and the version history is stored alongside the files. In this way, git tracks the changes you make locally; to share the changes with others you would make use of Github as well.

2.2.1 Branches

Git also allows you to make use of branches to swap between different versions of code. Git branches are effectively a separate line of development within your code. When you want to add a new feature or fix a bug—no matter how big or how small, you create a new branch to your changes. This makes it harder for unstable code to get merged into the main code base, and it gives you the chance to clean up your content before merging it into the main branch. The point at which you create a new branch, it becomes a duplicate of the branch you started from; it will then diverge from that old branch as you make new changes, and eventually further down the process you will merge it back in to the branch it originally came from.

Importantly, branches are for new features and content, not individual people! Think about them like individual folders inside a shared drive; when producing a new chart you’d work inside a folder (or branch) called “charts” and not one called “Dave’s code”.

(And don’t worry if branches seem a bit abstract right now; they’ll make a lot more sense in the next chapter!)

Command line basics

If you would like to make use of a command line tool (git bash or the R terminal), try out the basics in this mini-exercise:

Open Git Bash, type “ls” and press enter: What do you think this command does?
Now type “pwd”
Now type “cd” followed by the name of a folder (e.g. g), what has changed in the git terminal and what do you think the command has done?
Try typing “ls” again and see what is being displayed now
Try using the up and down arrows – what do you think this is doing?

And the answers to this exercise:

The command ‘ls’ is used to “list” contents of the current working directory.
The command ‘pwd’ is used to print the present working directory.
cd is an acronym for ‘Change Directory’. cd is invoked with an appended directory name. Executing cd will change the terminal sessions current working directory to the directory specified. The text at the start of your command line will indicate the fact that the working directory has changed. You can use “cd ..” to go up a level in your directory
As your working directory has changed, ls will now show a list of files in the new working directory.
Up and down arrows can be used to navigate through the last commands you provided, press enter to run any of these again.

2.2.2 The git workflow

Actually using git follows a few basic steps. This section aims to talk you through the purpose of each of these steps, as well as the RStudio point and click interface process required to carry them out. There are also drop down git command line boxes associated with each section; click these if you’d like to see the git commands happening behind the scenes (or use them for yourself!)

2.3 Cloning

The first time you work with a git repository which has already been set up, you need to clone the repository from Github into your local Git workspace. This makes a copy of the remote repository locally, copying down all of the files and the version control history.

There are two important things to note about cloning:

You only do it once! Cloning a repository only happens the first time you use it. After that, you make use of the pull command to bring new changes down from the remote repository to your local one.
You always do it to a personal file location! Never clone a repository to a shared drive; many of the features will not work properly, and you’ll also re-introduce the risk of overwriting other people’s changes when you collaborate. With Github, your remote Github version is the shared version, and your git repository goes in a home/C drive directory.

Depending on whether you have set up an SSH key or a Personal Access Token (PAT), the method for cloning a repository will differ. SSH keys provide a secure, key-based authentication method, while PATs act as passwords for HTTPS connections. If you’ve configured an SSH key, you can clone using the repository’s SSH URL. If you’re using a PAT, you’ll need to clone via the HTTPS URL and provide your token during authentication.

How to clone a repository with PAT Token:

Full instructions on the CRAN Wiki.

How to clone a repository using an SSH key in Cloud R:

2.3.1 How to clone

Using an SSH Key

Step 1: Get the SSH URL of the Repository

Go to the GitHub link for the repository you want to clone; you can find this on the repository’s main page, typically under the green “Code”
Choose the SSH tab
Copy the SSH URL, e.g.: git@github.com:your-username/your-repo.git

Step 2: Open R Studio

Go to R Studio
Go to the menu: File > New Project > Version Control > Git

Step 3: Paste the SSH Repository URL

You will see a pop-up window which looks like this:

In each field:

Repository URL: Paste the SSH URL
Project directory name: Will be auto-filled so you can leave it blank
Create project as subdirectory of: Browse to select a location where the repository will be cloned. In Cloud R, your home directory (~) is a suitable choice

Step 4: Create Project

Click Create project! Git will clone the repository from GitHub and open the project, so you’re ready to begin working

After this step, all the remaining stages take place in the Git window panel in the top right of your screen.

Command line equivalent

What’s happening in the command line when you do this

Navigate to the correct working directory using the “cd” command followed by the location
Use the command “git clone” followed by the repository URL to clone the repository. Again, you may need to enter your authentication details at this point.
Navigate into the new git repository that’s been set up using the cd command followed by the name of the repository

2.3.2 Exercise

10:00

Clone a copy of the git-and-github-training repository; the URL you need for this is: git@github.com:department-for-transport/git-and-github-training.git

2.4 Branching

As mentioned earlier, when making changes to code you will likely want to create a new branch to make those changes in. You can then make those changes safely when collaborating with other people, and reduce the risk of making changes which conflict with someone else’s, or overwrite important code.

Creating a branch makes a new, clean duplicate of the code in the default branch, and allows you to make new edits to that code in a separate working environment.

2.4.1 How to branch

For branching, you’ll need to use the two branching buttons circled in red below.

Select the branch you want to branch from first; this will usually be “dev”. You can do this by clicking the arrow next to the word “main” and selecting the name of the branch there. If the name showing here is already the one you want to branch from, you don’t need to do anything at this step.
Now click the purple boxes next to this. A pop-up will appear, and you can type the name of your branch into here.

Good practice for naming branches:

use descriptive and concise names that reflect the purpose of the branch (e.g. feature-new-dashboard or bugfix-login-error)
avoid using spaces or special characters; use hyphens or underscores instead
follow a consistent naming convention agreed upon by your team (e.g. feature/ or bugfix/ prefixes)
keep branch names lowercase for simplicity and compatibility

Click “Create” to produce the branch. A pop-up will appear at this point, letting you know that the branch has been set up, linked to the remote repository, and you have automatically been moved over to that branch.

You are all set to start making changes!

Command line equivalent

What’s happening in the command line when you do this

When inside the git repository, type “git branch” to see what branches are available in your repository. There will be a star (*) at the start of the name of the one that is currently active.
If you need to change which branch you’re currently on, use “git checkout” followed by the name of the branch you want to move to.
Run “git branch” again to check you’ve swapped to the correct branch.
Now, to create a new branch do “git branch” followed by the name of the new branch you want to create e.g. “git branch feature/new_content”. You should get a message saying this has been done.
To move to the new branch, use the “git checkout” command again.

2.4.2 Exercise

05:00

Create a branch. For now just call the branch your name e.g. john-smith (although this is bad practice in real life!)

2.5 Making a change

This bit happens as normal! You can now open and edit code files, text files, etc. as normal. The important thing to remember is that using git doesn’t replace any of the normal coding process, you still open and edit files in the same way, and save them once you’ve finished editing.

2.6 Staging

Staging is the most complicated step for most people to wrap their heads around!

Essentially at this point, you have files which contain a variety of changes. These will show up in the git window in the right hand side of your screen looking like this:

Each line in this window represents a file with changes that have been made but are not yet tracked by Git. To start tracking these changes, you need to add them to the staging area. The staging area works like an online shopping cart: the changes are like items you have viewed online, but they will not move to the next step until you add them to the cart by staging them. You can choose to stage all files at once or only a specific subset.

2.6.1 How to stage

On the left-hand side of each file is a column titled Staged. By default, all checkboxes in this column are unchecked.
To stage a file, click the checkbox next to it. When you do this, the Status boxes will move horizontally and may change their color or symbol. These symbols indicate the file’s current state:

Symbol	Colour	Meaning	Description
M	Blue	Modified	A file that contains changes.
?	Yellow	Untracked	A new, moved or renamed file not yet tracked by Git.
A	Green	Added	A brand new file.
R	Purple	Renamed or moved	A file that has been renamed or moved.
D	Red	Deleted	A file that has been deleted.

If you are not sure if you’d like to stage a file or not, click the “Diff” button. This will show you a line-by-line breakdown of the changes in each file, so you can understand more about what has changed.

That’s it! You’re now ready to move on to the most important stage; committing.

Command line equivalent

What’s happening in the command line when you do this

When inside the git repository, use the “git status” command to see what files are currently staged or unstaged.
Use the “git add” command to stage files. Use git add followed by the name of the file (e.g. “git add new_file.R”) to stage a single file, or “git add .” to stage all files at once.
Use “git status” again to see which files are now staged or unstaged.
To check the line-by-line changes, use the “git diff” command. This can print out a lot of content, so use the ENTER key to tab through it all. To escape from this mode, type “q” then ENTER.

2.6.2 Exercise

05:00

Open the file called text_file.txt and make a change to it. You can also make a change to one or more of the code files in this repository if you’d like. Save these changes.
Find and stage all the changed files.

2.7 Committing

Committing is probably the most important step of the git process. This step saves a snapshot of your staged files to your repository version history as a commit. This saves the changes that were made, alongside the time they were made, who made the change, and gives you the opportunity to add a commit message. This is a free text box that allows you to explain the purpose of the commit, so when you look back at the history of the repo you can see why you made this change.

2.7.1 How to commit

Above your staged files in the Git window, click the “Commit” button.
This opens the commit window. In here, you can see the files that have been staged and the individual changes in each of them.

Add a commit message in the box in the top right of the commit window. This should be a short but sufficiently detailed message which explains in a few words the purpose of the changes you’re committing.

Example: “Fix typo if course introduction section.” This message is concise but descriptive, highlighting the purpose of the change (fixing a typo) and specifying the context (course introduction section).

Click Commit! A window will pop up to let you know it’s happened successfully, you can close both this and the commit window.
When you close the commit window, you’ll notice that above the staging window, there’s an information message which now says “your branch is ahead of branch_name by 1 commit”. This message tells you that there are committed changes in your local repository which you have not shared with your remote repository.

You are now ready to share your changes with others!

Command line equivalent

What’s happening in the command line when you do this

Use “git status” to see which files are staged ready for commit.
The base command to commit your staged changes is “git commit”. However… Warning! Using git commit without any further modifications opens the VIM editor to allow you to create your commit message. This is notoriously difficult to use, so I recommend you avoid it!
Instead, use “git commit -m ‘your message here’” to provide your commit and your message at the same time. Your commit message will need to be less than 140 characters inside the command line.
Use “git status” again to check your changes have been committed.

2.7.2 Exercise

05:00

Commit the staged files with a clear and sensible commit message.

2.8 Pushing

At this point, your changes are committed locally. But if you’re using GitHub to collaborate and share code, or as a backup to your local code, there’s one final step.

Pushing code describes sharing the files and version history stored in your local repository up to your remote repository. As well as allowing for collaboration and backups, GitHub also provides a much easier way to view the history of commits (which we’ll cover more in the next chapter!).

2.8.1 How to push

Once you have one or more committed changes, hit the green “push” arrow. Depending on how you’ve authenticated, you may need to provide your username (email address) and password (PAT token) at this point.
A pop-up window will appear confirming that your push has been successful.

That’s it! Your changes will now appear on the remote repository.

Command line equivalent

What’s happening in the command line when you do this

Once you have your changes committed, use the “git push” command to push those changes up. As above, you may need to authenticate at this stage.
If this is the first time you are pushing on this branch, you may instead need to run “git push –set-upstream origin YOUR BRANCH NAME” which both creates a remote branch equivalent to your local branch, and pushes the content up. This is done for you automatically if you are using the RStudio point and click interface.

CRAN Shorts - How to commit and push changes with the terminal

2.8.2 Exercise

05:00

Push your changes up to the remote repository.

HINT: Check out one of our CRAN Short videos on how to commit and push changes:

2.9 Pulling

Pulling represents both the end of one git cycle and the start of the next one. As the name suggests, it’s the opposite of pushing, and it allows you to take changes other people have made in the remote repository, and bring them into your local repository.

As mentioned above, when you’re using a repository after the first time, you will pull the changes down rather than cloning the repository in the steps above.

2.9.1 How to pull

Press the blue arrow to pull down changes from the remote repository.
A pop-up window will appear confirming that your pull has been successful, and you will also see that the content of your local files changes directly.

You can pull down at any time, but you should do so at least once a day when you’re coding, or when you know someone else has made a change.

Command line equivalent

What’s happening in the command line when you do this

You can replicate the process above using the “git pull” command.

2.9.2 Exercise

15:00

You’re now going to practice collaborating with someone else on GitHub! This will include pulling down someone else’s changes, making a change to their work, and sharing it with them:

If you’re working through this book on your own, please only complete Question 1 of this exercise:

Pull down to get the latest changes made to the remote repository.
Use the dropdown branch list to identify the branch of the person you have been paired with. Click on it to move to their branch.
Note how your files have now changed to their files, you should only have their changes in your files now.
Add a change to one of their files.
Stage the change you have made.
Commit the changes to the git repo.
Push it back up to GitHub.

You have successfully collaborated on a branch created by somebody else!

2.10 Best practices

Git is a powerful tool for version control, but adopting the right workflow can help you manage your code efficiently, especially when working in teams or on larger projects. Below are some best practices to follow when using Git, whether through point-and-click interfaces, or by using Git commands in the terminal.

Commit early, commit often

make small, frequent commits that represent logical chunks of work. This makes it easier to track changes and roll back to earlier states if needed
avoid making huge commits that include multiple unrelated changes. Instead focus each commit on a specific task or bug fix

Write descriptive commit messages

a good commit message is clear and descriptive. It should by the “why” behind changes, not just the “what”
Examples:
- “Fix bug in user authentication logic”
- “Add unit tests for data processing functions”
- “Refactor code for better readability”

Use branches for new features or bug fixes

always create a new branch when working on a feature or bug fix. This keeps your main branch clean and free of unfinished work
name branches descriptively to indicate their purpose (e.g., feature/user-authentication or bugfix/login)

Pull changes frequently

before starting your work, always pull the latest changes from the remote repository to ensure you’re working with the most recent code
pull regularly while working to avoid conflicts later. It’s easier to resolve smaller conflicts incrementally than all at once at the end

Avoid large push conflicts

avoid working on the same files as others. Ensure you are regularly pulling changes to stay up to date with the latest updates in the remote repository
use feature branches, and push small frequent commits to isolate your work and reduce the chances of pushing conflicting changes

Push changes to remote repositories regularly

push your changes to the remote repository frequently to back up your work and share your progress with collaborators
avoid pushing incomplete work, but don’t wait too long between pushes, especially when working in a team