The Big Book of Git
2024-07-30
Chapter 1 Introduction
1.2 How to use this book
This book is primarily designed to accompany the Git and Github training that we run at DfT; this includes:
- Introduction to Git and Github (chapters 2 and 3)
- Introduction to Gitflow (chapter 4)
- Tehnical Lead Training (chapter 7)
If you’re attending one of these courses you’ll have all of the sections explained with examples and live demonstrations of Git and Github, as well as time to complete the exercises.
If you’re running through this book solo, it’s recommended to run through it in order, and try out all of the exercises as you go through.
The book is also designed for people who will mostly be using Git through the RStudio interface, and therefore the examples will mostly show how to do things using the RStudio graphical interface. However, we appreciate it’s useful to understand that these buttons all correspond to underlying git commands (and also know that some odd people prefer using the written git commands!). Therefore, each section will include an expandable box (black and marked with the Git Bash logo) talking you through the corresponding git text commands; these are designed to aid understanding of the underlying processes, but if you’d prefer to use these in git bash or the terminal, you’re very welcome to!
1.3 What are Git and Github?
Before diving into how to use Git and Github, it’s helpful to look at what these tools are, how you might use them, and the terminology that surrounds them. Often people will use things like Git, Git Bash, and Github interchangeably, although as you’ll see these are all subtly different!
1.3.1 What is Git?
Git is an open source version control system, meaning:
- it is a freely distributed software
- it is designed to track different versions and changes to a wide range of code and file types.
It is the most commonly used version control system, and is used extensively to track changes to a wide range of code files (not just R). Git tracks the changes you make to files, and is useful for doing a few different things, whether you work alone or collaborate with others:
Version tracking: allowing you to revert back to specific versions of your work should you ever need to.
Auditability: keeping a record of the who, what, when and why of different code changes.
Facilitating collaboration: by default, two or more people working on the same code file can easily overwrite each other’s changes. Git allows changes by multiple people to all be merged into one source, making collaboration easier and preventing loss of work.
Git is a piece of software which is installed and used locally on each analyst or developers computer. Your files and their history are stored on your computer in something known as a local repository. You can link this with online hosts (such as GitHub) to store a copy of the files and their revision history on a remote repository that is accessible to colleagues.
1.3.2 What is GitHub?
GitHub is a platform that hosts Git repositories in a centralised cloud location on the internet. It is not the only option available (other departments may use GitLab or DevOps), but it is one of the most popular. As well as providing a web-based graphical interface to view your repository, Github also provides access control and several collaboration features, such as wikis and basic task management tools for every project.
GitHub gives you a centrally located place where you can upload your changes, download changes from others, peer review code before it becomes the default code base and much more. This enables you to collaborate more easily with other developers.
You do not need GitHub to use git, but you cannot use GitHub without using git.
1.3.3 How do Git and Github work together?
Imagine a workflow where multiple people all want to make use of and edit the same code files, using Git and Github for version control:
In this image, every user has git installed on their individual computer, which is where they make changes to their own repository, containing their own code. When they are happy with those changes, they can share that version-controlled code up to Github, which is a cloud-based, shared version of the code. Other users can make their own changes to that code, use it, or combine different changes into the shared version of the code on there.
This is similar to many other hybrid software and cloud storage based systems you may be familiar with. For example, a shared text document is something you would edit on a local version of Word on your computer. This is the software you use to make changes, similar to git. When you are happy with those changes, you would then share the finished document up to Sharepoint for other people in your team to acces. This is the cloud-based storage/collaboration platform similar to Github.
1.4 Interfaces you may use
Git is a piece of software you make use of on your computer, but there are actually many different ways you can interface with it. In the same way that you may write and run Python code using one of many programmes such as PyCharm, VSCode, RStudio, etc, you can write and run git commands in many different programmes. These generally fall into one of two categories; point and click, or command line.
1.4.1 Point and click interfaces
Point-and-click git interfaces do exactly what they say on the tin; they include a graphical interface that allows you to interact with git by clicking on buttons. This allows you to use all of the common git commands, and many people find it more simple to use than written git commands. Under the hood, they still use the written git commands, and there are some git features you cannot do using these interfaces.
RStudio Git graphical interface: RStudio has a built in point-and-click graphical interface for Git which appears in the top right corner of the screen when working in Git projects. We will assume throughout this course that this is the primary way people are interacting with Git.
Other graphical interfaces: This can include other embedded point-and-click interfaces in Jupyter notebooks, PyCharm, other IDEs, or the git GUI.
1.4.2 Command line
Command line tools are the most direct way to interact with git; these are simple text-based tools that allow you to type written commands directly to git. People often find the learning curve to use these kinds of tools is steeper than for point-and-click tools, but they are more universal and allow you to perform every possible git command on them.
Git Bash: Git Bash is a command line tool that allows users to interact with Git on Windows computers. Like all command line tools, it has a very minimal interface that you simply type git commands into. You will use this if you want to use a command line tool locally on a Windows computer.
RStudio terminal: Unlike Windows computers, Linux computers do not need to run git bash, and instead can use the Linux terminal as a command line tool instead. In Cloud R, you can access this through the R terminal (bottom left) rather than installing any other software.