Chapter 5 Getting your code onto Github
So far, all of the previous chapters in this book have assumed that you are working on an existing repository. This is the normal starting place for most beginners, working on an existing code base someone else has created and with code already on Github. However, many people also have to work with code which may already be written but is not yet on Github.
This chapter talks you through the two methods of moving code onto Github; either starting with Git or starting with Github, and will explain both the pros and cons of using either method.
If you have a new project you want to start directly on Github, I would recommend you follow method 2 and simply start writing your code at step 3 rather than adding existing code files.
5.1 Why move to Github?
If you’re not already convinced that you should be making use of Github for all of your projects, it offers:
Easy version control for your code: much better than 20 versions of the same code with slightly different names on a shared drive!
Enables collaborative coding, development, and bug fixing: No risk of overwriting someone else’s changes with your own, or not understanding why something was done.
Makes it easy to move to Cloud R: When swapping from one platform to another, or between Cloud R servers, Github makes it seamless.
Store documentation alongside code: Rather than keeping track of documentation distinct from code, or relying on code comments throughout the code file itself, Github makes it easy to store both together.
Common, reproducible project structure: You always know where to look for your code, your documentation, your issues log, etc.
However, when moving code that you already rely on for one or more projects, it’s important to sure that you’re moving that code at the right time! There are three major points to consider when moving projects:
Your collaborators are ready to start using Github too: Make sure everyone in the project has a Github account and sufficient training, otherwise it will become a barrier to them contributing.
You’re not sharing anything secret on Github: In the Gitflow chapter we already discussed why storing secret information in code is a bad idea, regardless of where that code is stored. Use the opportunity of moving to Github to ensure you’ve removed any passwords, keys, or other secret details before setting up the repo.
You’re ready to update your code to read/write data to shared locations: Moving code to Github requires few alterations to the code itself, the only exception being if you currently store data and code inside the same project. This is bad practice and not possible when working with Github, so you’ll need to have the time to update your code to read from a shared location away from your code before making the move.
5.2 Method 1: from Git to Github
The first of the two methods starts by creating a git repository locally, and then combining that with a blank Github repository. There are a number of steps to the process, but with practice it’s a relatively straightforward process! Note that for most of the steps you will need to write commands into a Git command line interface, sadly this is an example of something the RStudio GUI can’t do!
5.2.1 Step 1: Git track your project
Open up your command line interface and use the cd
command to navigate to an existing local project (e.g. cd "C:/my R files/my R project"
).
Note that this needs to be a local project, e.g. on a personal drive or Cloud R home directory. If your project is currently on a shared drive, this method won’t work, so stop now and try the second method!
Create a git repository inside your project folder using git init
. You should get a success message which says: Initialized empty Git repository in <the folder you specified>
.
5.2.2 Step 2: Set up your gitignore
This is one step you can do in the RStudio GUI (although if you prefer the CLI commands are in the expandable box below).
Open up your project inside R Studio in the normal way (File -> Open project). Create a new plain text file (File -> New File -> Text file) and call it .gitignore
(note the preceeding dot and no file extension).
The file will open automatically in R. You can add the names or partial names of files you don’t want to upload to Github to this notepad. If you’re not sure what should go here, you can use the analyst template to get started. Save the file and close it when you’re done.
Using the GUI, stage your .gitignore file, and commit it with the commit message ‘first commit’. Check out chapter 1 if you’ve forgotten how to do this.
What’s happening in the command line when you do this
Make sure you are still inside your project in the command line
Use the command “touch .gitignore” to create the gitignore file
Open up the newly created .gitignore file in notepad or RStudio and complete it as above
Add your gitignore file to your git repository using “git add .gitignore”.
Commit the gitignore to your local git repo using “git commit –m ‘first commit’”
5.2.3 Step 3: Create a blank Github repo
These next steps happen in Github! Start by navigating to the DfT Github and select the green “new repository” button.
At this point, you need to ensure you’re creating a completely blank repository, otherwise the later steps won’t work. To do this, make sure you leave the repository with all of the standard settings, including:
- No template
- No README file
- Gitignore template: None
- Licence: None
Give the repo the same name as your local project (this isn’t necessary but makes it much easier!), and add a short description of the project to the repository.
Click “create repository”. The resulting repository should be completely blank; if it includes any files (a gitignore or README are the usual culprits), this won’t work, so you’ll need to go back and start again!
5.2.4 Step 4: Join the two repositories together
To link your local repository to your remote one you need to return to your git command line:
Depending on whether you’re using HTTPS or SSH cloning, the command is either:
git remote add origin https://github.com/department-for-transport/<YOUR REPO NAME>.git
(for HTTPS cloning)
OR
git remote add origin git@github.com:department-for-transport/<YOUR REPO NAME>.git
(for SSH cloning)
You can now push your .gitignore up to the linked remote repository using git push -u origin main
.
Your repositories are now linked! From now on, you can use the normal staging/committing/pushing process to commit and push the rest of your files to the repository.
5.2.5 When should you use this method?
This is an ideal method for projects which are already stored in local folders (e.g. data exploration you’re doing on your local computer) rather than on a shared drive.
It’s also very straightforward for projects that already keep data and code separate, as you can stage and commit all files in one go, rather than sorting through and separating them.
This method also allows you to build a repo around established features within a project; these might be things like custom git files if it’s a pre-existing git repo, or for projects that require a standardised structure such as a package or bookdown.
Finally, this is a good approach if you’re confident with using the git command line; there are some stages that you need to use git commands with, so it can be a bit more daunting if you’re unfamiliar with these.
5.3 Method 2: from Github to Git
The second of the two methods starts the opposite way round; by creating a remote Github repository, turning that into a local git repository, and then populating that with your code files. There are far fewer steps to this process, but the trade-off is a slightly more manual process of picking and choosing the files to add to the local repository at the end. As a bonus, you don’t need to use a command line interface for this process at all, but you can if you’d prefer!
5.3.1 Step 1: Create a Github repo
Start by navigating to the DfT Github and select the green “new repository” button. Unlike the previous method, you can set up a repository as normal at this point.
If you’re unfamiliar with creating Github repositories, the best option is to select to use the department-for-transport/analyst_template, this sets up the repository with all the files you need and a useful gitignore file.
If you are an experienced Github user, you may want to use another template or set up your own repo with custom settings.
Make sure the repository owner is set to the DfT organisation, and you can leave all the other settings as default. Give the repository a name and a description, and click “create repository”.
5.3.2 Step 2: Clone the new repo
You can now clone your repository to create a local copy, in the normal way!
- Get the Github link for the repository you’ve just set up, making sure to select either the HTTPS or SSH url as necessary.
- Go to file -> new project -> version control -> git
- In the window that pops up, paste the repository url into the first line (repository URL)
- The second line (repository name) should auto-fill
- In the final box, browse to select a location to clone to. In Cloud R, your home directory (~) is a good choice; never use a shared drive!
- Click create project! Git will clone the repository down from Github, and open the project so you’re ready to start working in it. If you’re using HTTPS authentication, you might have to provide your username (email address) and password (PAT token) at this point, for SSH you won’t have to
What’s happening in the command line when you do this
Navigate to the correct working directory using the “cd” command followed by the location
Use the command “git clone” followed by the repository url to clone the repository. Again, you may need to enter your authentication details at this point.
Navigate into the new git repository that’s been set up using the cd command followed by the name of the repository
5.3.3 Step 3: Add your files to the repo
After the previous step, you’ve now successfully cloned your template repo locally, and it’s linked to the remote repository.
You can now add any code, template files, etc to your local repository. You can move these across one-by-one, or do a bulk upload in a zip file using the Cloud R interface. If you’ve previously been storing data or outputs in the same folder as your code, this is the point at which you’d separate the two so your data remains on the shared drive and your code moves into the repo.
The template .gitignore you have used (if you have used the template in step 1 as instructed!) will prevent upload of files such as xlsx, csv, html, etc to prevent the inclusion of any sensitive data.
Once you’ve added all the files into the project, you can use the standard stage/commit/push process to push these files up to the remote Github repository.
And that’s it!
5.3.4 When should you use this method?
This method is ideal for projects which are currently stored in a shared drive location, and/or are currently storing data and code together in the same file structure. It gives you the opportunity to move the code to a personal drive, and also to separate out the data and code as part of the moving process, removing the risk of accidentally sharing data.
It’s also a great option if you’d like to standardise your Github repo structure and take the stress out of building your own repo settings. The provided templates on the DfT Github include a gitignore which automatically ignores common filetypes, and pull request and issue templates, and are a really useful starting point for any analytical project.
This is a great option if you’re unfamiliar or not confident with command line tools, as all of the git steps are simple and familiar, and can be done in the R point-and-click interface.
Finally, this is a great option if you are starting a new project entirely from scratch; when you get to step 3 you just start writing your brand new code rather than uploading existing files.