Python for Scientists – Version Control

Do you have 15+ versions of your manuscript/thesis/report? Maybe they have names such as report.doc, report_pd_edits.doc, report_final.doc, report_final_v3.doc, etc.? At some point there are so many files you have to filter by modification date to know which one you were working on last. Or maybe you accidentally saved over a document and lost a week or two worth of struggle? This is where version control can help!

Different types of version control exist, but I’m going to discuss Git. There are two main reasons to use Git. One is version control. Version control keeps your project directories from becoming bloated with multiple copies of nearly identical files (examples in previous paragraph). The second reason is for collaboration. Other people can be invited to collaborate on a Git project. This provides all users access to each other’s latest changes (assuming timely pushes to the central repository). The feature image above from XKCD wraps Git up in a nutshell. You will memorize a few commands (or create a cheat sheet of commands for yourself), and if a problem pops up then it’s off to stackoverflow!

I know that this post doesn’t specifically contain Python information. However, version control is a necessity when it comes to keeping track of code changes!

Topics covered in this post

  • Git setup
  • Branches
  • Create remote repository from local project
  • Clone a repository

Click here for a list of related posts.

Updated: 2019-10-05

Initial Git Setup

While the command line is slightly more difficult to use than a GUI, it’s easier to recover from problems if you are already comfortable with basic command line commands. Go here to download Git-SCM for Windows which installs both a Git command line prompt and a Git GUI. Other options exist for installing Git, and other GUI’s are also available.

The first step after installing Git-SCM is to setup a global username and email which will provide credit for each of your commits to the repository. Open a Git BASH command line prompt and type the following commands, pressing “enter” after each line:

git config --global user.name <userName>
git config --global user.email <email>

Replace <username> and <email> with your own information.

Create Remote Git Repository from an Existing Local Project

To create a Git repository with a project folder that already exists (most common method unless working in a collaboration), open a Git BASH session (if not already open) and navigate to the top-level directory of the project (e.g. ~/Documents/Projects/myProject/). Then, type the following command into Git BASH and press the ‘enter’ key:

git init .

This will initialize a Git project in the top-level directory of the project ‘myProject’ in this example. The next step is to generate a “.gitignore” (don’t forget the leading period on the filename) file. This file tells Git which files/directories should not be sent to an online repository. To generate a text file in Git BASH, type “nano” and press the enter key. An editor will appear in the command line. To save edits, press “control + x” and then press “y” and then the “enter” key. This combination will exit Nano and save edits. If edits shouldn’t be saved, press “n” instead of “y”. Below is an example of a “.gitignore” file.

 data/*.nc
 *.png
 personalData.txt 

Git repositories are not the correct place to store large files. If datasets are large, think greater than 50 MB, then add the data file extension to the “.gitignore” file. Usually, plot extensions should also be ignored.

Next, go to an online repository (such as GitLab) and create a user account if you don’t already have one. Then, create a project with the same name as the top-level directory on the local machine. For example, if the top-level project path is “~/Documents/Projects/myProject”, the GitLab repository should be named “myProject”. Make sure to mark the repository as “private” if you don’t want other people to see/access the repository. A repository type can be changed to “public” at a later date.

Now it’s time to tell Git (running on your local machine) where the remote repository (on GitLab) is located. Open a Git BASH command prompt and navigate to the top-level directory, if not already there. Then type the following command and press the “enter” key:

git remote
add origin <url>

Where <url> should be the URL of the online repository that was created on GitLab (or other online repository website). An example URL is, https://gitlab.com/userName/myProject where “userName” is the username associated with your GitLab account, and “myProject” is the top-level directory name on the local machine.

The order of operations for a first commit is as follows:

git pull origin master
git add .
git commit -m “commit message here”
git push -u origin master 

The first line simply grabs anything that might already be in the online repository (which should be nothing). The second line adds all files/folders within the project directory that are not listed in the .gitignore file. Line three commits the changes added by line two, and provides a message. Messages are encouraged for all commits as a reminder of what was changed. The last line pushes the commit to the repository and sets up the branch (branches are explained later).

Clone a Remote Git Repository to a Local Computer

To clone a project that already exists as a remote Git Repository, open a Git BASH prompt and navigate to the location where the project will be saved (e.g. ~/Documents/Projects/). Then type the following line and press the “enter” key:

git clone <http://exampleSite.html>

Where <http://exampleSite.html> is the full path to the remote Git repository.

Git Branches

Git branches are a way to separate a working copy of the project from a development copy. The main branch, or trunk of the tree, is the Master branch. This branch can be renamed, but will remain the default branch. Perhaps a milestone is reached while working on a project. Maybe the first milestone is to read the first data file into memory. At this point, the Master branch works. Let’s say the next step in the project is to perform a statistical analysis of the data. To ensure nothing breaks on the code that already works, a second branch can be created. Branches are created with the following command (in a Git BASH prompt):

git branch <branchName>

Where <branchName> is a descriptive name. I tend to name the second branch “develop”. To switch between branches, type:

git checkout <branchName>

To push the new branch to a remote repository type:

git push -u

To view all available branches type:

git branch

Branching can be as simple as two branches described above, or numerous branches. However, to keep things easy, stick with two branches initially. A master branch for stable/release-ready code, and a develop branch for new/enhanced code.

Eventually, newly developed code (in this example, a statistical analysis) is vetted to work properly. At this point, the development branch (develop) can be merged with the Master branch. Merging updates the Master branch with all changes performed in the develop branch. To do so, type the following commands, pressing the “enter” key after each line:

 git checkout master
 git merge develop
 git push
 git checkout develop 

The first line ensures the current branch is set to master. The second line will merge all changes from the develop branch to the master branch. The local repository must then be pushed to the online repository with line three. The last line switches back to the develop branch so future changes are not performed on the master branch.

BONUS

Congratulations on making it through the entire post. As a reward, here’s a bonus for your attention.

If you decide to stop tracking a file within the local Git repository that was previously pushed to the online repository, add the filename to .gitignore and then type the following command into a Git BASH prompt:

git rm --cached <filename>

Where <filename> is the name of the file you no longer want to track with Git.


Liked it? Take a second to support AtmoGuy on Patreon!
Become a patron at Patreon!

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑