Created by Pieter Moris and contributors
All in all, git is pretty
Keep calm
and
git gud
Anytime you experience an insightful "AHA!"-moment, you should probably be praising these people, rather than me. Idem ditto whenever you chuckle.
TL;DR
Come back to this slide at the end of the presentation.
.git
directory. Can be local or remote. Contains references to branches (heads), tags and remote branches..git
folder.GitHub
is just one example.The following commands are referred to as
git porcelain commands;
the ones the user normally interacts with.
Later on, we'll encounter the so-called
git plumbing commands;
low-level commands to manipulate and inspect objects.
You might have heard that git is represented by a
But what are the nodes and edges of this graph?
And how does it track files?
That file in
cf/bc74af89ccf0d0ddbe488b4a2df7318786759d
is what git refers to as a blob.
Blobs are created when you git add
files. They're the most basic building block of git.
But both its name and contents look like nonsense, right?
Any kind of content you insert into the repository can be retrieved using a unique key or fingerprint
That key is a SHA-1 checksum of the contents of the file (and a short header).
You've probably encountered these before as the names of your commits.
git add dog
achieve?.git/objects/
.
$ tree .git/objects
.git/objects/
├── cf
│ └── bc74af89ccf0d0ddbe488b4a2df7318786759d
├── info
└── pack
# Compute the hash of the file in your directory
$ git hash-object dog
cfbc74af89ccf0d0ddbe488b4a2df7318786759d
# ONLY file contents (and size) determine the hash
$ echo "woof" | git hash-object --stdin
cfbc74af89ccf0d0ddbe488b4a2df7318786759d
$ printf "blob 5\000woof\n" | openssl sha1
(stdin)= cfbc74af89ccf0d0ddbe488b4a2df7318786759d
These blob objects form the beginning of our graph.
Every text file with the contents "woof" that you'll ever create in your project (or any future projects) will point to the same blob, regardless of its filename.
This means that if you'd been following along, you will have ended up with the exact same blob.
This is part of what makes git's storage method so efficient. But more on that later.
One of the biggest hurdles to wrapping your head around git, is all of its jargon.
Whenever you see the term tree, just think directory.
Note: you might see some people refer to the git graph as a tree as well.
$ tree .git/objects
.git/objects
├── 35
│ └── eaf2cfe26d5a30558c7aceaad5fadc72a09164
├── 99
│ └── 809ef5ef2a4458e883f53c0ce55fc9f7061844
├── cf
│ └── bc74af89ccf0d0ddbe488b4a2df7318786759d
├── fa
│ └── b9dd251f10ff00622bfd0f069e98b492d433c8
├── info
└── pack
# view contents of tree object
$ git cat-file -p 35eaf2c
100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d dog
Two new files showed up after committing: 35eaf2c
and fab9dd2
.
The first of these is a so-called tree object.
$ git cat-file -p 35eaf2c
100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d dog
.git/objects
with SHA-1-derived name.git commit
.Blobs record the content of files.
Trees record the directory structure by pointing to blobs.
We now have a bunch of tree and blob objects, but we haven't yet found a way to:
The final type of git object, the commit, takes care of this.
$ tree .git/objects
.git/objects
├── 35
│ └── eaf2cfe26d5a30558c7aceaad5fadc72a09164
//
├── fa
│ └── b9dd251f10ff00622bfd0f069e98b492d433c8
├── info
└── pack
# view contents of commit object
$ git cat-file -p fab9dd2
tree 35eaf2cfe26d5a30558c7aceaad5fadc72a09164
author Pieter <13552343+pmoris@users.noreply.github.com> 1535631839 +0200
committer Pieter <13552343+pmoris@users.noreply.github.com> 1535631839 +0200
Initial commit
git commit
.This gives us three "levels" of git objects:
It's possible, using only
It's also possible to manually create a commit object that points to this tree and to notify our repo that this commit exists in our branch, in order to make it reachable.
1. Add the file to the staging area (or index).
# add file to staging area
# - normally handled by `git add`
$ master 1 git update-index --add \
--cacheinfo 100644 \
99809ef5ef2a4458e883f53c0ce55fc9f7061844 cat
# the index references all files and directories that
# will be recorded by the tree
$ master 1 git ls-files --stage
100644 blob 99809ef5ef2a4458e883f53c0ce55fc9f7061844 cat
100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d dog
2. Write the index contents to a tree object.
# create tree object from index
# - normally handled by `git commit`
$ master ● 1 ✚ 1 git write-tree
a8fd4f7e27f7b943a6b1ae5e84430d56d234526c
# view its contents
$ master ● 1 ✚ 1 git cat-file -p a8fd
100644 blob 99809ef5ef2a4458e883f53c0ce55fc9f7061844 cat
100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d dog
3. Create a new commit object and chain it to the previous commit (parent).
# create commit object - normally handled by `git commit`
$ master ● 1 ✚ 1 echo "2nd commit manual" | \
git commit-tree a8fd4f7 -p fab9dd2
0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
# view its contents
$ master ● 1 ✚ 1 git cat-file -p \
0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
tree a8fd4f7e27f7b943a6b1ae5e84430d56d234526c
parent fab9dd251f10ff00622bfd0f069e98b492d433c8
author Pieter <13552343+pmoris@users.noreply.github.com> 1535829462 +0200
committer Pieter <13552343+pmoris@users.noreply.github.com> 1535829462 +0200
2nd commit manual
This second commit has an extra line for the previous commit object!
We're now ready to see how git keeps track of a project's history.
By chaining commits, each one pointing to its parent(s), a history is formed.
At this point, we can recall the history by invoking the log
command directly on the new commit.
$ master ● 1 ✚ 1 git log --stat 0ed44be
commit 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
Author: Pieter <13552343+pmoris@users.noreply.github.com>
Date: Sat Sep 1 21:17:42 2018 +0200
2nd commit manual
cat | 1 +
1 file changed, 1 insertion(+)
commit fab9dd251f10ff00622bfd0f069e98b492d433c8 (HEAD -> master)
Author: Pieter <13552343+pmoris@users.noreply.github.com>
Date: Sat Sep 1 21:14:04 2018 +0200
Initial commit
dog | 1 +
1 file changed, 1 insertion(+)
So far so good, we've (almost) successfully reproduced the porcelain command git commit
.
But why wouldn't the regular git log
command work at this point? And where do branches come into play?
So how does git keep track of what commit we're currently on?
The answer is branches!
Their nature and behaviour will also become more clear when we look at them from a DAG perspective.
They provide names for certain points in the graph and allow us to easily access them, instead of having to remember the hash digests of specific commits.
Branches reside in...
$ master ● 1 ✚ 1 ls .git/refs/heads/
master
And are simply plain text files pointing to a commit.
$ master ● 1 ✚ 1 cat .git/refs/heads/master
fab9dd251f10ff00622bfd0f069e98b492d433c8
The master branch is pointing to the initial commit that we created using the regular porcelain command git commit
.
We can update the master reference to point to the new commit.
# change ref file for master branch to a new commit object
$ master ● 1 ✚ 1 git update-ref refs/heads/master 0ed44be
$ master ✚ 1 cat .git/refs/heads/master
0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
$ master ✚ 1 git log
commit 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1 (HEAD -> master)
Author: Pieter <13552343+pmoris@users.noreply.github.com>
Date: Sat Sep 1 21:17:42 2018 +0200
2nd commit manual
commit fab9dd251f10ff00622bfd0f069e98b492d433c8
Author: Pieter <13552343+pmoris@users.noreply.github.com>
Date: Sat Sep 1 21:14:04 2018 +0200
Initial commit
Normally, this moving of the reference branch happens behind the scenes when we use the porcelain command git commit
.
Let's create a new branch now to see how it affects the graph.
# Retrieve the cat object from our repository
# and place it in the working directory
# Why? the file never existed except as a blob
$ master ✚ 1 git cat-file -p 99809ef5 > cat
# Create new branch
$ master git checkout -b "no-cats-allowed"
Switched to a new branch 'no-cats-allowed'
# Check references
$ no-cats-allowed ✚ 1 cat .git/refs/heads/no-cats-allowed
0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
# Check current commit
$ no-cats-allowed ✚ 1 git rev-parse HEAD
0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
# or just use git log
Creating a new branch merely adds a reference to the same commit that we were on before. It is literally just a text file.
$ no-cats-allowed git rm cat
rm 'cat'
$ no-cats-allowed ● 1 mkdir doge
$ no-cats-allowed ● 1 echo "bark" > doge/much
$ no-cats-allowed ● 1 … 1 echo "floof" > doge/very
$ no-cats-allowed ● 1 … 1 git add doge/
$ no-cats-allowed ● 3 git status
On branch no-cats-allowed
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
deleted: cat
new file: doge/much
new file: doge/very
$ no-cats-allowed ● 3 git commit -m "doggies"
[no-cats-allowed 40da2ef] doggies
3 files changed, 2 insertions(+), 1 deletion(-)
delete mode 100644 cat
create mode 100644 doge/much
create mode 100644 doge/very
$ no-cats-allowed git rev-parse HEAD
40da2ef175b6693c785251467946f1cdaf5e6552
$ no-cats-allowed git cat-file -p HEAD
tree 60c51895339b14261e18cb4555b4a43d1cfdc397
parent 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
author Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200
committer Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200
doggies
How does git know which working directory to show us and which branch to operate on when we use porcelain commands like git commit
?
That's where HEAD comes into play.
# HEAD points to a ref or branch name
$ no-cats-allowed cat .git/HEAD
ref: refs/heads/no-cats-allowed
# the branch references a commit
$ no-cats-allowed cat .git/refs/heads/no-cats-allowed
40da2ef175b6693c785251467946f1cdaf5e6552
# shorthand for commit id
$ no-cats-allowed git rev-parse HEAD
40da2ef175b6693c785251467946f1cdaf5e6552
# shorthand for commit contents
$ no-cats-allowed git cat-file -p HEAD
tree 60c51895339b14261e18cb4555b4a43d1cfdc397
parent 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
author Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200
committer Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200
doggies
This is the final aspect of what happens behind the scenes when we use the porcelain command git commit
.
.git
directory.commit
or checkout
a branch (or commit).A (rather odd) way of saying that we are checking out a commit that no branch points to directly.
When you checkout
a commit instead of a branch, HEAD will point directly to this commit, instead of to a ref/branch.
This means that you won't be able to reach this commit again in the future (directed arrows!), unless you remember its hash, and git's garbage collection will remove it at some point.
We briefly mentioned this during the manual commit section, but for the sake of completion...
For a good write-up, see: https://stackoverflow.com/questions/25351450/what-does-adding-to-the-index-really-mean-in-git
Conceptually it's an ever-changing tree object stored as a single large binary file .git/index
Whenever you use git commit
, the tree object that is created will be based on the contents of the index.
It can be viewed via (see previous slides)
$ master 1 git ls-files --stage
100644 blob 99809ef5ef2a4458e883f53c0ce55fc9f7061844 cat
100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d dog
The SHA-1 hash names used for git objects provides an elegant way to verify the integrity of all the data in the repository.
The hash of any one object depends on the hashes (and contents) of all the files that came before it. A similar technique is used by Bitcoin and BitTorrent. (But don't call git a blockchain unless you want to upset some people.)
This is also the reason why re-writing history can be dangerous in git. Everything downstream will be affected. (Although this generally only becomes a problem when working with collaborators.)
When merging branches, each commit is interpreted as a set of changes.
Each merge tries to consolidate (at least) three different snapshots: the two you provide in the command and their most recent common parent.
The following command finds the common ancestor commit of two branches (better: of the commits at their tips).
$ git merge-base master no-cats-allowed
0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
One of the following scenarios can occur:
Source: https://codewords.recurse.com/issues/two/git-from-the-inside-out
Source: https://stackoverflow.com/a/3639387
More info: https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified
git reset [<mode>] [<commit>]
--soft
: leaves index and working dir untouched.--mixed (default)
: resets index and leaves working dir untouched.--hard
: resets index and working dir. One of the few ways to lose progress in git!A common point of confusion is the relation between local and remote branches.
As we've seen, branches are just references to specific commits.
Ergo, remote branches are pointers to specific commits in the remote repository.
You can't move these references yourself.
More info: https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches
This means that you're up-to-date with the ref called origin/master, which is a local reference on your local repo stored in .git/refs/remotes/origin/master
.
This ref is tracking the remote master branch, but your local repository does not know the remote state until you perform a fetch or pull.
This also explains why you must sometimes do a merge with origin/master:
the local and remote history can diverge and must be reconciled.
Oh, and the following two steps are equivalent.
$ git pull
$ git fetch
$ git merge origin/master
$ git push --delete remote_name branch_name
$ git branch -d branch_name
There's a bunch of stuff to know outside of regular usage. Fortunately, most of these things can be found in one of the many excellent resources online. The Pro Git book is usually a good starting point.
git log --oneline --abbrev-commit --all \
--graph --decorate --color
See this SO post for even more elaborate options (hint: create an alias...).
.gitignore
Adding a file to the ignore file will prevent it from being added to the index, but what if it's already there?
git rm --cached <file>
or
git rm -r --cached .
git add -A
I mentioned that git doesn't store diffs, but every new file (based on its contents) is stored in its unique blob.
But this is not the whole story.
Git does regularly create binary deltas called packfiles based on its internal objects. You can manually trigger this by calling git gc
.
More info: https://git-scm.com/book/en/v2/Git-Internals-Packfiles
There's an extensive amount of syntax to refer to specific commits.
HEAD
HEAD^ # parent
HEAD^^ # parent's parent
HEAD^2 # second parent if commit has >1 parent
HEAD~n # n'th generation ancestor
git push -u origin test
Or equivalently
git push origin master
git branch --set-upstream master origin/master
Makes current branch track a remote one, i.e. allows pushing/pulling without specifying the remote ref name.
$ mkdir kakapo-repo && cd kakapo-repo/ && git init
Initialized empty Git repository in /media/pieter/DATA/Wetenschap/Doctoraat/biodm/reboot2018/kakapo-repo/.git/
$ No commits yet on master touch cultofthepartyparrot
$ No commits yet on master … 1 git add -A
$ No commits yet on master ● 1 git commit -m "Initial kakapo commit"
[master (root-commit) 0e7e785] Initial kakapo commit
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 cultofthepartyparrot
$ master echo 'https://www.youtube.com/watch?v=9T1vfsHYiKY' \
> secretsofthekakapo
$ master … 1 git add -A
$ master ● 1 git commit -m "Maybe this is better kept private..."
[master 920a49e] Maybe this is better kept private...
1 file changed, 1 insertion(+)
create mode 100644 secretsofthekakapo
$ master git hash-object secretsofthekakapo
29d7910d2fcf6bdb5dfbbbd1dc725d9a5648f7d7
$ master touch sirocco
$ master … 1 rm secretsofthekakapo
$ master ✚ 1 … 1 git add -A
$ master ● 2 git commit -m "More kakapos and removed compromising material"
[master 7b8164b] More kakapos and removed compromising material
2 files changed, 1 deletion(-)
delete mode 100644 secretsofthekakapo
create mode 100644 sirocco
$ master git checkout HEAD^
$ 920a49e ls
cultofthepartyparrot secretsofthekakapo
$ 920a49e git checkout master
Previous HEAD position was 920a49e Maybe this is better kept private...
Switched to branch 'master'
$ master git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch secretsofthekakapo' \
--prune-empty --tag-name-filter cat -- --all
Rewrite 920a49ef4d3e4558c42dd50e5700697b6e182900 (2/3) (0 seconds passed, remaining 0 predicted) rm 'secretsofthekakapo'
Rewrite 7b8164b0ede12d9486e5de0d3aea371185da1b0f (3/3) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/master' was rewritten
$ master git push origin --force --all
This is one of the few occassions where you should use push --force
.
More info: https://help.github.com/articles/removing-sensitive-data-from-a-repository/
https://stackoverflow.com/questions/36255221/what-is-the-difference-between-tree-filter-and-index-filter-in-the-git
$ master git cat-file -p 29d7
https://www.youtube.com/watch?v=9T1vfsHYiKY
$ master git for-each-ref --format='delete %(refname)' \
refs/original | git update-ref --stdin
$ master git reflog expire --expire=now --all
$ master git gc --prune=now
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), done.
Total 5 (delta 0), reused 0 (delta 0)
$ master git cat-file -p 29d7
fatal: Not a valid object name 29d7
More info about final clean up: https://help.github.com/articles/removing-sensitive-data-from-a-repository/
https://stackoverflow.com/questions/16584256/after-filter-branch-ing-to-remove-file-from-git-repo-file-remains-in-pack-file
This should set you up to make your git experience smooth sailing cruising from here on out.
Have a look at this excellent interactive visualisation to reinforce some of the concepts we've gone over: https://onlywei.github.io/explain-git-with-d3/.
.git/objects
directory as you do so and using the git cat-file -p/t
commands to inspect new objects..git/index
file after staging content? Think about it, then try it!git commit --amend
change the name (i.e. SHA-1 digest) of the commit that is updated? Think about it, then try it! Read up on the amend command here if you're not familiar with this command.git status
and git checkout
on a commit's hash directly (instead of on a ref). What will happen if you create a new commit here? Inspect the graph and try to find a route that can reach this new commit. Look up the git prune
command.For all of these activities, try to keep the DAG structure of git in the back of your mind.