git gud

or "How I Learned to Stop Worrying and Love the blob"

Created by Pieter Moris and contributors

All in all, git is pretty

amazing!


At least, that's what everyone keeps telling me.

Does this situation sound familiar?

xkcd-git
they-said

Keep calm

and

git gud

DISCLAIMER
Resources created by people more knowledgeable and witty than I

Anytime you experience an insightful "AHA!"-moment, you should probably be praising these people, rather than me. Idem ditto whenever you chuckle.

A brief recap of git

TL;DR

https://rogerdudler.github.io/git-guide/

A few more resources for the absolute beginner

Glossary of terms

Come back to this slide at the end of the presentation.

  • Repository: Archive with snapshots of working directory (commits). Defined by .git directory. Can be local or remote. Contains references to branches (heads), tags and remote branches.
  • Working tree or directory: a directory containing a .git folder.
  • Index or staging area: a (conceptual) area in between your working directory and repository.
  • Commit: a snapshot of your working directory at a specific point in time.
  • Branch: a reference/name/pointer/alias for a specific commit.
  • master: the default name for your local branch.
  • origin: the default name for your main remote repository.
  • HEAD: a file pointing to the currently checked-out branch or commit.

Source: https://jwiegley.github.io/git-from-the-bottom-up/

git

=

distributed version control system

  • Snapshots: keep track of previous versions of files (no more FINAL_final_revised_project3.code files cluttering up your folders).
  • Distributed: mirror your project (and its history) on different machines.
    • There can be many remote repositories, GitHub is just one example.
    • Allows you to work offline.

The following commands are referred to as
git porcelain commands;
the ones the user normally interacts with.


Later on, we'll encounter the so-called
git plumbing commands;
low-level commands to manipulate and inspect objects.

they-said

Source: https://blog.osteele.com/2008/05/my-git-workflow/

You might have heard that git is represented by a

Directed Acyclic Graph (DAG)


But what are the nodes and edges of this graph?

And how does it track files?

Enter the .git directory

That file in

cf/bc74af89ccf0d0ddbe488b4a2df7318786759d
is what git refers to as a blob.


Blobs are created when you git add files. They're the most basic building block of git.


But both its name and contents look like nonsense, right?

two, three,

cha, cha, SHA-1?

Git is a content-addressable filesystem

Any kind of content you insert into the repository can be retrieved using a unique key or fingerprint

That key is a SHA-1 checksum of the contents of the file (and a short header).

You've probably encountered these before as the names of your commits.

Git objects

Blobs 🍮

So what did git add dog achieve?


  • New blob file in .git/objects/.
  • Uniquely named by hashing the file content (and a header).
  • The first two characters of the 40 character SHA-1 digest form a sub-directory.
  • Remainder forms the file name.
Note: the index or stage is also prepped in this step. More on that later.

Git provides us with a bunch of plumbing commands to inspect and even create blobs ourselves.


							$ tree .git/objects
							.git/objects/
							├── cf
							│   └── bc74af89ccf0d0ddbe488b4a2df7318786759d
							├── info
							└── pack

							# Compute the hash of the file in your directory
							$ git hash-object dog
							cfbc74af89ccf0d0ddbe488b4a2df7318786759d

							# ONLY file contents (and size) determine the hash
							$ echo "woof" | git hash-object --stdin
							cfbc74af89ccf0d0ddbe488b4a2df7318786759d

							$ printf "blob 5\000woof\n" | openssl sha1
							(stdin)= cfbc74af89ccf0d0ddbe488b4a2df7318786759d
						

directed acyclic graph structure

blob

These blob objects form the beginning of our graph.

Every text file with the contents "woof" that you'll ever create in your project (or any future projects) will point to the same blob, regardless of its filename.

This means that if you'd been following along, you will have ended up with the exact same blob.

This is part of what makes git's storage method so efficient. But more on that later.

Git objects

Trees 🌳

One of the biggest hurdles to wrapping your head around git, is all of its jargon.


Whenever you see the term tree, just think directory.

Note: you might see some people refer to the git graph as a tree as well.

Tree objects record the directory structure.


							$ tree .git/objects
							.git/objects
							├── 35
							│   └── eaf2cfe26d5a30558c7aceaad5fadc72a09164
							├── 99
							│   └── 809ef5ef2a4458e883f53c0ce55fc9f7061844
							├── cf
							│   └── bc74af89ccf0d0ddbe488b4a2df7318786759d
							├── fa
							│   └── b9dd251f10ff00622bfd0f069e98b492d433c8
							├── info
							└── pack

							# view contents of tree object
							$ git cat-file -p 35eaf2c
							100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d	dog
						

Two new files showed up after committing: 35eaf2c and fab9dd2.
The first of these is a so-called tree object.

Tree objects record the directory structure.


							$ git cat-file -p 35eaf2c
							100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d	dog
						
  • Stored in .git/objects with SHA-1-derived name.
  • Contents: 1 line with associated mode, type, filename and SHA-1 hash per target node => this is enough to restore any file.
  • DAG-view: nodes that point to blobs or other trees (i.e. sub-directories).
  • Trees are created from the current state of the index (more on that later).
  • Trees are stored whenever a commit is made via git commit.
  • Trees made up of the same blobs/trees always hash to the same digest (type, size, contents).

directed acyclic graph structure

Blobs record the content of files.

Trees record the directory structure by pointing to blobs.

graph-tree

Git objects

Commits 💾

We now have a bunch of tree and blob objects, but we haven't yet found a way to:


  • Record meta-data about the snapshots (who what where?).
  • Traverse these snapshots.


The final type of git object, the commit, takes care of this.


							$ tree .git/objects
							.git/objects
							├── 35
							│   └── eaf2cfe26d5a30558c7aceaad5fadc72a09164
							//
							├── fa
							│   └── b9dd251f10ff00622bfd0f069e98b492d433c8
							├── info
							└── pack

							# view contents of commit object
							$ git cat-file -p fab9dd2
							tree 35eaf2cfe26d5a30558c7aceaad5fadc72a09164
							author Pieter <13552343+pmoris@users.noreply.github.com> 1535631839 +0200
							committer Pieter <13552343+pmoris@users.noreply.github.com> 1535631839 +0200

							Initial commit
						
  • Created alongside the tree object via git commit.
  • DAG-view: a commit points to a specific tree (or directory states)
  • Contains author, committer, time stamp and commit message.
  • Name derived from SHA-1 digest of contents (and will thus always be unique).

directed acyclic graph structure

graph-tree

This gives us three "levels" of git objects:


  • Level 1: Blobs record the content of files. Each new file or change gets its own blob.
  • Level 2: Trees record a snapshot of the staged(!) directory structure; a collection of folders and file revisions, by pointing to blobs.
  • Level 3: Commits point to trees and provide meta-data.
Do you recall that cat blob we created manually?

It's possible, using only git plumbing commands, to create a tree object that references it.

It's also possible to manually create a commit object that points to this tree and to notify our repo that this commit exists in our branch, in order to make it reachable.


You can refer to this and this guide for a full overview.

1. Add the file to the staging area (or index).


							# add file to staging area
							# - normally handled by `git add`
							$  master 1  git update-index --add \
							  --cacheinfo 100644 \
							  99809ef5ef2a4458e883f53c0ce55fc9f7061844 cat

							# the index references all files and directories that
							# will be recorded by the tree
							$  master 1  git ls-files --stage
							100644 blob 99809ef5ef2a4458e883f53c0ce55fc9f7061844	cat
							100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d	dog
						

2. Write the index contents to a tree object.


							# create tree object from index
							# - normally handled by `git commit`
							$  master ● 1 ✚ 1  git write-tree
							a8fd4f7e27f7b943a6b1ae5e84430d56d234526c

							# view its contents
							$  master ● 1 ✚ 1  git cat-file -p a8fd
							100644 blob 99809ef5ef2a4458e883f53c0ce55fc9f7061844	cat
							100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d	dog
						

3. Create a new commit object and chain it to the previous commit (parent).


							# create commit object - normally handled by `git commit`
							$  master ● 1 ✚ 1  echo "2nd commit manual" | \
							  git commit-tree a8fd4f7 -p fab9dd2
							0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1

							# view its contents
							$  master ● 1 ✚ 1  git cat-file -p \
							  0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
							tree a8fd4f7e27f7b943a6b1ae5e84430d56d234526c
							parent fab9dd251f10ff00622bfd0f069e98b492d433c8
							author Pieter <13552343+pmoris@users.noreply.github.com> 1535829462 +0200
							committer Pieter <13552343+pmoris@users.noreply.github.com> 1535829462 +0200

							2nd commit manual
						

This second commit has an extra line for the previous commit object!

We're now ready to see how git keeps track of a project's history.

directed acyclic graph structure

graph-manual-noref

By chaining commits, each one pointing to its parent(s), a history is formed.

  • Level 1: Blobs record content of files. Each new file or change gets its own blob. Identical blobs are re-used by trees.
  • Level 2: Trees record a snapshot of the directory structure; a collection of folders and file revisions, by pointing to blobs.
  • Level 3: Commits point to trees, provide meta-data and form a history or timeline by pointing to parent commits..


New commits always point to their parents (i.e. backwards in time), never the other way around!
(cf. directed acyclic graph)

At this point, we can recall the history by invoking the log command directly on the new commit.


							$   master ● 1 ✚ 1  git log --stat 0ed44be
							commit 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
							Author: Pieter <13552343+pmoris@users.noreply.github.com>
							Date:   Sat Sep 1 21:17:42 2018 +0200

								2nd commit manual

								cat | 1 +
								1 file changed, 1 insertion(+)

							commit fab9dd251f10ff00622bfd0f069e98b492d433c8 (HEAD -> master)
							Author: Pieter <13552343+pmoris@users.noreply.github.com>
							Date:   Sat Sep 1 21:14:04 2018 +0200

								Initial commit

								dog | 1 +
								1 file changed, 1 insertion(+)
						

So far so good, we've (almost) successfully reproduced the porcelain command git commit.


But why wouldn't the regular git log command work at this point? And where do branches come into play?

References and branches ⎇

So how does git keep track of what commit we're currently on?

The answer is branches!

Their nature and behaviour will also become more clear when we look at them from a DAG perspective.

Branches are nothing more than references or pointers or aliases to specific commit objects.


They provide names for certain points in the graph and allow us to easily access them, instead of having to remember the hash digests of specific commits.

Branches reside in...


							$  master ● 1 ✚ 1  ls .git/refs/heads/
							master
						

And are simply plain text files pointing to a commit.


							$  master ● 1 ✚ 1 cat .git/refs/heads/master
							fab9dd251f10ff00622bfd0f069e98b492d433c8
						

directed acyclic graph structure

graph-master

The master branch is pointing to the initial commit that we created using the regular porcelain command git commit.

We can update the master reference to point to the new commit.


							# change ref file for master branch to a new commit object
							$  master ● 1 ✚ 1  git update-ref refs/heads/master 0ed44be

							$  master ✚ 1  cat .git/refs/heads/master
							0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1

							$  master ✚ 1  git log
							commit 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1 (HEAD -> master)
							Author: Pieter <13552343+pmoris@users.noreply.github.com>
							Date:   Sat Sep 1 21:17:42 2018 +0200

								2nd commit manual

							commit fab9dd251f10ff00622bfd0f069e98b492d433c8
							Author: Pieter <13552343+pmoris@users.noreply.github.com>
							Date:   Sat Sep 1 21:14:04 2018 +0200

								Initial commit
						

Normally, this moving of the reference branch happens behind the scenes when we use the porcelain command git commit.

directed acyclic graph structure

graph-manual-updated

Let's create a new branch now to see how it affects the graph.


							# Retrieve the cat object from our repository
							# and place it in the working directory
							# Why? the file never existed except as a blob
							$  master ✚ 1  git cat-file -p 99809ef5 > cat

							# Create new branch
							$  master  git checkout -b "no-cats-allowed"
							Switched to a new branch 'no-cats-allowed'

							# Check references
							$  no-cats-allowed ✚ 1  cat .git/refs/heads/no-cats-allowed
							0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1

							# Check current commit
							$  no-cats-allowed ✚ 1  git rev-parse HEAD
							0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
							# or just use git log
						

Creating a new branch merely adds a reference to the same commit that we were on before. It is literally just a text file.

directed acyclic graph structure

graph-branch

Let's diverge the branch


							$  no-cats-allowed  git rm cat
							rm 'cat'

							$  no-cats-allowed ● 1  mkdir doge

							$  no-cats-allowed ● 1  echo "bark" > doge/much

							$  no-cats-allowed ● 1 … 1  echo "floof" > doge/very

							$  no-cats-allowed ● 1 … 1  git add doge/

							$  no-cats-allowed ● 3  git status
							On branch no-cats-allowed
							Changes to be committed:
								(use "git reset HEAD <file>..." to unstage)

								deleted:    cat
								new file:   doge/much
								new file:   doge/very
						

New commit


								$  no-cats-allowed ● 3  git commit -m "doggies"
								[no-cats-allowed 40da2ef] doggies
									3 files changed, 2 insertions(+), 1 deletion(-)
									delete mode 100644 cat
									create mode 100644 doge/much
									create mode 100644 doge/very

								$  no-cats-allowed  git rev-parse HEAD
								40da2ef175b6693c785251467946f1cdaf5e6552

								$  no-cats-allowed  git cat-file -p HEAD
								tree 60c51895339b14261e18cb4555b4a43d1cfdc397
								parent 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
								author Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200
								committer Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200

								doggies
							

directed acyclic graph structure

graph-branch2

Don't lose your HEAD

How does git know which working directory to show us and which branch to operate on when we use porcelain commands like git commit?


That's where HEAD comes into play.


							# HEAD points to a ref or branch name
							$  no-cats-allowed  cat .git/HEAD
							ref: refs/heads/no-cats-allowed

							# the branch references a commit
							$  no-cats-allowed  cat .git/refs/heads/no-cats-allowed
							40da2ef175b6693c785251467946f1cdaf5e6552

							# shorthand for commit id
							$  no-cats-allowed  git rev-parse HEAD
							40da2ef175b6693c785251467946f1cdaf5e6552

							# shorthand for commit contents
							$  no-cats-allowed  git cat-file -p HEAD
							tree 60c51895339b14261e18cb4555b4a43d1cfdc397
							parent 0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
							author Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200
							committer Pieter <13552343+pmoris@users.noreply.github.com> 1535872422 +0200

							doggies
						

This is the final aspect of what happens behind the scenes when we use the porcelain command git commit.

HEAD...

  • Is a flat text file in the .git directory.
  • Points to the current branch (ref), i.e. it's a label for a specific commit.
  • Is updated whenever we commit or checkout a branch (or commit).

directed acyclic graph structure

graph-branch-head

Detached HEAD?!1

A (rather odd) way of saying that we are checking out a commit that no branch points to directly.

graph-det-head

What's the problem of having a detached HEAD?

When you checkout a commit instead of a branch, HEAD will point directly to this commit, instead of to a ref/branch.

This means that you won't be able to reach this commit again in the future (directed arrows!), unless you remember its hash, and git's garbage collection will remove it at some point.

graph-det-head

Index or staging area

We briefly mentioned this during the manual commit section, but for the sake of completion...


The index is the staging area in between the working directory and a commit.

staging-area

For a good write-up, see: https://stackoverflow.com/questions/25351450/what-does-adding-to-the-index-really-mean-in-git

Conceptually it's an ever-changing tree object stored as a single large binary file .git/index


Whenever you use git commit, the tree object that is created will be based on the contents of the index.


It can be viewed via (see previous slides)


							$  master 1  git ls-files --stage
								100644 blob 99809ef5ef2a4458e883f53c0ce55fc9f7061844	cat
								100644 blob cfbc74af89ccf0d0ddbe488b4a2df7318786759d	dog
						

Why do we need staging/an index?


  • Easy way to try out and reset changes prior to committing fully.
  • Allows splitting changes over multiple commits.
  • Simplifies merging.

A little something about graph properties

Advantages of a DAG

  • Only 1 copy of each file (better yet: its contents) is stored regardless of the number of snapshots. Blobs are re-used.
  • History can be rewound from any point in the graph.
  • Refs provide entrypoints through meaningful names (i.e. bookmarks).

Git is also a Merkle Graph

The SHA-1 hash names used for git objects provides an elegant way to verify the integrity of all the data in the repository.

The hash of any one object depends on the hashes (and contents) of all the files that came before it. A similar technique is used by Bitcoin and BitTorrent. (But don't call git a blockchain unless you want to upset some people.)

This is also the reason why re-writing history can be dangerous in git. Everything downstream will be affected. (Although this generally only becomes a problem when working with collaborators.)

Merging branches

When merging branches, each commit is interpreted as a set of changes.

Each merge tries to consolidate (at least) three different snapshots: the two you provide in the command and their most recent common parent.

The following command finds the common ancestor commit of two branches (better: of the commits at their tips).


							$ git merge-base master no-cats-allowed
							0ed44be76b3ae96cb3fa7e3501f8ba56e488f7f1
						

One of the following scenarios can occur:

  • Giver is parent of receiver: fast-forward (follow sequence of revisions).
  • Two different lineages are merged: start from common ancestor and play out all changes in sequence. The new commit will have two parents and potential conflicts must be resolved.
    • File content base = giver != receiver: modification.
    • File content base = receiver != giver: modification.
    • File content base != receiver != giver: conflict.

Source: https://codewords.recurse.com/issues/two/git-from-the-inside-out

A few more useful tidbits

Reset vs checkout

  • Reset moves the thing that HEAD is pointing to, i.e. the branch.
  • Checkout moves HEAD itself (and can lead to a detached HEAD).
reset-checkout

Source: https://stackoverflow.com/a/3639387

More info: https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified

Reset can do a few different things depending on the parameter you pass


							git reset [<mode>] [<commit>]
						
  • --soft: leaves index and working dir untouched.
  • --mixed (default): resets index and leaves working dir untouched.
  • --hard: resets index and working dir. One of the few ways to lose progress in git!

Local vs remote branches

A common point of confusion is the relation between local and remote branches.

As we've seen, branches are just references to specific commits.

Ergo, remote branches are pointers to specific commits in the remote repository.

You can't move these references yourself.

More info: https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches

Your branch is up-to-date with 'origin/master'...

...except it's not?

This means that you're up-to-date with the ref called origin/master, which is a local reference on your local repo stored in .git/refs/remotes/origin/master.

This ref is tracking the remote master branch, but your local repository does not know the remote state until you perform a fetch or pull.

This also explains why you must sometimes do a merge with origin/master:
the local and remote history can diverge and must be reconciled.

remote-branches

Oh, and the following two steps are equivalent.


							$ git pull
						

							$ git fetch
							$ git merge origin/master
						

Removing local and remote branches


							$ git push --delete remote_name branch_name
							$ git branch -d branch_name
						

Initial setup, managing SSL keys, creating your own remote repositories, etc.


There's a bunch of stuff to know outside of regular usage. Fortunately, most of these things can be found in one of the many excellent resources online. The Pro Git book is usually a good starting point.

Pretty git graphs in the CLI



							git log --oneline --abbrev-commit --all \
							        --graph --decorate --color

See this SO post for even more elaborate options (hint: create an alias...).

Removing tracked files after updating .gitignore


Adding a file to the ignore file will prevent it from being added to the index, but what if it's already there?


							git rm --cached <file>
						

or


							git rm -r --cached .
							git add -A
						

Packfiles, garbage collection and deltas


I mentioned that git doesn't store diffs, but every new file (based on its contents) is stored in its unique blob.

But this is not the whole story.

Git does regularly create binary deltas called packfiles based on its internal objects. You can manually trigger this by calling git gc.

More info: https://git-scm.com/book/en/v2/Git-Internals-Packfiles

Revision names and syntax

Tree-ish and commit-ish

There's an extensive amount of syntax to refer to specific commits.


							HEAD
							HEAD^ # parent
							HEAD^^ # parent's parent
							HEAD^2 # second parent if commit has >1 parent
							HEAD~n # n'th generation ancestor
						

Setting upstream branches


							git push -u origin test
						

Or equivalently


							git push origin master
							git branch --set-upstream master origin/master
						

Makes current branch track a remote one, i.e. allows pushing/pulling without specifying the remote ref name.

Further aspects to explore:


  • Stashing: temporarily storing the changes in your working tree without needing to commit them. Allows you to move around the revision history.
  • Reverting: adding a new commit that "undoes" a previous one. Better than re-writing history in many cases (e.g. collaborators).

So you accidentally commited sensitive data to your repo...

kermit

Sensitive data in second commit


							$ mkdir kakapo-repo && cd kakapo-repo/ && git init
							Initialized empty Git repository in /media/pieter/DATA/Wetenschap/Doctoraat/biodm/reboot2018/kakapo-repo/.git/
							$  No commits yet on master  touch cultofthepartyparrot
							$  No commits yet on master … 1  git add -A
							$  No commits yet on master ● 1  git commit -m "Initial kakapo commit"
							[master (root-commit) 0e7e785] Initial kakapo commit
								1 file changed, 0 insertions(+), 0 deletions(-)
								create mode 100644 cultofthepartyparrot
							$  master  echo 'https://www.youtube.com/watch?v=9T1vfsHYiKY' \
									> secretsofthekakapo
							$  master … 1  git add -A
							$  master ● 1  git commit -m "Maybe this is better kept private..."
							[master 920a49e] Maybe this is better kept private...
								1 file changed, 1 insertion(+)
								create mode 100644 secretsofthekakapo
							$  master  git hash-object secretsofthekakapo
							29d7910d2fcf6bdb5dfbbbd1dc725d9a5648f7d7
						

Removing it from subsequent commits doesn't achieve anything


							$  master  touch sirocco
							$  master … 1  rm secretsofthekakapo
							$  master ✚ 1 … 1  git add -A
							$  master ● 2  git commit -m "More kakapos and removed compromising material"
							[master 7b8164b] More kakapos and removed compromising material
								2 files changed, 1 deletion(-)
								delete mode 100644 secretsofthekakapo
								create mode 100644 sirocco
							$  master  git checkout HEAD^
							$  920a49e  ls
							cultofthepartyparrot  secretsofthekakapo
						

Use the filter-branches command


							$  920a49e  git checkout master
							Previous HEAD position was 920a49e Maybe this is better kept private...
							Switched to branch 'master'
							$  master  git filter-branch --force --index-filter \
							'git rm --cached --ignore-unmatch secretsofthekakapo' \
							--prune-empty --tag-name-filter cat -- --all
							Rewrite 920a49ef4d3e4558c42dd50e5700697b6e182900 (2/3) (0 seconds passed, remaining 0 predicted)    rm 'secretsofthekakapo'
							Rewrite 7b8164b0ede12d9486e5de0d3aea371185da1b0f (3/3) (0 seconds passed, remaining 0 predicted)
							Ref 'refs/heads/master' was rewritten
							$  master  git push origin --force --all
						

This is one of the few occassions where you should use push --force.

More info: https://help.github.com/articles/removing-sensitive-data-from-a-repository/
https://stackoverflow.com/questions/36255221/what-is-the-difference-between-tree-filter-and-index-filter-in-the-git

Clean up leftovers


							$  master  git cat-file -p  29d7
							https://www.youtube.com/watch?v=9T1vfsHYiKY
							$  master  git for-each-ref --format='delete %(refname)' \
								refs/original | git update-ref --stdin
							$  master  git reflog expire --expire=now --all
							$  master  git gc --prune=now
							Counting objects: 5, done.
							Delta compression using up to 8 threads.
							Compressing objects: 100% (3/3), done.
							Writing objects: 100% (5/5), done.
							Total 5 (delta 0), reused 0 (delta 0)
							$  master  git cat-file -p  29d7
							fatal: Not a valid object name 29d7

						

More info about final clean up: https://help.github.com/articles/removing-sensitive-data-from-a-repository/
https://stackoverflow.com/questions/16584256/after-filter-branch-ing-to-remove-file-from-git-repo-file-remains-in-pack-file

That's all folks!

This should set you up to make your git experience smooth sailing cruising from here on out.

cruising

Challenges

Have a look at this excellent interactive visualisation to reinforce some of the concepts we've gone over: https://onlywei.github.io/explain-git-with-d3/.

Challenges

  1. Create a simple "Hello World" repository with a few commits and/or branches, keeping an eye on the .git/objects directory as you do so and using the git cat-file -p/t commands to inspect new objects.
  2. What would happen if you delete the .git/index file after staging content? Think about it, then try it!
  3. Does a git commit --amend change the name (i.e. SHA-1 digest) of the commit that is updated? Think about it, then try it! Read up on the amend command here if you're not familiar with this command.
  4. Create a copy of your repository and append a single character to 1 of the objects in .git/objects. Use git status and git fsck to see how git deals with the damage you've caused.
  5. Use git checkout on a commit's hash directly (instead of on a ref). What will happen if you create a new commit here? Inspect the graph and try to find a route that can reach this new commit. Look up the git prune command.
  6. Suppose that, after committing a few times, we modify an existing file, add it to the index and then modify it again. What will happen when we try to checkout a different branch? Think about how the working tree, index and commit you're trying to checkout differ. You can look up the answer in "Git from the inside out".

Challenges

  1. Explore the git game: https://www.git-game.com/.
  2. Or try out GitHub's interactive guide to several (basic and advanced) topics: https://learngitbranching.js.org/.


For all of these activities, try to keep the DAG structure of git in the back of your mind.