Last modified: September 13, 2025
This article is written in: 🇺🇸
Git stores your project as a graph of immutable objects. At the leaves are blobs: raw file contents with no filenames attached. Trees sit above blobs and act like directories; a tree is just a list that maps a filename and a mode to either another tree (subfolder) or a blob (file). Commits point to exactly one top-level tree—the snapshot of the entire project at that moment—plus zero or more parents (for merges), and some who/when/why metadata. Refs like main
or v1.2.0
are just human-friendly names that point at a commit’s ID. HEAD is a tiny text file telling Git which ref you have checked out. The magic is that object IDs are content-hashes, so if two files have identical bytes, they share the same blob; if a folder hasn’t changed, its tree can be reused; if nothing changed, the commit points at the same tree as before. That’s how Git is both space-efficient and lightning fast: reuse everywhere, append-only history, and lookups by hash.
Inside .git/
, everything supports that model. You’ll see objects/
holding the database (loose files at first, later packed into .pack
files), refs/
holding branch and tag pointers, HEAD
pointing to the current branch, an index
file acting as a staging area, and logs that remember where refs used to point (your safety net). Git never edits a blob or tree in place; it writes a brand-new object, then flips a pointer. That’s why operations feel atomic and why you can always walk backward. Think of it like Lego: blobs are bricks, trees are plates arranging bricks by name, commits are photos of the model with notes about who built it—Git keeps every photo and never scribbles on an old one.
Objects (content-addressed)
┌────────────┐
file → │ blob │ (bytes only)
└─────┬──────┘
│ referenced by name+mode
┌─────▼──────┐
folder→ │ tree │ (entries: mode, type, hash, name)
└─────┬──────┘
│ root tree of snapshot
┌─────▼──────┐
history │ commit │ (tree, parents, author, message)
└────────────┘
names:
HEAD → refs/heads/main → <commit>
storage:
.git/
HEAD
index
objects/
12/34abcd... (loose objects)
pack/pack-xxxx.{pack,idx}
refs/
heads/main
tags/v1.2.0
logs/refs/heads/main (reflog)
# start clean
mkdir toy && cd toy
git init
# output:
# Initialized empty Git repository in .../toy/.git/
Create a file and inspect what gets stored.
echo "hello world" > hello.txt
git status
# output:
# Untracked files:
# hello.txt
Stage it; staging writes a blob object and records its hash in the index.
git add hello.txt
# See what’s staged and the blob ID behind it
git ls-files -s
# output (mode | stage | blob-hash | path):
# 100644 <blob-hash> 0 hello.txt
# (your hash will differ)
Peek at the blob (raw file content, no filename inside):
git cat-file -t <blob-hash>
# output:
# blob
git cat-file -p <blob-hash>
# output:
# hello world
Write the first tree from the index and look inside it.
git write-tree
# output:
# <tree-hash>
git cat-file -p <tree-hash>
# output (mode type hash name):
# 100644 blob <blob-hash> hello.txt
Make a commit that points at that tree.
git commit-tree <tree-hash> -m "initial snapshot"
# output:
# <commit-hash>
# Move the branch name to that commit
git update-ref refs/heads/main <commit-hash>
echo "ref: refs/heads/main" > .git/HEAD
Read the commit object:
git cat-file -p <commit-hash>
# output (example):
# tree <tree-hash>
# author You <you@example.com> 1699999999 +0000
# committer You <you@example.com> 1699999999 +0000
#
# initial snapshot
Now use the regular porcelain to keep going.
# Add another file and commit it the normal way
echo "bye" > bye.txt
git add bye.txt
git commit -m "add bye"
# output:
# [main 9f3a1c2] add bye
# 1 file changed, 1 insertion(+)
# create mode 100644 bye.txt
See the new snapshot’s tree and all files’ blob hashes.
git ls-tree -r HEAD
# output:
# 100644 blob <hash1> bye.txt
# 100644 blob <hash2> hello.txt
git cat-file -p HEAD^{tree}
# output:
# 100644 blob <hash1> bye.txt
# 100644 blob <hash2> hello.txt
Show how commits stitch together.
git log --oneline --graph --decorate
# output:
# * 9f3a1c2 (HEAD -> main) add bye
# * a1b2c3d initial snapshot
Peek at pointers on disk.
cat .git/HEAD
# output:
# ref: refs/heads/main
cat .git/refs/heads/main
# output:
# 9f3a1c2... (the commit ID for main)
A picture of what you just built:
refs/heads/main ──► (commit 9f3a1c2)
│
├─ tree T2 (root)
│ ├─ 100644 blob <hash1> bye.txt
│ └─ 100644 blob <hash2> hello.txt
│
└─ parent (commit a1b2c3d)
└─ tree T1 with only hello.txt
The index is a compact table: for each path, it stores mode, blob hash, and some metadata. That’s why “staged vs unstaged” is so fast—Git compares your working tree to the index, not to HEAD, and it can build a tree object directly from the index during commit.
git ls-files --stage
# output (mode stage hash path):
# 100644 0 <hash1> bye.txt
# 100644 0 <hash2> hello.txt
Stages other than 0
appear during merges to hold multiple versions of the same path (that’s how Git presents conflict hunks).
New objects start life as compressed files under .git/objects/aa/bb...
where aa
is the first two hex digits of the ID. Over time Git packs many objects into a single .pack
file for space and speed; an accompanying .idx
lets it locate an object quickly.
# Count loose objects
find .git/objects -type f | wc -l
# output (small number initially)
# Pack them
git gc
# output:
# Counting objects: ...
# Writing objects: ...
# Total ..., reused ...
# See the packs
ls .git/objects/pack
# output:
# pack-1234abcd.pack
# pack-1234abcd.idx
ASCII of that progression:
loose:
.git/objects/
ab/cdef... (one file per object)
12/3456...
packed:
.git/objects/pack/
pack-xxxx.pack (many objects)
pack-xxxx.idx (index for lookups)
Two files with identical bytes anywhere in history share the same blob ID. Filenames and execute bits live in trees, so renames don’t rewrite blobs; Git just updates the tree entry to point the same blob under a new name.
cp hello.txt copy.txt
git add copy.txt
git ls-tree -r HEAD
# output (notice identical blob hash for both paths):
# 100644 blob <hash2> copy.txt
# 100644 blob <hash2> hello.txt
Tags are refs too; annotated tags are objects with their own metadata and a pointer to another object (usually a commit). Lightweight tags are just a name pointing straight to a commit.
git tag -a v1.0 -m "first release"
git cat-file -p refs/tags/v1.0
# output:
# object <commit-hash>
# type commit
# tag v1.0
# tagger You <you@example.com> ...
#
# first release
# Show the type/size/pretty of any object by hash
git cat-file -t <id>
git cat-file -s <id>
git cat-file -p <id>
# Show exactly which tree/parent(s) a commit has
git show --no-patch --pretty=raw HEAD
# Map paths to blob hashes at a commit (great for checksums)
git ls-tree -r --long HEAD
# Compare two trees without touching the working dir
git diff --name-status <commita>^{tree} <commitb>^{tree}
Blobs are bytes. Trees give names to blobs (and trees). Commits name a tree and connect it to parents. Refs name commits so humans don’t have to memorize hashes. The .git/
folder is the whole universe: a content-addressed object store, a few tiny pointers, and logs. Everything else is a view on those pieces.