Last modified: June 06, 2026

This article is written in: πŸ‡ΊπŸ‡Έ

Git Internals

Git stores your project as a graph of immutable objects. Instead of storing changes as a sequence of file diffs, Git stores snapshots of your project. Each snapshot is built from content-addressed objects, meaning each object is identified by a hash of its contents.

At the bottom are blobs. A blob stores raw file contents only. It does not store the file name, path, permissions, or history.

Above blobs are trees. A tree is like a directory. It maps names to objects. Each tree entry contains a file mode, object type, object hash, and name. A tree can point to blobs, which represent files, or to other trees, which represent subdirectories.

Above trees are commits. A commit points to one top-level tree, which represents the full project snapshot at that moment. A commit also stores parent commit IDs, author and committer information, timestamps, and the commit message.

Refs such as main, feature/login, or v1.2.0 are human-friendly names that point to object IDs, usually commit IDs. HEAD tells Git what you currently have checked out. Most of the time, HEAD points to a branch ref such as refs/heads/main.

The important idea is this:

file bytes β†’ blob
directory listing β†’ tree
snapshot + history β†’ commit
human name β†’ ref
current checkout β†’ HEAD

Git objects are immutable. Git does not edit an existing blob, tree, or commit in place. When something changes, Git writes new objects and then moves a pointer, such as a branch ref, to the new commit.

This is one reason Git is efficient:

A simple mental model:

Blobs are file contents.
Trees are directories.
Commits are snapshots with history.
Refs are names for commits.
HEAD is where you are now.

The .git/ Directory

The .git/ directory is the database and control center of a Git repository. Your working directory contains the checked-out files, but .git/ contains the history, references, staging area, and recovery logs.

Common important files and directories:

.git/
  HEAD
  index
  objects/
  refs/
  logs/
  config

Important Parts

Path Purpose
.git/objects/ Stores Git objects: blobs, trees, commits, and tags
.git/refs/ Stores branch and tag references
.git/HEAD Points to the current branch or directly to a commit
.git/index The staging area
.git/logs/ Reflogs that remember where refs used to point
.git/config Repository-specific configuration

New objects usually start as loose objects under .git/objects/. Later, Git may compress many objects into packfiles for better storage efficiency and faster transfer.

Example layout:

.git/
  HEAD
  index
  objects/
    12/34abcd...
    pack/
      pack-xxxx.pack
      pack-xxxx.idx
  refs/
    heads/main
    tags/v1.2.0
  logs/
    HEAD
    refs/heads/main

The design is simple: Git stores immutable objects, then moves small pointers around.

That is why many Git operations feel atomic. Creating a commit writes new objects, then updates a ref. The old commit is still there. If a branch moves unexpectedly, the reflog often lets you recover it.

Object Model

Git has four main object types:

blob
tree
commit
tag

Blob

A blob stores file contents.

It does not know:

Example:

blob = "hello world\n"

If two files have exactly the same bytes, they point to the same blob.

Tree

A tree represents a directory. It stores entries that map names to object IDs.

Each tree entry contains:

mode type hash name

Example:

100644 blob  <hash>  hello.txt
040000 tree  <hash>  src

The tree is where file names and modes live. That is why a blob can be reused under multiple names.

Commit

A commit represents a project snapshot plus history metadata.

A commit stores:

A normal commit has one parent. The first commit has no parent. A merge commit usually has two or more parents.

Example commit content:

tree <tree-hash>
parent <parent-commit-hash>
author You <you@example.com> 1699999999 +0000
committer You <you@example.com> 1699999999 +0000

Add hello.txt

The commit does not store a full copy of every file directly. It points to a tree, and that tree points to blobs and subtrees.

Tag Object

Git has two common types of tags:

A lightweight tag is just a ref that points directly to an object, usually a commit.

An annotated tag is a real Git object. It has its own metadata, message, tagger, and a pointer to another object.

Annotated tags are useful for releases because they can be signed, inspected, and treated as first-class objects.

High-Level Diagram

Objects

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
file β†’  β”‚    blob    β”‚   raw file bytes only
        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
              β”‚ referenced by name and mode
              β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
dir  β†’  β”‚    tree    β”‚   entries: mode, type, hash, name
        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
              β”‚ root tree of snapshot
              β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
hist β†’  β”‚   commit   β”‚   tree, parent(s), author, message
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Names

HEAD β†’ refs/heads/main β†’ commit

Another view:

refs/heads/main
      β”‚
      β–Ό
  commit C2
      β”‚
      β”œβ”€β”€ root tree T2
      β”‚     β”œβ”€β”€ blob B1  hello.txt
      β”‚     └── blob B2  bye.txt
      β”‚
      └── parent commit C1
            β”‚
            └── root tree T1
                  └── blob B1  hello.txt

Walk the Objects by Hand

This lab builds a tiny repository and inspects the objects Git creates.

1. Create a Repository

mkdir toy
cd toy
git init

Example output:

Initialized empty Git repository in .../toy/.git/

At this point, Git has created a .git/ directory, but there are no commits yet.

2. Create a File

echo "hello world" > hello.txt
git status

Example output:

Untracked files:
  hello.txt

The file exists in your working directory, but Git has not stored it as part of a snapshot yet.

3. Stage the File

git add hello.txt

Staging does two important things:

  1. Git writes a blob object for the file contents.
  2. Git records the file path, mode, and blob hash in the index.

Now inspect the index:

git ls-files -s

Example output:

100644 <blob-hash> 0	hello.txt

Meaning:

100644       file mode
<blob-hash>  blob object ID
0            stage number
hello.txt    path

Stage 0 means the normal, resolved version of the file. Other stage numbers appear during merge conflicts.

4. Inspect the Blob

Use git cat-file to inspect the object.

git cat-file -t <blob-hash>

Output:

blob

Print the blob contents:

git cat-file -p <blob-hash>

Output:

hello world

Notice that the blob contains only the file content. It does not contain the name hello.txt.

5. Write a Tree from the Index

The index can be turned into a tree object.

git write-tree

Example output:

<tree-hash>

Inspect the tree:

git cat-file -p <tree-hash>

Example output:

100644 blob <blob-hash>	hello.txt

Now the name hello.txt appears. That name is stored in the tree, not in the blob.

6. Create a Commit Manually

Use git commit-tree to create a commit object that points to the tree.

git commit-tree <tree-hash> -m "initial snapshot"

Example output:

<commit-hash>

At this point, the commit object exists, but no branch necessarily points to it yet. A commit without a ref can become hard to find later.

Move main to the commit:

git update-ref refs/heads/main <commit-hash>

Make HEAD point to main:

echo "ref: refs/heads/main" > .git/HEAD

Now inspect the commit:

git cat-file -p <commit-hash>

Example output:

tree <tree-hash>
author You <you@example.com> 1699999999 +0000
committer You <you@example.com> 1699999999 +0000

initial snapshot

The first commit has no parent.

7. Continue with Normal Git Commands

Create another file:

echo "bye" > bye.txt
git add bye.txt
git commit -m "add bye"

Example output:

[main 9f3a1c2] add bye
 1 file changed, 1 insertion(+)
 create mode 100644 bye.txt

Now inspect the tree for HEAD:

git ls-tree -r HEAD

Example output:

100644 blob <hash1>	bye.txt
100644 blob <hash2>	hello.txt

You can also inspect the root tree directly:

git cat-file -p HEAD^{tree}

Example output:

100644 blob <hash1>	bye.txt
100644 blob <hash2>	hello.txt
git log --oneline --graph --decorate

Example output:

* 9f3a1c2 (HEAD -> main) add bye
* a1b2c3d initial snapshot

The second commit points to the first commit as its parent.

Diagram:

refs/heads/main ──► commit C2
                     β”‚
                     β”œβ”€β”€ tree T2
                     β”‚    β”œβ”€β”€ blob B1  bye.txt
                     β”‚    └── blob B2  hello.txt
                     β”‚
                     └── parent commit C1
                          β”‚
                          └── tree T1
                               └── blob B2  hello.txt

9. Inspect Pointers on Disk

Look at HEAD:

cat .git/HEAD

Output:

ref: refs/heads/main

Look at the branch ref:

cat .git/refs/heads/main

Example output:

9f3a1c2...

This file contains the commit ID that main currently points to.

In some repositories, refs may be packed into .git/packed-refs, so you may not always see every ref as a loose file under .git/refs/.

The Index / Staging Area

The index is Git’s staging area. It sits between the working directory and the next commit.

A useful model:

working directory  β†’  index  β†’  commit
     files            staged     snapshot

The index stores a compact table of paths and blob IDs, plus file mode and metadata. It does not store file contents directly. The contents are stored as blob objects.

Inspect it with:

git ls-files --stage

Example output:

100644 <hash1> 0	bye.txt
100644 <hash2> 0	hello.txt

The index is why Git can quickly answer questions such as:

git status
git diff
git diff --staged
git commit

Git compares:

working directory vs index       β†’ unstaged changes
index vs HEAD                    β†’ staged changes
HEAD vs another commit/tree       β†’ committed differences

During a merge conflict, the index can store multiple versions of the same path:

stage 1 = common ancestor
stage 2 = ours
stage 3 = theirs
stage 0 = resolved version

That is how Git keeps track of conflict information before you resolve it.

Loose Objects and Packfiles

New objects usually begin as loose objects.

A loose object is stored under .git/objects/ using the first two hex characters of its object ID as a directory name.

Example:

.git/objects/ab/cdef1234...

Here:

ab        first two hex characters
cdef...   rest of the object ID

Loose objects are individually compressed. This is simple and fast for creating new objects.

Over time, Git may pack many loose objects into a packfile. A packfile stores many objects together and can use delta compression to reduce space.

Packfiles live here:

.git/objects/pack/

Example:

pack-1234abcd.pack
pack-1234abcd.idx

The .pack file stores the objects. The .idx file lets Git quickly find objects inside the pack.

Commands:

find .git/objects -type f | wc -l
git gc
ls .git/objects/pack

Example progression:

Loose objects:

.git/objects/
  ab/cdef...
  12/3456...

Packed objects:

.git/objects/pack/
  pack-xxxx.pack
  pack-xxxx.idx

git gc means garbage collection. It cleans up and optimizes the repository by packing objects, pruning unreachable objects when safe, and improving storage efficiency.

Names Are in Trees, Not Blobs

A blob only stores bytes. File names live in tree objects.

This explains Git’s deduplication behavior.

If two files have identical contents, they use the same blob:

cp hello.txt copy.txt
git add copy.txt
git ls-files -s

Example output:

100644 <same-blob-hash> 0	copy.txt
100644 <same-blob-hash> 0	hello.txt

After committing, the tree has two names pointing to the same blob.

tree
 β”œβ”€β”€ hello.txt β†’ blob B1
 └── copy.txt  β†’ blob B1

This is also why a rename does not rewrite the file contents. Git can represent the new name in a new tree while reusing the same blob.

Important detail: Git does not store β€œrename objects.” Rename detection is usually computed later by commands like git diff or git log --follow, based on similarity between deleted and added paths.

Refs, Branches, and HEAD

A ref is a name that points to an object ID.

Common refs:

refs/heads/main
refs/heads/feature/login
refs/tags/v1.0
refs/remotes/origin/main

A branch is simply a movable ref that usually points to a commit.

Example:

refs/heads/main β†’ commit C3

When you create a new commit on main, Git writes the commit object and then moves refs/heads/main to the new commit.

before:

main β†’ C2

after commit:

main β†’ C3 β†’ C2

HEAD usually points to the current branch:

HEAD β†’ refs/heads/main β†’ commit C3

If you check out a specific commit instead of a branch, Git enters a detached HEAD state:

HEAD β†’ commit C2

Detached HEAD is not dangerous by itself, but new commits made there are not attached to a branch unless you create one.

Reflog: Git’s Safety Net

The reflog records where refs used to point. It is local to your repository and is extremely useful for recovery.

For example, when a branch moves due to commit, reset, rebase, or merge, Git records the previous position.

Inspect the reflog:

git reflog

Example output:

9f3a1c2 HEAD@{0}: commit: add bye
a1b2c3d HEAD@{1}: commit: initial snapshot

You can use reflog entries to recover lost commits:

git checkout -b recovered HEAD@{1}

or:

git reset --hard HEAD@{1}

Use reset --hard carefully because it changes the working directory and index.

The reflog is local. It is not normally pushed to remotes.

Tags

Tags give names to important points in history, often releases.

There are two main tag types.

Lightweight Tag

A lightweight tag is just a ref pointing directly to a commit.

git tag v1.0

Conceptually:

refs/tags/v1.0 β†’ commit C3

Annotated Tag

An annotated tag creates a tag object with metadata and a message.

git tag -a v1.0 -m "first release"

Inspect it:

git cat-file -p refs/tags/v1.0

Example output:

object <commit-hash>
type commit
tag v1.0
tagger You <you@example.com> ...

first release

Annotated tags are generally better for releases because they preserve tagger information, can be signed, and have their own object ID.

Quick Commands

Use these commands when you want to inspect Git directly.

These commands are useful when you want to look under Git’s normal user-facing commands and inspect the actual objects Git stores internally. Git stores repository data as objects, mainly commits, trees, blobs, and tags. These commands help you identify object types, inspect commit metadata, map filenames to blob hashes, compare stored snapshots, and check how much object storage the repository is using.

Inspect Any Object

git cat-file -t <id>   # object type
git cat-file -s <id>   # object size
git cat-file -p <id>   # pretty-print object

The git cat-file command lets you inspect Git objects directly by object ID, branch name, tag name, or other revision expressions. It is useful when you already have a hash and want to know what it represents.

Example output:

$ git cat-file -t HEAD
commit

$ git cat-file -s HEAD
245

$ git cat-file -p HEAD
tree 7b3f9a1c5d3e8f6a2b0c9d4e1f8a6b7c9d0e1f2a
parent 2a4c6e8f1b3d5a7c9e0f2a4b6d8e1c3f5a7b9d0
author Alex Example <alex@example.com> 1777808400 +0200
committer Alex Example <alex@example.com> 1777808400 +0200

Add hello.txt

Explanation:

Inspect a Commit

git show --no-patch --pretty=raw HEAD

This shows the commit’s tree, parent commits, author, committer, and message.

This command displays the raw metadata for the commit pointed to by HEAD, without showing the file diff. It is useful when you want to inspect exactly what commit object Git has stored and which tree snapshot that commit points to.

Example output:

commit 9fceb02d0ae598e95dc970b74767f19372d61af8
tree 7b3f9a1c5d3e8f6a2b0c9d4e1f8a6b7c9d0e1f2a
parent 2a4c6e8f1b3d5a7c9e0f2a4b6d8e1c3f5a7b9d0
author Alex Example <alex@example.com> 1777808400 +0200
committer Alex Example <alex@example.com> 1777808400 +0200

    Add hello.txt

Explanation:

Inspect a Tree

git ls-tree HEAD
git ls-tree -r HEAD
git ls-tree -r --long HEAD

Useful for mapping file paths to blob hashes.

A tree object represents a directory snapshot. It stores filenames, file modes, object types, and object IDs. The git ls-tree command shows the contents of a tree, commit, branch, or tag without checking anything out into the working directory.

Example output:

$ git ls-tree HEAD
100644 blob e965047ad7c57865823c7d992b1d046ea66edf78    hello.txt
040000 tree 3b18e512dba79e4c8300dd08aeb37f8e728b8dad    src

$ git ls-tree -r HEAD
100644 blob e965047ad7c57865823c7d992b1d046ea66edf78    hello.txt
100644 blob a3f5c7d9e1b2a4c6e8f0d3b5a7c9e1f2d4b6a8c0    src/main.py

$ git ls-tree -r --long HEAD
100644 blob e965047ad7c57865823c7d992b1d046ea66edf78       14    hello.txt
100644 blob a3f5c7d9e1b2a4c6e8f0d3b5a7c9e1f2d4b6a8c0      128    src/main.py

Explanation:

Compare Two Trees

git diff --name-status <commitA>^{tree} <commitB>^{tree}

This compares snapshots without needing to check them out.

This command compares the tree snapshots from two commits. It focuses on stored project content rather than the working directory. It is useful when you want to compare two repository states directly at the object level.

Example output:

M       README.md
A       hello.txt
D       old-config.yml
R100    app.py   src/app.py

Explanation:

Find Which Object a Path Uses

git rev-parse HEAD:hello.txt

This prints the blob ID for hello.txt at HEAD.

This command asks Git to resolve a specific path inside a specific commit. It is useful when you want to know exactly which blob object stores the file content for a path at a particular commit.

Example output:

e965047ad7c57865823c7d992b1d046ea66edf78

Explanation:

Read a File from a Commit

git show HEAD:hello.txt

This prints the version of hello.txt stored in HEAD.

This command reads a file directly from a commit without checking out that commit. It is useful when you want to inspect a historical version of a file, compare content mentally, or recover a file’s contents from another revision.

Example output:

Hello, Git!

Explanation:

Show Object Storage Statistics
git count-objects -v

This shows loose object count, pack count, and storage size.

This command displays statistics about Git’s object database. It helps you understand how many loose objects exist, how many packfiles are present, and how much disk space Git objects are using.

Example output:

count: 24
size: 96
in-pack: 1520
packs: 2
size-pack: 384
prune-packable: 0
garbage: 0
size-garbage: 0

Explanation:

A More Complete Mental Model

Git is made of three main layers:

1. Object database
   blobs, trees, commits, tags

2. References
   branches, tags, remote-tracking refs, HEAD

3. Working state
   working directory, index, current checkout

The working directory is what you edit.

The index is what you plan to commit.

The commit is the saved snapshot.

working directory
      β”‚ git add
      β–Ό
index
      β”‚ git commit
      β–Ό
commit
      β”‚ branch ref moves
      β–Ό
history

When you run:

git add hello.txt

Git writes or reuses a blob and updates the index.

When you run:

git commit

Git writes a tree from the index, writes a commit pointing to that tree, and moves the current branch ref.

When you run:

git checkout main

Git updates HEAD, updates the index, and writes files into the working directory to match the commit pointed to by main.