Inside Git: How It Works and the Role of the .git Folder

If you've been using Git for a while, you're probably comfortable with commands like git add, git commit, and git push. But have you ever wondered what actually happens behind the scenes when you run these commands? What makes Git so fast and reliable? And what's hiding inside that mysterious .git folder?

Understanding how Git works internally transforms it from a black box of commands to memorize into a logical system that makes perfect sense. In this deep dive, we'll peek under the hood to explore Git's architecture, the structure of the .git folder, and how Git tracks and stores your project's history.

How Git Works Internally

At its core, Git is a content-addressable filesystem with a version control system built on top of it. This might sound complex, but the fundamental concept is elegant and simple.

Git is a Snapshot System, Not a Delta System

Unlike many version control systems that store changes as a series of differences (deltas) between versions, Git takes a different approach. When you commit, Git doesn't record what changed—it takes a complete snapshot of your entire project at that moment.

However, Git is smart about storage. If a file hasn't changed between commits, Git doesn't store it again. Instead, it stores a reference (pointer) to the identical file it already saved. This makes Git incredibly efficient while maintaining complete snapshots of your project at every commit.

Think of it like a photo album where each page represents a commit. If the same object appears in multiple photos, you don't need multiple copies—you just reference the same object across different photos.

Content-Addressable Storage

Every piece of content in Git—whether it's a file, a directory structure, or a commit—is stored as an object with a unique identifier called a SHA-1 hash. This 40-character hexadecimal string is generated from the content itself.

The beauty of this system is that the same content always produces the same hash. This provides several benefits:

Integrity: Git can detect if any content has been corrupted or tampered with
Deduplication: Identical content is stored only once
Efficiency: Git can quickly determine if it already has certain content

The Three States of Git

To understand Git's workflow, you need to know about the three states your files can be in:

Modified: You've changed a file in your working directory, but haven't staged it yet.

Staged: You've marked a modified file to go into your next commit snapshot.

Committed: The data is safely stored in your local Git database.

This three-state system is what makes Git so powerful and flexible. You have complete control over what gets committed and when.

Understanding the `.git` Folder

When you run git init in a directory, Git creates a hidden .git folder. This folder is the brain of your repository—it contains everything Git needs to manage your project's version history.

Let's explore the structure and purpose of this critical folder:

.git/
├── HEAD
├── config
├── description
├── hooks/
├── info/
├── objects/
├── refs/
└── index

The HEAD File

The HEAD file is a pointer to your current branch. When you switch branches, this file updates to point to the new branch. Typically, it contains something like:

ref: refs/heads/main

This tells Git that you're currently on the main branch. The HEAD is what determines which commit you're working from.

The config File

This file contains repository-specific configuration settings. When you set up your username and email for a specific repository, or configure remote repositories, that information is stored here.

[core]
    repositoryformatversion = 0
    filemode = true
[remote "origin"]
    url = https://github.com/ashishsingodiya/repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
    remote = origin
    merge = refs/heads/main

The hooks Directory

The hooks/ directory contains scripts that Git can execute at specific points in the Git workflow. These are powerful automation tools that can run before commits, after pushes, or at other key moments. By default, Git includes sample hook scripts that you can customize.

The objects Directory

This is where Git stores all the content for your repository. Every file you commit, every directory structure, and every commit itself is stored here as an object. These objects are organized into subdirectories based on the first two characters of their SHA-1 hash.

objects/
├── 0a/
│   └── 3b5c8d9e7f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c
├── 1f/
│   └── 2e3d4c5b6a7f8e9d0c1b2a3f4e5d6c7b8a9f0e1
├── info/
└── pack/

The refs Directory

The refs/ directory stores pointers to commits. It contains subdirectories for branches, tags, and remote-tracking branches:

refs/
├── heads/        # Local branches
│   ├── main
│   └── feature-branch
├── remotes/      # Remote-tracking branches
│   └── origin/
│       └── main
└── tags/         # Tags
    └── v1.0.0

Each file in these directories contains a SHA-1 hash pointing to a specific commit.

The index File

The index file (also called the staging area) is a binary file that stores information about files that are staged for the next commit. When you run git add, Git updates this file with information about which files should be included in the next commit.

Git Objects: Blob, Tree, Commit

Git uses three primary types of objects to represent your project and its history. Understanding these objects is key to understanding how Git works.

Blob Objects

A blob (binary large object) stores the contents of a file. When you add a file to Git, it creates a blob object containing the file's data. The blob doesn't include the filename or any other metadata—just the raw content.

For example, if you have a file called hello.txt containing "Hello, World!", Git creates a blob with that exact content. If you rename the file to greeting.txt but keep the content the same, Git uses the same blob—there's no need to store duplicate content.

Tree Objects

A tree object represents a directory. It contains pointers to blobs (files) and other trees (subdirectories), along with the filenames and permissions. This is how Git reconstructs your directory structure.

Think of a tree object as a directory listing. It might contain entries like:

100644 blob a906cb2a4a90... README.md
100644 blob 8f94139338f9... index.js
040000 tree 99f1a6d12f85... src/

This tree contains two files (README.md and index.js) and one subdirectory (src/).

Commit Objects

A commit object ties everything together. It contains:

A pointer to a tree object (the root directory of your project at that commit)
Pointers to parent commit(s)
Author and committer information
Timestamp
Commit message

tree 99f1a6d12f85...
parent 0d1521e4b3bf...
author Ashish Singodiya <hello@ashish.pro> 1704240000 +0530
committer Ashish Singodiya <hello@ashish.pro> 1704240000 +0530

Add user authentication feature

This creates a complete snapshot of your project at a specific point in time, linked to the previous state through the parent pointer.

How Git Tracks Changes

Now that we understand Git's object model, let's see how these pieces work together when you use Git commands.

What Happens During `git add`

When you run git add filename.txt, several things happen behind the scenes:

Git creates a blob: Git reads the file content and creates a blob object in the .git/objects/ directory
Git calculates the SHA-1 hash: The content is hashed to generate a unique identifier
Git updates the index: The staging area (.git/index) is updated to include this file and its blob hash

At this point, your file's content is safely stored in Git's database, even though you haven't committed yet. The index now knows that this file should be included in the next commit.

What Happens During `git commit`

When you run git commit -m "Your message", Git performs these steps:

Git creates tree objects: Git creates tree objects representing your project's directory structure, based on the staged files in the index
Git creates a commit object: Git creates a commit object that points to the root tree, includes metadata, and links to the parent commit
Git updates the branch reference: The current branch pointer (like refs/heads/main) is updated to point to this new commit
Git clears the staging area: The index is updated to match the new commit

Here's a visual representation of what happens:

Working Directory  →  Staging Area (Index)  →  Repository (.git)
------------------------------------------------------------------
You edit files     →  git add stages files  →  git commit saves
                                                 snapshot

The Relationship Between Commits, Trees, and Blobs

Let's trace through a concrete example. Suppose you have this project structure:

my-project/
├── README.md
└── src/
    └── app.js

When you commit this, Git creates:

Blob for README.md: Stores the content of README.md
Blob for app.js: Stores the content of app.js
Tree for src/: Points to the app.js blob
Tree for root directory: Points to the README.md blob and the src/ tree
Commit object: Points to the root tree and contains metadata

Commit A
    |
    v
Root Tree
    |
    ├──> README.md (blob)
    |
    └──> src/ (tree)
           |
           └──> app.js (blob)

If you modify app.js and commit again, Git creates:

New blob for app.js: The old blob is kept unchanged
New tree for src/: Points to the new app.js blob
New root tree: Points to the unchanged README.md blob and the new src/ tree
New commit object: Points to the new root tree and to Commit A as its parent

Commit B
    |
    v
Root Tree (new)
    |
    ├──> README.md (blob) [same as before]
    |
    └──> src/ (tree - new)
           |
           └──> app.js (blob - new)

Notice how Git reuses the unchanged README.md blob. This is what makes Git so storage-efficient.

How Git Uses Hashes to Ensure Integrity

Every object in Git is identified by a SHA-1 hash of its content. This cryptographic hash function ensures data integrity in several ways:

Content Verification

Since the hash is derived from the content, any modification to a file, tree, or commit would produce a different hash. If somehow data gets corrupted, Git will immediately detect it because the hash won't match.

Immutability

Once an object is created with a specific hash, it cannot be changed without changing its hash. This means Git's history is effectively tamper-proof. You can't secretly modify an old commit without changing its hash, which would break all the references to it.

Efficient Storage and Comparison

Git can quickly determine if two files are identical by comparing their hashes instead of comparing their entire contents. This makes operations like git status and git diff extremely fast, even on large repositories.

Deduplication

If the same file content exists in multiple places or at multiple points in history, Git stores it only once. The hash acts as a unique identifier that lets Git recognize duplicate content instantly.

Building a Mental Model of Git

Rather than memorizing Git commands, it's far more powerful to understand the underlying model. Once you grasp how Git works internally, commands become intuitive:

git add creates blobs and updates the index
git commit creates tree and commit objects, then updates branch references
git branch just creates a new reference pointing to a commit
git checkout updates HEAD and your working directory to match a commit
git merge creates a new commit with multiple parents

When you understand that branches are just lightweight pointers to commits, branching becomes trivial. When you realize that commits form a directed acyclic graph, operations like merge and rebase make perfect sense.

Think of your Git repository as a tree of snapshots, where each commit is a complete picture of your project at a moment in time, connected to its predecessors through parent pointers. The .git folder is the filing system that stores all these snapshots efficiently.

Practical Insights

Understanding Git's internals has practical benefits:

Faster Recovery: When things go wrong, you can navigate the object database directly to recover lost commits or understand what happened.

Better Performance: Knowing how Git stores data helps you structure your repository efficiently. For example, you'll understand why committing large binary files repeatedly causes repo bloat.

Smarter Workflows: Understanding that branches are just pointers makes you more confident about creating and experimenting with branches.

Debugging Mysteries: When Git behaves unexpectedly, knowledge of internals helps you diagnose and fix issues rather than blindly following Stack Overflow answers.

Exploring Your Own .git Folder

Want to see this in action? Try these commands in one of your Git repositories:

# View the contents of HEAD
cat .git/HEAD

# List all branches
ls .git/refs/heads/

# View a commit object (replace hash with one from your repo)
git cat-file -p <commit-hash>

# View a tree object
git cat-file -p <tree-hash>

# View a blob object
git cat-file -p <blob-hash>

# See all objects in your repository
find .git/objects -type f

These commands let you peek into Git's database and see exactly how your project is stored.

Conclusion

Git's internal architecture is remarkably elegant. By storing content as immutable objects identified by cryptographic hashes, Git achieves integrity, efficiency, and distributed collaboration. The .git folder contains everything needed to reconstruct your entire project history.

Understanding these internals transforms Git from a mysterious tool into a logical system. You'll make better decisions about branching, merging, and repository organization. You'll troubleshoot problems more effectively. And most importantly, you'll feel confident using Git rather than intimidated by it.

The next time you run git commit, you'll know exactly what's happening: Git is creating blobs for your files, assembling them into trees, wrapping everything in a commit object, and updating a branch pointer. It's not magic—it's beautifully simple engineering.

Now that you understand what's under the hood, you're ready to use Git like a pro.

Have questions about Git internals or want to share something interesting you discovered in your .git folder? Drop a comment below!

Inside Git: How It Works and the Role of the .git Folder

How Git Works Internally

Git is a Snapshot System, Not a Delta System

Content-Addressable Storage

The Three States of Git

Understanding the `.git` Folder

The HEAD File

The config File

The hooks Directory

The objects Directory

The refs Directory

The index File

Git Objects: Blob, Tree, Commit

Blob Objects

Tree Objects

Commit Objects

How Git Tracks Changes

What Happens During `git add`

What Happens During `git commit`

The Relationship Between Commits, Trees, and Blobs

How Git Uses Hashes to Ensure Integrity

Content Verification

Immutability

Efficient Storage and Comparison

Deduplication

Building a Mental Model of Git

Practical Insights

Exploring Your Own .git Folder

Conclusion

Comments

More from this blog

JWT Authentication in Node.js Explained Simply

Sessions vs JWT vs Cookies: Understanding Authentication Approaches

REST API Design Made Simple with Express.js

Handling File Uploads in Express with Multer

Storing Uploaded Files and Serving Them in Express

Command Palette

How Git Works Internally

Git is a Snapshot System, Not a Delta System

Content-Addressable Storage

The Three States of Git

Understanding the .git Folder

The HEAD File

The config File

The hooks Directory

The objects Directory

The refs Directory

The index File

Git Objects: Blob, Tree, Commit

Blob Objects

Tree Objects

Commit Objects

How Git Tracks Changes

What Happens During git add

What Happens During git commit

The Relationship Between Commits, Trees, and Blobs

How Git Uses Hashes to Ensure Integrity

Content Verification

Immutability

Efficient Storage and Comparison

Deduplication

Building a Mental Model of Git

Practical Insights

Exploring Your Own .git Folder

Conclusion

Comments

More from this blog

Understanding the `.git` Folder

What Happens During `git add`

What Happens During `git commit`