Skip to main content

Repository Structure

Understanding Git's repository organization, directory structure, and the relationship between working directory, staging area, and Git database.

Repository Overview

The Three Areas

Git manages three distinct areas:

  1. Working Directory - Your current files and changes
  2. Staging Area (Index) - Prepared changes for next commit
  3. Git Directory - Object database and repository metadata
# Visual representation:
Working Directory → Staging Area → Git Directory
(git add) (git commit)

Repository Types

Local Repository:

  • Complete Git repository on your machine
  • Contains full history and all objects
  • Can work offline

Remote Repository:

  • Repository hosted on server (GitHub, GitLab, etc.)
  • Shared among team members
  • Central source of truth

Bare Repository:

  • Repository without working directory
  • Only contains Git database
  • Used for server repositories

Git Directory Structure

Core Components

# .git directory structure
.git/
├── HEAD # Current branch pointer
├── config # Repository configuration
├── description # Repository description
├── hooks/ # Hook scripts
├── info/ # Global excludes
├── objects/ # Object database
├── refs/ # Reference storage
├── index # Staging area
├── logs/ # Reference logs
└── packed-refs # Packed references

HEAD Reference

# HEAD points to current branch
cat .git/HEAD
# Output: ref: refs/heads/main

# Or directly to commit (detached HEAD)
# Output: a1b2c3d4e5f6789012345678901234567890abcd

# View current HEAD commit
git rev-parse HEAD

Configuration Files

# Repository-specific config
cat .git/config

# Example content:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = https://github.com/user/repo.git
fetch = +refs/heads/*:refs/remotes/origin/*

Object Database

Objects Directory

# Object storage structure
.git/objects/
├── 01/ # First 2 chars of hash
│ └── 23456789... # Remaining 38 chars
├── ab/
│ └── cdef1234...
├── info/ # Object metadata
├── pack/ # Pack files
└── tmp_obj_* # Temporary objects

Object Types and Storage

# View object type
git cat-file -t a1b2c3d4

# View object size
git cat-file -s a1b2c3d4

# View object content
git cat-file -p a1b2c3d4

# List all objects
find .git/objects -type f | wc -l

Pack Files

# Pack file structure
.git/objects/pack/
├── pack-abc123.idx # Pack index
├── pack-abc123.pack # Pack data
└── pack-abc123.rev # Reverse index (Git 2.31+)

# View pack contents
git verify-pack -v .git/objects/pack/pack-*.idx | head -20

References System

References Directory

# References structure
.git/refs/
├── heads/ # Local branches
│ ├── main
│ └── feature-branch
├── remotes/ # Remote branches
│ └── origin/
│ ├── main
│ └── feature-branch
└── tags/ # Tags
└── v1.0.0

Branch References

# View branch reference
cat .git/refs/heads/main
# Output: commit-hash

# View all references
git show-ref

# Create reference manually
git update-ref refs/heads/new-branch commit-hash

Remote References

# Remote tracking branches
ls .git/refs/remotes/origin/

# View remote reference
cat .git/refs/remotes/origin/main

# Update remote references
git fetch origin

Index (Staging Area)

Index Structure

# Index file location
ls -la .git/index

# View index contents
git ls-files --stage

# Example output:
# 100644 a1b2c3d4... 0 README.md
# 100644 e5f6789... 0 src/main.js

Index Entries

Each index entry contains:

  • File mode - Permissions (100644, 100755, etc.)
  • Object hash - SHA-1 of file contents
  • Stage number - 0 for normal, 1-3 for conflicts
  • Filename - Relative path from repository root

Index Operations

# Add file to index
git add filename.txt

# Update index from working directory
git add -u

# Remove file from index
git rm --cached filename.txt

# View index as tree
git write-tree

Working Directory

Working Directory Structure

# Working directory is your project files
project/
├── .git/ # Git repository (hidden)
├── README.md
├── src/
│ ├── main.js
│ └── utils.js
└── package.json

File States

Files in working directory have different states:

  • Tracked - Files Git knows about
  • Untracked - Files Git doesn't know about
  • Modified - Tracked files with changes
  • Staged - Files ready for commit
# Check file states
git status

# View changes in working directory
git diff

# View staged changes
git diff --staged

Hooks Directory

Hook Scripts

# Hooks directory
.git/hooks/
├── applypatch-msg.sample
├── commit-msg.sample
├── pre-commit.sample
├── pre-push.sample
├── pre-rebase.sample
└── prepare-commit-msg.sample

# Activate hook by removing .sample extension
mv .git/hooks/pre-commit.sample .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

Hook Types

Client-side hooks:

  • pre-commit - Before commit is created
  • prepare-commit-msg - Before commit message editor
  • commit-msg - After commit message is entered
  • post-commit - After commit is created
  • pre-push - Before push to remote

Server-side hooks:

  • pre-receive - Before objects are received
  • update - Before branch is updated
  • post-receive - After objects are received

Logs Directory

Reference Logs

# Logs directory
.git/logs/
├── HEAD # HEAD movement log
└── refs/
├── heads/
│ └── main # Branch-specific logs
└── remotes/
└── origin/
└── main # Remote branch logs

Reflog Operations

# View reflog
git reflog

# View specific branch reflog
git reflog show main

# View reflog for specific time
git reflog show HEAD@{2.days.ago}

Info Directory

Global Excludes

# Global excludes (like .gitignore)
.git/info/exclude

# Example content:
*.tmp
*.log
*~

Sparse Checkout

# Sparse checkout patterns
.git/info/sparse-checkout

# Example content:
src/
docs/
!docs/temp/

Repository Initialization

New Repository

# Initialize new repository
git init

# Initialize bare repository
git init --bare

# Initialize with custom branch
git init --initial-branch=main

Repository Format

# Repository format version
cat .git/config | grep repositoryformatversion
# Output: repositoryformatversion = 0

# Version 0: Basic Git repository
# Version 1: Extensions supported

Repository Maintenance

Garbage Collection

# Remove unreachable objects
git gc

# Aggressive garbage collection
git gc --aggressive

# Prune objects older than 2 weeks
git prune --expire=2.weeks.ago

Repository Verification

# Check repository integrity
git fsck --full

# Check object connectivity
git fsck --connectivity-only

# Verify pack files
git verify-pack -v .git/objects/pack/pack-*.idx

Advanced Structure

Alternate Object Directories

# Share objects with other repositories
echo "/path/to/shared/objects" > .git/objects/info/alternates

# View alternate paths
cat .git/objects/info/alternates

Worktrees

# Multiple working directories
git worktree add ../feature-branch feature-branch

# Worktree administrative files
.git/worktrees/
└── feature-branch/
├── HEAD
├── commondir
└── gitdir

Repository Cloning

Clone Process

# What happens during clone:
# 1. Create .git directory
# 2. Initialize object database
# 3. Add remote origin
# 4. Fetch all objects
# 5. Create working directory
# 6. Checkout default branch

Clone Variations

# Shallow clone (limited history)
git clone --depth 1 repo-url

# Bare clone (no working directory)
git clone --bare repo-url

# Clone specific branch
git clone --branch feature-branch repo-url

Performance Considerations

Repository Size

# Check repository size
du -sh .git/

# Check object count
git count-objects -v

# Find large objects
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
awk '/^blob/ {print substr($0,6)}' | \
sort --numeric-sort --key=2 | \
tail -10

Optimization

# Repack objects
git repack -adf

# Create multi-pack index
git multi-pack-index write

# Compress references
git pack-refs --all

Security Considerations

File Permissions

# Git directory permissions
chmod 755 .git/
chmod 644 .git/config
chmod 755 .git/hooks/*

# Protect sensitive files
chmod 600 .git/config # If contains credentials

Repository Validation

# Verify repository integrity
git fsck --strict

# Check for suspicious objects
git fsck --unreachable

# Validate pack files
git verify-pack -v .git/objects/pack/pack-*.idx

Troubleshooting

Common Issues

  1. Corrupted index

    rm .git/index
    git reset
  2. Missing objects

    git fsck --full
    git prune
    git gc
  3. Broken references

    git update-ref -d refs/heads/broken-branch
    git reflog expire --expire=now --all

Recovery Procedures

# Backup repository
cp -r .git .git.backup

# Restore from backup
rm -rf .git
mv .git.backup .git

# Rebuild index
git reset --hard HEAD

Best Practices

Repository Organization

  1. Keep .git directory clean - Don't manually edit files
  2. Regular maintenance - Run git gc periodically
  3. Monitor size - Track repository growth
  4. Backup important repositories - Protect against corruption
  5. Use appropriate clone types - Shallow for CI, full for development

Security Practices

  1. Protect .git directory - Never expose via web server
  2. Validate repository integrity - Regular fsck checks
  3. Use signed commits - For critical repositories
  4. Monitor access - Track who accesses repositories
  5. Backup and recovery - Plan for disaster recovery

Summary

Git's repository structure provides:

  • Separation of concerns - Working directory, staging, and database
  • Integrity - Cryptographic verification of all data
  • Efficiency - Optimized storage and access patterns
  • Flexibility - Multiple workflow support
  • Robustness - Built-in corruption detection and recovery

Understanding this structure is essential for:

  • Effective Git usage
  • Repository maintenance
  • Performance optimization
  • Troubleshooting issues
  • Advanced Git operations

See Git Data Model for object details and the Git documentation for how commits build on this structure.