Repository Structure
Understanding Git's repository organization, directory structure, and the relationship between working directory, staging area, and Git database.
Repository Overview
The Three Areas
Git manages three distinct areas:
- Working Directory - Your current files and changes
- Staging Area (Index) - Prepared changes for next commit
- Git Directory - Object database and repository metadata
# Visual representation:
Working Directory → Staging Area → Git Directory
(git add) (git commit)
Repository Types
Local Repository:
- Complete Git repository on your machine
- Contains full history and all objects
- Can work offline
Remote Repository:
- Repository hosted on server (GitHub, GitLab, etc.)
- Shared among team members
- Central source of truth
Bare Repository:
- Repository without working directory
- Only contains Git database
- Used for server repositories
Git Directory Structure
Core Components
# .git directory structure
.git/
├── HEAD # Current branch pointer
├── config # Repository configuration
├── description # Repository description
├── hooks/ # Hook scripts
├── info/ # Global excludes
├── objects/ # Object database
├── refs/ # Reference storage
├── index # Staging area
├── logs/ # Reference logs
└── packed-refs # Packed references
HEAD Reference
# HEAD points to current branch
cat .git/HEAD
# Output: ref: refs/heads/main
# Or directly to commit (detached HEAD)
# Output: a1b2c3d4e5f6789012345678901234567890abcd
# View current HEAD commit
git rev-parse HEAD
Configuration Files
# Repository-specific config
cat .git/config
# Example content:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = https://github.com/user/repo.git
fetch = +refs/heads/*:refs/remotes/origin/*
Object Database
Objects Directory
# Object storage structure
.git/objects/
├── 01/ # First 2 chars of hash
│ └── 23456789... # Remaining 38 chars
├── ab/
│ └── cdef1234...
├── info/ # Object metadata
├── pack/ # Pack files
└── tmp_obj_* # Temporary objects
Object Types and Storage
# View object type
git cat-file -t a1b2c3d4
# View object size
git cat-file -s a1b2c3d4
# View object content
git cat-file -p a1b2c3d4
# List all objects
find .git/objects -type f | wc -l
Pack Files
# Pack file structure
.git/objects/pack/
├── pack-abc123.idx # Pack index
├── pack-abc123.pack # Pack data
└── pack-abc123.rev # Reverse index (Git 2.31+)
# View pack contents
git verify-pack -v .git/objects/pack/pack-*.idx | head -20
References System
References Directory
# References structure
.git/refs/
├── heads/ # Local branches
│ ├── main
│ └── feature-branch
├── remotes/ # Remote branches
│ └── origin/
│ ├── main
│ └── feature-branch
└── tags/ # Tags
└── v1.0.0
Branch References
# View branch reference
cat .git/refs/heads/main
# Output: commit-hash
# View all references
git show-ref
# Create reference manually
git update-ref refs/heads/new-branch commit-hash
Remote References
# Remote tracking branches
ls .git/refs/remotes/origin/
# View remote reference
cat .git/refs/remotes/origin/main
# Update remote references
git fetch origin
Index (Staging Area)
Index Structure
# Index file location
ls -la .git/index
# View index contents
git ls-files --stage
# Example output:
# 100644 a1b2c3d4... 0 README.md
# 100644 e5f6789... 0 src/main.js
Index Entries
Each index entry contains:
- File mode - Permissions (100644, 100755, etc.)
- Object hash - SHA-1 of file contents
- Stage number - 0 for normal, 1-3 for conflicts
- Filename - Relative path from repository root
Index Operations
# Add file to index
git add filename.txt
# Update index from working directory
git add -u
# Remove file from index
git rm --cached filename.txt
# View index as tree
git write-tree
Working Directory
Working Directory Structure
# Working directory is your project files
project/
├── .git/ # Git repository (hidden)
├── README.md
├── src/
│ ├── main.js
│ └── utils.js
└── package.json
File States
Files in working directory have different states:
- Tracked - Files Git knows about
- Untracked - Files Git doesn't know about
- Modified - Tracked files with changes
- Staged - Files ready for commit
# Check file states
git status
# View changes in working directory
git diff
# View staged changes
git diff --staged
Hooks Directory
Hook Scripts
# Hooks directory
.git/hooks/
├── applypatch-msg.sample
├── commit-msg.sample
├── pre-commit.sample
├── pre-push.sample
├── pre-rebase.sample
└── prepare-commit-msg.sample
# Activate hook by removing .sample extension
mv .git/hooks/pre-commit.sample .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
Hook Types
Client-side hooks:
pre-commit- Before commit is createdprepare-commit-msg- Before commit message editorcommit-msg- After commit message is enteredpost-commit- After commit is createdpre-push- Before push to remote
Server-side hooks:
pre-receive- Before objects are receivedupdate- Before branch is updatedpost-receive- After objects are received
Logs Directory
Reference Logs
# Logs directory
.git/logs/
├── HEAD # HEAD movement log
└── refs/
├── heads/
│ └── main # Branch-specific logs
└── remotes/
└── origin/
└── main # Remote branch logs
Reflog Operations
# View reflog
git reflog
# View specific branch reflog
git reflog show main
# View reflog for specific time
git reflog show HEAD@{2.days.ago}
Info Directory
Global Excludes
# Global excludes (like .gitignore)
.git/info/exclude
# Example content:
*.tmp
*.log
*~
Sparse Checkout
# Sparse checkout patterns
.git/info/sparse-checkout
# Example content:
src/
docs/
!docs/temp/
Repository Initialization
New Repository
# Initialize new repository
git init
# Initialize bare repository
git init --bare
# Initialize with custom branch
git init --initial-branch=main
Repository Format
# Repository format version
cat .git/config | grep repositoryformatversion
# Output: repositoryformatversion = 0
# Version 0: Basic Git repository
# Version 1: Extensions supported
Repository Maintenance
Garbage Collection
# Remove unreachable objects
git gc
# Aggressive garbage collection
git gc --aggressive
# Prune objects older than 2 weeks
git prune --expire=2.weeks.ago
Repository Verification
# Check repository integrity
git fsck --full
# Check object connectivity
git fsck --connectivity-only
# Verify pack files
git verify-pack -v .git/objects/pack/pack-*.idx
Advanced Structure
Alternate Object Directories
# Share objects with other repositories
echo "/path/to/shared/objects" > .git/objects/info/alternates
# View alternate paths
cat .git/objects/info/alternates
Worktrees
# Multiple working directories
git worktree add ../feature-branch feature-branch
# Worktree administrative files
.git/worktrees/
└── feature-branch/
├── HEAD
├── commondir
└── gitdir
Repository Cloning
Clone Process
# What happens during clone:
# 1. Create .git directory
# 2. Initialize object database
# 3. Add remote origin
# 4. Fetch all objects
# 5. Create working directory
# 6. Checkout default branch
Clone Variations
# Shallow clone (limited history)
git clone --depth 1 repo-url
# Bare clone (no working directory)
git clone --bare repo-url
# Clone specific branch
git clone --branch feature-branch repo-url
Performance Considerations
Repository Size
# Check repository size
du -sh .git/
# Check object count
git count-objects -v
# Find large objects
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
awk '/^blob/ {print substr($0,6)}' | \
sort --numeric-sort --key=2 | \
tail -10
Optimization
# Repack objects
git repack -adf
# Create multi-pack index
git multi-pack-index write
# Compress references
git pack-refs --all
Security Considerations
File Permissions
# Git directory permissions
chmod 755 .git/
chmod 644 .git/config
chmod 755 .git/hooks/*
# Protect sensitive files
chmod 600 .git/config # If contains credentials
Repository Validation
# Verify repository integrity
git fsck --strict
# Check for suspicious objects
git fsck --unreachable
# Validate pack files
git verify-pack -v .git/objects/pack/pack-*.idx
Troubleshooting
Common Issues
-
Corrupted index
rm .git/index
git reset -
Missing objects
git fsck --full
git prune
git gc -
Broken references
git update-ref -d refs/heads/broken-branch
git reflog expire --expire=now --all
Recovery Procedures
# Backup repository
cp -r .git .git.backup
# Restore from backup
rm -rf .git
mv .git.backup .git
# Rebuild index
git reset --hard HEAD
Best Practices
Repository Organization
- Keep .git directory clean - Don't manually edit files
- Regular maintenance - Run git gc periodically
- Monitor size - Track repository growth
- Backup important repositories - Protect against corruption
- Use appropriate clone types - Shallow for CI, full for development
Security Practices
- Protect .git directory - Never expose via web server
- Validate repository integrity - Regular fsck checks
- Use signed commits - For critical repositories
- Monitor access - Track who accesses repositories
- Backup and recovery - Plan for disaster recovery
Summary
Git's repository structure provides:
- Separation of concerns - Working directory, staging, and database
- Integrity - Cryptographic verification of all data
- Efficiency - Optimized storage and access patterns
- Flexibility - Multiple workflow support
- Robustness - Built-in corruption detection and recovery
Understanding this structure is essential for:
- Effective Git usage
- Repository maintenance
- Performance optimization
- Troubleshooting issues
- Advanced Git operations
See Git Data Model for object details and the Git documentation for how commits build on this structure.