heapq — Heap Queue Algorithm
📚 Official Documentation & Resources
- Python Official Documentation - Complete API reference and examples
- PEP 456 - Adding heap operations to Python
- Real Python Tutorial - In-depth tutorial with practical examples
- Python Module of the Week - Comprehensive examples and use cases
- GeeksforGeeks Guide - Beginner-friendly tutorial
- Python Tips Blog - Quick reference and tips
- Stack Overflow: heapq - Common questions and solutions
Overview
The heapq module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Heaps are binary trees where every parent node has a value less than or equal to any of its children (min-heap). This module transforms regular Python lists into heaps and provides functions for heap operations.
The module was introduced in Python 2.3 and implements a min-heap using a list data structure. It's particularly useful for algorithms that need to repeatedly access the smallest element, such as Dijkstra's algorithm, event simulation, and task scheduling.
🎯 Key Characteristics
- Min-Heap Implementation - Always maintains the smallest element at index 0
- In-Place Operations - Transforms regular lists into heaps without extra memory
- O(log n) Insertion/Deletion - Efficient heap operations with logarithmic complexity
- O(1) Minimum Access - Constant time access to the smallest element
- Pure Python Functions - No class instantiation required, works with standard lists
- Stable Sorting - Maintains relative order of equal elements in heap operations
🔧 Prerequisites and Setup
Python Version Compatibility
- Available in Python 2.3+
- No version-specific differences in core functionality
Installation and Imports
# Standard library (no installation needed)
import heapq
# Or import specific functions
from heapq import heappush, heappop, heapify
📚 Basic Usage
Simple Example
import heapq
# Create a heap from a list
numbers = [4, 1, 7, 3, 8, 5]
heapq.heapify(numbers) # Transform list into heap in-place
print(numbers) # [1, 3, 5, 4, 8, 7] - heap property maintained
# Add elements to heap
heapq.heappush(numbers, 2)
print(numbers) # [1, 3, 2, 4, 8, 7, 5]
# Remove and return smallest element
smallest = heapq.heappop(numbers)
print(f"Smallest: {smallest}") # Smallest: 1
print(numbers) # [2, 3, 5, 4, 8, 7]
# Peek at smallest without removing
print(f"Next smallest: {numbers[0]}") # Next smallest: 2
Core Functions
import heapq
# Initialize empty heap
heap = []
# Push elements
for value in [3, 1, 4, 1, 5, 9, 2, 6]:
heapq.heappush(heap, value)
print(f"After adding {value}: {heap}")
# Pop elements in sorted order
sorted_values = []
while heap:
smallest = heapq.heappop(heap)
sorted_values.append(smallest)
print(f"Popped {smallest}, remaining: {heap}")
print(f"Sorted order: {sorted_values}") # [1, 1, 2, 3, 4, 5, 6, 9]
Common Patterns
import heapq
# Pattern 1: Top-K smallest elements
def get_k_smallest(data, k):
return heapq.nsmallest(k, data)
# Pattern 2: Top-K largest elements
def get_k_largest(data, k):
return heapq.nlargest(k, data)
# Pattern 3: Merge sorted iterables
def merge_sorted_lists(*lists):
return list(heapq.merge(*lists))
# Pattern 4: Priority queue with tuples
def priority_queue_example():
pq = []
heapq.heappush(pq, (2, 'medium priority'))
heapq.heappush(pq, (1, 'high priority'))
heapq.heappush(pq, (3, 'low priority'))
while pq:
priority, task = heapq.heappop(pq)
print(f"Processing: {task} (priority {priority})")
🔧 heapq API Reference
Core Functions
| Function | Description | Time Complexity | Return Type | Example |
|---|---|---|---|---|
heappush(heap, item) | Push item onto heap | O(log n) | None | heappush(h, 5) |
heappop(heap) | Pop and return smallest item | O(log n) | Any | x = heappop(h) |
heappushpop(heap, item) | Push item then pop smallest | O(log n) | Any | x = heappushpop(h, 5) |
heapreplace(heap, item) | Pop smallest then push item | O(log n) | Any | x = heapreplace(h, 5) |
heapify(x) | Transform list into heap in-place | O(n) | None | heapify(mylist) |
nlargest(k, iterable, key=None) | Find k largest elements | O(n log k) | list | nlargest(3, data) |
nsmallest(k, iterable, key=None) | Find k smallest elements | O(n log k) | list | nsmallest(3, data) |
merge(*iterables, key=None, reverse=False) | Merge sorted iterables | O(n log k) | iterator | merge(list1, list2) |
Function Parameters
heappush(heap, item)
- heap: List representing the heap
- item: Element to add to the heap
- Returns: None (modifies heap in-place)
heappop(heap)
- heap: Non-empty list representing the heap
- Returns: The smallest element from the heap
- Raises: IndexError if heap is empty
heappushpop(heap, item)
- heap: List representing the heap
- item: Element to push before popping
- Returns: The smallest element (either the pushed item or existing smallest)
- Note: More efficient than separate push/pop operations
heapreplace(heap, item)
- heap: Non-empty list representing the heap
- item: Element to push after popping
- Returns: The popped smallest element
- Raises: IndexError if heap is empty
nlargest(k, iterable, key=None) / nsmallest(k, iterable, key=None)
- k: Number of elements to return
- iterable: Source data
- key: Function to extract comparison key (optional)
- Returns: List of k largest/smallest elements
Detailed Function Examples
heappush and heappop
import heapq
# Build a heap step by step
heap = []
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
for item in data:
heapq.heappush(heap, item)
print(f"Heap after adding {item}: {heap}")
# Extract all elements in sorted order
sorted_result = []
while heap:
smallest = heapq.heappop(heap)
sorted_result.append(smallest)
print(f"Sorted: {sorted_result}") # [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]
heappushpop vs heapreplace
import heapq
# heappushpop: push then pop
heap1 = [1, 3, 5, 7, 9]
heapq.heapify(heap1)
result1 = heapq.heappushpop(heap1, 4) # Push 4, then pop smallest
print(f"heappushpop result: {result1}") # 1 (smallest overall)
print(f"Heap after: {heap1}") # [3, 7, 5, 4, 9]
# heapreplace: pop then push
heap2 = [1, 3, 5, 7, 9]
heapq.heapify(heap2)
result2 = heapq.heapreplace(heap2, 4) # Pop smallest, then push 4
print(f"heapreplace result: {result2}") # 1 (original smallest)
print(f"Heap after: {heap2}") # [3, 4, 5, 7, 9]
nlargest and nsmallest with key function
import heapq
# Working with complex data
students = [
('Alice', 85),
('Bob', 90),
('Charlie', 78),
('Diana', 92),
('Eve', 88)
]
# Top 3 students by grade
top_students = heapq.nlargest(3, students, key=lambda x: x[1])
print("Top 3 students:", top_students)
# [('Diana', 92), ('Bob', 90), ('Eve', 88)]
# Bottom 2 students by grade
bottom_students = heapq.nsmallest(2, students, key=lambda x: x[1])
print("Bottom 2 students:", bottom_students)
# [('Charlie', 78), ('Alice', 85)]
merge function
import heapq
# Merge multiple sorted iterables
list1 = [1, 4, 7, 10]
list2 = [2, 5, 8, 11]
list3 = [3, 6, 9, 12]
# Merge maintains sorted order
merged = list(heapq.merge(list1, list2, list3))
print("Merged:", merged) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
# With custom key function
words1 = ['a', 'cat', 'dog']
words2 = ['an', 'bat', 'fox']
merged_by_length = list(heapq.merge(words1, words2, key=len))
print("Merged by length:", merged_by_length) # ['a', 'an', 'cat', 'bat', 'dog', 'fox']
Important Notes
Heap Property
- Index 0 always contains the smallest element
- For any index k:
heap[k] <= heap[2*k+1]andheap[k] <= heap[2*k+2](if children exist) - No guaranteed order between siblings or across levels
Max-Heap Implementation
import heapq
# Use negative values for max-heap behavior
def max_heap_example():
max_heap = []
data = [3, 1, 4, 1, 5, 9, 2]
# Push negative values
for item in data:
heapq.heappush(max_heap, -item)
# Pop and negate to get max values
largest = -heapq.heappop(max_heap)
print(f"Largest value: {largest}") # 9
Thread Safety
- heapq operations are not thread-safe
- Use locks for concurrent access
- Consider
queue.PriorityQueuefor thread-safe priority queues
🎯 When to Use heapq
✅ Ideal Use Cases
- Priority Queue Systems - Task scheduling, event processing, job queues
- Top-K Problems - Finding largest/smallest elements without full sorting
- Streaming Data Analysis - Processing large datasets with limited memory
- Graph Algorithms - Dijkstra's shortest path, Prim's MST, A* search
- Event-Driven Simulation - Discrete event systems, scheduling algorithms
- Merge Operations - Combining sorted data streams or files
- Real-Time Systems - Maintaining running statistics, sliding window calculations
- Cache Implementation - LRU cache with priority-based eviction
❌ When NOT to Use heapq
- Need Random Access - Heap doesn't provide efficient access to arbitrary elements
- Frequent Deletion of Non-Minimum - O(n) time to remove arbitrary elements
- Small Datasets - Overhead may not be worth it for very small collections
- Need Stable Ordering - Heap doesn't guarantee stable sort without extra work
- Maximum Heap Required - Python's heapq only implements min-heap natively
- Thread-Safe Operations - heapq operations are not thread-safe
- Need Duplicate Handling - No built-in support for counting or handling duplicates
Alternative Solutions
Built-in Alternatives
# For finding min/max of small datasets
min_val = min(data)
max_val = max(data)
# For sorting small datasets completely
sorted_data = sorted(data)
# For top-k when you need full dataset sorted anyway
top_k = sorted(data, reverse=True)[:k]
Third-party Alternatives
# For thread-safe priority queues
import queue
pq = queue.PriorityQueue()
# For more advanced heap operations
# Consider: sortedcontainers, blist, or custom implementations
Performance Considerations
Time Complexity Summary
| Operation | Time Complexity | Notes |
|---|---|---|
heappush | O(log n) | Insert element maintaining heap property |
heappop | O(log n) | Remove minimum element and restore heap |
heappushpop | O(log n) | Combined push/pop, more efficient than separate |
heapreplace | O(log n) | Combined pop/push, more efficient than separate |
heapify | O(n) | Build heap from arbitrary list |
nlargest/nsmallest | O(n log k) | Where k is the number of elements requested |
merge | O(n log k) | Where k is the number of iterables |
| Access minimum | O(1) | Direct access to heap[0] |
Memory Usage Tips
import heapq
import sys
# Memory-efficient patterns
def memory_efficient_top_k(iterable, k):
"""More memory efficient than storing entire dataset"""
heap = []
for item in iterable:
if len(heap) < k:
heapq.heappush(heap, item)
elif item > heap[0]:
heapq.heapreplace(heap, item)
return sorted(heap, reverse=True)
# Compare memory usage
large_dataset = range(1000000)
# Memory inefficient: loads everything into memory
# all_data = list(large_dataset)
# top_100 = heapq.nlargest(100, all_data)
# Memory efficient: processes stream
top_100_efficient = memory_efficient_top_k(large_dataset, 100)
# Heap size comparison
heap1 = list(range(1000))
heapq.heapify(heap1)
heap2 = []
for i in range(1000):
heapq.heappush(heap2, i)
print(f"Heapify memory: {sys.getsizeof(heap1)} bytes")
print(f"Individual push memory: {sys.getsizeof(heap2)} bytes")
💡 Best Practices
-
Use heapify for Existing Data - When converting a list to heap, use
heapify()(O(n)) instead of individual pushes (O(n log n)) -
Implement Proper Comparison - For custom objects, implement
__lt__method or use tuples with priority values -
Handle Empty Heaps Gracefully - Always check heap size before popping or accessing elements
-
Choose Right Function for Top-K - Use
nlargest/nsmallestfor small k, heap operations for dynamic/streaming scenarios -
Use Tuples for Complex Priority - Combine priority, timestamp, and data in tuples for sophisticated ordering
-
Consider Memory Efficiency - For large datasets, prefer streaming approaches over loading everything into memory
-
Document Heap Invariants - Clearly document that your code maintains heap property and what constitutes valid heap state