Skip to main content

heapq — Heap Queue Algorithm

📚 Official Documentation & Resources

Overview

The heapq module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Heaps are binary trees where every parent node has a value less than or equal to any of its children (min-heap). This module transforms regular Python lists into heaps and provides functions for heap operations.

The module was introduced in Python 2.3 and implements a min-heap using a list data structure. It's particularly useful for algorithms that need to repeatedly access the smallest element, such as Dijkstra's algorithm, event simulation, and task scheduling.

🎯 Key Characteristics

  • Min-Heap Implementation - Always maintains the smallest element at index 0
  • In-Place Operations - Transforms regular lists into heaps without extra memory
  • O(log n) Insertion/Deletion - Efficient heap operations with logarithmic complexity
  • O(1) Minimum Access - Constant time access to the smallest element
  • Pure Python Functions - No class instantiation required, works with standard lists
  • Stable Sorting - Maintains relative order of equal elements in heap operations

🔧 Prerequisites and Setup

Python Version Compatibility

  • Available in Python 2.3+
  • No version-specific differences in core functionality

Installation and Imports

# Standard library (no installation needed)
import heapq

# Or import specific functions
from heapq import heappush, heappop, heapify

📚 Basic Usage

Simple Example

import heapq

# Create a heap from a list
numbers = [4, 1, 7, 3, 8, 5]
heapq.heapify(numbers) # Transform list into heap in-place
print(numbers) # [1, 3, 5, 4, 8, 7] - heap property maintained

# Add elements to heap
heapq.heappush(numbers, 2)
print(numbers) # [1, 3, 2, 4, 8, 7, 5]

# Remove and return smallest element
smallest = heapq.heappop(numbers)
print(f"Smallest: {smallest}") # Smallest: 1
print(numbers) # [2, 3, 5, 4, 8, 7]

# Peek at smallest without removing
print(f"Next smallest: {numbers[0]}") # Next smallest: 2

Core Functions

import heapq

# Initialize empty heap
heap = []

# Push elements
for value in [3, 1, 4, 1, 5, 9, 2, 6]:
heapq.heappush(heap, value)
print(f"After adding {value}: {heap}")

# Pop elements in sorted order
sorted_values = []
while heap:
smallest = heapq.heappop(heap)
sorted_values.append(smallest)
print(f"Popped {smallest}, remaining: {heap}")

print(f"Sorted order: {sorted_values}") # [1, 1, 2, 3, 4, 5, 6, 9]

Common Patterns

import heapq

# Pattern 1: Top-K smallest elements
def get_k_smallest(data, k):
return heapq.nsmallest(k, data)

# Pattern 2: Top-K largest elements
def get_k_largest(data, k):
return heapq.nlargest(k, data)

# Pattern 3: Merge sorted iterables
def merge_sorted_lists(*lists):
return list(heapq.merge(*lists))

# Pattern 4: Priority queue with tuples
def priority_queue_example():
pq = []
heapq.heappush(pq, (2, 'medium priority'))
heapq.heappush(pq, (1, 'high priority'))
heapq.heappush(pq, (3, 'low priority'))

while pq:
priority, task = heapq.heappop(pq)
print(f"Processing: {task} (priority {priority})")

🔧 heapq API Reference

Core Functions

FunctionDescriptionTime ComplexityReturn TypeExample
heappush(heap, item)Push item onto heapO(log n)Noneheappush(h, 5)
heappop(heap)Pop and return smallest itemO(log n)Anyx = heappop(h)
heappushpop(heap, item)Push item then pop smallestO(log n)Anyx = heappushpop(h, 5)
heapreplace(heap, item)Pop smallest then push itemO(log n)Anyx = heapreplace(h, 5)
heapify(x)Transform list into heap in-placeO(n)Noneheapify(mylist)
nlargest(k, iterable, key=None)Find k largest elementsO(n log k)listnlargest(3, data)
nsmallest(k, iterable, key=None)Find k smallest elementsO(n log k)listnsmallest(3, data)
merge(*iterables, key=None, reverse=False)Merge sorted iterablesO(n log k)iteratormerge(list1, list2)

Function Parameters

heappush(heap, item)

  • heap: List representing the heap
  • item: Element to add to the heap
  • Returns: None (modifies heap in-place)

heappop(heap)

  • heap: Non-empty list representing the heap
  • Returns: The smallest element from the heap
  • Raises: IndexError if heap is empty

heappushpop(heap, item)

  • heap: List representing the heap
  • item: Element to push before popping
  • Returns: The smallest element (either the pushed item or existing smallest)
  • Note: More efficient than separate push/pop operations

heapreplace(heap, item)

  • heap: Non-empty list representing the heap
  • item: Element to push after popping
  • Returns: The popped smallest element
  • Raises: IndexError if heap is empty

nlargest(k, iterable, key=None) / nsmallest(k, iterable, key=None)

  • k: Number of elements to return
  • iterable: Source data
  • key: Function to extract comparison key (optional)
  • Returns: List of k largest/smallest elements

Detailed Function Examples

heappush and heappop

import heapq

# Build a heap step by step
heap = []
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

for item in data:
heapq.heappush(heap, item)
print(f"Heap after adding {item}: {heap}")

# Extract all elements in sorted order
sorted_result = []
while heap:
smallest = heapq.heappop(heap)
sorted_result.append(smallest)

print(f"Sorted: {sorted_result}") # [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]

heappushpop vs heapreplace

import heapq

# heappushpop: push then pop
heap1 = [1, 3, 5, 7, 9]
heapq.heapify(heap1)
result1 = heapq.heappushpop(heap1, 4) # Push 4, then pop smallest
print(f"heappushpop result: {result1}") # 1 (smallest overall)
print(f"Heap after: {heap1}") # [3, 7, 5, 4, 9]

# heapreplace: pop then push
heap2 = [1, 3, 5, 7, 9]
heapq.heapify(heap2)
result2 = heapq.heapreplace(heap2, 4) # Pop smallest, then push 4
print(f"heapreplace result: {result2}") # 1 (original smallest)
print(f"Heap after: {heap2}") # [3, 4, 5, 7, 9]

nlargest and nsmallest with key function

import heapq

# Working with complex data
students = [
('Alice', 85),
('Bob', 90),
('Charlie', 78),
('Diana', 92),
('Eve', 88)
]

# Top 3 students by grade
top_students = heapq.nlargest(3, students, key=lambda x: x[1])
print("Top 3 students:", top_students)
# [('Diana', 92), ('Bob', 90), ('Eve', 88)]

# Bottom 2 students by grade
bottom_students = heapq.nsmallest(2, students, key=lambda x: x[1])
print("Bottom 2 students:", bottom_students)
# [('Charlie', 78), ('Alice', 85)]

merge function

import heapq

# Merge multiple sorted iterables
list1 = [1, 4, 7, 10]
list2 = [2, 5, 8, 11]
list3 = [3, 6, 9, 12]

# Merge maintains sorted order
merged = list(heapq.merge(list1, list2, list3))
print("Merged:", merged) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

# With custom key function
words1 = ['a', 'cat', 'dog']
words2 = ['an', 'bat', 'fox']
merged_by_length = list(heapq.merge(words1, words2, key=len))
print("Merged by length:", merged_by_length) # ['a', 'an', 'cat', 'bat', 'dog', 'fox']

Important Notes

Heap Property

  • Index 0 always contains the smallest element
  • For any index k: heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] (if children exist)
  • No guaranteed order between siblings or across levels

Max-Heap Implementation

import heapq

# Use negative values for max-heap behavior
def max_heap_example():
max_heap = []
data = [3, 1, 4, 1, 5, 9, 2]

# Push negative values
for item in data:
heapq.heappush(max_heap, -item)

# Pop and negate to get max values
largest = -heapq.heappop(max_heap)
print(f"Largest value: {largest}") # 9

Thread Safety

  • heapq operations are not thread-safe
  • Use locks for concurrent access
  • Consider queue.PriorityQueue for thread-safe priority queues

🎯 When to Use heapq

✅ Ideal Use Cases

  • Priority Queue Systems - Task scheduling, event processing, job queues
  • Top-K Problems - Finding largest/smallest elements without full sorting
  • Streaming Data Analysis - Processing large datasets with limited memory
  • Graph Algorithms - Dijkstra's shortest path, Prim's MST, A* search
  • Event-Driven Simulation - Discrete event systems, scheduling algorithms
  • Merge Operations - Combining sorted data streams or files
  • Real-Time Systems - Maintaining running statistics, sliding window calculations
  • Cache Implementation - LRU cache with priority-based eviction

❌ When NOT to Use heapq

  • Need Random Access - Heap doesn't provide efficient access to arbitrary elements
  • Frequent Deletion of Non-Minimum - O(n) time to remove arbitrary elements
  • Small Datasets - Overhead may not be worth it for very small collections
  • Need Stable Ordering - Heap doesn't guarantee stable sort without extra work
  • Maximum Heap Required - Python's heapq only implements min-heap natively
  • Thread-Safe Operations - heapq operations are not thread-safe
  • Need Duplicate Handling - No built-in support for counting or handling duplicates

Alternative Solutions

Built-in Alternatives

# For finding min/max of small datasets
min_val = min(data)
max_val = max(data)

# For sorting small datasets completely
sorted_data = sorted(data)

# For top-k when you need full dataset sorted anyway
top_k = sorted(data, reverse=True)[:k]

Third-party Alternatives

# For thread-safe priority queues
import queue
pq = queue.PriorityQueue()

# For more advanced heap operations
# Consider: sortedcontainers, blist, or custom implementations

Performance Considerations

Time Complexity Summary

OperationTime ComplexityNotes
heappushO(log n)Insert element maintaining heap property
heappopO(log n)Remove minimum element and restore heap
heappushpopO(log n)Combined push/pop, more efficient than separate
heapreplaceO(log n)Combined pop/push, more efficient than separate
heapifyO(n)Build heap from arbitrary list
nlargest/nsmallestO(n log k)Where k is the number of elements requested
mergeO(n log k)Where k is the number of iterables
Access minimumO(1)Direct access to heap[0]

Memory Usage Tips

import heapq
import sys

# Memory-efficient patterns
def memory_efficient_top_k(iterable, k):
"""More memory efficient than storing entire dataset"""
heap = []
for item in iterable:
if len(heap) < k:
heapq.heappush(heap, item)
elif item > heap[0]:
heapq.heapreplace(heap, item)
return sorted(heap, reverse=True)

# Compare memory usage
large_dataset = range(1000000)

# Memory inefficient: loads everything into memory
# all_data = list(large_dataset)
# top_100 = heapq.nlargest(100, all_data)

# Memory efficient: processes stream
top_100_efficient = memory_efficient_top_k(large_dataset, 100)

# Heap size comparison
heap1 = list(range(1000))
heapq.heapify(heap1)
heap2 = []
for i in range(1000):
heapq.heappush(heap2, i)

print(f"Heapify memory: {sys.getsizeof(heap1)} bytes")
print(f"Individual push memory: {sys.getsizeof(heap2)} bytes")

💡 Best Practices

  1. Use heapify for Existing Data - When converting a list to heap, use heapify() (O(n)) instead of individual pushes (O(n log n))

  2. Implement Proper Comparison - For custom objects, implement __lt__ method or use tuples with priority values

  3. Handle Empty Heaps Gracefully - Always check heap size before popping or accessing elements

  4. Choose Right Function for Top-K - Use nlargest/nsmallest for small k, heap operations for dynamic/streaming scenarios

  5. Use Tuples for Complex Priority - Combine priority, timestamp, and data in tuples for sophisticated ordering

  6. Consider Memory Efficiency - For large datasets, prefer streaming approaches over loading everything into memory

  7. Document Heap Invariants - Clearly document that your code maintains heap property and what constitutes valid heap state