heapq – In-place heap sort algorithm

Purpose:The heapq implements a min-heap sort algorithm suitable for use with Python’s lists.
Available In:New in 2.3 with additions in 2.5

A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.

A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap.

Example Data

The examples below use this sample data:

# This data was generated with the random module.

data = [19, 9, 4, 10, 11, 8, 2]

The heap output is printed using heapq_showtree.py:

import math
from cStringIO import StringIO

def show_tree(tree, total_width=36, fill=' '):
    """Pretty-print a tree."""
    output = StringIO()
    last_row = -1
    for i, n in enumerate(tree):
        if i:
            row = int(math.floor(math.log(i+1, 2)))
        else:
            row = 0
        if row != last_row:
            output.write('\n')
        columns = 2**row
        col_width = int(math.floor((total_width * 1.0) / columns))
        output.write(str(n).center(col_width, fill))
        last_row = row
    print output.getvalue()
    print '-' * total_width
    print
    return

Creating a Heap

There are 2 basic ways to create a heap, heappush() and heapify().

Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heap = []
print 'random :', data
print

for n in data:
    print 'add %3d:' % n
    heapq.heappush(heap, n)
    show_tree(heap)
$ python heapq_heappush.py

random : [19, 9, 4, 10, 11, 8, 2]

add  19:

                 19
------------------------------------

add   9:

                 9
        19
------------------------------------

add   4:

                 4
        19                9
------------------------------------

add  10:

                 4
        10                9
    19
------------------------------------

add  11:

                 4
        10                9
    19       11
------------------------------------

add   8:

                 4
        10                8
    19       11       9
------------------------------------

add   2:

                 2
        10                4
    19       11       9        8
------------------------------------

If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print 'random    :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
$ python heapq_heapify.py

random    : [19, 9, 4, 10, 11, 8, 2]
heapified :

                 2
        9                 4
    10       11       8        19
------------------------------------

Accessing Contents of a Heap

Once the heap is organized correctly, use heappop() to remove the element with the lowest value. In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print 'random    :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
print

inorder = []
while data:
    smallest = heapq.heappop(data)
    print 'pop    %3d:' % smallest
    show_tree(data)
    inorder.append(smallest)
print 'inorder   :', inorder
$ python heapq_heappop.py

random    : [19, 9, 4, 10, 11, 8, 2]
heapified :

                 2
        9                 4
    10       11       8        19
------------------------------------


pop      2:

                 4
        9                 8
    10       11       19
------------------------------------

pop      4:

                 8
        9                 19
    10       11
------------------------------------

pop      8:

                 9
        10                19
    11
------------------------------------

pop      9:

                 10
        11                19
------------------------------------

pop     10:

                 11
        19
------------------------------------

pop     11:

                 19
------------------------------------

pop     19:

------------------------------------

inorder   : [2, 4, 8, 9, 10, 11, 19]

To remove existing elements and replace them with new values in a single operation, use heapreplace().

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heapq.heapify(data)
print 'start:'
show_tree(data)

for n in [0, 7, 13, 9, 5]:
    smallest = heapq.heapreplace(data, n)
    print 'replace %2d with %2d:' % (smallest, n)
    show_tree(data)

Replacing elements in place lets you maintain a fixed size heap, such as a queue of jobs ordered by priority.

$ python heapq_heapreplace.py

start:

                 2
        9                 4
    10       11       8        19
------------------------------------

replace  2 with  0:

                 0
        9                 4
    10       11       8        19
------------------------------------

replace  0 with  7:

                 4
        9                 7
    10       11       8        19
------------------------------------

replace  4 with 13:

                 7
        9                 8
    10       11       13       19
------------------------------------

replace  7 with  9:

                 8
        9                 9
    10       11       13       19
------------------------------------

replace  8 with  5:

                 5
        9                 9
    10       11       13       19
------------------------------------

Data Extremes

heapq also includes 2 functions to examine an iterable to find a range of the largest or smallest values it contains. Using nlargest() and nsmallest() are really only efficient for relatively small values of n > 1, but can still come in handy in a few cases.

import heapq
from heapq_heapdata import data

print 'all       :', data
print '3 largest :', heapq.nlargest(3, data)
print 'from sort :', list(reversed(sorted(data)[-3:]))
print '3 smallest:', heapq.nsmallest(3, data)
print 'from sort :', sorted(data)[:3]
$ python heapq_extremes.py

all       : [19, 9, 4, 10, 11, 8, 2]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [2, 4, 8]
from sort : [2, 4, 8]

See also

heapq
The standard library documentation for this module.
WikiPedia: Heap (data structure)
A general description of heap data structures.
In-Memory Data Structures
Other Python data structures.
Priority Queue
A priority queue implementation from Queue in the standard library.