linecache – Read text files efficiently

Purpose:Retrieve lines of text from files or imported python modules, holding a cache of the results to make reading many lines from the same file more efficient.
Available In:1.4

The linecache module is used extensively throughout the Python standard library when dealing with Python source files. The implementation of the cache simply holds the contents of files, parsed into separate lines, in a dictionary in memory. The API returns the requested line(s) by indexing into a list. The time savings is from (repeatedly) reading the file and parsing lines to find the one desired. This is especially useful when looking for multiple lines from the same file, such as when producing a traceback for an error report.

Test Data

We will use some text produced by the Lorem Ipsum generator as sample input.

import os
import tempfile

lorem = '''Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Vivamus eget elit. In posuere mi non risus. Mauris id quam posuere
lectus sollicitudin varius. Praesent at mi. Nunc eu velit. Sed augue
massa, fermentum id, nonummy a, nonummy sit amet, ligula. Curabitur
eros pede, egestas at, ultricies ac, pellentesque eu, tellus. 

Sed sed odio sed mi luctus mollis. Integer et nulla ac augue convallis
accumsan. Ut felis. Donec lectus sapien, elementum nec, condimentum ac,
interdum non, tellus. Aenean viverra, mauris vehicula semper porttitor,
ipsum odio consectetuer lorem, ac imperdiet eros odio a sapien. Nulla
mauris tellus, aliquam non, egestas a, nonummy et, erat. Vivamus
sagittis porttitor eros.'''

def make_tempfile():
    fd, temp_file_name = tempfile.mkstemp()
    os.close(fd)
    f = open(temp_file_name, 'wt')
    try:
        f.write(lorem)
    finally:
        f.close()
    return temp_file_name

def cleanup(filename):
    os.unlink(filename)

Reading Specific Lines

Reading the 5th line from the file is a simple one-liner. Notice that the line numbers in the linecache module start with 1, but if we split the string ourselves we start indexing the array from 0. We also need to strip the trailing newline from the value returned from the cache.

import linecache
from linecache_data import *

filename = make_tempfile()

# Pick out the same line from source and cache.
# (Notice that linecache counts from 1)
print 'SOURCE: ', lorem.split('\n')[4]
print 'CACHE : ', linecache.getline(filename, 5).rstrip()

cleanup(filename)
$ python linecache_getline.py

SOURCE:  eros pede, egestas at, ultricies ac, pellentesque eu, tellus.
CACHE :  eros pede, egestas at, ultricies ac, pellentesque eu, tellus.

Handling Blank Lines

Next let’s see what happens if the line we want is empty:

import linecache
from linecache_data import *

filename = make_tempfile()

# Blank lines include the newline
print '\nBLANK : "%s"' % linecache.getline(filename, 6)

cleanup(filename)
$ python linecache_empty_line.py


BLANK : "
"

Error Handling

If the requested line number falls out of the range of valid lines in the file, linecache returns an empty string.

import linecache
from linecache_data import *

filename = make_tempfile()

# The cache always returns a string, and uses
# an empty string to indicate a line which does
# not exist.
not_there = linecache.getline(filename, 500)
print '\nNOT THERE: "%s" includes %d characters' %  (not_there, len(not_there))

cleanup(filename)
$ python linecache_out_of_range.py


NOT THERE: "" includes 0 characters

The module never raises an exception, even if the file does not exist:

import linecache

# Errors are even hidden if linecache cannot find the file
no_such_file = linecache.getline('this_file_does_not_exist.txt', 1)
print '\nNO FILE: ', no_such_file
$ python linecache_missing_file.py


NO FILE:

Python Source

Since linecache is used so heavily when producing tracebacks, one of the key features is the ability to find Python source modules in the import path by specifying the base name of the module. The cache population code in linecache searches sys.path for the module if it cannot find the file directly.

import linecache

# Look for the linecache module, using
# the built in sys.path search.
module_line = linecache.getline('linecache.py', 3)
print '\nMODULE : ', module_line
$ python linecache_path_search.py


MODULE :  This is intended to read lines from modules imported -- hence if a filename

See also

linecache
The standard library documentation for this module.
http://www.ipsum.com/
Lorem Ipsum generator.
File Access
Other tools for working with files.