glob – Filename pattern matching

Purpose:Use Unix shell rules to fine filenames matching a pattern.
Available In:1.4

Even though the glob API is very simple, the module packs a lot of power. It is useful in any situation where your program needs to look for a list of files on the filesystem with names matching a pattern. If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, use glob instead of writing code to scan the directory contents yourself.

The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. There are only a few special characters: two different wild-cards, and character ranges are supported. The patterns rules are applied to segments of the filename (stopping at the path separator, /). Paths in the pattern can be relative or absolute. Shell variable names and tilde (~) are not expanded.

Example Data

The examples below assume the following test files are present in the current working directory:

$ python glob_maketestdata.py

dir
dir/file.txt
dir/file1.txt
dir/file2.txt
dir/filea.txt
dir/fileb.txt
dir/subdir
dir/subdir/subfile.txt

Note

Use glob_maketestdata.py in the sample code to create these files if you want to run the examples.

Wildcards

An asterisk (*) matches zero or more characters in a segment of a name. For example, dir/*.

import glob
for name in glob.glob('dir/*'):
    print name

The pattern matches every pathname (file or directory) in the directory dir, without recursing further into subdirectories.

$ python glob_asterisk.py

dir/file.txt
dir/file1.txt
dir/file2.txt
dir/filea.txt
dir/fileb.txt
dir/subdir

To list files in a subdirectory, you must include the subdirectory in the pattern:

import glob

print 'Named explicitly:'
for name in glob.glob('dir/subdir/*'):
    print '\t', name

print 'Named with wildcard:'
for name in glob.glob('dir/*/*'):
    print '\t', name

The first case above lists the subdirectory name explicitly, while the second case depends on a wildcard to find the directory.

$ python glob_subdir.py

Named explicitly:
        dir/subdir/subfile.txt
Named with wildcard:
        dir/subdir/subfile.txt

The results, in this case, are the same. If there was another subdirectory, the wildcard would match both subdirectories and include the filenames from both.

Single Character Wildcard

The other wildcard character supported is the question mark (?). It matches any single character in that position in the name. For example,

import glob

for name in glob.glob('dir/file?.txt'):
    print name

Matches all of the filenames which begin with “file”, have one more character of any type, then end with ”.txt”.

$ python glob_question.py

dir/file1.txt
dir/file2.txt
dir/filea.txt
dir/fileb.txt

Character Ranges

When you need to match a specific character, use a character range instead of a question mark. For example, to find all of the files which have a digit in the name before the extension:

import glob
for name in glob.glob('dir/*[0-9].*'):
    print name

The character range [0-9] matches any single digit. The range is ordered based on the character code for each letter/digit, and the dash indicates an unbroken range of sequential characters. The same range value could be written [0123456789].

$ python glob_charrange.py

dir/file1.txt
dir/file2.txt

See also

glob
The standard library documentation for this module.
Pattern Matching Notation
An explanation of globbing from The Open Group’s Shell Command Language specification.
fnmatch
Filename matching implementation.
File Access
Other tools for working with files.