The File System¶
Python’s standard library includes a large range of tools for working with files on the file system, building and parsing filenames, and examining file contents.
The first step in working with files is to determine the name of the
file to work on. Python represents filenames as simple strings, but
provides tools for building them from standard, platform-independent,
After the name of the file is identified, other characteristics, such
as permissions or the file size, can be checked using
os.stat() and the constants in
When an application needs random access to files,
makes it easy to read lines by their line number. The contents of the
file are maintained in a cache, so be careful of memory consumption.
tempfile is useful for cases that need to create scratch files
to hold data temporarily, or before moving it to a permanent location.
It provides classes to create temporary files and directories safely
and securely. Names are guaranteed to be unique, and include random
components so they are not easily guessable.
Frequently, programs need to work on files as a whole, without regard
to their content. The
shutil module includes high-level file
operations such as copying files and directories, and creating or
extracting archives of files.
filecmp module compares files and directories by looking at
the bytes they contain, but without any special knowledge about their
file class can be used to read and write files
visible on local file systems. A program’s performance can suffer when
it accesses large files through the
interfaces, though, since they both involve copying the data
multiple times as it is moved from the disk to memory the application
can see. Using
mmap tells the operating system to use its virtual
memory subsystem to map a file’s contents directly into memory accessible
by a program, avoiding a copy step between the operating system and the
internal buffer for the
Text data using characters not available in ASCII is usually saved in
a Unicode data format. Since the standard
assumes each byte of a text file represents one character, reading
Unicode text with multi-byte encodings requires extra processing. The
codecs module handles the encoding and decoding automatically,
so that in many cases a non-ASCII file can be used without any other
changes to the program.
io module provides access to the classes used to implement
Python’s file-based input and output. For testing code that depends
on reading or writing data from files,
io provides an in-memory
stream object that behaves like a file, but does not reside on disk.
- os.path — Platform-independent Manipulation of Filenames
- pathlib — Filesystem Paths as Objects
- glob — Filename Pattern Matching
- fnmatch — Unix-style Glob Pattern Matching
- linecache — Read Text Files Efficiently
- tempfile — Temporary File System Objects
- shutil — High-level File Operations
- filecmp — Compare Files
- mmap — Memory-map Files
- codecs — String Encoding and Decoding
- io — Text, Binary, and Raw Stream I/O Tools