Data Compression and Archiving¶
Although modern computer systems have an ever increasing storage capacity, the growth of data being produced is unrelenting. Lossless compression algorithms make up for some of the shortfall in capacity by trading time spent compressing or decompressing data for the space needed to store it. Python includes interfaces to the most popular compression libraries so it can read and write files interchangeably.
zlib
and gzip
expose the GNU zip library, and bz2
provides access to the more recent bzip2 format. Both formats work on
streams of data, without regard to input format, and provide
interfaces for reading and writing compressed files transparently.
Use these modules for compressing a single file or data source.
The standard library also includes modules to manage archive
formats, for combining several files into a single file that can be
managed as a unit. tarfile
reads and writes the Unix tape
archive format, an old standard still widely used today because of its
flexibility. zipfile
works with archives based on the format
popularized by the PC program PKZIP, originally used under MS-DOS and
Windows, but now also used on other platforms because of the
simplicity of its API and portability of the format.