tarfile – Tar archive access¶
Purpose: | Tar archive access. |
---|---|
Available In: | 2.3 and later |
The tarfile module provides read and write access to UNIX tar archives, including compressed files. In addition to the POSIX standards, several GNU tar extensions are supported. Various UNIX special file types (hard and soft links, device nodes, etc.) are also handled.
Testing Tar Files¶
The is_tarfile() function returns a boolean indicating whether or not the filename passed as an argument refers to a valid tar file.
import tarfile
for filename in [ 'README.txt', 'example.tar',
'bad_example.tar', 'notthere.tar' ]:
try:
print '%20s %s' % (filename, tarfile.is_tarfile(filename))
except IOError, err:
print '%20s %s' % (filename, err)
If the file does not exist, is_tarfile() raises an IOError.
$ python tarfile_is_tarfile.py
README.txt False
example.tar True
bad_example.tar False
notthere.tar [Errno 2] No such file or directory: 'notthere.tar'
Reading Meta-data from an Archive¶
Use the TarFile class to work directly with a tar archive. It supports methods for reading data about existing archives as well as modifying the archives by adding additional files.
To read the names of the files in an existing archive, use getnames():
import tarfile
t = tarfile.open('example.tar', 'r')
print t.getnames()
The return value is a list of strings with the names of the archive contents:
$ python tarfile_getnames.py
['README.txt']
In addition to names, meta-data about the archive members is available as instances of TarInfo objects. Load the meta-data via getmembers() and getmember().
import tarfile
import time
t = tarfile.open('example.tar', 'r')
for member_info in t.getmembers():
print member_info.name
print '\tModified:\t', time.ctime(member_info.mtime)
print '\tMode :\t', oct(member_info.mode)
print '\tType :\t', member_info.type
print '\tSize :\t', member_info.size, 'bytes'
print
$ python tarfile_getmembers.py
README.txt
Modified: Sun Feb 22 11:13:55 2009
Mode : 0644
Type : 0
Size : 75 bytes
If you know in advance the name of the archive member, you can retrieve its TarInfo object with getmember().
import tarfile
import time
t = tarfile.open('example.tar', 'r')
for filename in [ 'README.txt', 'notthere.txt' ]:
try:
info = t.getmember(filename)
except KeyError:
print 'ERROR: Did not find %s in tar archive' % filename
else:
print '%s is %d bytes' % (info.name, info.size)
If the archive member is not present, getmember() raises a KeyError.
$ python tarfile_getmember.py
README.txt is 75 bytes
ERROR: Did not find notthere.txt in tar archive
Extracting Files From an Archive¶
To access the data from an archive member within your program, use the extractfile() method, passing the member’s name.
import tarfile
t = tarfile.open('example.tar', 'r')
for filename in [ 'README.txt', 'notthere.txt' ]:
try:
f = t.extractfile(filename)
except KeyError:
print 'ERROR: Did not find %s in tar archive' % filename
else:
print filename, ':', f.read()
$ python tarfile_extractfile.py
README.txt : The examples for the tarfile module use this file and example.tar as data.
ERROR: Did not find notthere.txt in tar archive
If you just want to unpack the archive and write the files to the filesystem, use extract() or extractall() instead.
import tarfile
import os
os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extract('README.txt', 'outdir')
print os.listdir('outdir')
$ python tarfile_extract.py
['README.txt']
Note
The standard library documentation includes a note stating that extractall() is safer than extract(), and it should be used in most cases.
import tarfile
import os
os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir')
print os.listdir('outdir')
$ python tarfile_extractall.py
['README.txt']
If you only want to extract certain files from the archive, their names can be passed to extractall().
import tarfile
import os
os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir', members=[t.getmember('README.txt')])
print os.listdir('outdir')
$ python tarfile_extractall_members.py
['README.txt']
Creating New Archives¶
To create a new archive, simply open the TarFile with a mode of 'w'. Any existing file is truncated and a new archive is started. To add files, use the add() method.
import tarfile
print 'creating archive'
out = tarfile.open('tarfile_add.tar', mode='w')
try:
print 'adding README.txt'
out.add('README.txt')
finally:
print 'closing'
out.close()
print
print 'Contents:'
t = tarfile.open('tarfile_add.tar', 'r')
for member_info in t.getmembers():
print member_info.name
$ python tarfile_add.py
creating archive
adding README.txt
closing
Contents:
README.txt
Using Alternate Archive Member Names¶
It is possible to add a file to an archive using a name other than the original file name, by constructing a TarInfo object with an alternate arcname and passing it to addfile().
import tarfile
print 'creating archive'
out = tarfile.open('tarfile_addfile.tar', mode='w')
try:
print 'adding README.txt as RENAMED.txt'
info = out.gettarinfo('README.txt', arcname='RENAMED.txt')
out.addfile(info)
finally:
print 'closing'
out.close()
print
print 'Contents:'
t = tarfile.open('tarfile_addfile.tar', 'r')
for member_info in t.getmembers():
print member_info.name
The archive includes only the changed filename:
$ python tarfile_addfile.py
creating archive
adding README.txt as RENAMED.txt
closing
Contents:
RENAMED.txt
Writing Data from Sources Other Than Files¶
Sometimes you want to write data to an archive but the data is not in a file on the filesystem. Rather than writing the data to a file, then adding that file to the archive, you can use addfile() to add data from an open file-like handle.
import tarfile
from cStringIO import StringIO
data = 'This is the data to write to the archive.'
out = tarfile.open('tarfile_addfile_string.tar', mode='w')
try:
info = tarfile.TarInfo('made_up_file.txt')
info.size = len(data)
out.addfile(info, StringIO(data))
finally:
out.close()
print
print 'Contents:'
t = tarfile.open('tarfile_addfile_string.tar', 'r')
for member_info in t.getmembers():
print member_info.name
f = t.extractfile(member_info)
print f.read()
By first constructing a TarInfo object ourselves, we can give the archive member any name we wish. After setting the size, we can write the data to the archive using addfile() and passing a StringIO buffer as a source of the data.
$ python tarfile_addfile_string.py
Contents:
made_up_file.txt
This is the data to write to the archive.
Appending to Archives¶
In addition to creating new archives, it is possible to append to an existing file. To open a file to append to it, use mode 'a'.
import tarfile
print 'creating archive'
out = tarfile.open('tarfile_append.tar', mode='w')
try:
out.add('README.txt')
finally:
out.close()
print 'contents:', [m.name
for m in tarfile.open('tarfile_append.tar', 'r').getmembers()]
print 'adding index.rst'
out = tarfile.open('tarfile_append.tar', mode='a')
try:
out.add('index.rst')
finally:
out.close()
print 'contents:', [m.name
for m in tarfile.open('tarfile_append.tar', 'r').getmembers()]
The resulting archive ends up with two members:
$ python tarfile_append.py
creating archive
contents: ['README.txt']
adding index.rst
contents: ['README.txt', 'index.rst']
Working with Compressed Archives¶
Besides regular tar archive files, the tarfile module can work with archives compressed via the gzip or bzip2 protocols. To open a compressed archive, modify the mode string passed to open() to include ":gz" or ":bz2", depending on the compression method you want to use.
import tarfile
import os
fmt = '%-30s %-10s'
print fmt % ('FILENAME', 'SIZE')
print fmt % ('README.txt', os.stat('README.txt').st_size)
for filename, write_mode in [
('tarfile_compression.tar', 'w'),
('tarfile_compression.tar.gz', 'w:gz'),
('tarfile_compression.tar.bz2', 'w:bz2'),
]:
out = tarfile.open(filename, mode=write_mode)
try:
out.add('README.txt')
finally:
out.close()
print fmt % (filename, os.stat(filename).st_size),
print [m.name for m in tarfile.open(filename, 'r:*').getmembers()]
When opening an existing archive for reading, you can specify "r:*" to have tarfile determine the compression method to use automatically.
$ python tarfile_compression.py
FILENAME SIZE
README.txt 75
tarfile_compression.tar 10240 ['README.txt']
tarfile_compression.tar.gz 211 ['README.txt']
tarfile_compression.tar.bz2 188 ['README.txt']
See also
- tarfile
- The standard library documentation for this module.
- GNU tar manual
- Documentation of the tar format, including extensions.
- zipfile
- Similar access for ZIP archives.
- gzip
- GNU zip compression
- bz2
- bzip2 compression