hmac – Cryptographic signature and verification of messages.

Purpose:The hmac module implements keyed-hashing for message authentication, as described in RFC 2104.
Available In:2.2

The HMAC algorithm can be used to verify the integrity of information passed between applications or stored in a potentially vulnerable location. The basic idea is to generate a cryptographic hash of the actual data combined with a shared secret key. The resulting hash can then be used to check the transmitted or stored message to determine a level of trust, without transmitting the secret key.

Disclaimer: I’m not a security expert. For the full details on HMAC, check out RFC 2104.

Example

Creating the hash is not complex. Here’s a simple example which uses the default MD5 hash algorithm:

import hmac

digest_maker = hmac.new('secret-shared-key-goes-here')

f = open('lorem.txt', 'rb')
try:
    while True:
        block = f.read(1024)
        if not block:
            break
        digest_maker.update(block)
finally:
    f.close()

digest = digest_maker.hexdigest()
print digest

When run, the code reads its source file and computes an HMAC signature for it:

$ python hmac_simple.py

4bcb287e284f8c21e87e14ba2dc40b16

Note

If I haven’t changed the file by the time I release the example source for this week, the copy you download should produce the same hash.

SHA vs. MD5

Although the default cryptographic algorithm for hmac is MD5, that is not the most secure method to use. MD5 hashes have some weaknesses, such as collisions (where two different messages produce the same hash). The SHA-1 algorithm is considered to be stronger, and should be used instead.

import hmac
import hashlib

digest_maker = hmac.new('secret-shared-key-goes-here', '', hashlib.sha1)

f = open('hmac_sha.py', 'rb')
try:
    while True:
        block = f.read(1024)
        if not block:
            break
        digest_maker.update(block)
finally:
    f.close()

digest = digest_maker.hexdigest()
print digest

hmac.new() takes 3 arguments. The first is the secret key, which should be shared between the two endpoints which are communicating so both ends can use the same value. The second value is an initial message. If the message content that needs to be authenticated is small, such as a timestamp or HTTP POST, the entire body of the message can be passed to new() instead of using the update() method. The last argument is the digest module to be used. The default is hashlib.md5. The previous example substitutes hashlib.sha1.

$ python hmac_sha.py

69b26d1731a0a5f0fc7a92fc6c540823ec210759

Binary Digests

The first few examples used the hexdigest() method to produce printable digests. The hexdigest is is a different representation of the value calculated by the digest() method, which is a binary value that may include unprintable or non-ASCII characters, including NULs. Some web services (Google checkout, Amazon S3) use the base64 encoded version of the binary digest instead of the hexdigest.

import base64
import hmac
import hashlib

f = open('lorem.txt', 'rb')
try:
    body = f.read()
finally:
    f.close()

digest = hmac.new('secret-shared-key-goes-here', body, hashlib.sha1).digest()
print base64.encodestring(digest)

The base64 encoded string ends in a newline, which frequently needs to be stripped off when embedding the string in HTTP headers or other formatting-sensitive contexts.

$ python hmac_base64.py

olW2DoXHGJEKGU0aE9fOwSVE/o4=

Applications

HMAC authentication should be used for any public network service, and any time data is stored where security is important. For example, when sending data through a pipe or socket, that data should be signed and then the signature should be tested before the data is used. The extended example below is available in the hmac_pickle.py file as part of the PyMOTW source package.

First, let’s establish a function to calculate a digest for a string, and a simple class to be instantiated and passed through a communication channel.

import hashlib
import hmac
try:
    import cPickle as pickle
except:
    import pickle
import pprint
from StringIO import StringIO


def make_digest(message):
    "Return a digest for the message."
    return hmac.new('secret-shared-key-goes-here', message, hashlib.sha1).hexdigest()


class SimpleObject(object):
    "A very simple class to demonstrate checking digests before unpickling."
    def __init__(self, name):
        self.name = name
    def __str__(self):
        return self.name

Next, create a StringIO buffer to represent the socket or pipe. We will using a naive, but easy to parse, format for the data stream. The digest and length of the data are written, followed by a new line. The serialized representation of the object, generated by pickle, follows. In a real system, we would not want to depend on a length value, since if the digest is wrong the length is probably wrong as well. Some sort of terminator sequence not likely to appear in the real data would be more appropriate.

For this example, we will write two objects to the stream. The first is written using the correct digest value.

# Simulate a writable socket or pipe with StringIO
out_s = StringIO()

# Write a valid object to the stream:
#  digest\nlength\npickle
o = SimpleObject('digest matches')
pickled_data = pickle.dumps(o)
digest = make_digest(pickled_data)
header = '%s %s' % (digest, len(pickled_data))
print '\nWRITING:', header
out_s.write(header + '\n')
out_s.write(pickled_data)

The second object is written to the stream with an invalid digest, produced by calculating the digest for some other data instead of the pickle.

# Write an invalid object to the stream
o = SimpleObject('digest does not match')
pickled_data = pickle.dumps(o)
digest = make_digest('not the pickled data at all')
header = '%s %s' % (digest, len(pickled_data))
print '\nWRITING:', header
out_s.write(header + '\n')
out_s.write(pickled_data)

out_s.flush()

Now that the data is in the StringIO buffer, we can read it back out again. The first step is to read the line of data with the digest and data length. Then the remaining data is read (using the length value). We could use pickle.load() to read directly from the stream, but that assumes a trusted data stream and we do not yet trust the data enough to unpickle it. Reading the pickle as a string collect the data from the stream, without actually unpickling the object.

# Simulate a readable socket or pipe with StringIO
in_s = StringIO(out_s.getvalue())

# Read the data
while True:
    first_line = in_s.readline()
    if not first_line:
        break
    incoming_digest, incoming_length = first_line.split(' ')
    incoming_length = int(incoming_length)
    print '\nREAD:', incoming_digest, incoming_length
    incoming_pickled_data = in_s.read(incoming_length)

Once we have the pickled data, we can recalculate the digest value and compare it against what we read. If the digests match, we know it is safe to trust the data and unpickle it.

actual_digest = make_digest(incoming_pickled_data)
print 'ACTUAL:', actual_digest

if incoming_digest != actual_digest:
    print 'WARNING: Data corruption'
else:
    obj = pickle.loads(incoming_pickled_data)
    print 'OK:', obj

The output shows that the first object is verified and the second is deemed “corrupted”, as expected:

$ python hmac_pickle.py


WRITING: 387632cfa3d18cd19bdfe72b61ac395dfcdc87c9 124

WRITING: b01b209e28d7e053408ebe23b90fe5c33bc6a0ec 131

READ: 387632cfa3d18cd19bdfe72b61ac395dfcdc87c9 124
ACTUAL: 387632cfa3d18cd19bdfe72b61ac395dfcdc87c9
OK: digest matches

READ: b01b209e28d7e053408ebe23b90fe5c33bc6a0ec 131
ACTUAL: dec53ca1ad3f4b657dd81d514f17f735628b6828
WARNING: Data corruption

See also

hmac
The standard library documentation for this module.
RFC 2104
HMAC: Keyed-Hashing for Message Authentication
hashlib
The hashlib module.
pickle
Serialization library.
WikiPedia: MD5
Description of the MD5 hashing algorithm.
Authenticating to Amazon S3 Web Service
Instructions for authenticating to S3 using HMAC-SHA1 signed credentials.