Creating XML Documents

In addition to its parsing capabilities, xml.etree.ElementTree also supports creating well-formed XML documents from Element objects constructed in an application. The Element class used when a document is parsed also knows how to generate a serialized form of its contents, which can then be written to a file or other data stream.

Building Element Nodes

There are three helper functions useful for creating a hierarchy of Element nodes. Element() creates a standard node, SubElement() attaches a new node to a parent, and Comment() creates a node that serializes using XML’s comment syntax.

from xml.etree.ElementTree import Element, SubElement, Comment, tostring

top = Element('top')

comment = Comment('Generated for PyMOTW')
top.append(comment)

child = SubElement(top, 'child')
child.text = 'This child contains text.'

child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'

child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'

print tostring(top)

The output contains only the XML nodes in the tree, not the XML declaration with version and encoding.

$ python ElementTree_create.py

<top><!--Generated for PyMOTW--><child>This child contains text.</ch
ild><child_with_tail>This child has regular text.</child_with_tail>A
nd "tail" text.<child_with_entity_ref>This &amp; that</child_with_en
tity_ref></top>

The & character in the text of child_with_entity_ref is converted to the entity reference &amp; automatically.

Pretty-Printing XML

ElementTree makes no effort to “pretty print” the output produced by tostring(), since adding extra whitespace changes the contents of the document. To make the output easier to follow for human readers, the rest of the examples below will use a tip I found online and re-parse the XML with xml.dom.minidom then use its toprettyxml() method.

from xml.etree import ElementTree
from xml.dom import minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

The updated example now looks like:

from xml.etree.ElementTree import Element, SubElement, Comment
from ElementTree_pretty import prettify

top = Element('top')

comment = Comment('Generated for PyMOTW')
top.append(comment)

child = SubElement(top, 'child')
child.text = 'This child contains text.'

child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'

child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'

print prettify(top)

and the output is easier to read:

$ python ElementTree_create_pretty.py

<?xml version="1.0" ?>
<top>
  <!--Generated for PyMOTW-->
  <child>
    This child contains text.
  </child>
  <child_with_tail>
    This child has regular text.
  </child_with_tail>
  And &quot;tail&quot; text.
  <child_with_entity_ref>
    This &amp; that
  </child_with_entity_ref>
</top>

In addition to the extra whitespace for formatting, the xml.dom.minidom pretty-printer also adds an XML declaration to the output.

Setting Element Properties

The previous example created nodes with tags and text content, but did not set any attributes of the nodes. Many of the examples from Parsing XML Documents worked with an OPML file listing podcasts and their feeds. The outline nodes in the tree used attributes for the group names and podcast properties. ElementTree can be used to construct a similar XML file from a CSV input file, setting all of the element attributes as the tree is constructed.

import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
import datetime
from ElementTree_pretty import prettify

generated_on = str(datetime.datetime.now())

# Configure one attribute with set()
root = Element('opml')
root.set('version', '1.0')

root.append(Comment('Generated by ElementTree_csv_to_xml.py for PyMOTW'))

head = SubElement(root, 'head')
title = SubElement(head, 'title')
title.text = 'My Podcasts'
dc = SubElement(head, 'dateCreated')
dc.text = generated_on
dm = SubElement(head, 'dateModified')
dm.text = generated_on

body = SubElement(root, 'body')

with open('podcasts.csv', 'rt') as f:
    current_group = None
    reader = csv.reader(f)
    for row in reader:
        group_name, podcast_name, xml_url, html_url = row
        if current_group is None or group_name != current_group.text:
            # Start a new group
            current_group = SubElement(body, 'outline', {'text':group_name})
        # Add this podcast to the group,
        # setting all of its attributes at
        # once.
        podcast = SubElement(current_group, 'outline',
                             {'text':podcast_name,
                              'xmlUrl':xml_url,
                              'htmlUrl':html_url,
                              })

print prettify(root)

The attribute values can be configured one at a time with set() (as with the root node), or all at once by passing a dictionary to the node factory (as with each group and podcast node).

$ python ElementTree_csv_to_xml.py

<?xml version="1.0" ?>
<opml version="1.0">
  <!--Generated by ElementTree_csv_to_xml.py for PyMOTW-->
  <head>
    <title>
      My Podcasts
    </title>
    <dateCreated>
      2013-02-21 06:38:01.494066
    </dateCreated>
    <dateModified>
      2013-02-21 06:38:01.494066
    </dateModified>
  </head>
  <body>
    <outline text="Science and Tech">
      <outline htmlUrl="http://www.publicradio.org/columns/futureten
se/" text="APM: Future Tense" xmlUrl="http://www.publicradio.org/col
umns/futuretense/podcast.xml"/>
    </outline>
    <outline text="Science and Tech">
      <outline htmlUrl="http://www.uh.edu/engines/engines.htm" text=
"Engines Of Our Ingenuity Podcast" xmlUrl="http://www.npr.org/rss/po
dcast.php?id=510030"/>
    </outline>
    <outline text="Science and Tech">
      <outline htmlUrl="http://www.nyas.org/WhatWeDo/SciencetheCity.
aspx" text="Science &amp; the City" xmlUrl="http://www.nyas.org/Podc
asts/Atom.axd"/>
    </outline>
    <outline text="Books and Fiction">
      <outline htmlUrl="http://www.podiobooks.com/blog" text="Podiob
ooker" xmlUrl="http://feeds.feedburner.com/podiobooks"/>
    </outline>
    <outline text="Books and Fiction">
      <outline htmlUrl="http://web.me.com/normsherman/Site/Podcast/P
odcast.html" text="The Drabblecast" xmlUrl="http://web.me.com/normsh
erman/Site/Podcast/rss.xml"/>
    </outline>
    <outline text="Books and Fiction">
      <outline htmlUrl="http://www.tor.com/" text="tor.com / categor
y / tordotstories" xmlUrl="http://www.tor.com/rss/category/TorDotSto
ries"/>
    </outline>
    <outline text="Computers and Programming">
      <outline htmlUrl="http://twit.tv/mbw" text="MacBreak Weekly" x
mlUrl="http://leo.am/podcasts/mbw"/>
    </outline>
    <outline text="Computers and Programming">
      <outline htmlUrl="http://twit.tv" text="FLOSS Weekly" xmlUrl="
http://leo.am/podcasts/floss"/>
    </outline>
    <outline text="Computers and Programming">
      <outline htmlUrl="http://www.coreint.org/" text="Core Intuitio
n" xmlUrl="http://www.coreint.org/podcast.xml"/>
    </outline>
    <outline text="Python">
      <outline htmlUrl="http://advocacy.python.org/podcasts/" text="
PyCon Podcast" xmlUrl="http://advocacy.python.org/podcasts/pycon.rss
"/>
    </outline>
    <outline text="Python">
      <outline htmlUrl="http://advocacy.python.org/podcasts/" text="
A Little Bit of Python" xmlUrl="http://advocacy.python.org/podcasts/
littlebit.rss"/>
    </outline>
    <outline text="Python">
      <outline htmlUrl="" text="Django Dose Everything Feed" xmlUrl=
"http://djangodose.com/everything/feed/"/>
    </outline>
    <outline text="Miscelaneous">
      <outline htmlUrl="http://www.castsampler.com/users/dhellmann/"
 text="dhellmann's CastSampler Feed" xmlUrl="http://www.castsampler.
com/cast/feed/rss/dhellmann/"/>
    </outline>
  </body>
</opml>

Building Trees from Lists of Nodes

Multiple children can be added to an Element instance with the extend() method. The argument to extend() is any iterable, including a list or another Element instance.

from xml.etree.ElementTree import Element, tostring
from ElementTree_pretty import prettify

top = Element('top')

children = [
    Element('child', num=str(i))
    for i in xrange(3)
    ]

top.extend(children)

print prettify(top)

When a list is given, the nodes in the list are added directly to the new parent.

$ python ElementTree_extend.py

<?xml version="1.0" ?>
<top>
  <child num="0"/>
  <child num="1"/>
  <child num="2"/>
</top>

When another Element instance is given, the children of that node are added to the new parent.

from xml.etree.ElementTree import Element, SubElement, tostring, XML
from ElementTree_pretty import prettify

top = Element('top')

parent = SubElement(top, 'parent')

children = XML('''<root><child num="0" /><child num="1" /><child num="2" /></root> ''')
parent.extend(children)

print prettify(top)

In this case, the node with tag root created by parsing the XML string has three children, which are added to the parent node. The root node is not part of the output tree.

$ python ElementTree_extend_node.py

<?xml version="1.0" ?>
<top>
  <parent>
    <child num="0"/>
    <child num="1"/>
    <child num="2"/>
  </parent>
</top>

It is important to understand that extend() does not modify any existing parent-child relationships with the nodes. If the values passed to extend exist somewhere in the tree already, they will still be there, and will be repeated in the output.

from xml.etree.ElementTree import Element, SubElement, tostring, XML
from ElementTree_pretty import prettify

top = Element('top')

parent_a = SubElement(top, 'parent', id='A')
parent_b = SubElement(top, 'parent', id='B')

# Create children
children = XML('''<root><child num="0" /><child num="1" /><child num="2" /></root> ''')

# Set the id to the Python object id of the node to make duplicates
# easier to spot.
for c in children:
    c.set('id', str(id(c)))

# Add to first parent
parent_a.extend(children)

print 'A:'
print prettify(top)
print

# Copy nodes to second parent
parent_b.extend(children)

print 'B:'
print prettify(top)
print

Setting the id attribute of these children to the Python unique object identifier exposes the fact that the same node objects appear in the output tree more than once.

$ python ElementTree_extend_node_copy.py

A:
<?xml version="1.0" ?>
<top>
  <parent id="A">
    <child id="4300110224" num="0"/>
    <child id="4300110288" num="1"/>
    <child id="4300110480" num="2"/>
  </parent>
  <parent id="B"/>
</top>


B:
<?xml version="1.0" ?>
<top>
  <parent id="A">
    <child id="4300110224" num="0"/>
    <child id="4300110288" num="1"/>
    <child id="4300110480" num="2"/>
  </parent>
  <parent id="B">
    <child id="4300110224" num="0"/>
    <child id="4300110288" num="1"/>
    <child id="4300110480" num="2"/>
  </parent>
</top>

Serializing XML to a Stream

tostring() is implemented to write to an in-memory file-like object and then return a string representing the entire element tree. When working with large amounts of data, it will take less memory and make more efficient use of the I/O libraries to write directly to a file handle using the write() method of ElementTree.

import sys
from xml.etree.ElementTree import Element, SubElement, Comment, ElementTree

top = Element('top')

comment = Comment('Generated for PyMOTW')
top.append(comment)

child = SubElement(top, 'child')
child.text = 'This child contains text.'

child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'

child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'

empty_child = SubElement(top, 'empty_child')

ElementTree(top).write(sys.stdout)

The example uses sys.stdout to write to the console, but it could also write to an open file or socket.

$ python ElementTree_write.py

<top><!--Generated for PyMOTW--><child>This child contains text.</ch
ild><child_with_tail>This child has regular text.</child_with_tail>A
nd "tail" text.<child_with_entity_ref>This &amp; that</child_with_en
tity_ref><empty_child /></top>

The last node in the tree contains no text or sub-nodes, so it is written as an empty tag, <empty_child />. write() takes a method argument to control the handling for empty nodes.

import sys
from xml.etree.ElementTree import Element, SubElement, ElementTree

top = Element('top')

child = SubElement(top, 'child')
child.text = 'This child contains text.'

empty_child = SubElement(top, 'empty_child')

for method in [ 'xml', 'html', 'text' ]:
    print method
    ElementTree(top).write(sys.stdout, method=method)
    print '\n'
    

Three methods are supported:

xml
The default method, produces <empty_child />.
html
Produce the tag pair, as is required in HTML documents (<empty_child></empty_child>).
text
Prints only the text of nodes, and skips empty tags entirely.
$ python ElementTree_write_method.py

xml
<top><child>This child contains text.</child><empty_child /></top>

html
<top><child>This child contains text.</child><empty_child></empty_child></top>

text
This child contains text.

See also

Outline Processor Markup Language, OPML
Dave Winer’s OPML specification and documentation.