Joining multiple XML files with Python and libxml2

I found myself with the job of joining two XML documents. Should have been trivial but I don’t live and breathe XML so after false starts with XSLT and XInclude I tried with python.

Pretty straightforward, apart from finding some documentation for the libxml2 bindings.

 

# Take a bunch of XML files on the command line and merge them into
# one big XML document.
#
# The root element will come from the first document; the root elements of
# subsequent documents will be lost, as will anything outside the root
# (comments and whatnot).

import sys
import libxml2

doc = None
root = None

for i in range(1, len(sys.argv)):

    newdoc = libxml2.parseFile(sys.argv[i])
    newroot = newdoc.getRootElement()

    if newroot:
        if not root:
            # first document with a root element
            doc = newdoc
            root = newroot
        else:
            # merge this into previous document
            root.addChildList(newroot.children.copyNodeList())
            newdoc.freeDoc()

if doc:
    print doc
    doc.freeDoc()

Advertisements
2 comments
  1. dung le said:

    hi there,

    i am very new to python and pydev IDE. how can i add libxml2 package to project so that i can successfully do “import libxml2” as in your code?

    thanks alot.

    Dunn.

    • Sorry, I’m not familiar with pydev. On my system (Fedora Linux), libxml2 is available without me having installed it. Perhaps the availability of the python bindings is a side effect of installing the Fedora libxml2 packages?

      The homepage for libxml2 is here: http://xmlsoft.org/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: