Often data is stored in XML files must be massaged into other formats (e.g. DocBook to roff). There are well developed procedures for defining such transformations (XSLT) and a number of tools to apply them (e.g. xsltproc).
Besides the W3 tutorial, there is also a nice introduction by Paul Grosso and Norman Walsh. I've copied a simple example from this intro and also included a slightly more complicated setup for generating online help for a list of macros.
XSLT is also useful for standardizing XML content. For example, I was
recently trying to compare to Gramps XML files, to see what had
changed between two database backups. Unfortunately, the backup XML
was not sorted by id
, so there were many diff chunks due to node
shuffling that didn't represent any useful information. With the
following XSLT:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- sort node children by their `id` attributes -->
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:for-each select="node()">
<xsl:sort select="@id" order="ascending"/>
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
With the above saved as sort-by-id.xsl
, you can sort some.xml
using
$ xsltproc --nonet --novalid sort-by-id.xsl some.xml
You can compare two Gramps XML files with
$ diff -u <(zcat a.gramps | xsltproc --nonet --novalid sort-by-id.xsl -)
<(zcat b.gramps | xsltproc --nonet --novalid sort-by-id.xsl -) | less
Jesper Tverskov has a nice page about the identity template and related tricks if you want more examples of quasi-copy transforms.