Selecting specific XML elements

I have a pretty big XML file structured with the TEI specification and I’ve been charged with the responsibility of basically migrating the XML into a Drupal installation but in order to do this, I need to extract only specific elements of the XML and simultaneously output these elements into a subsequent XML file for eventual use within Drupal.

Drupal aside for a second, the basic XML structure is as follows:

<TEI.2>
	<teiHeader>
		<text>
			<front>
				<titlePage></titlePage>
				<pb/>
				<div1 tyle="section" n="1" org="uniform" sample="complete">
					<head></head>
					<pb n="xxi"/>
					<p></p>
					<p></p>
					<p></p>
					<p></p>
					<p></p>
					<closer></closer>
				</div1>
				<div1>
				...
				</div1>
			</front>
			<body>
				<div1>
					<head></head>
					<div2>
					...

With that structure above, how could I extract (parse?) for only head and the paragraphs inside the div1 elements and put this all into another XML file?

I tried to parse for what I wanted using the following…

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

	<xsl:template match="/">
		<head><xsl:apply-templates select="TEI.2/text/front/head"/></head>
		<p><xsl:apply-templates select="TEI.2/text/front/div1/p"/></p>
	</xsl:template>

</xsl:stylesheet>

…and viewing the original XML document in Firefox (after including / applying the XSL above) to save what I had as another XML file, but this didn’t work like I thought it would because it seems as if Firefox gets confused about how I’m trying to save the file (not to mention that it includes something in the markup–transformiix:result–which is something I’m unfamiliar with; it’s also excluding the head elements). Overall point here is that my approach with saving inside Firefox doesn’t seem to be the CORRECT approach and I’m just not that sharp with XML yet. I can do PHP, JavaScript, and CSS, but XML is still a bit of a learning curve for moi’.

So any help from you guys is appreciated as always. How can I parse for only certain elements and save the output as a straight up XML file?

You can have a look at SimpleXML since you know PHP.

To get the head element you would do something like this:


$xml->TEI.2->text->front->head

At this point you can save everything to a text file with PHP functions like fwrite :slight_smile: