PHP5 DOM, XML, XPATH, SimpleXML Examples

hi all, i’m having a bear of a time working with the dom in order to manipulate xml files. the php5 functionality is relatively new and online tutorials are sorely lacking. the php5 documentation and examples are valuable, but often limited in scope.

so, i want to create a thread to share what i’ve learned (through blood sweat and tears - at least for the DOM stuff) and ask others to contribute their knowledge to help me, and others, learn more about DOM, XML, etc…

a one stop shop for php5 simplexml, dom, xpath, if you will. i’ll probably have to learn xslt eventually, so feel free to chime in with your expertise there.

i’m going to use a simplified version of php5’s simplexml sample xml here:

http://us2.php.net/manual/en/ref.simplexml.php

in this first post, i’ll show some exampls of using simplexml -the simplest way to get data out of an xml file (the downside is that it can’t write to the xml file). here’s an example of the syntaxt:


<?PHP

$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>
XML;

$xml = simplexml_load_string($xmlstr);

// pulls first movie reference.
$title = $xml->movie[0]->title;
// $title = $xml->movie[0]->title; will also pull the first [0] element value.

echo 'First Title: '.$title.'<br />';

// works with only one instance of title.
$title = $xml->movie[1]->title;

echo 'Second Title: '.$title.'<br />';
echo '<br />';

//you can loop through all movie values and pull the title:

echo 'The Loop Method: <br />';

foreach ($xml->movie as $movie) {
   echo $movie->title, '<br />';
}

?>

$xml = simplexml_load_string($xmlstr);

// pulls first movie reference.
$title = $xml->movie[0]->title;
// $title = $xml->movie[0]->title; will also pull the first [0] element value.

echo 'First Title: '.$title.'<br />';

// works with only one instance of title.
$title = $xml->movie[1]->title;

echo 'Second Title: '.$title.'<br />';
echo '<br />';

//you can loop through all movie values and pull the title:

echo 'The Loop Method: <br />';

foreach ($xml->movie as $movie) {
   echo $movie->title, '<br />';
}

?>


i use a forms class and stick $title as the value of my form. for some reason, it seems that this syntax outputs an object instead of a string. in order to get it to work right, i had to add the following after getting the value of $title using simplexml:



function to_string_trim($var){
// forms class inconsistency requires string conversion
// trim white space off beginning and end of xml data.  simplexml
// apparently doesn't have a method to eliminate this white space.
    $var = trim((string) $var);
    return $var;

}

$dns_3 = to_string_trim($dns_3);


the function casts to string (required for display in forms class or to compar to another sting) and trims the white space (which can cause problems, especially when comparing user input to the xml file data).

this covers the basics of pulling data out of an xml file using simplexml.

in the next post, we’ll take a look at incorporating xpath with simplexml.

the functionality we went over in the first post is limited. in order to improve the ability to access data, simplexml works with xpath. for a detailed description of how xpath queries work, check out this linke:

http://www.zvon.org/xxl/XPathTutorial/General/examples.html

there is a lot of detailed xpath information here - so don’t just skim the website. study it.

xpath add fine grained querying abilities to simplexml (and also DOM, but we aren’t there yet).

somethng to keep in mind as you work through this example code.

// - returns all elements in a document, regardless of path.
/ - starts at the root and works through a specific path.

here is sample code to get the two titles using simplexml and xpath:



<?PHP

$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>
XML;

$xml = simplexml_load_string($xmlstr);

echo '--List All Titles--<br /><br />';

// get title values

$title_query = '//movies/movie/title';

$title = $xml->xpath($title_query);

// $title is an array of values
// cast object to string to prevent future type problems
$title_1 = (string) $title[0];
echo 'Title 1: '.$title_1;

echo '<br />';

$title_2 = (string) $title[1];
echo 'Title 2: '.$title_2;

// ------------- List All Ratings in movies/movie/ ---------------

echo '<br /><br />--List All Ratings in movies/movie/ Path--<br /><br />';

// get rating

$rating_query = '//movies/movie/rating';

$rating = $xml->xpath($rating_query);

$rating_1 = (string) $rating[0];
echo 'Rating 1: '.$rating_1;

echo '<br />';

$rating_2 = (string) $rating[1];
echo 'Rating 2: '.$rating_2;

// ------------------------ List All Types ---------------------------

echo '<br /><br />--List All Types--<br /><br />';

// get ratings by type

$rating_query = '//rating[@type]';

$rating = $xml->xpath($rating_query);

$rating_1 = (string) $rating[0];
echo 'Rating 1: '.$rating_1;

echo '<br />';

$rating_2 = (string) $rating[1];
echo 'Rating 2: '.$rating_2;

// ------------------ Specific Rating By Type Value --------------------

echo '<br /><br />--Specific Rating By Type Value--<br /><br />';

// get ratings by type

$rating_query = '//rating[@type=\\'thumbs\\']';

$rating = $xml->xpath($rating_query);

$rating_1 = (string) $rating[0];
echo 'Rating (thumbs): '.$rating_1;

$rating_query = '//rating[@type=\\'stars\\']';

echo '<br />';

$rating = $xml->xpath($rating_query);

$rating_1 = (string) $rating[0];
echo 'Rating (stars): '.$rating_1;

// ------------------ List Titles By Looping --------------------

echo '<br /><br />--List Titles By Looping--<br /><br />';

// get ratings by type

$rating_query = '//rating';


foreach ($xml->movie as $movie) {
   echo $movie->title, '<br />';
}

?>


in the next post, we will use the DOM to get the same data.

the following is code to do similar tasks using DOM and XPath.

again, i cast to string immediately to avoid any future compatibility issues.



<?PHP

$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>
XML;

// --------------------- Using DOM and XPath ------------------------

$doc = new DOMDocument;
//$doc->preserveWhiteSpace = false;

$doc = DOMDocument::loadXML($xmlstr);

// $doc->Load(esternal_file_name.xml'); // use if loading separate xml file

$xpath = new DOMXPath($doc);

// get title values

echo '--List All Titles, One By One--<br /><br />';

$query = '//movies/movie/title';
$query = $xpath->query($query);
// find first item
$title_1 = $query->item(0)->nodeValue;
$title_2 = $query->item(1)->nodeValue;
$title_1 = (string) $title_1;
$title_2 = (string) $title_2;
echo 'Title 1: '.$title_1.'<br />';
echo 'Title 2: '.$title_2;

// get title values usig a loop

echo '<br /><br />';

echo '--Loop Through All Titles--<br /><br />';

$titles = $doc->getElementsByTagName('title');
foreach ($titles as $title) {
print (string) $title->firstChild->nodeValue . '<br />';
}

?>


i’m trying to work through how to use the DOM to count elements.

using simplexml, it is fairly simple. see the code below for some examples:



<?PHP

$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>
XML;

$xml = simplexml_load_string($xmlstr);

$nodes = count($xml->xpath("//rating[@type = 'thumbs']"));

echo 'number \\'thumbs\\' nodes: '.$nodes.'<br />';

$nodes = count($xml->xpath("//rating[@type = 'stars']"));

echo 'number \\'stars\\' nodes: '.$nodes.'<br />';

$nodes = count($xml->xpath("//rating"));

echo 'number rating nodes: '.$nodes.'<br />';

$nodes = count($xml->xpath("//title"));

echo 'number title nodes: '.$nodes.'<br />';

// notice the absolute xpath

$nodes = count($xml->xpath("/movies/movie/title"));

echo 'number title nodes (follow path): '.$nodes.'<br />';

?>


hopefully, i’ve been able to help some newbies learn a little about PHP5’s XML, DOM, XPath and SimpleXML.

however, there is a lot i don’t know.

  1. i don’t understand how to count elements using the DOM.
  2. i don’t know how to use the DOM to update existing node text in the xml file (this may require a separate file instead of loading a string. it definitely requires the DOM, SimpleXML won’t do it).
  3. i don’t know how to delete nodes, node text and insert nodes and node text in an xml file.
  4. i don’t understand xslt.
  5. someone also requested that someone cover xquery (i’mnot familiar with it and i don’t think i need it just yet).

i would appreciate it, and i’m nay other people would appreciate it, if people with these skills could use the sample xml below to show us how to do these routine tasks.



<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>


you may have to load an external file in order to do some of the tasks listed above.

i will keep working through these issues and post the successes here if they haven’t been addressed yet.

tia…

auricle was nice enough to share some code to count elements using the DOM and getElementsByTagName. using his example, i was able to count using XPath, too.

i created a movies.xml and put it in the same directory as my test.php file. the xml in movies.xml looks just as before:



<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>


the php5 code to count elements using getElementsByTagName and XPath is as follows:



<?PHP

$doc = new DOMDocument('1.0', 'utf-8');
$doc->load('movies.xml');

// count using getElementsByTagName
echo 'Count Title (getElementsByTagName)<br /><br />';
$title = $doc->getElementsByTagName('title');
$count_elements = $title->length;
echo 'number of title elements: '.$count_elements;

// count using xpath

echo '<br /><br />Count Title (XPath)<br /><br />';
$xpath = new DOMXPath($doc);
$query = '//movies/movie/title';
$title = $xpath->query($query);
$count_elements = $title->length;
echo 'number of title elements: '.$count_elements;

echo '<br /><br />Count Rating (XPath)<br /><br />';
//$xpath = new DOMXPath($doc);
$query = '//movies/movie/rating';
$rating = $xpath->query($query);
$count_elements = $rating->length;
echo 'number of rating elements: '.$count_elements;

echo '<br /><br />Count Character (XPath)<br /><br />';
//$xpath = new DOMXPath($doc);
$query = '//movies/movie/characters/character';
$character = $xpath->query($query);
$count_elements = $character->length;
echo 'number of character elements: '.$count_elements;

?>


i found a way, by reading XML By Example, to create a function that will display an element’s text. actually, i think will display a node’s text, too. for example, if you query an attribute, this may well return the value of the attribute, too.

see the new code at the bottom of the following code example:



<?PHP

$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>
XML;

// --------------------- Using DOM and XPath ------------------------

$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;

$doc = DOMDocument::loadXML($xmlstr);

// $doc->Load(esternal_file_name.xml'); // use if loading separate xml file

$xpath = new DOMXPath($doc);

// get title values

echo '--List All Titles, One By One--<br /><br />';

$query = '//movies/movie/title';
$query = $xpath->query($query);
// find first item
$title_1 = $query->item(0)->nodeValue;
$title_2 = $query->item(1)->nodeValue;
$title_1 = (string) $title_1;
$title_2 = (string) $title_2;
echo 'Title 1: '.$title_1.'<br />';
echo 'Title 2: '.$title_2;

// get title values usig a loop

echo '<br /><br />';

echo '--Loop Through All Titles--<br /><br />';

$titles = $doc->getElementsByTagName('title');
foreach ($titles as $title) {
print (string) $title->firstChild->nodeValue . '<br />';
}

echo '<br />';

// ------------------- this is new --------------------------

// use very basic function to get text value from node

//"$query" below references back up to "$query = $xpath->query($query);"

echo 'get_text: '. get_text($query->item(0));

function get_text($node){

    return $node->firstChild->data;

}

echo '<br /><br />';

echo 'get_text_improved: '. get_text($query->item(0));

// see nodeType Codes list, below

function get_text_improved($node){

    if($node->nodeType == 1){

        $text = '';
        $children = $node->childNodes;
        for ($i = 1; $i < $children->length; $i++) {

            if($children->time($i)->node-Type == 3){

                $text += $children->item(i)->data;
                return $text;

            }
        }
    }
}

/*
nodeType Codes:

1 - Element
2 - Attribute
3 - Text
4 - CDATA Section
5 - Entity reference
6 - Entity
7 - Processing instruction
8 - Comment
9 - Document
10 - Document type
11 - Document fragment
12 - Notation
*/

?>


with the help guidance from auricle over on the php forums (auricle showed me how to do this), i was able to get the following DOM, XPath code to update an xml file.



 $doc = new DOMDocument;
//$doc->preserveWhiteSpace = false;

//$doc = DOMDocument::loadXML($xmlstr);
$doc->Load('movies.xml'); // use if loading separate xml file
$xpath = new DOMXPath($doc);

$query = '//movies/movie/title';
$query = $xpath->query($query);

$movie_1 = $query->item(0);
$movie_1->nodeValue = 'New Title 1';
file_put_contents('movies.xml', $doc->saveXML());

$query_1 = '//movies/movie/title';
$query_1 = $xpath->query($query_1);
$new_title = $query_1->item(0)->nodeValue;
echo $new_title;


the starting movies.xml file, located in the same directory as my php file, looks like this:



<?xml version='1.0' standalone='yes'?>
<movies>
  <movie>
    <title>PHP: Behind the Parser</title>
    <characters>
      <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
      </character>
    </characters>
  </movie>
  <movie>
    <title>PHP and the DOM</title>
    <rating type='thumbs'>7</rating>
    <rating type='stars'>5</rating>
  </movie>
</movies>


“New Title 1” will replace “PHP: Behind the Parser”. if you uncomment out preserveWhiteSpace then the indent formatting of movies.xml will be lost.

the following code will create XML:


<?php
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;

$html = $dom->createElement ('html');
$html = $dom->appendChild ($html);

$head = $dom->createElement ('head');
$head = $html->appendChild ($head);

$title = $dom->createElement ('title');
$title = $head->appendChild ($title);

$title_text = $dom->createTextNode ('Web Page Title');
$title_text = $title->appendChild ($title_text);

$body = $dom->createElement ('body');
$body = $html->appendChild ($body);

$p = $dom->createElement('p');
$p = $body->appendChild ($p);

$text = 'This is text that can go into your paragraph';

$p_text = $dom->createTextNode ($text);
$p_text = $p->appendChild ($p_text);

echo $dom->saveXML();

?>

i’m currently trying to figure out how to create a file with this xml as its text.

i think it is related to:

DOMImplementation->createDocument()

but i haven’t found a working example yet.

How about…

/* PHP5 only */
$doc->save('/path/tp/your/new/xml/file');

here is an example how to add attributes and attribute values:



<?PHP

$dom = new DOMDocument('1.0');
$dom->formatOutput = true;

$html = $dom->createElement ('html');
$html = $dom->appendChild ($html);

$head = $dom->createElement ('head');
$head = $html->appendChild ($head);

$title = $dom->createElement ('title');
$title = $head->appendChild ($title);

$title_text = $dom->createTextNode ('Web Page Title');
$title_text = $title->appendChild ($title_text);

$body = $dom->createElement ('body');
$body = $html->appendChild ($body);

$p = $dom->createElement('p', 'This is a new way to insert text into an element');
$p = $body->appendChild ($p);

$attr_p = $dom->createAttribute ('class');
$attr_p = $p->appendChild ($attr_p);
$p->setAttribute('class', 'className');

echo $dom->saveXML();

$dom->save('test_file.xml');

?>


notice that i used a different method to input the text into the <p> tag. also notice that this file will be saved in the same directory where this file resides.

as auricle showed in the previous post, you can define a path and store the file in a different directory.

i’m facing a situation where i need to read in an xml file, determine which xml file it is (out of six different possible xml file structures), load the values into an array (including any blank values) and then take that array and put the values (including empty - “”) into a form.

is there a way to read an xml file and have all the values automatically put into an array?

does anyone have any suggestions that i may have missed?

also, how do i write a comment into the xml?

tia…

Wouldn’t you know that just from the file name?

also, how do i write a comment into the xml?
On the subject of arrays, look at the user comments at http://www.php.net/DOM. I remember there were a couple about arrays there.

To add a comment:

$comment = $doc->createComment('your comment here');
$node->appendChild($comment);

i wasn’t clear - there is a single file that can take on six different formats. there is data unique to each form in the xml file, so i check that and then display the right form to display the xml format.

i’ll check out the php.net notes for arrays.

thanks for the comment instruction.

once you’ve run the code in the post quoted above, you should have an xml file named $test_file.xml.

the following code will use the DOM and XPATH to append a new <p> element to the <body> element. the new element will be appended to the end of the body’s child nodes (<p>) - th enew element will appear last in the list of <p> elements.



<?php

$xml_file = 'test_file.xml';

$dom = new DOMDocument;

if (!$dom->Load($xml_file)){
    echo 'file not loaded';
}

$xpath = new DOMXPath($dom);

$q_body = '/html/body';
$body = $xpath->query($q_body);

$new_p = new domElement('p', 'New Paragraph');
$add_p = $body->item(0)->appendChild($new_p);

$dom->formatOutput = true;
echo $dom->saveXML();

$dom->save('test_file.xml');

?>


bump… thread is useful for folks learning PHP5.1.x and DOM, XML.

anyone with XSLT knowledge is encouraged to post short tutorials similar to the ones already in this thread.

good luck!