php5 need something like innerHTML instead of nodeValue

I am using cURL and xpath to get data from an html page and everything is going well until I encounter a node like this:


<td id="foo">
The first bit of Data I want
<br>The second bit of Data I want
<br>The third bit of Data I want</td>

so I curl the page and setup my xpath like this:


$fooNode = $xpath->evaluate("/html/body//td[@id='foo'];
$fooString = $fooNode->item(0)->nodeValue;
$echo $fooString;

which gives me something like:
“The first bit of Data I wantThe second bit of Data I wantThe third bit of Data I want”

as a result with no way to separate the data (before you ask the data above is just an example, can’t explode the string via “The”)

What I would like instead is some way to return the node with markup intact sorta like innerHTML in js, so I can explode it to an array via the “<br>” tag. like:

“The first bit of Data I want<br>The second bit of Data I want<br>The third bit of Data I want”

Is there anyway to save a node as a string with <br> tags intact?

Forgive me if this has been answered before, I did look around the site and Googled all last night to no avail.

$fooBits = explode(“<br>”,$fooString);
print_r($fooBits);

As I stated that is exactly what I would do if nodeValue would return a $foobits with the <br>s intact but it doesn’t. nodeValue strips out all the tags.

I’m not familiar with the “evaluate” method, but if it returns SimpleXMLElement objects the same as the SimpleXML objects xpath method does, you can use the “asXML()” method of the SimpleXMLElement object.

http://www.php.net/manual/function.simplexml-element-asXML.php

echo $fooNode[0]->asXML();

The following isn’t concrete (more a basic proof of concept:in other words, untested) but you could probably get at the “innerXML” (or innerHTML if you adapted the following function slightly) like:

innerXML function


function innerXML($node)
{
	$doc  = $node->ownerDocument;
	$frag = $doc->createDocumentFragment();
	foreach ($node->childNodes as $child)
	{
		$frag->appendChild($child->cloneNode(TRUE));
	}
	return $doc->saveXML($frag);
}

Quicky example


$dom = new DOMDocument();
$dom->loadXML('
<table>
<tr>
	<td id="foo"> 
		The first bit of Data I want
		<br/>The second bit of Data I want
		<br/>The third bit of Data I want
	</td>
</tr>
</table>
');

$node = $dom->getElementsByTagName('td')->item(0);
echo innerXML($node);

Tried both the asXML() function and the innerXML($node) functions but neither worked. I had tried the asXML() before but abandoned going any further with it since it seems like it is just for SimpleXML objects and not DOMDocument objects. Would love to be proven wrong though!

I think the createDocumentFragment() path looks the most promising but I am having trouble finding the documentation for it, or useful examples of it used in a tutorial. Any suggestions?

Thanks again for all your help. I promise to post the solution once I find as I think many people would like to know how to do this.

Jumped the gun a bit, your function, Salathe, seems to be working

innerXML function


function innerXML($node)
{
	$doc  = $node->ownerDocument;
	$frag = $doc->createDocumentFragment();
	foreach ($node->childNodes as $child)
	{
		$frag->appendChild($child->cloneNode(TRUE));
	}
	return $doc->saveXML($frag);
}

used this code instead


$dom = new DOMDocument();
$dom->loadXML('
<html>
<body>
<table>
<tr>
	<td id="foo">
		The first bit of Data I want
		<br/>The second bit of Data I want
		<br/>The third bit of Data I want
	</td>
</tr>
</table>
<body>
<html>

');
$xpath = new DOMXPath($dom);
$node = $xpath->evaluate("/html/body//td[@id='foo' ]");
$nameObject = innerXML($node->item(0));
echo $nameObject;

gives this result:


<>The first bit of Data I want
The second bit of Data I want
The third bit of Data I want

My goal is to end up with three distinct strings:

$firstString = “The first bit of Data I want”;
$secondString = “The second bit of Data I want”;
$thirdString = “The third bit of Data I want”;

What type of data is it that innerXML() returns and how do I get at the three bits of data in it. Thanks again I think I am really close here!

Got it working, did some more reading and realized saveXML outputs a string so viewed the source and saw that they were using <br /> instead of <br> tags, exploded that and was home free thanks for all your help. Here is the generalized code I used to get the results.


function innerXML($node)

{

    $doc  = $node->ownerDocument;

    $frag = $doc->createDocumentFragment();

    foreach ($node->childNodes as $child)

    {

        $frag->appendChild($child->cloneNode(TRUE));

    }

    return $doc->saveXML($frag);

}


$dom = new DOMDocument();

$dom->loadXML('

<html>

<body>

<table>

<tr>

    <td id="foo">

        The first bit of Data I want

        <br />The second bit of Data I want

        <br />The third bit of Data I want

    </td>

</tr>

</table>

<body>

<html>



');

$xpath = new DOMXPath($dom);

$node = $xpath->evaluate("/html/body//td[@id='foo' ]");

$dataString = innerXML($node->item(0));
$dataArr = explode("<br />", $dataString);

$dataUno = $dataArr[0];
$dataDos = $dataArr[1];
$dataTres = $dataArr[2];

echo "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />"

which yields:


firstdata = The first bit of Data I want
seconddata = The second bit of Data I want
thirddata = The third bit of Data I want


Thanks again and hope this helps someone else!

This would work also…

<?php
function getNodeInnerHTML(DOMNode $oNode)
{
    $oDom = new DOMDocument();
    foreach($oNode->childNode as $oChild)
    {
        $oDom->appendChild($oDom->importNode($oChild, true));
    }
    return $oDom->saveHTML();
}
?>