Parse XML that Contains CDATA XML

I’m attempting to parse some XML using PHP and normally would just use simplexml_load_string() to read the data. However, the XML that I need to parse includes CDATA and another embedded XML document.

<?xml version="1.0" encoding="utf-8"?>
<webRequest>
    <id>160810</id>
    <request>
        <merchantShortName>ReServe</merchantShortName>
        <serviceName>reservationManagementServices</serviceName>
        <actionName>diningResListGet</actionName>
    </request>
    <authentication>
        <username>lynn</username>
        <password>lynn</password>
    </authentication>
    <content>
    <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
        <diningResListGetRequest>
            <dateRangeFilter>
                <fromDate>2015-10-15</fromDate>
                <toDate>2015-10-15</toDate>
            </dateRangeFilter>
            <siteNameFilter>
                <matchCriterion>EqualTo</matchCriterion>
                <stringValue>Frontier Vineyards</stringValue>
            </siteNameFilter>
            <maxReturned>50</maxReturned>
            <servicePeriodFilter>Dinner</servicePeriodFilter>
            <modifiedDateTimeFilter>
                <fromDateTime>2012-04-01T12:00:00</fromDateTime>
            </modifiedDateTimeFilter>
        </diningResListGetRequest>
    ]]></content>
</webRequest>

When I attempt to parse this XML using simplexml_load_string() I receive the error: Warning: simplexml_load_string(): Entity: line 2: parser error : XML declaration allowed only at the start of the document.

What’s the trick to getting at the data contained in the embedded XML document using PHP? I’m likely not fully understanding what I’m looking at so my Google searches haven’t turned up anything useful. Not even sure what that second embedded XML document is called. Any help is appreciated!

I’m no expert on this, but according to this page (http://php.chinaunix.net/manual/sl/function.simplexml-load-string.php, look in the comments from around June 2008) there seems to be a need to escape the contents of the file prior to using simplexml_load_string if there is CDATA in place. Also there is talk of a LIBNOCDATA option on that function, though reading it seems that it might make things worse rather than better.

SimpleXMLString was always the quicker, less featured parser, use DOM. And heres an example of how to use it with cdata: http://stackoverflow.com/questions/6674322/how-to-get-values-inside-cdatavalues-using-php-dom

Thanks guys, that helps a bit. The issue I’m running into now regardless of which parsing tool I use is the second XML declaration within the CDATA element. The parser is puking on that section.

Warning: simplexml_load_string(): Entity: line 2: parser error : XML declaration allowed only at the start of the document in /xml-parse.php on line 19
Warning: simplexml_load_string(): <?xml version=“1.0” encoding=“utf-8”?> in /xml-parse.php on line 19

This is continuing to use simplexml_load_string(). I received the same error when using DOMDocument().

My next thought was to simply strip out the CDATA and additional XML declaration from the XML given that it’s useless to me in terms of parsing the data:

$rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $rawXML);
$rawXML = str_replace(']]>', "", $rawXML);

My resulting XML document looks as such:

<?xml version="1.0" encoding="utf-8"?>
<webRequest>
    <id>160810</id>
    <request>
        <merchantShortName>ReServe</merchantShortName>
        <serviceName>reservationManagementServices</serviceName>
        <actionName>diningResListGet</actionName>
    </request>
    <authentication>
        <username>lynn</username>
        <password>lynn</password>
    </authentication>
    <content>
    
        <diningResListGetRequest>
            <dateRangeFilter>
                <fromDate>2015-10-15</fromDate>
                <toDate>2015-10-15</toDate>
            </dateRangeFilter>
            <siteNameFilter>
                <matchCriterion>EqualTo</matchCriterion>
                <stringValue>Frontier Vineyards</stringValue>
            </siteNameFilter>
            <maxReturned>50</maxReturned>
            <servicePeriodFilter>Dinner</servicePeriodFilter>
            <modifiedDateTimeFilter>
                <fromDateTime>2012-04-01T12:00:00</fromDateTime>
            </modifiedDateTimeFilter>
        </diningResListGetRequest>
    </content>
</webRequest>

Everything looks okay to me here, but I’m still receiving that error about the second declaration. I double and triple checked that I’m not feeding the original XML into the simplexml_load_string() but rather am feeding the replaced variable. Full code is below in case I’m missing something.

$rawXML = file_get_contents("php://input");
$rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $rawXML);
$rawXML = str_replace(']]>', "", $rawXML);
$xml = simplexml_load_string($rawXML);

Any additional thoughts on this error?

Will any of these suffice?


<?php

$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<webRequest>
    <id>160810</id>
    <request>
        <merchantShortName>ReServe</merchantShortName>
        <serviceName>reservationManagementServices</serviceName>
        <actionName>diningResListGet</actionName>
    </request>
    <authentication>
        <username>lynn</username>
        <password>lynn</password>
    </authentication>
    <content>
    <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
        <diningResListGetRequest>
            <dateRangeFilter>
                <fromDate>2015-10-15</fromDate>
                <toDate>2015-10-15</toDate>
            </dateRangeFilter>
            <siteNameFilter>
                <matchCriterion>EqualTo</matchCriterion>
                <stringValue>Frontier Vineyards</stringValue>
            </siteNameFilter>
            <maxReturned>50</maxReturned>
            <servicePeriodFilter>Dinner</servicePeriodFilter>
            <modifiedDateTimeFilter>
                <fromDateTime>2012-04-01T12:00:00</fromDateTime>
            </modifiedDateTimeFilter>
        </diningResListGetRequest>
    ]]></content>
</webRequest>
XML;


$DOMDocument = new DOMDocument();
$DOMDocument->loadXML( $xml );

$DOMNodeList = $DOMDocument->getElementsByTagName( 'content' );

echo $DOMNodeList->item( 0 )->textContent;
echo "\
------------------------------------\
";

$DOMXPath         = new DOMXPath( $DOMDocument );
$xpathDomNodeList = $DOMXPath->query( '//webRequest/content' );
echo $xpathDomNodeList->item( 0 )->textContent;

echo "\
-------------end of dom-----------------------\
";

$simpleXMLElement = new SimpleXMLElement( $xml );
$simpleXMLXPathedElements = $simpleXMLElement->xpath( '//webRequest/content' );
echo (string)$simpleXMLXPathedElements[ 0 ];

//Actually this just works with no special treatment( string replacements etc)
echo $simpleXMLElement->content;

Quick question are you trying to conjoin two documents into one to parse it all on one parse as the content document is a separate one? They can be merged, and your code works fine for that as this is your example isolated.


$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<webRequest>
    <id>160810</id>
    <request>
        <merchantShortName>ReServe</merchantShortName>
        <serviceName>reservationManagementServices</serviceName>
        <actionName>diningResListGet</actionName>
    </request>
    <authentication>
        <username>lynn</username>
        <password>lynn</password>
    </authentication>
    <content>
    <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
        <diningResListGetRequest>
            <dateRangeFilter>
                <fromDate>2015-10-15</fromDate>
                <toDate>2015-10-15</toDate>
            </dateRangeFilter>
            <siteNameFilter>
                <matchCriterion>EqualTo</matchCriterion>
                <stringValue>Frontier Vineyards</stringValue>
            </siteNameFilter>
            <maxReturned>50</maxReturned>
            <servicePeriodFilter>Dinner</servicePeriodFilter>
            <modifiedDateTimeFilter>
                <fromDateTime>2012-04-01T12:00:00</fromDateTime>
            </modifiedDateTimeFilter>
        </diningResListGetRequest>
    ]]></content>
</webRequest>
XML;



$rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $xml);
$rawXML = str_replace(']]>', "", $rawXML);
$xml = simplexml_load_string($rawXML);

echo( $xml->saveXML() );

If I add a line space at the start of the xml I get the same error “simplexml_load_string(): Entity: line 2: parser error : XML declaration allowed only at the” so it may be work trimming the xml.


$xml = <<<XML

<?xml version="1.0" encoding="utf-8"?>
<webRequest>
    <id>160810</id>
    <request>
        <merchantShortName>ReServe</merchantShortName>
        <serviceName>reservationManagementServices</serviceName>
        <actionName>diningResListGet</actionName>
    </request>
    <authentication>
        <username>lynn</username>
        <password>lynn</password>
    </authentication>
    <content>
    <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
        <diningResListGetRequest>
            <dateRangeFilter>
                <fromDate>2015-10-15</fromDate>
                <toDate>2015-10-15</toDate>
            </dateRangeFilter>
            <siteNameFilter>
                <matchCriterion>EqualTo</matchCriterion>
                <stringValue>Frontier Vineyards</stringValue>
            </siteNameFilter>
            <maxReturned>50</maxReturned>
            <servicePeriodFilter>Dinner</servicePeriodFilter>
            <modifiedDateTimeFilter>
                <fromDateTime>2012-04-01T12:00:00</fromDateTime>
            </modifiedDateTimeFilter>
        </diningResListGetRequest>
    ]]></content>
</webRequest>
XML;



$rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $xml);
$rawXML = str_replace(']]>', "", $rawXML);
$xml = simplexml_load_string($rawXML);

echo( $xml->saveXML() );

Seriously? Doing a trim() on $raw XML solved the issue. Thanks for the help!

::stops banging head against wall::