DOM vs. SimpleXML - performance

PHP offers 2 good tools for XML parsing - DOM and SimpleXML. The second allows to reduce code. Accessing elements properties is easier.

However, how about performance of these extensions and memory usage? Has someone made a good benchmark of them?

If you search in google for “simplexml benchmark”, the first result is:

Processing Large XML Documents with PHP 5

That should answer your question.

It’d be easier to tell if the documentation for XML reading/parsing/writing in PHP didn’t suck. Right now people don’t even know how to pull a single, specific entry from an XML file without it throwing some kind of error because there are no explanations for what any of it means. Just one of several things the PHP developers get completely wrong. This is not how you make a scripting language, and this is a terrible example of how one might document something.

It’s not performance per se, but: In a previous project, we found that SimpleXML had a tendency to fail unpredictably, and decided that we needed to use DOM for reliability. That was a couple of years ago. It may have improved since then.

Personally, I don’t understand the need for SimpleXml. DOM may be verbose, but it’s a standard interface, so it’s the same thing you have in javascript and other platforms you might interact with. That makes it worth using for consistency, even if it’s a bit long winded.

Oh, and if you are worried about performance, then you probably need neither DOM nor SimpleXML. They both work by parsing everything into memory, which is quite inefficient for large documents (Large as in several megabytes). If you need to process large XML documents, you will get much better performance with the event-based XML parser: http://docs.php.net/manual/en/ref.xml.php

There are lots of problems with XHTML parsing in PHP by XML functions. I’ve just solved most of them. However, how to get value of attribute with namespace?

<input type="checkbox" id="something" f3:var="config.item" />
<?php
foreach(xpath('//input[@id="ms"]') as $item) echo $item['f3:var'];
?>

It doesn’t work. Even $f3->item[‘var’].

I wonder whether parsing XML using preg_* and str_* wouldn’t better. Though i’m only creating a template system. XML parsing is only used for FORMS with IF conditions generating. Example output:

<input type="checkbox" id="something" <?php if($config['item']) echo 'checked="checked" ';?>/>
OR:
<?php if($config['item']) echo '<input type="checkbox" id="something" checked="checked" />'; else echo '<input type="checkbox" id="something" />'; ?>

There is only 1 (later maybe more) new attribute: f3:var - which can be situated in <form> (for all checkbox, radio and select) or in input / select elements.

Give me some advices. What is better for this purpose - XML in PHP or preg_* / str_*?

What problems? That’s not standard XHTML.

It’s not standard XHTML but it’s a part of template file! And I wonder whether I should use DOM / SimpleXML or preg_* with str_*. Though regular expressions are inseperable - i have to parse {var}, <!-- VAR –>, etc. - I can’t do it using XML functions. But maybe XML functions are easier and faster? I don’t know!