xml_parser, attributes and nested [third level] elements

Hey SPF Developers -

We’re migrating from SimpleXML to the SAX parser to handle larger files but running into issues properly retrieving attributes from nested elements within our xml feed.

I’m able to access attributes using the code below, but rather than trying to get it from the specific tag, my script seems to try on every tag. As such warnings are thrown and if the attribute name exists in multiple tags, it’s duplicated in our array.

Can anyone suggest a better way to get an element like Product->merchantListing->in_stock->type ?

function startElement($parser, $tag, $attributes) {   
	switch($tag) {
		case 'product':
			$this->product=array('id'=>$attributes['id'], manufacturerName'=>'','merchantProduct'=>'','price'=>'');
			break;
		case 'manufacturerName';
		case 'merchantProduct':
			if(isset($attributes['mid'])){ echo $attributes['mid']; }
		case 'price': 
			if ($this->product) { $this->product_elem = $tag;  }
			break; 
	} 
}

XML Sample:

<product category_id="5" id="123455">
	<name>Some Amazing Product</name>
	<manufacturerPartNumber>ABCD</manufacturerPartNumber>
	<manufacturerName>Guys Who Make Stuff</manufacturerName>
	<merchantListing included="1"> 
		<merchantProduct mid="555555">
		 	<in_stock type="stock-1"></in_stock> 
			<condition type="cond-0"></condition> 
			<price>19.89</price> 
		</merchantProduct>
	</merchantListing> 
</product> 

Thanks

How big are each of the <product> elements? If they’re similar to the sample that you gave, my suggestion would be to use XMLReader (a more up-to-date alternative to the parser that you’re using) to find each <product>, then use SimpleXML (or DOM) to work easily with that chunk of XML.

This approach keeps you working with what you know (SimpleXML) but only loads [hopefully] small chunks of XML into memory at any given time. Of course, this is all based on your <product> elements not being too large to work with.

The product chunks are tiny [less than 50 lines each], it’s the number of them per file that causes issues as some of raw files are over a GB in size.

I’ll take a look at XML reader and go from there. Sticking with SimpleXML would be… fantastic.

Thanks!

Well for anyone else stuck on a similar issue in the future…

I found a nice code example at http://stackoverflow.com/questions/1835177/how-to-use-xmlreader-in-php which as Salathe pointed out lets you get the benefits of opening the file up progressively while still keeping the simplicity of SimpleXML. Took about 90 seconds to modify our script to make it all work.