Trying to optimize large query

I’ve got this large query that’s bogging things down on my site and I think I’ve exhausted my optimization abilities on it!

What I was trying to achieve was to fetch the products for a category, along with ALL of the information that is required for the on-page display, so as to get it all in one and avoid any iterative queries. One-to-many relationships were resolved by liberal use of GROUP_CONCAT. The goal was for overall speed but I’m not sure I’ve managed that!

The query is below, with the EXPLAIN statement below that. Do you think there’s any improvements I can make? Or am I trying to do too much in this one query? It takes about 4 seconds usually.

All the tables are MyISAM (a couple of of them have FULLTEXT indexes). I’ve tried changing them to INNODB but didn’t see any performance increase. In terms of table size the products table has about 100,000 rows, a similar amount for the images table. They’re the main big ones.

Not sure if the query and the explain are sufficient to allow understanding of what I’m doing. Let me know if I need to post more info.


SELECT SQL_CALC_FOUND_ROWS
			p.productID, p.prodName, p.price,
			GROUP_CONCAT(DISTINCT pic.fileName ORDER BY pic.index) pictureList,
			COUNT(DISTINCT pic.picID) pic_count,
			GROUP_CONCAT(DISTINCT 'caption_',pic.index,':',pic.caption ORDER BY pic.index) captionList,
			s.deliveryTime,
			tax.taxRate,
			COUNT(v.videoID) videoCount,
			COUNT(DISTINCT review.reviewId) AS review_count, AVG(review.rating) AS avg_review,
			
			GROUP_CONCAT(DISTINCT 'pid:',freeProd.productID,';name:',freeProd.name,';price:',freeProd.price, ';image:',freeProductPic.fileName,';' SEPARATOR '#') freeProds, 	
			GROUP_CONCAT(DISTINCT doc.docFile) docFiles
			
			FROM
			products AS p
			JOIN product_category AS pcat ON p.productID=pcat.productID
			LEFT JOIN tax ON p.taxCode=tax.taxCode
			LEFT JOIN productSupplier AS s ON p.supplierID=s.supplierID
			LEFT JOIN images AS pic ON p.productID=pic.productID
			LEFT JOIN videos AS v ON p.productID=v.productID
			
			LEFT JOIN productReviews AS rev ON p.productID=rev.productID AND review.status=1

			LEFT JOIN freeProduct AS fp ON p.productID = fp.productID
			LEFT JOIN tProducts AS freeProd ON fp.freeProductID=freeProd.productID
			LEFT JOIN images AS freeProductPic ON freeProd.productID=freeProductPic.productID AND freeProductPic.index=1

			LEFT JOIN product_productDocuments AS pdoc ON p.productID=pdoc.productID
			LEFT JOIN productDocuments AS doc ON pdoc.docID=doc.docID
			
			WHERE
				p.isForSale=1
				AND
				pcat.catId=6615
			GROUP BY p.productID
		
			ORDER BY price ASC
			LIMIT 0, 100



id  select_type         table      type    possible_keys                           key            key_len  ref                          rows  Extra
1   PRIMARY             pcat       ref     unique_index,catId,productID            unique_index   4        const                        35    Using index; Using temporary; Using filesort
1   PRIMARY             p          eq_ref  PRIMARY,isForSale,prodID_isForSale      PRIMARY        4        mydb.pcat.productID       1     Using where
1   PRIMARY             tax        eq_ref  PRIMARY                                 PRIMARY        7        mydb.p.taxCode            1
1   PRIMARY             s          ref     supplierId                              supplierId     13       mydb.p.supplierID         1
1   PRIMARY             pic        ref     productID,prodID_index                  productID      4        mydb.p.productID          1
1   PRIMARY             v          ref     productID                               productID      5        mydb.p.productID          3
1   PRIMARY             review     ref     status,productID,prodID_status          prodID_status  7        mydb.pcat.productID,const 5
1   PRIMARY             fp         ref     unique_index,productID                  unique_index   8        mydb.p.productID          1
1   PRIMARY             freeProd   eq_ref  PRIMARY,prodID_isForSale                PRIMARY        4        mydb.fp.freeProductID     1
1   PRIMARY             freePic    ref     relProdId,prodID_index                  productID      4        mydb.freeProd.productID   1
1   PRIMARY             pdoc       ref     unique_index,productID                  unique_index   9        mydb.p.productID          1     Using index
1   PRIMARY             doc        eq_ref  PRIMARY                                 PRIMARY        4        mydb.pdoc.docID           1

you realize, of course, why you have to use DISTINCT in all your GROUP_CONCATs, right?

it’s because you have massive cross join effects produced by the various one-to-many relationships compounding each other

to see what i mean, just remove the GROUP BY clause (and of course take off the GROUP_CONCAT functions, but leave a column in the SELECT clause so that table is still represented) to see how many intermediate rows your joins produce

then i’ll explain how to change the query

Yup, there are 54436 rows in the intermediate set. The final result is 37 rows. That’s a lot of cross-joining.

and that’s where the processing time goes, to GROUP_CONCAT all the distinct values, since each DISTINCT requires a sort

okay, let’s reconstruct the query

i always start the FROM clause with the “driving” table, the table that drives the logic of the query, i.e. the table that has the important WHERE criterion

in this case you want all products for a specific category, so product_category is the driving table


  FROM product_category AS pcat
INNER
  JOIN products AS p 
    ON p.productID = pcat.productID
   AND p.isForSale = 1
 WHERE pcat.catId = 6615

note how the only condition left in the WHERE clause is the driving condition, the other one has been moved up to the join’s ON clause

now i start joining all the many-to-one tables using inner joins


  FROM product_category AS pcat
INNER
  JOIN products AS p 
    ON p.productID = pcat.productID
   AND p.isForSale = 1
[COLOR="#0000FF"]INNER
  JOIN tax 
    ON tax.taxCode = p.taxCode[/COLOR]
 WHERE pcat.catId = 6615

note it’s an INNER JOIN because you wouldn’t have a tax.taxcode for a tax.taxcode that doesn’t exist (except if p.taxcode can be NULL, in which case, yes, it would have to be a LEFT OUTER JOIN)

i’m going to skip a couple of tables and do the videos and reviews

these are one-to-many relationships (as opposed to the many-to-one relationship we saw with the tax table) so these get “pushed down” into subqueries


  FROM product_category AS pcat
INNER
  JOIN products AS p 
    ON p.productID = pcat.productID
   AND p.isForSale = 1
INNER
  JOIN tax 
    ON tax.taxCode = p.taxCode
LEFT OUTER 
  JOIN [COLOR="#0000FF"]( SELECT productID
              , COUNT(*) AS videoCount
           FROM videos 
         GROUP
             BY productID ) AS v 
    ON v.productID = p.productID[/COLOR]
LEFT OUTER 
  JOIN [COLOR="#FF0000"]( SELECT productID
              , COUNT(*) AS review_count
              , AVG(rating) AS avg_review
           FROM productReviews
          WHERE status = 1
         GROUP
             BY productID ) AS rev 
    ON rev.productID = p.productID[/COLOR]
 WHERE pcat.catId = 6615

so the strategy is to do the groupings in the subqueries

the videos subquery was easy, but the reviews subquery was a bit confusing, as you used “rev” as the table alias but “review” elsewhere (this would’ve been a syntax error in the query you posted), but i wanted to show how more than one aggregate function is done, and also the use of the WHERE clause in the subquery

make sense so far?

so the final main query does not have a GROUP BY clause, all grouping is done in subqueries

think you can finish the rest of the query?

if you have trouble, post how far you got…

Thanks for your help. As always, I realise there’s so much more to learn!

I get what you’re saying about keeping the GROUP BYs in the subqueries. However I have re-written half of the query and it’s now running slower.

Below is what I’ve written so far, and I’ve left out the remaining parts until I get my head around this bit. Is the problem that the subqueries are returning too large amounts of data, and matching it against the “driving” table in an un-indexed way?


SELECT SQL_CALC_FOUND_ROWS
	p.productID, p.prodName, p.price,
	s.deliveryTime,
	tax.taxRate,

	pic.pic_count, pic.picture_list, pic.captionList			
	, v.videoCount
	, review.review_count, review.avg_review

			
	FROM
	product_category AS pcat
	INNER JOIN products AS p ON p.productID=pcat.productID AND p.isForSale=1
	INNER JOIN tax ON p.taxCode=tax.taxCode
	INNER JOIN productSupplier AS s ON p.supplierID=s.supplierID
	
	LEFT OUTER JOIN
	(
		SELECT productID,
			COUNT(DISTINCT pic.picID) pic_count,
			GROUP_CONCAT(DISTINCT pic.fileName ORDER BY pic.ordering) picture_list,
			GROUP_CONCAT(DISTINCT 'caption_',pic.index,':',pic.caption ORDER BY pic.ordering) captionList
		FROM images pic
		GROUP
			BY pic.relProdID
		#THIS SUBQUERY RETURNS 84000 ROWS
	) AS pic
		ON pic.productID=p.productID
		
	LEFT OUTER JOIN
	(
		SELECT vid.productID
		, COUNT(*) AS videoCount
		FROM videos vid
		GROUP BY vid.productID
		#THIS SUBQUERY RETURNS 400 ROWS
	) AS v
		ON v.productID=p.productID
		
	LEFT OUTER JOIN
	(
		SELECT review.productID
		, COUNT(DISTINCT review.reviewId) AS review_count
		, AVG(review.rating) AS avg_review
		FROM productReviews review
		WHERE review.status=1
		GROUP BY review.productID
		#THIS SUBQUERY RETURNS 2500 ROWS
	) AS review
		ON review.productID=p.productID
	
	WHERE
		pcat.catId=6615
	
	ORDER BY price ASC



id  select_type  table       type    possible_keys                              key           key_len  ref                    rows   Extra
1   PRIMARY      pth         ref     unique_index,catId,productID               unique_index  4        const                  35     Using index; Using temporary; Using filesort
1   PRIMARY      p           eq_ref  PRIMARY,suppid,isForSake,prodID_isForSale  PRIMARY       4        mydb.pth.productID     1      Using where
1   PRIMARY      s           ref     supplierId                                 supplierId    13       mydb.p.supplierID      1      Using where; Using index
1   PRIMARY      tax         eq_ref  PRIMARY                                    PRIMARY       7        mydb.p.taxCode         1      Using index
1   PRIMARY      <derived2>  ALL                                                                                              83498
1   PRIMARY      <derived3>  ALL                                                                                              372
1   PRIMARY      <derived4>  ALL                                                                                              2243
4   DERIVED      review      ALL     status                                                                                   14033  Using filesort
3   DERIVED      vid         index                                              productID     5                               1166   Using index
2   DERIVED      pic         ALL                                                                                              89994  Using filesort

i’m sorry, that EXPLAIN is just a little bit beyond my skill set

if you’re really only after (approx) 37 products at a time, perhaps you could try breaking up the query and running each of the one-to-many relationship queries separately, using WHERE product_id IN ( list of ids ) where the list of product id values is obtained by the first query

I suspected that I’d have to re-do the process using separate queries, but was trying to avoid it! But it’s probably the best way. There was a time I thought that 1 query would be the best way, but you live and learn.
Thanks for your help, it’s appreciated, as always.