Struggling With Dynamic Grouping

I have a list of product SKUs in the following format:
Brand Code (hyphen) Model Number (hyphen) Variant

So for example:
NIKE-SHOEA-WHITE
etc.

I have a query that attempts to get 5 of the most recent Model Numbers for a given brand.

So here’s some sample data:

sku                 date_added
NIKE-SHOEA-WHITE    2009-08-23
NIKE-SHOEA-BLACK    2009-08-23
NIKE-SHOEA-GREEN    2010-09-23
NIKE-SHOEA-BLUE     2010-09-23
NIKE-SHOEB-WHITE    2010-09-23
NIKE-SHOEB-BLACK    2010-09-23
NIKE-SHOEB-RED      2010-09-23
NIKE-SHOEB-PURPLE   2010-09-23

Here’s my current query (overly simplified):

SELECT p.sku AS product_sku
  FROM Products p
 WHERE p.active = 1
 GROUP
    BY SUBSTRING_INDEX(p.sku, '-', 2)
 ORDER
    BY p.date_added DESC
 LIMIT 5

My expected result would be:
NIKE-SHOEA-GREEN
NIKE-SHOEB-WHITE

My actual result is:
NIKE-SHOEA-BLACK
NIKE-SHOEB-BLACK

(notice how NIKE-SHOEA-BLACK is actually over a year older than NIKE-SHOEA-BLUE and NIKE-SHOEA-GREEN)

What appears to be happening is that the order by is sorting the results by date, and then the group by is applied. When the group by is applied (presumably for speed), each sub-group (meaning all NIKE-SHOEA) are then sorted by code (so since -BLACK comes first alphabetically, it is aways returned first regardless of the date_added field).

So in technically if I say the query returns the “5 most recent models”, the query is working, but what I am truly after is the “5 of the newest products, while showing no more than 1 of each model”.

I don’t want to select a bunch of products and filter the results via the application if I don’t have to. Can anyone think of a clever solution?

what is actually happening is completely the opposite

first, the GROUP BY takes all rows with the same grouping column, and extracts from them a single result row – one result row per group

then the ORDER BY sorts the result rows

your problem lies in the fact that you have the sku (which is ~not~ a grouping column) in the SELECT clause

so when the GROUP BY is collapsing all the rows with the same grouping column (e.g. NIKE-SHOEA), which value of sku should it give you for that group? well, they are all different, so it chooses one of them arbitrarily

read this for a full explanation: http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

you might have to

that “showing no more than 1 of each model” requirement is the deal breaker

my advice is fetch twenty of the latest, loop through them, and display the first 5 distinct models

Makes sense (as always). Tweaked the application and everything seems good. Looking back at it, what my application was displaying before was (in some cases) completely wrong :frowning: