SQL query: using MAX and GROUP BY with other fields

ajmirsky · March 21, 2002, 11:34pm

Given these entries in my table:

id date value
4 2002-1-1 zero
4 2002-2-26 first
4 2002-2-28 second

I am trying to write a SQL query to get the value at the maximum date. So I first tried this query and got:

select MAX(date),value from MY_TBL group by id

2002-2-28 first

When it does the aggregate function of group by and then chooses a maximum date, it disreguards other fields. It then selects the first entry to show
the value of.

So it is even worse than that, because what I would really like to do is, get the latest value on a given day.

So here is an example:

id date value
6 2002-2-2 zero
6 2002-2-16 first
6 2002-2-18 second
6 2002-3-2 third

select MAX(date),value from MY_TBL where date < ‘2002-3-2’ group by id

This result doesn’t return me anything! Which would lead me to the same conclusion above. mysql groups everything first, decides the maximum value
(in this case ‘2002-3-2’ and then applies the where clause and since ‘2002-3-2’ is not less than the date i was looking for it doesn’t report anything!

Is their a way to change the order of operations so that is first evaluates the where clause and then does the group/max?

A way using a single query to get the value at a maximum date?

Thanks for the help.

Sincerely,
Andrew Mirsky
ajmirsky@hotmail.com

MattR · March 22, 2002, 2:03am

According to the SQL standard you cannot include columns in your SELECT which are not in the group by. It shouldn’t (and doesn’t in this case) work.

Try grouping by DATE instead.

Adding ‘value’ to the SELECT clause will cause problems – what if you had the following data:
2002-2-2 zero
2002-2-2 first

What value would it select? Zero? First? Try this query:


SELECT MAX( date ),
       value -- non standard and probably won't work
  FROM MY_TBL
 WHERE date < 'etc'
 GROUP BY date

DR_LaRRY_PEpPeR · March 22, 2002, 4:26am

Originally posted by MattR
According to the SQL standard you cannot include columns in your SELECT which are not in the group by.

just curious, should that not say “cannot include columns in your GROUP BY which are not in the SELECT?” that seems to make more sense. otherwise you would have to GROUP BY all columns, no?

MattR · March 22, 2002, 4:43am

Nope, that is what the standard states. e.g.


SELECT col1, col2
  FROM t2
 GROUP BY col1

Non-standard SQL!

Because again in the example consider if you have:


SELECT SUM( col1 ), col2
  FROM table1
 GROUP BY col1

Logically try and think about what that will do…

If your data looks like this:


table1( col1, col2 )
--------------------
1, 'happy'
2, 'joy'
3, 'joebob'
3, 'hippy'
2, 'shut'
1, 'up'
1, 'hippy'

In relational algebra you must always have distinct tuples (rows) in your result set. No exceptions. All tables must have, in some fashion, a unique way of identifying them. Remember that a result-set is just another mathematical set.

So, when you ask the RDBMS to GROUP BY col1 it is going to whine “Well, you have this col2 thing hanging around”. So you can try it in most RDBMS’s (and it should break). However, MySQL ‘relaxed’ the standards a bit to make it more useful (Sybase did as well).

In the above example the GROUP BY and SUM wants to return a single row, e.g.

SELECT sum( col1 ), col1
  FROM table1
 GROUP BY col1

Is logical:


result set
sum, col1
----------
3, 1
4, 2
6, 3

However, since you’re not grouping by COL2 in the example, you make the engine go “Wait, you’re asking me to duplicate sum counts, right? Uhm, ok. I want to return a single tuple for sum, but you’re telling me that I can’t. So, I can do one of two things. Return a single tuple for sum and randomly pick a col2 to display, or for each col2 which is in the sum, repeat the sum.” In MySQL and Sybase, the latter is chosen:


result set
sum, col2
---------
3, 'happy'
3, 'up'
3, 'hippy'
4, 'joy'
4, 'shut'
6, 'joebob'
6, 'hippy'

Explaining why relational calculus says this is VERBOTEN is outside the scope of this post; check out Codd’s papers or get a relational calculus/algebra book.

MattR · March 22, 2002, 4:48am

The standards also say that GROUP BY can only be used with one or more aggregate functions (but not all).

For example:


SELECT sum( col1 )
  FROM table1
 GROUP BY col1

Could return duplicate values (never mind the results would be fairly meaningless ;)), so you need to include ‘col1’ in the SELECT.

Conversely, you cannot have a GROUP BY without aggregates. Again this comes to relational calculus restrictions but MySQL/Sybase say ‘Well, we’ll group them pretty for you’ even though NEIN say Codd and Date.

Sybase has a handy feature that you can enable when you run queries to let you know when you’re deviating from the SQL standard:


1> SET FIPSFLAGGER ON
2> GO
1> SELECT SUM( col1 ), col2
2>   FROM table1
3>  GROUP BY col1
4> GO
Line number 1 contains Non-ANSI text. The error is caused due to the use of
column name(s) in a select list with aggregates but column name(s) not in group
by list.

1> SELECT username
2>   FROM user
3>  GROUP BY username
4> GO
FIPS WARNING: GROUP BY clause specified when all the items in select list are
aggregate functions or none of the items in select list are aggregate
functions.

DR_LaRRY_PEpPeR · March 22, 2002, 5:26am

:weyes: :looko: :wall:

that was WAY too much information for me. thanks though. very informative, as usual. i understand that what i thought is wrong now.

MattR · March 22, 2002, 5:32am

LOL, glad I could help. Funny thing about standards – vendors don’t follow 'em too well, do they?