MAX, GROUP BY, ORDER in Postgres

markbrown4 · July 13, 2012, 5:01am

It sounds simple enough but I can’t nut out a solution

I have a rounds table (id, user_id, total_score)
I’m wanting to get the top 10 scores but group by the users so that the top ten doesn’t have any of the users more than once.

This is giving the right scores:
SELECT MAX(total_score) as max_score, user_id FROM “rounds” GROUP BY user_id ORDER BY max_score LIMIT 10

I’m wanting to get the round id’s though and including that in the group by means that the users aren’t unique anymore.
SELECT MAX(total_score) as max_score, user_id, id FROM “rounds” GROUP BY user_id, id ORDER BY max_score LIMIT 10

DarthGuido · July 13, 2012, 6:29am

In MySQL you could eliminate the id from the GROUP BY, and it would give you the data you want (with only one of the round id’s in case of a user with the same max score in more than 1 round):


SELECT 
    MAX(total_score) as max_score
  , user_id
  , id 
FROM "rounds" 
GROUP BY user_id 
ORDER BY max_score 
LIMIT 10

I don’t know if that works in postgres as well?

markbrown4 · July 13, 2012, 6:39am

Hmm, no love.

SELECT MAX(total_score) as max_score, user_id, id FROM “rounds” GROUP BY user_id ORDER BY max_score LIMIT 10;
ERROR: column “rounds.id” must appear in the GROUP BY clause or be used in an aggregate function

Edit Oh, I think this works.
SELECT MAX(total_score) as max_score, user_id, MAX(id) as id FROM “rounds” GROUP BY user_id ORDER BY max_score LIMIT 10;

Will test it out, thanks for tip off guido.

Edit: No the MAX id hack doesn’t work.

markbrown4 · July 13, 2012, 8:46am

Actually, putting MAX in the query does work. I was just confused because I wasn’t ordering by max_score DESC.
Are there any downsides to this? Should another aggregate function be used instead?

SOLVED


SELECT MAX(total_score) as max_score, user_id, MAX(id) as id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;

Thanks guido.

swampBoogie · July 13, 2012, 8:47am


select total_score,
       user_id, 
       id
  from "rounds" x
 where total_score =
      (select max(total_score)
         from "rounds"
        where "rounds".user_id = x.user_id)
 order 
    by total_score desc
 limit 10

markbrown4 · July 13, 2012, 10:18am

Boom.

I thought I had solved it but no, this is the correct answer.

So, It’s the MAX and the ORDER BY that complicates this and requires a nested select?

Thanks for your help, I appreciate it.

DarthGuido · July 13, 2012, 10:33am

If I’m not mistaken, that solution will still give you more than 1 row per user, if a user has multiple rows with the same max score.
If you only want one row per user returned, and you don’t care which one, then your solution is the way to go (returning always the latest round id of the rows with the max score for that user).

markbrown4 · July 13, 2012, 10:50am

The previous one wasn’t returning MAX(total_score) anymore, It seemed to be taking the MAX(id) in preference somehow…

Because SQL does my head in and for efficiency I’ve actually chosen to bake the best_score out and save it against the user.
Then it’s a much more straightforward SELECT * FROM User ORDER BY max_score DESC LIMIT 10

Are there good rules of thumb for baking out calculated fields like this in the db or is that frowned upon?

DarthGuido · July 13, 2012, 10:55am

Are you saying that this query didn’t return the max total_score for each user_id?

SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;

I find that hard to believe. All aggregate functions should give a value relative to the columns you’re grouping by, in this case user_id. Unless Postgres does things differently?

markbrown4 · July 13, 2012, 11:01am

It was returning the correct scores for the user_id’s but the id’s weren’t related anymore.


SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
 max_score | user_id | id 
-----------+---------+----
    131.16 |       1 | 55
        55 |      44 | 54
      49.2 |      30 | 28
     42.33 |      33 | 49
      32.7 |       7 | 10
        32 |      34 | 35
    19.135 |      40 | 44
      18.5 |      38 | 48
      17.3 |      41 | 46
     13.04 |       3 |  4
(10 rows)


SELECT total_score from rounds WHERE id IN (55,54,28,49,10,35,44,48,46,4) ORDER BY total_score DESC;
 total_score 
-------------
      19.135
           0
           0
           0
           0
           0
           0
           0
           0
           0

markbrown4 · July 13, 2012, 11:06am

DISTINCT ON might help with the duplicates.


select DISTINCT ON (total_score)
  total_score,
  user_id, 
  id
from "rounds" x
where total_score =
     (select max(total_score)
        from "rounds"
       where "rounds".user_id = x.user_id)
order 
   by total_score desc
limit 10

DarthGuido · July 13, 2012, 12:09pm

Yes of course. Stupid mistake from my part.

system · July 13, 2012, 12:51pm

One quick possible solution, from the top of my head:

Use a bigger LIMIT for the query extracting both user_id and id, a 100 would cover the worst case scenario, where every top player has top scores, make it a subquery, then group that by user in parent query to have only one player appearing for the top 10 positions.

markbrown4 · July 14, 2012, 12:50am

Thanks, but swampBoogie’s answer is correct.
In any case, I’ve changed it now so that when a score is saved if it’s the highest score it will be saved directly against the user now.
[sql]SELECT * FROM User ORDER BY max_score DESC LIMIT 10[/sql]
I think baking out calculations like this make sense and keep things faster / simpler in sql land.

system · July 14, 2012, 6:35am

Glad you sorted it out.

For the sake of the argument, the advantages of what I’m proposing are these:

for the inner query, that gets the highest scores, you don’t have to use the MAX() function, you just use ORDER BY total_score DESC LIMIT 100
for the outer query, that gets the distinct top 10, you now have a much smaller set, 100 records, to deal with

It needs some testing but I believe it should be faster.