Which of these is better (and why)?

cydewaze · May 23, 2014, 12:33am

What’s the difference between:

SELECT ride_name, dist
FROM ridelog
ORDER BY dist DESC
LIMIT 1

and

SELECT ride_name, dist
FROM ridelog
WHERE dist = ( SELECT MAX(dist) FROM ridelog )

They both seem to give me the same thing, but I’m thinking the second one is faster because ORDER BY is slow.

r937 · May 23, 2014, 12:49am

wait, ORDER BY is slow?

and you think MAX() is faster?

what do the EXPLAINs say?

cydewaze · May 23, 2014, 2:00am

I could have sworn you told me that at one point. Although it may have been order by vs not using order by.

I don’t think anything (a common problem for me)

They don’t say anything for some reason. If I cfdump the queries, I get an execution time of “1” for each version.

The MAX version returns two records for the query (there are two SELECTs after all) and the LIMIT version returns one.

Mittineague · May 23, 2014, 2:19am

By “faster” what do you mean, are we talking 0.0000937 secs or something with potentially more impact?

cydewaze · May 23, 2014, 2:24am

I’m just wondering if there’s an advantage to using one method over another.

In a database with 22 records, I don’t think I have to much worry about performance. But someday I might have to do something similar at work, so I wanted to know the best way to do it.

felgall · May 23, 2014, 3:05am

Not using LIMIT would make the command more portable if you are thinking of switching to a different SQL database.

When it comes to the efficiency of different calls to receive the same info, it often depends on just how many records there are and what they contain as to which will be more efficient. Until such time as the two EXPLAIN statements give significantly different results there is no particular reason to switch.

r937 · May 23, 2014, 8:50am

it would also generate way too many rows

felgall · May 23, 2014, 10:03pm

Why would it? From what I can see both the alternatives being discussed produce one row of output - one using LIMIT and one using an alternative way of getting just the one row. That alternative would also work in databases that do not support the proprietary LIMIT command.

Please explain how SELECT ride_name, dist FROM ridelog WHERE dist = ( SELECT MAX(dist) FROM ridelog ) generate way too many rows.

r937 · May 23, 2014, 10:48pm

stephen, i was responding to your suggestion of leaving LIMIT off

of course the MAX subquery version will return one row…

or two, or three…

but not all

felgall · May 24, 2014, 1:41am

I never suggested leaving LIMIIT off. I suggested that a version that didn’t use LIMIT was more portable.

r937 · May 24, 2014, 2:45am

i bow to your hair-splitting acumen

felgall · May 24, 2014, 7:15am

I don’t understand this comment. We were discussing two specific alternative ways of coding an SQL call and I pointed out that using LIMIT makes the code less portable because it ties the code specifically to mySQL. I was simply pointing out that additional difference between the two versions beyond that of the comparative efficiency.

What does hair splitting have to do with that? Unless it has something to do with your misinterpretation of statements by not reading them in the context of what comes before. Does everyone have to quote everything that came before when making a comment in order for you to interpret the context of their comment properly?

Just for you - here’s the context of my earlier statement:

cydewaze:

What’s the difference between:

SELECT ride_name, dist
FROM ridelog
ORDER BY dist DESC
LIMIT 1

and

SELECT ride_name, dist
FROM ridelog
WHERE dist = ( SELECT MAX(dist) FROM ridelog )

cydewaze · May 25, 2014, 3:40am

For what it’s worth, I decided to use MAX() rather than LIMIT.

molona · May 26, 2014, 10:55am

I know I’m late and I don’t pretend to be an expert but my understanding would be that the first would be faster as using subqueries tends to be slower.

Haven’t said this, each database is a world on its own and the best thing would be to test each of the queries on both databases to know what works best because, yes, ordering is slow

r937 · May 26, 2014, 11:43am

so is scanning for max

cydewaze · May 26, 2014, 4:54pm

My problem is that I compare database functions to how I’d do the task myself.

If I had a stack of cards numbered 1 - 50, putting them in numeric order would take a certain amount of time. Picking the highest one would also require me to put them in order, then pick the top one (which is what I’d do with LIMIT 1) so from a human standpoint, they seem the same.

I don’t know how a database handles things though. It’s obviously faster at math than I am!

molona · May 27, 2014, 2:49pm

Off Topic:

So is that really faster? Or being more generic… is a subquery faster than ordering? Yes, you can call me ignorant

molona · May 27, 2014, 2:50pm

Off Topic:

But only a little bit faster

r937 · May 27, 2014, 3:24pm

suppose you have a table where some column has various values… like, say, a table of rides where each ride has a PK and also a column for the distance

how do you image MAX(dist) is found? wouldn’t the engine have to go through all the rows somehow? i can see it examining all the rows and keeping a variable which it replaces every time it finds a larger value

now imagine you had to return only one row from a result set which is sorted… wouldn’t it be easier, instead of sorting them all, simply to scan through and find the largest value, then return that row?

what i’m saying is, we can speculate all we want about what’s fast, and what’s faster, but in the end, the optimizer decides how to do what you ask it to do, and it might not actually sort if it doesn’t have to

if you want to know what’s fast and what’s faster, there is only one way – EXPLAIN