IN -or- OR (that is the question)

Hi Everybody!

I was cruising through my slow query log earlier and one of my larger queries occurs in the log file occasionally. So I’m trying to work to optimize it a bit. One part of the query that I was pretty sure was going to be bad when I wrote it is a long OR statement.

This is preceded by a couple of AND statements. There could possibly be several hundred of these 'OR’s


( sitedone_dealer.inventory.id = '0'  OR sitedone_dealer.inventory.id = '36'  OR sitedone_dealer.inventory.id = '58'  OR 
sitedone_dealer.inventory.id = '59'  OR sitedone_dealer.inventory.id = '60'  OR sitedone_dealer.inventory.id = '61'  OR 
sitedone_dealer.inventory.id = '62'  OR sitedone_dealer.inventory.id = '63'  OR sitedone_dealer.inventory.id = '64'  OR 
sitedone_dealer.inventory.id = '65'  OR sitedone_dealer.inventory.id = '66'  OR sitedone_dealer.inventory.id = '67'  OR 
sitedone_dealer.inventory.id = '68'  OR sitedone_dealer.inventory.id = '69'  OR sitedone_dealer.inventory.id = '70'  OR 
sitedone_dealer.inventory.id = '71'  OR sitedone_dealer.inventory.id = '72'  OR sitedone_dealer.inventory.id = '73'  OR 
sitedone_dealer.inventory.id = '74'  OR sitedone_dealer.inventory.id = '75'  OR sitedone_dealer.inventory.id = '76'  OR 
sitedone_dealer.inventory.id = '77'  OR sitedone_dealer.inventory.id = '78'  OR sitedone_dealer.inventory.id = '79'  OR 
sitedone_dealer.inventory.id = '80'  OR sitedone_dealer.inventory.id = '81'  OR sitedone_dealer.inventory.id = '82'  OR sitedone_dealer.inventory.id = '83'  OR sitedone_dealer.inventory.id = '84'  OR sitedone_dealer.inventory.id = '85'  OR sitedone_dealer.inventory.id = '86'  OR sitedone_dealer.inventory.id = '87'  OR sitedone_dealer.inventory.id = '88'  OR sitedone_dealer.inventory.id = '89'  OR sitedone_dealer.inventory.id = '90'  OR sitedone_dealer.inventory.id = '91'  OR sitedone_dealer.inventory.id = '92'  OR sitedone_dealer.inventory.id = '94'  OR sitedone_dealer.inventory.id = '95'  OR sitedone_dealer.inventory.id = '96'  OR sitedone_dealer.inventory.id = '97'  OR sitedone_dealer.inventory.id = '98'  OR sitedone_dealer.inventory.id = '99'  OR sitedone_dealer.inventory.id = '100'  OR sitedone_dealer.inventory.id = '102'  OR sitedone_dealer.inventory.id = '103'  OR sitedone_dealer.inventory.id = '104'  OR sitedone_dealer.inventory.id = '105'  OR sitedone_dealer.inventory.id = '106'  OR sitedone_dealer.inventory.id = '107'  OR sitedone_dealer.inventory.id = '108'  OR sitedone_dealer.inventory.id = '109'  OR sitedone_dealer.inventory.id = '110'  OR sitedone_dealer.inventory.id = '111'  OR sitedone_dealer.inventory.id = '164'  OR sitedone_dealer.inventory.id = '167'  OR sitedone_dealer.inventory.id = '198'  OR sitedone_dealer.inventory.id = '199'  OR sitedone_dealer.inventory.id = '200'  OR sitedone_dealer.inventory.id = '202'  OR sitedone_dealer.inventory.id = '203'  OR sitedone_dealer.inventory.id = '204'  OR sitedone_dealer.inventory.id = '205'  OR sitedone_dealer.inventory.id = '206'  OR sitedone_dealer.inventory.id = '207'  OR sitedone_dealer.inventory.id = '208'  OR sitedone_dealer.inventory.id = '209'  OR sitedone_dealer.inventory.id = '210'  OR sitedone_dealer.inventory.id = '211'  OR sitedone_dealer.inventory.id = '212'  OR sitedone_dealer.inventory.id = '213'  OR sitedone_dealer.inventory.id = '214'  OR sitedone_dealer.inventory.id = '215'  OR sitedone_dealer.inventory.id = '216'  OR sitedone_dealer.inventory.id = '217'  OR sitedone_dealer.inventory.id = '218'  ) 

Would this portion of the query be better served by something like this?


OR sitedone_dealer.inventory.id IN(1,4,6,87,34,23,676,9877,45,23,....etc)

I’m open to any suggestion or advice.

Thanks in advance for any guidance!

Using IN for that could be more efficient an certainly couldn’t be less efficient. The use of IN lets the database know that all those comparisons are on the one field far more clearly than using all the OR statements does and so the dtabase may be able to come up with a more efficient way of accessing the data.

It would also make the code easier to read regardless of whether it makes the text more efficient.

The query optimizer should end up producing the same query from either one. I would bet the IN() gets rewritten to a bunch of OR’s when it’s translated to the intermediate query language the individual storage engine gets passed. Some quick testing and checking EXPLAIN estimates seems to verify there’s no speed difference.

So there’s no technical difference between the two, do whichever is easier to code or more readable for you.

I agree. It would be a rather stupid optimiser that didn’t. The point I was attempting to make was that if there were to be a difference between the two that the difference would have to give the advantage to the IN version as that is more specific - you are comparing one field to a collection of values rather than just doing a collection of comparisons.

The IN version should produce shorter code which should therefore be easier to read.

Of course if you are dynamically generating the query then the OR version may be simpler to code.

I disagree. At least in PHP the IN queries are easier to code.

Consider


$ids = array(1,2,3);
$where = 'somefield=' . implode(' OR somefield=', $ids);
$query = 'SELECT something FROM some_table WHERE '.$where;

versus


$ids = array(1,2,3);
$where = 'somefield IN (' . implode(',', $ids) . ')';
$query = 'SELECT something FROM some_table WHERE '.$where;

The difference is in the $where =, where I find the second one more easy to read.

Another advantage of IN over OR is that the query is shorter, so you are less likely to run into the maximum query length.
Although this is slightly far-fetched it’s still true :slight_smile:

Another advantage of IN over OR is that it can be added to the WHERE with an AND, so you don’t have to worry about adding ( and ) around the conditions :slight_smile:

Dang, this is my third attempt at responding. The site keeps saying my security token is missing?

Anyways, the query is already coded and working with the OR statement so if there’s no performance gain I won’t change it. I guess I need to EXPLAIN the entire query and try to find out where the bottleneck is.

Many Thanks for the input!

That depends on how your code is written to generate the query - I said that using OR may fit better with the way you are generating the code in some instances.

In this case where the OP already has the code working there is no reason for changing it just for the sake of converting to use IN.

Can’t speak to MySql but for MSSQL, IN() gets rewritten as a temp table and a join in general, which is hella fast.

Could it be worth experimenting with using BETWEEN where the id are +1 from the previous? (and keep doing ORs or IN() on those that aren’t)

Instead of doing this:

id = 1 OR id = 2 OR id = 3 OR id = 4 OR id = 5 OR id = 6 OR id = 7 OR id = 8

do this:

id BETWEEN 1 AND 8

Might be worth a try, although I suspect the query optimizer will end up doing the exact same query as with ORs and IN().

that’s awesome

but what does MSSQL do for the series of OR conditions?

Are those ID’s hard coded? If they’re coming from another query, you could use a Subquery or Join to speed up to code.

It seems a bit strange to me to have written a query like that. Where do these IDs come from? If they just happen to be all the existing IDs in the database, then I think you’re doing it wrong. If they are from a known subset with a common feature, again you’re doing it wrong.

I too wonder if a join is the correct answer.

Logically, in is the same as a big list (X or Y or Z . . )

so, logically, it creates a temp table in that case too?

Can’t really say, not horribly familiar with the internals of the MySql query parsing and optimization. Or if MySql even has in-memory temp tables these days.

no, i was asking about msqql, you said mssql creates a temp table for an IN list, and i was asking whether mssql creates a temp table for a series of OR conditions as well

i realize that this is a mysql thread but the mssql example is relevant in that it would show how different sql can lead to different execution paths…

Gotcha. Answer is that I think it optimizes both clauses the same under the hood at the end of the day.

This would all depend on the internal of MySQL and that could change from version to version too in theory. That is a reason that I sometime write code a way I feel the “right way” (which can be subjective!), unless there is a real need to optimize for a specific version or a real need to optimize over the reading quality for example. The “right way” might get a good chance overtime to be optimized as well if not already.

I would personally be more interested in the comparison of performance with EXISTS and IN with a subquery on the same table with an equivalent condition (at least I have noticed difference with some DBMS in the past even if it was the same condition). But that is another topic.

However the “IN” version is also shorter in length. Therefore less data to send to the MySQL server. If it was an heavily used query, it could improve the performance for that part of the process.

Another thought, you could as well test it yourself with a lot of your OR condition. Make sure when you run your query to use SQL_NO_CACHE, else they might both run at the same speed because the result has been cached.

You can also analyze the query with DESCRIBE.

I personally tried a simple version on my side and both the “multiple OR” and the “IN” versions had the same execution plan with a “range” search type. So there is a good chance they will be optimized the same way by the MySQL server.

You might let us know what you find. :wink:

A JOIN may very well be the most proper solution. I’ll look into this further and let you guys know the result.

The OR is composed of a SELECT all from a table that holds only the id’s used in the OR statement. The id’s in this table relate to a separate database full of products. Each dealer only has access to a certain number of products in the main product database.

I’ll have another look at the query and see if a JOIN is the more proper solution.