Okay, so I’ve been reading several chapters in the book I got for a databases course at the university, but I couln’t find any reference to IN in that book whatsoever.
So I went to the MySQL documentation and found the following:
expr IN (value, …)
Returns 1 if expr is equal to any of the values in the IN list, else returns 0. If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.
You can find this here.
I did some experimental testing on a toy table, created by the SQL below, to investigate this.
CREATE TABLE `test` (
`id` int(10) DEFAULT NULL,
`some_char` char(1) DEFAULT NULL,
KEY `id_index` (`id`),
KEY `some_char_index` (`some_char`)
) ENGINE=MyISAM
INSERT INTO test(id,some_char) VALUES
(1, 'j'),
(2, 'i'),
(3, 'h'),
(4, 'g'),
(5, 'f'),
(6, 'e'),
(7, 'd'),
(8, 'c'),
(9, 'b'),
(10, 'a')
(I first tried this table with a primary key on id
, but then MySQL seems to sort results by id
, which defies the purpose of what I’m about to show).
Now consider the following query
SELECT id,some_char FROM test WHERE some_char IN ('b','a');
The result is
id some_char
------ ---------
10 a
9 b
And the EXPLAIN for this query is:
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ --------------- --------------- ------- ------ ------ -----------
1 SIMPLE test range some_char_index some_char_index 2 (NULL) 2 Using where
So, MySQL indeed seems to have sorted the values of the IN values list (‘a’ is first in the result, ‘b’ second, in my IN values list I asked them other way around).
However, in two seperate cases MySQL was not sorting the values.
First case
The query
SELECT id,some_char FROM test WHERE some_char IN ('b','a','c');
Returns the result
id some_char
------ ---------
8 c
9 b
10 a
And the EXPLAIN of this query:
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ --------------- ------ ------- ------ ------ -----------
1 SIMPLE test ALL some_char_index (NULL) (NULL) (NULL) 10 Using where
So instead of using the index, MySQL seems have to fallen back to a full table scan.
Interestingly, when we perform the following query on the table
INSERT INTO test(id,some_char) VALUES (11,'z'), (12,'z'), (13,'z')
To give the table a 13 tuples, and perform the same query, we get the result
id some_char
------ ---------
10 a
9 b
8 c
and the EXPLAIN of this query now is:
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ --------------- --------------- ------- ------ ------ -----------
1 SIMPLE test range some_char_index some_char_index 2 (NULL) 3 Using where
So now MySQL is using the index again.
(For 11 and 12 tuples MySQL also uses a tablescan, for >= 13 tuples it using the index again.)
So MySQL appears to be considering whether sorting the values of the IN values list would be faster than just doing a tablescan.
Second case
When we drop the index some_char_index
and perform the same queries (still with 13 tuples) we get
Query:
SELECT id,some_char FROM test WHERE some_char IN ('b','a');
Result:
id some_char
------ ---------
9 b
10 a
Explain:
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ------------- ------ ------- ------ ------ -----------
1 SIMPLE test ALL (NULL) (NULL) (NULL) (NULL) 13 Using where
And for the other query.
Query:
SELECT id,some_char FROM test WHERE some_char IN ('b','a','c');
Result:
id some_char
------ ---------
8 c
9 b
10 a
Explain:
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ------------- ------ ------- ------ ------ -----------
1 SIMPLE test ALL (NULL) (NULL) (NULL) (NULL) 13 Using where
So, when there is no index on the some_char column, MySQL does not sort the values of the IN values list and performs a tablescan.
Now to get back to the original problem, when I rewite all IN’s to OR’s (so some_char IN (‘a’,‘b’) becomes some_field=‘a’ OR some_field=‘b’) the results are EXACTLY the same.
This would suggest (but doesn’t prove) that MySQL is rewriting the queries using IN and the queries using OR’s to the same plan.
For InnoDB the results are basically the same, but instead of 13 tuples you need 17 tuples to make MySQL use the index when looking for 3 different values.
DISCLAIMER: All these results are experimental and should be viewed as such. I’m not proving anything, just showing the results of some experiments. Everyone can draw their own conclusions.