Multiple column index

rblon · January 26, 2010, 5:38pm

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use an index if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

Now, my question is how I should define the indexes if I want to execute the following queries efficiently:


SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col1=val1 AND col3=val3;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2 AND col3=val3;

An index on (col1, col2, col3) doesn’t work with the 2nd query. If instead I would define an index on (col1, col2) and (col1, col3) would that help the 3rd query?

r937 · January 26, 2010, 6:30pm

declare on index on (col1,col2,col3), which will be used by the first and third queries

then declare an index on (col1,col3) which will be used by the second

nope, mysql will use only one index per table

rblon · January 26, 2010, 6:57pm

ok but what if the situation is somewhat more complicated… Eg there is also a col4 and col5 which is sometimes used in the SELECT statement.

The only thing the queries have in common is that they alway start with


SELECT * FROM tbl_name WHERE col1=val1 AND ...

In the “single column case” (lets imagine col1 doesn’t exist) you would just put indexes on col2, col3, col4, and col5, respectively. However, in the multiple column case it seems you cannot extend this reasoning by putting indexes on (col1, col2), (col1, col3), (col1, col4), (col1, col5). But creating an index for all combinations doesn’t seem a good idea either.

r937 · January 26, 2010, 7:14pm

i believe you understand the situation, yes

rblon · January 26, 2010, 7:38pm

not yet actually.

I didn’t know that. But then it seems, that what I called the “single column case” can be extended.

Because in the “single column case” [-> col1 doesn’t exist, index on (col2), (col3), (col4), (col5)], the following statement only uses one index:


SELECT * FROM tbl_name WHERE col2=val2 AND col5=val5

which is the same in the “multiple column case” [-> index on (col1, col2), (col1, col3), (col1, col4), (col1, col5)]


SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2 AND col5=val5

SpacePhoenix · January 27, 2010, 6:47am

rblon, do you need all the fields of the table in the results set? If not, specify the fields in the SELECT clause. It will save resources (memory) as you want be grabbing data that is not needed.

For example
Say each field is approx 1MB (per record)and you have 20 fields in the table, by doing a SELECT star you’ll be grabbing 20MB approx (1MB x 20 fields) per record which matchs. But say you only needed 4 fields, by specifying them in the SELECT clause instead of using SELECT * you’ll only be grabbing 4MB approx (1MB x 4 fields) per record which matches.

It might also be more efficient to specify all the fields wanted, even if you want them all, which is something that I’ll have to test at some point.

rblon · January 27, 2010, 8:37am

SpacePhoenix, I am aware of that. I was just extending an example from the MySQL manual about indexing.

Personally, in that case I would also specify the field names. It makes the program more readable.

rblon · January 27, 2010, 10:12am

once more, regarding this statement:

According to the manual:

If separate single-column indexes exist on col1 and col2, the optimizer will attempt to use the Index Merge optimization (see Section 7.2.6, “Index Merge Optimization”)

That suggests MySQL will try to use more than one index per table. But is merging also attempted for multiple column indexes?

r937 · January 27, 2010, 11:52am

i think what you should do is move away from the col1, col2, col3 hypothetical scenario, and actually run some tests using EXPLAIN to see what it actually does

rblon · January 27, 2010, 4:08pm

Ok, here we go… I have used the following test program to make two tables (similar to the example we have been discussing):


<?php
// Initialize database connection
$con = ...

// Create tables
$sql = "CREATE TABLE single (
  col2 INTEGER,
  col3 INTEGER,
  rowid INTEGER,
  INDEX (col2),
  INDEX (col3)
) ENGINE=MyISAM";

mysql_query($sql, $con);

$sql = "CREATE TABLE multiple (
  col1 INTEGER,
  col2 INTEGER,
  col3 INTEGER,
  rowid INTEGER,
  INDEX (col1, col2),
  INDEX (col1, col3)
) ENGINE=MyISAM";

mysql_query($sql, $con);

/*
 * Table single will have 100,00[COLOR="Red"]1[/COLOR] rows
 * col2 is a random number between 1 and 100
 * col3 is a random number between 1 and 20
 * Table multiple is a (double) copy of single, 
 * with col1 of 1 and 2, respectively
*/

$seed = 100;
srand($seed);

for ($i = 0; $i <= 100000; $i++) {
  $col2 = rand(1, 100);
  $col3 = rand(1, 19);  [COLOR="Red"]// Edit: should have been 20, but 19 used anyway[/COLOR]

  $sql = "INSERT INTO single 
            (col2, col3, rowid) 
          VALUES 
            ($col2, $col3, $i)";

  mysql_query($sql, $con);

  $sql = "INSERT INTO multiple 
            (col1, col2, col3, rowid)
          VALUES
            (1, $col2, $col3, $i),
            (2, $col2, $col3, $i)";

  mysql_query($sql, $con);
}
?>

Now, the following statements each return (the same) 59 rows


SELECT rowid FROM single WHERE col2=71 AND col3=5;
SELECT rowid FROM multiple WHERE col1=1 AND col2=71 AND col3=5;

I attach a screenshot of the EXPLAIN for both statements.

I see that the first query uses only key. This is col2, which makes sense as that returns the smallest number of rows. However, the EXPLAIN output says 1252 rows, while “SELECT rowid FROM single WHERE col2=71” only returns 1009 rows. Also, I thought for the subquery (col3=5) some kind of index merging would happen, but that doesn’t seem to be the case. So I guess, MySQL is going through all 1252 (or 1009) rows.

The second query does use index merging but I am not sure how to further interpret the EXPLAIN output. More specificially, how to read “possible_keys: col1,col1_2”?

Any help would be much appreciated.

r937 · January 27, 2010, 5:01pm

you’re definitely on the right track to becoming a mysql internals expert, a path i myself have chosen not to go down, so i don’t think i can help you further

all i know is that “possible keys” means the indexes that were available to choose from

rblon · January 27, 2010, 10:48pm

hmm, that’s a shame, but fair enough…