Junction Tables, Indexes, and Foreign Keys

Smola · November 23, 2010, 9:10pm

Don’t worry! I know what all of the above are! I have a technical question (or two).

I am going through a client’s database and optimizing the tables as best as I can. Some junction tables exist to link certain tables together. Let’s say I have a junction table with two columns (some_id, another_id [and in that order]) that pairs primary keys of two other tables and so are, in effect, foreign keys. This junction table is obviously used in joins so both columns are being used whenever this junction table is referenced. The rows should be unique, so there is a composite unique index on both columns (which can be removed if necessary of course). I am aware that MySQL can now use this index if a WHERE clause references both columns or just the first (some_id) column. Is that enough to be optimized for tables that store large amounts of data? I know how to optimize statements and tables for WHERE clauses, but JOINs are another story. I read the docs, searched here, and searched on Google, and could not find any resources that solved my specific problem(s).

Problem 1: Creating a foreign key also creates an index on a column if it doesn’t already exist right? So after explicitly defining each column as a foreign key, I’d have a composite index on both columns, and single indexes on each individual column? That sounds like overkill.

Problem 2: I want to optimize for JOINs as I mentioned earlier. If I am wrong about the behavior described in problem 1, then do I need to create an index on the second column? That way the composite index handles queries referencing (some_id) and (some_id, another_id) and then I’ll have the single index just on the second column.

Assume that there is a fair amount of data retrieval going on and that my queries are optimized to reference (some_id) before (another_id). Also, redesigning the database structure is not an option (for this project).

r937 · November 23, 2010, 9:21pm

declaring the FK will create an index automatically (it didn’t used to) but only if a suitable index does not already exist

CREATE TABLE junction
( key1 INTEGER NOT NULL
, key2 INTEGER NOT NULL
, PRIMARY KEY (key1,key2)
);

that;s the way you normally do it

a query which supplies a value for key1 would ustilize the index, and if you were to declare a FK on key1, it could use the PK index and wouldn’t create another one

i typically also add a second alternate index on ( key2,key1 ), and thus the FK on key2 could use that one

why the two-column alternate index instead of just an index on key2 itself? because then it’s a covering index (which you could google to see why this is a good idea – queries don’t actually have to read the table rows at all!)

in any case, you would see the FK indexes that mysql creates by default if it needs to, they are not “secret” indexes or anything, so you should be able to see if the index is redundant

Smola · November 24, 2010, 12:42am

Good stuff! Let me spin this back to make sure I understand…

I was right about MySQL creating indexes automatically if one doesn’t exist, but what I didn’t know was that even though a foreign key can exist on just the first column (key1), the PK index can be used as opposed to MySQL implicitly creating one because it covers queries on key1 and key1, key2. Correct?

From the name, and knowing how composite indexes are used by MySQL, I’m guessing your second index which reverses the order of the PK is called a covering index because after it is defined, all possible combinations of columns are indexed or covered (in this example). The PK covers key1 and key1, key2 while the reversed index covers key2 and key2, key1. Is that right? That is genius! You have only created two indexes and their cardinality is the same as if I declared a singular index on key2!

I had planned on it anyway before making this post since I received a client’s payment today, but I’m going to buy your book right now. Unless a newer edition is about to be released…?

r937 · November 24, 2010, 1:09am

the term “covering” refers to the queries – a query is “coverd” by an index if all of the columns required for the query can be satisfied off the index, hence the index covers the query

the index doesn’t necessarily have to cover (all the columns in) the table in order to be a covering index for a query, it just has to cover all the columns that the query needs

of course, if an index actually contains all the columns in the table then there’s a very good chance that it will be a covering query for just about any query

and of course we need both 2-column indexes, because with only one, it would be possible for a query to require the columns in the other order, and hence a table scan would result

Smola · November 24, 2010, 1:22am

Ahhh well I was close…sort of, haha. Thanks for the responses!!

Smola · November 25, 2010, 5:46am

Ok, more on this topic…

Hypothetical question: Do you ever create a junction table with three columns? If so, what kind of index structure could you set up on that? (key1, key2, key3), (key3, key2, key1), and (key2)? Does this depend on the order of the joins? The order of the WHERE clause applies as well correct?

r937 · November 25, 2010, 10:18am

it depends on the types of queries i expect to run using that table

no sense indexing something if you don’t need to