mySQL how many Indexes to connect multiple tables with Foreign Keys to a single table

charles_i · September 12, 2012, 4:09pm

I am building a record collection database. I have a main Release table with Title, Artist, Format, Country, Date, Label, etc. - each of these will be a separate table with a Foreign Key connected back to Release. My question is, do all of the foreign keys in their separate tables connect to one Primary Key (id) in Release or do they each connect to a corresponding Index integer key in Release? And speaking of indexes - is it then necessary to create a separate index on all of the above which will be searched?

As for search it would be for example, search Artist (table will have first name, last name or group name) which should connect to that Artist’s releases. I don’t think a first name Artist search is necessary, so should the Indexes be in the Artist table, and say, the Label table rather than the Release table?

Thanks
Charles

rcashell · September 12, 2012, 4:38pm

Foreign keys allows you to have referential integrity of your data in the database. By this I mean if you were to add an entry that does not exist in a referenced table then it will fail.

If we look at your Release table (Title, Artist, Format, Country, Date, Label …), with database normalization techniques you could split this up based on duplicates that might occur in your data. For instance, a title would be unique but an artist may have one or more record releases, so we could create a separate table for Artist. Country likewise may have duplicate entries so we can create a separate table called country. The record label may also have duplicate entries and can be separated. I am not sure about format but I think could be separated.

So to your release table (artistid will have a foreign key constraint added against artist (id)). As you have mentioned above “each of these will be a separate table with a Foreign Key connected back to Release”, it is the other way around, the release table will have foreign key constraints to the Release table. The id in artist will be the primary key column of the artist table.

As for having indexes on the foreign key columns, this is particularly important, especially if updates are performed on the referenced tables i.e. artist where an id is updated. If there is no index on the Release(artist) column the check will require a full table scan. The table or all the rows will be locked for the duration of the operation which may take some time to perform. So having the indexes will help. At the same time the artist(id) field will be a primary key of the table. A primary key has a unique index automatically created where the primary key field(s) cannot have a null value.

charles_i · September 12, 2012, 5:41pm

I think your response is saying that I’m doing this correctly - a separately linked foreign key for each table. This is only a portion of my design - I’ve uploaded a jpg of the entire mySQL Workbench layout here:

http://fanoss.ca/records-db.jpg

What is still missing is a “lookup table” for many-to-many relationships, but going through it I can see only a few of those -(vinylInnerSleeve, VinylInserts, CDInsert, TapeInserts, Relevance, Instruments, Vocals, Collaborator, and Role).

As you can see this is quite a complicated plan and I just want to make sure I’m not doing anything horribly wrong before I start inputting.

Let me know if the Release table structure is correct and if there is anything else that won’t work.

Thanks!

Dr_John · September 14, 2012, 3:43pm

You have created a large number of tables where you don’t really need them other than for a look-up. And even then you are doing it the wrong way. You are forcing a vast number of joins to get at data that is actually just an ordinary attribute of another table. These tables are very easy to spot, as they all follow the same pattern.

For example, type{idType, type} Firstly, you don’t need the integer based id, as I’m willing to bet that there are not two types with the same name but a different id. Type is in fact it’s own primary key, and so the idType is not needed. But to get at the value of the type (REALLY bad name for something in a database or program!) you must do a join, based on the idType, to retrieve the type value.

You do this dozens of times. So when you write a query it will be horrendously complicated with 13 unnecessary joins to retrieve the attributes of a release! You should be storing the actual attribute’s value in the release table, not a list of numbers, each forcing a join to find out what value it represents.

You will need a query with 8 joins to read the attributes of vinyl for example.

The tables that are wrong are so easy to identify, as they all have the same structure:-
Xyz {idXyz, Xyz}

A lookup table should just have this structure:-
xyz {xyz}

Then the value (xyz) will be in the table where you currently have your lists of foreign keys - a list of numbers - and thus no joins will be needed to retrieve the values of the attributes. Your current design will be a nightmare to retrieve data from.

Basically, you are under the total wrong misapprehension that every table needs an integer as the unique identifier for each row.
Wrong, because there are such things as natural keys - go read up on that.
But for example
country{id, name}
1, Scotland
2, England
3, France
4, Germany
5, Spain
6, Columbia

the country name is unique, it should be the primary key in that table (and primary keys are automatically an index, so no need to add one), the country look-up table ONLY needs to list the name, and the name will be stored as an attribute on some other table.
There is only one country called Scotland, it has no need of a number to identify it (although we all know it is the No. 1 country of course)

r937 · September 14, 2012, 4:27pm

brace yourselves, the elite programmers will be piping up any second…

“but… but… but… if i do that, then what about all the space i’m wasting???”

that argument might’ve made sense in the 1980s, when disk space was a thousand dollars a megabyte

nowadays you can buy a terabyte for fifty bucks

the real bottleneck is compute time, and unnecessary joins are a contributing factor