Big tables versus many smaller tables

Hey Sitepoint,

Just a generic question about tables.

If I had a table that 10 million entries in it, and there was a field in the table called “category” which housed a unique number from another table called categories, would it be faster for selecting records from this table via specific categories if the table was split apart into multiple tables via category?

So say something like this is the original.

item_table[]
 - item_id, category_id, item_name

Then instead we create a bunch of these instead

item_table_category_id[]
 - item_id, item_name

I’m thinking it might be faster to split the tables apart because it has less data to sort through, but then again it might be slower if you wanted to select data from multiple tables at the same time?

Any ideas?

One table, and the right indexes

Could you please clarify for me what the “right” indexes means?

that depends on the query

for “selecting records from this table via specific categories” the right index would be on category_id

So is that something I set in the database? Or do you just mean when writing the query?

Something like

select count(*) from listings where category_id = "23"

In the database.
You can create the indexes when you create the table (see MySQL :: MySQL 5.5 Reference Manual :: 12.1.14 CREATE TABLE Syntax), or you can add them later (see [url=http://dev.mysql.com/doc/refman/5.5/en/create-index.html]MySQL :: MySQL 5.5 Reference Manual :: 12.1.11 CREATE INDEX Syntax)

Please excuse my extreme novice-ness but I have no clue what most of that page says.

Could you please explain to me what an Index in a table does and how it differs from a Primary Key?

A primary key is one sort of index - you can add others so as to make the data more quickly available by other fields. Without an index on the field you are searching on it needs to go through all the records to find the ones that match. With the index it can basically access them directly via that field because the index lists all the values in order for that field and points to the corresponding records.

I see, I had no idea that’s what an index did. Why wouldn’t I just make every field an index then lol?

Because indices slow down UPDATE and INSERT queries. This is because MySQL uses B-TREEs for the indices, and needs to make sure these trees remain balanced for optimal performance. It may well be that one UPDATE/INSERT query causes the tree to become unbalanced, in which case it needs to be re-ordered, which can take a few seconds up until a few hours, depending on the number of rows. Of course the more indices there are, the more B-TREEs MySQL needs, the slower UPDATEs and INSERTs will be.