Add indices for an advanced search option

DrQuincy · November 13, 2013, 11:22am

I’ve taken over a database that lists products and there is currently around 6k records but it grows by about 2k per year. Broadly speaking, each product has a single category field (GUID) and then six other fields relating to other areas of products.

When you click on a page it shows a category so I thought it prudent to index that although to be fair it seems to run fine without. The cardinality of the category column is about 60.

There is an advanced search where you can then optionally choose to filter the other six fields so you always filter by category the other six fields are optional and could be in any number or combination.

The noteworthy point is there is an int flag called status of 0, 1 or 2 corresponding to invisible, for sale, archived respectively.

Databases are my weak point so forgive these questions if they’re stupid:

With MySQL being quite fast, is there a rule-of-thumb for when to add an index in terms of number of records? Was I right to add one at ~6,000 records?
Am I right in saying the index on the category effectively separates the ~6,000 records into ~60 different sections. If we assumed the categories were evenly spread, performance-wise it’s almost as good as querying a single table of 100 records as the index means MySQL “knows” where the correct 100 are. Is that how it works? In which case, there is no need to index any of the other six fields, right?
I’ve read that an index on the status field would be a bad idea since the cardinality is 3. Why is this? Again, is there any rule-of-thumb as to when to add an index?

Generally speaking, with MySQL are you okay up to 10,000 records even on a shared server?

Thanks!

r937 · November 13, 2013, 2:03pm

my rule of thumb is to run the query without the index, make note of stats, add index and re-run for new stats, if no improvement then drop index

and, of course, EXPLAINs on the before- and after-index queries

more or less, yes, but there are index reads in addition to row reads, and those rows might be spread across the dataset, making them more expensive in terms of physical-vs-logical reads

you’re probably right that there is no need to index any of the other six fields, because presumably you want to return the entire product row, thus ruling out a covering index

because the additional number of index reads added to row reads, compared with row reads of the entire table (table scan), are prohibitive

DrQuincy · November 14, 2013, 11:38am

Ah, that makes sense, thanks for that. I am learning to use EXPLAIN to am getting there.

Very interesting and clever stuff!

vinny42 · November 19, 2013, 11:18pm

my rule of thumb is to run the query without the index, make note of stats, add index and re-run for new stats, if no improvement then drop index

Remember to disable the querycache and to test the index’s performance through a loadtest. Don’t go by one-of measurements.

[quote]
Generally speaking, with MySQL are you okay up to 10,000 records even on a shared server?

[quote]

10.000 records is tiny and the performance you are seeing now is probably due to the fact that the entire database fits into memory, eliminating most of the drawbacks of sequential scans. Databases are generaly clever enough to know how much of their data they can expect to find in cache and can choose not to use an index even if the querylogic says the index could be beneficial.