I have a table with around 300,000 rows. I am trying to count the number of items listed in a primary category (cat1level1) or second category (cat2level1), for example:
SELECT COUNT(*) as num FROM items USE INDEX (cat1level1,cat2level1) WHERE (cat1level1 = '3' OR cat2level1 = '3')
1st and 2nd categories are indexed columns. Category “3” is very popular and has around 200,000 items within it (two thirds of all the items in the table). The count query itself takes around 2-4 seconds to complete when counting items from category “3”. Whereas in less-popular categories with only 1000 items, it only takes a more reasonable 0.03 seconds to count.
Does anyone know why this is happening and is it normal? Is there a solution to speed up this count in case this popular category grows to contain a million items?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE items index_merge cat1level1,cat2level1 cat1level1,cat2level1 2,2 NULL 2 Using sort_union(cat1level1,cat2level1); Using whe…
Also, the two columns are two separate indexes, in hopes that both would be used in an “OR” statement.
What would be the best way of handling this then for the fastest count? Would you recommend having 2 separate queries, and then add the two counts?
For instance:
SELECT COUNT() as num FROM items USE INDEX (cat1level1) WHERE (cat1level1 = ‘3’)
SELECT COUNT() as num FROM items USE INDEX (cat2level1) WHERE (cat2level1 = ‘3’)
if i’m doing it, i would never tell the optimizer which index to use, except in those circumstances where it decided to use an index that wasn’t anywhere near efficient
i am of the strong opinion that the mysql optimizer engineers are a lot smarter than i am
I did a little more reading and I think newer versions MySQL are capable of using more than 1 index on a table in a single query.
Something is off with the optimizer. When I remove the ‘USE INDEX’ on the query on a category that contains 200,000 items, it does cut the time down a bit. However, if I query a count of the items in a less-used category with only 1000 items, it takes 4 seconds… whereas if I included the ‘USE INDEX’ in the query, it would only take 0.01 second.
i’m not a DBA but isn’t there something you can do like “update stats” so that the optimizer knows the true counts, and realizes that 1000 items out of 200K requires an index