I am trying to optimize a query for selecting all the feeds related to a particular tag(tagid) then ordering them by Score. My tag table is around 100 million rows and my feed table is around 3 million rows.
This is what i have:
select feeds.FeedID, feeds.FeedName, feeds.Score
inner join tags on feeds.FeedID = tags.FeedID
where tags.TagDirID = '123456'
order by feeds.Score desc
However, when i do an explain on it i am getting this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tags ref FeedID,TagDirID TagDirID 5 const 228120 Using where; Using temporary; Using filesort
1 SIMPLE feeds eq_ref PRIMARY PRIMARY 4 FeedID 1
When I remove "order by Score" i lose the filesort and the query is very quick(<1 second), with it the query is around 45 seconds. I have an index on feeds.Score but it doesn't appear to be being used when I try to sort on it. Any ideas how to speed this up? I read some stuff about leading tables and indexes but i couldn't figure out how to fix it.
Here's what the table structure is like:
CREATE TABLE `feeds` (
`FeedID` int(8) NOT NULL AUTO_INCREMENT,
`FeedName` varchar(100) DEFAULT NULL,
`Score` decimal(3,1) DEFAULT NULL,
PRIMARY KEY (`FeedID`),
KEY `Score` (`Score`)
CREATE TABLE `tags` (
`TagID` int(10) NOT NULL AUTO_INCREMENT,
`TagDirID` int(10) DEFAULT NULL,
`FeedID` int(8) DEFAULT NULL,
PRIMARY KEY (`TagID`),
KEY `FeedID` (`FeedID`),
KEY `Status` (`Status`),
KEY `TagDirID` (`TagDirID`)
what exactly is TagDirID? what's it for?
It's the foreign key for the actual tag and it's properties in a different table which I call the tag_directory. The table tags is the table that contains all the tags associated with each feed. Each feed can have the same tag applied more than once and each tag can be assigned to multiple feeds.
I could(and do) do a join on the tag_directory and just search for all feeds tagged with "car" but I thought this way would simplify things a bit.
okay, i think i understand
by searching for 12345 (the tag id) rather than 'car' (the tag) you've eliminated one table from the query
but you will still have the problem of searching by one index and sorting by another
your options are...
1) leave it as is, and accept "italait" (it takes as long as it takes)
2) don't sort
What do you mean by having the problem of searching by one index and sorting by another?
Along a different track, are there any settings in my.cnf that I can change regarding memory allocations/buffers that will improve sort times?
your 45-second slow query time is because the search results (by tag_id) need to be saved in a temporary file and then this temporary file is sorted into a different sequence (score)
if you remove the ORDER BY, you see the impressive speed improvement caused by not having to create the temporary file
consider now a query which has something like WHERE datecreated >= '2010-01-01' along with ORDER BY datecreated
this is an example of using an index for searching, and also for sorting -- this one wouldn't require a temporary file