Optimizing Tag Select/Order By

brsmu80 · January 7, 2010, 10:15pm

I am trying to optimize a query for selecting all the feeds related to a particular tag(tagid) then ordering them by Score. My tag table is around 100 million rows and my feed table is around 3 million rows.

This is what i have:

select feeds.FeedID, feeds.FeedName, feeds.Score 
from feeds        
inner join tags on feeds.FeedID = tags.FeedID
where tags.TagDirID = '123456' 
order by feeds.Score desc 
limit 500

However, when i do an explain on it i am getting this:

id     select_type     table     type     possible_keys     key     key_len     ref     rows     Extra
1     SIMPLE     tags     ref     FeedID,TagDirID     TagDirID     5     const     228120     Using where; Using temporary; Using filesort
1     SIMPLE     feeds     eq_ref     PRIMARY     PRIMARY     4     FeedID     1

When I remove “order by Score” i lose the filesort and the query is very quick(<1 second), with it the query is around 45 seconds. I have an index on feeds.Score but it doesn’t appear to be being used when I try to sort on it. Any ideas how to speed this up? I read some stuff about leading tables and indexes but i couldn’t figure out how to fix it.

Here’s what the table structure is like:

CREATE TABLE `feeds` (
  `FeedID` int(8) NOT NULL AUTO_INCREMENT,
  `FeedName` varchar(100) DEFAULT NULL,
  `Score` decimal(3,1) DEFAULT NULL,

  PRIMARY KEY (`FeedID`),
  KEY `Score` (`Score`)
)

CREATE TABLE `tags` (
  `TagID` int(10) NOT NULL AUTO_INCREMENT,
  `TagDirID` int(10) DEFAULT NULL,
  `FeedID` int(8) DEFAULT NULL,

  PRIMARY KEY (`TagID`),
  KEY `FeedID` (`FeedID`),
  KEY `Status` (`Status`),
  KEY `TagDirID` (`TagDirID`)
)

r937 · January 8, 2010, 12:47pm

what exactly is TagDirID? what’s it for?

brsmu80 · January 8, 2010, 7:20pm

It’s the foreign key for the actual tag and it’s properties in a different table which I call the tag_directory. The table tags is the table that contains all the tags associated with each feed. Each feed can have the same tag applied more than once and each tag can be assigned to multiple feeds.

I could(and do) do a join on the tag_directory and just search for all feeds tagged with “car” but I thought this way would simplify things a bit.

r937 · January 8, 2010, 7:32pm

okay, i think i understand

by searching for 12345 (the tag id) rather than ‘car’ (the tag) you’ve eliminated one table from the query

but you will still have the problem of searching by one index and sorting by another

your options are…

leave it as is, and accept “italait” (it takes as long as it takes)
don’t sort

brsmu80 · January 8, 2010, 9:33pm

What do you mean by having the problem of searching by one index and sorting by another?

Along a different track, are there any settings in my.cnf that I can change regarding memory allocations/buffers that will improve sort times?

r937 · January 8, 2010, 10:02pm

your 45-second slow query time is because the search results (by tag_id) need to be saved in a temporary file and then this temporary file is sorted into a different sequence (score)

if you remove the ORDER BY, you see the impressive speed improvement caused by not having to create the temporary file

consider now a query which has something like WHERE datecreated >= ‘2010-01-01’ along with ORDER BY datecreated

this is an example of using an index for searching, and also for sorting – this one wouldn’t require a temporary file