MySQL fulltext relevancy

I usually copy all text from appropriate columns into a single column and fulltext index that. Do you think that’s the best way to do it (for relevancy) or should each column have its own index?

For example, if I had a blog table I would copy the title and a plain text version of the post to a single indexed column. Would you do this or index the title and the plain text post separately?

You would think you’d have a single inverted index for the whole blog stuff. The index would need to know the structure of the data, yes: attributes and fields. So that way, one index can be searched and if it finds matches it will know if the match was in a title, or in a body, or both.

-fields are the names of the “cells” in your rows that have plain text that you’re indexing. When indexed by an indexer, they should have metadata stating which “row” or document they belong to.
-attributes are either “cells” that have non-text things that you may still want to search (price ranges or dates for example) or you can also label your fields with attributes (column “names” like “title”, “body”). With attributes, you can use weighted queries: if some word appears in both title and body, but is more relevant if it’s in the title, you can set more “weight” to the title attribute so those results are closer to the top of the result set.

There is an ANALYZE command you can run where your DB can see how much of what kind of data it has (note if your db is ginormous, you only want to analyze a section!), and there should be built-in search that can use that to make a single inverted index (also probably a b+ tree/search tree which it uses in queries to find ranges of things or strings of nodes next to each other). It may be treated like another “column” but you shouldn’t need to literally make a new column for FTS.

Having a single index for a small amount of data would be the most efficient: your queries have just one place to go look. If you have lots and lots of data, that’s when you’d split up your indices (usually into ranges or chunks).

Which DB are you using? Have you considered a dedicated little program for FTS?

Thanks for your reply. I am using MySQL and the options for fulltext are very limited. The techniques you describe sound more advanced than what I’m used. Are they more for enterprise level?

I suppose, still using a blog as an example, you would want to place more weight on occurrences in the title. Is this possible with MySQL fulltext?

What do you class as small amounts of data? I typically work with tables under 10k records. I only really do sites for SMEs.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.