What is the best performance search query/method for huge number of records

Hello,

I need to know what’s the best way to search tables with million records. I have a table called <people> and this table has the column: <status>

Status might be more than one word like:

“I am happy today and the weather is nice”.

Also, it might be in different languages.

Records in table are represented by UTF-8.

I need to search for a sub-word, word, or even some words out of the whole sentence like for example (according the status above):

Search 1: keyword = “ppy”

Search 2: keyword = “am weather”

Search 3: keyword = “nice”

Search 4: keyword = “day weath”

I would greatly appreciate if you hint me to the best method to apply a robust search. As far as I know using “LIKE” is not practical for huge records. I heard about the full text index but never used that.

Thank you very much!

Cheers,

you should try it

I will go for the full text index search. I would appreciate if you clear the right equivalent statement in full text search for the below:

$sql = ’ SELECT people.status, people.id FROM people WHERE
people.status LIKE “%‘.$keySearch.’%” AND people.id != “‘.$this->myId.’” ORDER BY people.time ASC LIMIT '.$limit;

Also, what shall I do to the table in mysql? I never worked on this so your detailed answer is much appreciated.

Thanks a lot.

first thing you have to do is create a FULLTEXT index on the column(s) you want to search

then compose the query using MATCH syntax, which you will find explained in the manual

Thank you! But shall I use the Boolean search? Could you please hint me about the correct statement?

One more thing, will it search for partial segments of words?

I have changed the column “status” to full text index and the engine is MyISAM. I have tried the below sql but it is not working :frowning: I am getting the error:
“test” AND people.id != “‘.$this->myId.’” ORDER BY people.time ASC LIMIT '.$limit;

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near “test”

$keySearch=“test”;
$limit=10;
$this->myId=1;

$sql = ’ SELECT people.status, people.id FROM people WHERE MATCH (people.status) AGAINST “‘.$keySearch.’” AND people.id != “‘.$this->myId.’” ORDER BY people.time ASC LIMIT '.$limit;

mysql_query($sql);

echo mysql_error();

Would you please advise?

Thanks.

please test it in mysql first, without php

I have something strange!

In my table people, I have the status for each person as below:

id status


1 hello
2 hello
3 hello
4 hello
5 hello

When I search using the SQL below:

SELECT people.status FROM people WHERE

MATCH (people.status) AGAINST (“hello*” IN BOOLEAN MODE)

HAVING people.id!= 1

ORDER BY status_time ASC LIMIT 10

I get ZERO rows! I should get 4 rows!

BUT:

If the table people like below:

id status


1 sello
2 sello
3 sello
4 sello
5 sello

And if I change the keyword for search to “sello”, then I get 4 rows! Why!! Below is the SQL:

SELECT people.status FROM people WHERE

MATCH (people.status) AGAINST (“sello*” IN BOOLEAN MODE)

HAVING people.id!= 1

ORDER BY status_time ASC LIMIT 10

Is there any restriction on the “hello” in mysql?

Thanks.

emphasis added

Thanks but what’s the difference between “hello” and “sello”? Why I am getting 4 hits for “sello” while getting ZERO hits for “hello”?

because there are so few rows in the table

bizarre, eh?

But “sello” is same case. Please look at the two tables. When searching for “sello”, I am getting 4 hits, while when searching for “hello” I am getting 0 hits! WHY!!

please re-read the excerpt from da manual that i posted above

I have read it. You can’t convince me. I am talking about two cases now:

First case:

search keyword: “hello”, hits: 0

People table

=========

id status
-- ------
1 hello
2 hello
3 hello
4 hello
5 hello

Second case:

search keyword: “sello”, hits: 4

People table

=========

id status
-- ------
1 sello
2 sello
3 sello
4 sello
5 sello

Did you recognize that “sello” got 4 hits while “hello” got 0 hits in their respective tables?

you have too few rows in your table

For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results

Sorry & thanks. So the full text search should be applied on the huge records, and before getting huge records, we can apply the “LIKE” then we convert.

Seems that it took “hello” as random.

Cheers,