Problem with query taking too long

I have an authentication system for my programs which logs every authentication attempt.
Now over the months, the logs table have gotten 4,064,450 records.
The problem is when im executing that query to get the latest 13 records it takes about 10 seconds to do it.

“SELECT * FROM logs_table ORDER BY time DESC LIMIT 0, 13”

time = the time of the attempt.

Is there a way to optimize the query ?

Thank you.

I’ll say that going through 4+ million records to only use 13 is very inefficient!

Do you need to retain them all? If not I would periodically DELETE them, maybe after saving a backup just in case you want them later.

Otherwise I would move the older ones to some sort of “archive” table that doesn’t need to be searched so often.

IMHO it isn’t the query that needs optimizing but rather the database architecture.

Thats the problem, we do search for the logs to get information about the user if needed for alot of cases.

Could you please give me more info about the “optimizing the database arch” ?

Thanks.

Something like this should speed up your query significantly (with tweaking, should run about 100x faster):


SELECT * 
FROM (
    SELECT * 
    FROM logs_table
    WHERE `time` > NOW() - INTERVAL 3 DAY
) AS `logs`
ORDER BY `time` DESC
LIMIT 0, 13

How many entries per day? If 3 days returns too many logs, reduce it… or adjust higher if you want more padding.

Also, if you don’t have an index on the “time” column, try:


ALTER TABLE `logs_table` ADD INDEX(`time`);

Cheers!

If you need to keep and use the data that does limit your options.

Do you really need every field returned - the “*” ? Even if you do, AFAIK explicitly listing the fields allows the database to use indexes and should run faster.

For the frequent query, maybe you could make a “recent” table that points to indexes in the “logs” table. Then save the heavy grunt work for the occassion.

That’s not the culprit here… he’s combining an ORDER BY and a LIMIT… that will necessarily compare the time field on every result… even when indexed, it’s doing a multi-million-record hash prior to the LIMIT.

To increase performance, he has to to a “pre-sort pseudo-limit” by date. This will allow him to cut out a significant % of his results on an indexed column prior to the ORDER BY/LIMIT.

It’ll significantly increase his performance (assuming the time column is indexed).

What server-side language are you using? Try dropping the LIMIT CLAUSE and instead in the server-side language run the loop that you use to get each row of the result set 13 times (or for however many rows are required) then break from the loop.

Yikes!!! That will retrieve 4 MILLION records!!! I don’t even KNOW how a server will handle that… even with an iterative model, there has to be some point in the DB connection that will process all 4 million records!

Ok, I decided to run a test on this, because I’m interested to see the results…

I’ve built the following table:

CREATE TABLE `logs_table` (
    `id` INT NOT NULL AUTO_INCREMENT,
    `user_id` INT NOT NULL,
    `action_id` INT NOT NULL,
    `time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`),
    INDEX (`user_id`),
    INDEX (`action_id`),
    INDEX (`time`)
);

I’m currently populating it with data… 4,000,000 records spread out randomly over a 12 month period. I’ve got 1.5 million records inserted so far!!!

I’m going to test some different queries on it and see what kind of results I get!

PS - This is on a 512 MB Slicehost VPS - not the best machine in the world, so should give some good results!

Ok, I actually inserted 5,000,000 records… and here are my results:

SELECT * FROM logs_table ORDER BY `time` DESC LIMIT 0, 13

0.04 seconds!!!

With no index on time, it’s 9.8 seconds (makes me think time is not indexed!)

SELECT * 
FROM (
    SELECT * 
    FROM logs_table
    WHERE `time` > NOW() - INTERVAL 3 DAY
) AS `logs`
ORDER BY `time` DESC
LIMIT 0, 13

4 seconds!!! The inner query is returning 100,000+ records, and that’s killing the index… So I reduced that timeframe and got:


SELECT * FROM (
    SELECT * 
    FROM logs_table 
    WHERE `time` > NOW() - INTERVAL 10 MINUTE
) AS `logs` 
ORDER BY `time` DESC 
LIMIT 0, 13;

0.01 seconds

Still, it seems your problem is no index on the time column… go back to my first post and run the ALTER TABLE query.