Painfully slow MySQL query please help

Zaggs · December 29, 2011, 4:39pm

Hi Guys!

I have the following MySQL query which loads painfully slow. It performs a search on a database of over 30,000 jobs. It shouldn’t be this slow (takes around 30 secs to load).


SELECT jobs.job_title, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, companies.company_name, companies.company_url, companies.logo_thumbnail, count( applications.id ) AS applications, count( featured_jobs.id ) AS featured, l.display_name, l.postcode, l.display_county
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN jobs_industries ON jobs_industries.job_id = jobs.id
INNER JOIN jobs_job_types ON jobs_job_types.job_id = jobs.id
INNER JOIN (

SELECT job_id, if( count( job_id ) >1, 'Locations throughout the UK', display_name ) AS display_name, postcode, display_county, location_id
FROM jobs_locations
GROUP BY job_id
) AS l ON l.job_id = jobs.id
LEFT OUTER JOIN applications ON applications.job_id = jobs.id
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC

I did an explain on the query and it produced this:


id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	PRIMARY 	applications 	system 	NULL	NULL	NULL	NULL	0 	const row not found
1 	PRIMARY 	<derived2> 	ALL 	NULL	NULL	NULL	NULL	31405 	Using temporary; Using filesort
1 	PRIMARY 	jobs 	eq_ref 	PRIMARY,company_id 	PRIMARY 	4 	l.job_id 	1 	Using where
1 	PRIMARY 	companies 	eq_ref 	PRIMARY 	PRIMARY 	4 	db_name.jobs.company_id 	1 	Using where
1 	PRIMARY 	featured_jobs 	ALL 	NULL	NULL	NULL	NULL	2 	
1 	PRIMARY 	jobs_industries 	ref 	job_id 	job_id 	4 	db_name.jobs.id 	12 	Using where; Using index
1 	PRIMARY 	jobs_job_types 	ref 	job_id 	job_id 	4 	db_name.jobs_industries.job_id 	12 	Using where; Using index
2 	DERIVED 	jobs_locations 	ALL 	NULL	NULL	NULL	NULL	31405 	Using temporary; Using filesort

Any input is appreciated.

Zaggs · January 3, 2012, 11:36am

Any ideas on this guys, still having the same problem!

And happy new year to all!

r937 · January 5, 2012, 10:44am

it’s normal if the query has to read an entire table – at that point, an index won’t help and will be ignored

in your joins, the join columns need to be properly indexed as well

i remember your other thread, where you had that massive EXPLAIN, and i must confess, it’s pretty hard to pinpoint where the issue might be

have you considered hiring a performance expert? (note: i don’t qualify, so this isn’t a hint)

Zaggs · January 5, 2012, 11:16am

Hi - thanks for your reply.

I think it’s the INNER JOIN SELECT (with concat) part of the query that’s taking the time because when I remove it, it loads much quicker (in under a second). Here’s the query:


SELECT jobs.id, jobs.job_title, jobs.job_description, jobs.job_salary, jobs.job_url, companies.company_name, companies.company_url, l.display_name
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN (

SELECT job_id, if( count( job_id ) >1, 'Locations throughout the UK', concat( display_town, ', ', display_county ) ) AS display_name
FROM jobs_locations
GROUP BY job_id
) AS l ON l.job_id = jobs.id
WHERE companies.status = 'Active'
AND jobs.status = 'on'
ORDER BY jobs.date_posted DESC
LIMIT 3

I have considered hiring a MySQL performance expert but they are more than likely going to charge me a few hundred bucks just to figure out this one issue. I would rather get my head round it first and then if we are still having issues, i’ll try to hire someone.

Thanks for your help.

r937 · January 5, 2012, 11:33am

run the subquery by itself and see how long it takes, maybe that’s your problem

Zaggs · January 5, 2012, 11:57am

Hmmm that’s very strange, loads very fast:

Showing rows 0 - 29 ( 31,405 total, Query took 0.0008 sec)

Does that mean it’s the inner joins causing the issue?

r937 · January 5, 2012, 12:13pm

i think all it means is that the subquery itself isn’t your issue… i think

TheRedDevil · January 5, 2012, 4:36pm

I am by far an SQL expert, but looking at your query and hearing how few rows there is in those tables and its already kneeling, I cant stop thinking that your problem is actually in the database model.

Have you considered reviewing the database model, and perhaps de-normalize parts of it if it make sense performance wise.
I.e. when going through this, think how will I get this data, how will I use this data etc.

rpkamp · January 5, 2012, 6:35pm

Yes, this sounds like one of those situations where denormalization would really help. So instead of using the subquery there, add a field location_name to the jobs_table, and set the value of that field to the relevant value when you save the record in your application.

Before you do that though, could you please post the EXPLAIN for that query?

r937 · January 5, 2012, 6:43pm

that may not be as easy as you think

the thing is, a job can have multiple locations, so if you did create this new column, you’d have to set it to the concatenated location name for the first location, and then update it if you add another location for that job

as for the EXPLAIN, it’s in his previous thread… perhaps you could please merge the threads?

SpacePhoenix · January 5, 2012, 9:00pm

Two threads merged.

Try the query again but without the LIMIT clause

TheRedDevil · January 5, 2012, 9:58pm

But is not that why we love this job? Due to the challenges it bring

Zaggs:
I’ve reviewed the thread again after Rémon was kind enough to merge your topics.

I’ll go out on a limb and say that you are not used to work with high load or high dependency websites.

After reviewing your first query, I’m 100% sure your problem can be fixed with de-normilization and with proper use of cache techniques.

Query in main post:
-You run two LEFT OUTER queries which count how many applications the job has received and one who count featured (not sure what this is for). Both these counts should also have been a separate column on the jobs table. Which is increased/decreased when a record is added/removed in said tables. This can be handled either by server side or triggers in the database.

-Your sub query is strange, you check to see if the job is only available on one location or multiple, but also pull the postcode etc. While the query is GROUP BY the job id, meaning that if there is more than one location you have no idea which postcode etc. your pulling. Which makes it look like if its in more than one location, you only show “Locations throughout the UK” for the user. With other words, this can ALSO be added as a column to the jobs table to keep track of, similar to what is mentioned above.

-In addition the pulling of the locations the job is available in should be moved outside the initial query, even if you want to pull all locations. Lets say you want to pull all locations per job, and you pull multiple jobs at a time. What you would do is: First loop over the jobs and in addition store the job ids in an array, then after pull all locations used across the jobs. With other words everything is done with two queries.

-You have INNER JOINS which you don’t use, example: jobs_industries and jobs_job_types Make certain you remove anything from the query if you don’t need it.

-You are using ENOM’s or at least string keys like job status “on”. Stop doing that and start using integers, that will make it slightly easier to convert to other databases in the future if needed.

-I assume you do not store the thumbnail image as a blob, referring to the companies table “logo_thumbnail” column. If you do stop with that, its very ineffective. In the event you store only the image file name, stop doing that as well! Instead use the company id for the file name, i.e. “ID_thumb.jpg” etc.

Second Query:
-You have a LIMIT on three records, just first running this query first:
SELECT job_id FROM jobs INNER JOIN companies as comp ON (jobs.company_id=comp.id AND comp.status=‘Active’) WHERE jobs.status=‘on’ ORDER BY jobs.date_posted DESC LIMIT 3
And then after do WHERE jobs.job_id IN (1, 2, 3) I.e. using the results from the first query, or even using it as a sub query would most probably make that query fast again.

-Though the other points I mentioned above I would have done here as well.

What you need to understand is that there is two kinds of people that work with Databases, the ones who want everything normalized, and the ones who understand that while normalized is the ultimate goal, it just does not work to have everything normalized for a high load or high dependability system.

With other words, a normalized database come at the cost of speed and required resources.

Now, with that said keep your database model as normalized as possible, as you can apply a lot of caching on the expensive parts as well. Even separating a larger query into two or more queries might change the application from taking 3 seconds, to take 0.00000005 seconds again even with more queries.

rpkamp · January 5, 2012, 10:18pm

Excellent post from TheRedDevil, I applaud you! All very valid points and indeed all of them will improve the speed of that query.

Having just looked at the EXPLAIN from your query, the problem is with “Using temporary; Using filesort”, which means MySQL needs to create a view from what you’re selecting in the subquery, write it to disk, and sort it there; which is sloooooooow. Every time you see “Using filesort” in an EXPLAIN that’s what’s killing it.

From all solutions given above I’d personally go with

because that way you can leave the database tables structure intact (and still normalized), and still get a huge performance gain. Win-win I’d say

rpkamp · January 5, 2012, 10:21pm

You could do that quite easily in your application, or via database triggers. Sure, it takes a little bit of time to create the new value and update the field, but it probably doesn’t happen very often and it most likely doesn’t touch any indexes (so no b-tree restructure – which are sloooow – will be needed), and since it’s likely to speed up the SELECT query a lot, it should be worth doing.

Although now that TheRedDevil just stepped in, his idea is better than mine