zaggs — 2012-08-16T03:32:09-04:00 — #1
I have a MySQL query which takes quite a while to load. This is very frustrating for users when they do a search as it can take up to 10 seconds for the page to load. When I execute the function in phpMyAdmin it says the query took 1.34 seconds but in fact takes about 10 seconds for that page to load too.
I also run an EXPLAIN on the query and took a screenshot of the result. Any help to speed this query up would be greatly appreciated.
SELECT jobs.job_title, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, companies.company_name, companies.company_url, companies.logo_thumbnail, count( applications.id ) AS applications, count( featured_jobs.id ) AS featured, l.display_name, l.postcode, l.display_county
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN jobs_industries ON jobs_industries.job_id = jobs.id
INNER JOIN jobs_job_types ON jobs_job_types.job_id = jobs.id
INNER JOIN (
SELECT job_id, if( count( job_id ) >1, 'Locations throughout the UK', display_name ) AS display_name, postcode, display_county, location_id
GROUP BY job_id
) AS l ON l.job_id = jobs.id
LEFT OUTER JOIN applications ON applications.job_id = jobs.id
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC
antnee — 2012-08-16T05:16:12-04:00 — #2
Generically, the places to focus on an Explain result are where:
- There are no possible keys to use
- There are a large number of rows returned
- Temporary tables and filesort are used
- Derived results from subqueries
Your indexes are stored in an efficient fashion, usually in order, whereas your data could be in any random order. Imagine a large book; you would need to read the whole thing to know that you've definitely found the information that you're looking for, whereas if you use the index you will be able to go straight to the term you need and won't need to look any further than the pages that it tells you to. So, where there's no possible key available, you can consider adding one. Personally I consider it good practice to have an index on any field that is used in a join, though in reality it may depend on the size of the table etc for whether there's any real-world benefit.
When you get 100 rows returned, and you then go further down and see another 100 rows returned (especially as a result of outer joins) you will multiply the number of original rows by the number of new rows. So 100x100 = 10,000 rows. When you've got 20,000 x 6 you will end up with 120,000 rows of data for the server to work through. Looking at your result, and if I'm not mistaken, you are returning a total of 2,387,775,606 rows. This is a HUGE amount of data that the server won't be able to keep in memory (depending on your configuration) so it will write it to a temporary table on disc and perform a filesort. Disc based operations are MUCH more costly than memory based, so it becomes much slower immediately. Large numbers of results, grouping and ordering generally always result in "Using temporary; Using filesort".
Lastly, and probably the primary reason for your particular problem (apologies for going the long way around) and the reason that you have so many rows returned as above, is that subqueries are just painfully slow in MySQL (usually). I don't believe that the jobs_locations join is doing you any favours at all, because as above, every time it appears to be returning 20,000 rows again. Is there any reason why you've used a subquery? Could you optimise it into a simple join? I would be focussing my attention there, personally
zaggs — 2012-08-16T05:24:45-04:00 — #3
Thanks for your detailed response. It all makes complete sense. I did try and write a simple join for this query but it didn't seem to work (can't remember why now though) that's why I changed it to use a sub query.
I have also been told that creating a temporary table, selecting the results and then deleting the temporary table may be a better choice, but I don't see why this would speed the query up?
I am totally lost at the moment
antnee — 2012-08-16T05:36:35-04:00 — #4
A colleague was once given the same advice. I'd heard it plenty of times so suggested we both tackle his slow query; he did it through a manual temporary table creation etc, and I did it the simple way around by just optimising the query as best as I could. At the end of the day my solution was faster, because MySQL said to itself "I've got a lot of info here - too much to process in RAM - so I'm going to create a temporary table (Using temporary), put the data in there, sort/group (Using filesort) and then return the results".
I'll be honest with you; with the database actually in front of me I was always excellent at optimising queries, but just looking at the query itself and an EXPLAIN result is not so easy for me, so I can't give you an actual solution. I WOULD however 100% suggest that you focus on that join. Another solution (if you're using PHP) would be to run that subquery first, store the results in a PHP array, and then look it up as the results come out of the main query. When we switched from InnoDB to NDB Cluster we had to break up loads of queries that worked fine in to multiple queries due to the way that NDB works. I spent years thinking I needed to get EVERYTHING out with one query, when it was often counter-intuitive. Just don't nest queries inside PHP loops (ie for each row, run another query).
Alternatively, get that subquery to only return one result (implement a WHERE clause in it). It'll still be slower than it could be, but it'll be a lot faster than it is
r937 — 2012-08-16T08:11:20-04:00 — #5
no, i think you read that wrong if you simply multiply 19949 * 6 * 19949
quick disclaimer: i'm no EXPLAIN expert either
however, you're on the right track in multiplying row counts together
i call this problem cross join effects and it results when you combine several one-to-many relationships in the joins
let's analyze the joins --
ON companies.id = jobs.company_id
if you look at the join columns, you can assume that each job has exactly one company
that's because jobs.company_id is a foreign key, referencing the companies.id primary key
it's a one-to-many relationship, from companies to jobs, but seen from the point of view of the jobs table, each job has one company
ON jobs_industries.job_id = jobs.id
this relationship is reversed -- it's one-to-many from jobs to industries
so each job can have multiple industries
ON jobs_job_types.job_id = jobs.id
another one-to-many relationship -- each job can have multiple job types
so if a given job has 4 industries and 3 job types, there will be 12 intermediate rows produced by these joins
in other words, cross join effects
now look at the subquery for job locations
this is another one-to-many relationship (each job can have multiple locations), however, the subquery has GROUP BY job_id so in effect this subquery will only ever join one additional row to the jobs row
i actually helped write this subquery last december, and i notice my advice to use CASE and MIN(display_name) was not adopted... sigh
carrying on with the one-to-many analysis, you can see that applications and featured-jobs are also contributing to the cross join effects
what to do?
any relationship that has multiple rows per job should be re-written into a subquery with a GROUP BY on the job