Search is painfully slow

Hi Guys!

I’m building a search on one of my sites and I have the following MySQL query. It’s currently taking 9.5 sec in phpMyadmin to execute the code, but actually takes even longer when run on the actual site. Do I have the right database structure of tables etc.


SELECT users . * , ui.industry, jt.job_type
	FROM users
LEFT OUTER JOIN (
	SELECT user_id, job_type
		FROM users_job_types
		LEFT JOIN job_types ON job_types.id = users_job_types.job_type_id
	GROUP BY user_id
) AS jt ON jt.user_id = users.id
LEFT OUTER JOIN (
	SELECT user_id, industry
		FROM users_industries
        LEFT JOIN industries ON industries.id = users_industries.industry_id
	GROUP BY user_id
) AS ui ON ui.user_id = users.id
WHERE users.status = '1'
AND users.cv_hide = '0'
ORDER BY cv_date DESC

Below are the create statements:


CREATE TABLE IF NOT EXISTS `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(100) CHARACTER SET utf8 NOT NULL,
  `cv_date` datetime DEFAULT NULL,
  `cv_hide` enum('0','1') CHARACTER SET utf8 NOT NULL DEFAULT '0',
  `status` enum('0','1') CHARACTER SET utf8 DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1292692 ;

CREATE TABLE IF NOT EXISTS `job_types` (
  `id` smallint(6) NOT NULL AUTO_INCREMENT,
  `job_type` varchar(100) NOT NULL,
  `job_type_description` text,
  `job_type_url` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1006 ;

CREATE TABLE IF NOT EXISTS `users_job_types` (
  `user_id` int(11) NOT NULL,
  `job_type_id` int(11) NOT NULL,
  KEY `user_id` (`user_id`),
  KEY `job_type_id` (`job_type_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `users_industries` (
  `user_id` int(11) NOT NULL,
  `industry_id` int(11) NOT NULL,
  KEY `user_id` (`user_id`),
  KEY `industry_id` (`industry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `industries` (
  `id` smallint(6) NOT NULL AUTO_INCREMENT,
  `industry` varchar(100) NOT NULL,
  `industry_description` text,
  `industry_url` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1052 ;

Please let me know if you know any way this can be speeded up.

Thanks.

your junction tables should be redesigned to enable the use of covering indexes – the primary key plus another index with the same two columns in the reverse order

CREATE TABLE users_job_types 
( user_id INTEGER NOT NULL
, job_type_id INTEGER NOT NULL
, PRIMARY KEY ( user_id , job_type_id )
, INDEX job_type_user ( job_type_id , user_id )
);

CREATE TABLE users_industries 
( user_id INTEGER NOT NULL
, industry_id INTEGER NOT NULL
, PRIMARY KEY ( user_id , industry_id )
, INDEX industry_user ( industry_id , user_id )
);

this way, the actual rows of these two tables are never accessed, just the indexes

as for your query, how come all you want is a single arbitrary job type and a single arbitrary industry for each user?

Hi r937,

Firstly, thank you for your post. Your comments make absolute complete sense.

I’ve added the new indexes / primary keys to the table. Now with the query itself, the aim is to return users who match a certain job type or industry. I’ve stripped that part of the sql out for now, so this in essence is the query that will be run when a user does a blanket search (i.e. they have no search criteria). Ideally, this query should return all users. I’m joining the tables as I need to display the industry and job type on the search results page. Does this make sense?

Ps, the results should return all industries / job types for each user (not just one)


[COLOR=#000000][COLOR=#007700] [/COLOR][COLOR=#0000BB]SELECT user_id[/COLOR][COLOR=#007700], [/COLOR][COLOR=#0000BB]job_type
        FROM users_job_types
        LEFT JOIN job_types ON job_types[/COLOR][COLOR=#007700].[/COLOR][COLOR=#0000BB]id [/COLOR][COLOR=#007700]= [/COLOR][COLOR=#0000BB]users_job_types[/COLOR][COLOR=#007700].[/COLOR][COLOR=#0000BB]job_type_id
    GROUP BY user_id

This part of your query alone will return a single job type per user_id and it is arbitrary which one it picks. If there are five rows in the table for you and your id=3 and two rows in the table for me and my id=4 it will return any one of your rows and any one of my rows.
[/COLOR][/COLOR]

Yes, this is really not ideal, I guess i’ll have to do some type of sub query?

you’re already doing a subquery

in fact, you’re doing two of them, and you will get cross join efects if you return more than one row per user with them

didn’t you already use GROUP_CONCAT in this query, or another very similar?

because that’s what’s needed here

Yes, I used GROUP_CONCAT on a similar query - not this one.

I guess I could do this then, but it still works slow;

SELECT users . * , ui.industry, jt.job_type
FROM users
INNER JOIN (

SELECT user_id, group_concat( job_type
SEPARATOR ' ' ) AS job_type
FROM users_job_types
LEFT JOIN job_types ON job_types.id = users_job_types.job_type_id
GROUP BY user_id
) AS jt ON jt.user_id = users.id
INNER JOIN (

SELECT user_id, group_concat( industry
SEPARATOR ' ' ) AS industry
FROM users_industries
LEFT JOIN industries ON industries.id = users_industries.industry_id
GROUP BY user_id
) AS ui ON ui.user_id = users.id
WHERE users.status = '1'
AND users.cv_hide = '0'
ORDER BY cv_date DESC
  1. How do I speed it up using my new indexes that I added?
  2. How can I return all users even if they don’t have any industries or job types?
  1. do an EXPLAIN and let’s analyze it

  2. use LEFT OUTER JOINs instead of INNER JOINs

edit: and use INNER JOINs inside your subqueries instead of LEFT OUTER JOINs

Here is an explain of the new query

i’m not a performance expert but perhaps the ORDER BY in the outer query is contributing to the slowness

this is just a guess, but the query might speed up really nicely if you pull in all the job type and industries separately ahead of time, and do the GROUP_CONCAT on the ids only (i.e. use just the junction tables in the subqueries), expanding the ids to names in your application language – but this would be a last resort

Hi,

I tried removing the “order by” in the outer query but it didn’t speed things up at all.

When you say pull in the job type and industries seperately, what exactly do you mean? Create separate queries?

yup

then your main query would look like this –

SELECT users.* 
     , jt.job_type_ids 
     , ui.industry_ids
  FROM users 
LEFT OUTER 
  JOIN ( SELECT user_id
              , GROUP_CONCAT(job_type_id) AS job_type_ids 
           FROM users_job_types 
         GROUP 
             BY user_id ) AS jt 
    ON jt.user_id = users.id 
LEFT OUTER 
  JOIN ( SELECT user_id
              , GROUP_CONCAT(industry_id) AS industry_ids 
           FROM users_industries 
         GROUP 
             BY user_id ) AS ui 
    ON ui.user_id = users.id 
 WHERE users.status = '1' 
   AND users.cv_hide = '0' 
ORDER 
    BY cv_date DESC  

hopefully that runs faster, and you would then swap out the ids using the pre-fetched job types and industries using your application language

Hi Paul,

That query is taking a while too…Showing rows 0 - 29 (7,404 total, Query took 17.0853 sec)

who’s this paul guy?

can you please reconfirm the indexes on the junction tables by doing a SHOW CREATE TABLE for them

Ooops, sorry it’s been a long day. Think I was typing an email to a guy named Paul prior to this being posted.


CREATE TABLE IF NOT EXISTS `users_industries` (
  `user_id` int(11) NOT NULL,
  `industry_id` int(11) NOT NULL,
  PRIMARY KEY (`user_id`,`industry_id`),
  KEY `industry_user` (`industry_id`,`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `users_job_types` (
  `user_id` int(11) NOT NULL,
  `job_type_id` int(11) NOT NULL,
  PRIMARY KEY (`user_id`,`job_type_id`),
  KEY `job_type_user` (`job_type_id`,`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

let’s step back from this for a moment, maybe there’s another approach

under what circumstances do you want to produce a list of all users?

why does that list need to show each user’s job types and industries?

how often will this query need to run? and have you thought about caching?

We use PHP to build an SQL query depending on what the users search criteria is. For example, if they select a job type, then the query will narrow by job type. The query I posted just happens to be a blanket search (i.e. the user didn’t have any search criteria and therefore all results should be returned).

Because we need to see the Job Type and Industry next to the user on the search results page.

It will run every time someone does a search, so probably quite a lot.

Does this help? Thanks.

Still looking for help on this issue. Can anyone help?

Thanks.

Perhaps try eliminating the subqueries.


SELECT 
       u.id
      ,GROUP_CONCAT(DISTINCT jt.job_type SEPARATOR ' ')
      ,GROUP_CONCAT(DISTINCT i.industry SEPARATOR ' ')
  FROM
      users u
  LEFT OUTER
  JOIN  
      users_job_types ujt
    ON
      u.id = ujt.user_id
  LEFT OUTER
  JOIN
      job_types jt
    ON
      ujt.job_type_id = jt.id
  LEFT OUTER
  JOIN
      users_industries ui
    ON
      u.id = ui.user_id
  LEFT OUTER
  JOIN
      industries i
    ON
      ui.industry_id = i.id
 WHERE
      u.`status` = 1
   AND
      u.cv_hide = 0
 GROUP
    BY
      u.id
 ORDER
    BY
      u.cv_date DESC

The explain for that should be using all the indexes.