MySQL join by field with comma seperated list

Zaggs · August 25, 2011, 11:28am

Hi Guys,

Is it possible to do a join on a table where the contents of the join field are comma separated list. So for example I want to do a join on the table below:


SELECT jobs.job_title,
companies.company_name,
companies.company_url,
companies.logo_thumbnail,
jobs.job_url,
jobs.job_salary,
jobs.job_ote,
jobs.job_perks,
jobs.job_description,
jobs.location,
jobs.date_posted,
[B]industries.industry, [/B]
count(featured_jobs.id) AS featured
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
[B]INNER JOIN industries ON industries.id = jobs.industry[/B]
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC
LIMIT 0 , 30

I want to get the industry name and the industry is stored by id (but there can be more that one and the IDs are separated by comma)… For example the contents of the jobs.industry might be “1011,1016,1090”

Can I do a join in this way?

r937 · August 25, 2011, 12:32pm

two problems

yes, you can do the join on a comma-delimited column, but it will always require a table scan (and therefore, be slow as cold glue)

secondly, if there is more than one industry per job, you won’t be able to just “get the industry name” because there will be more than one industry, and how did you want to display them? you’re doing a GROUP BY on the job, so you’re only going to see one row per job in the results

Zaggs · August 25, 2011, 12:55pm

Hi r937,

I think it would probably be better for me in the longer term to recreate the database structure so that the industries are stored in a separate table. That would be much easier/faster to join then.

r937 · August 25, 2011, 1:03pm

totally agree

but you will still ahve to decide which of the industries you want to show for each job

Zaggs · August 26, 2011, 3:21pm

Hmmm yes, how would I get a list of industries associated with each job? What is the ideal scenario here?

r937 · August 26, 2011, 4:00pm

you might have to do a GROUP_CONCAT on the industry names, just to keep the results to one row per job

Zaggs · August 26, 2011, 4:28pm

At the moment the industries are stored in the jobs table in the following format:

1024, 1025, 1089

Would it be better to store them in their own table (i.e a row for each) and then do a group_concat?

r937 · August 26, 2011, 4:35pm

absolutely, yes

as i said before, performing a join on a comma-delimited list will be slow as cold glue

Zaggs · August 26, 2011, 5:10pm

Can I just ask what you would see as the best possible structure for the new table? Would I need an ID field (primary key) or would it just be best to create an index on the job_id field?

r937 · August 26, 2011, 6:33pm

CREATE TABLE job_industries
( job_id INTEGER NOT NULL
, industry_id INTEGER NOT NULL
, PRIMARY KEY ( job_id, industry_id )
, INDEX industry_jobs ( industry_id, job_id )
, FOREIGN KEY ( job_id ) REFERENCES jobs ( id )
, FOREIGN KEY ( industry_id ) REFERENCES industries ( id )
);

Zaggs · August 26, 2011, 6:44pm

r937:

CREATE TABLE job_industries 
( job_id INTEGER NOT NULL
, industry_id INTEGER NOT NULL 
, PRIMARY KEY ( job_id, industry_id )
, INDEX industry_jobs ( industry_id, job_id )
, FOREIGN KEY ( job_id ) REFERENCES jobs ( id )
, FOREIGN KEY ( industry_id ) REFERENCES industries ( id )
);

Perfect, thank you! And… on that basis how would I run the original query to include my group_concat?

r937 · August 26, 2011, 6:51pm

you’d need to join to a subquery that did the aggregation

give it a try, come on…

Zaggs · August 26, 2011, 6:53pm

Im trying this, but it returns the industry as a BLOB field and I can’t see the data within it. Any idea why?


SELECT jobs.job_title, companies.company_name, companies.company_url, companies.logo_thumbnail, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, 
GROUP_CONCAT( jobs_sectors.sector, ', ' ) AS industry, 
count( featured_jobs.id ) AS featured
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN jobs_sectors ON jobs_sectors.job_id = jobs.id
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC

r937 · August 26, 2011, 6:55pm

no, you need to replace the join to jobs_sectors with a join to a subquery, which does the GROUP_CONCAT and has a GROUP BY jobs_sectors.job_id

you’ve never seen a join to a subquery before?

Zaggs · August 26, 2011, 7:01pm

No do you have a link where I can see how to join a subquery?

Zaggs · August 26, 2011, 7:09pm

I tried this, but no luck…Am I getting close


SELECT jobs.job_title, companies.company_name, companies.company_url, companies.logo_thumbnail, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, count( featured_jobs.id ) AS featured
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN (
SELECT GROUP_CONCAT( jobs_sectors.sector, ', ' )
FROM jobs_sectors
GROUP BY jobs_sectors.job_id
) AS industry
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC

r937 · August 26, 2011, 11:00pm

SELECT jobs.job_title
     , companies.company_name
     , companies.company_url
     , companies.logo_thumbnail
     , jobs.job_url
     , jobs.job_salary
     , jobs.job_ote
     , jobs.job_perks
     , jobs.job_description
     , jobs.date_posted
     , [COLOR="DarkGreen"]i.sectors[/COLOR]
     , [COLOR="Blue"]f.featured[/COLOR]
  FROM jobs
INNER
  JOIN companies
    ON companies.id = jobs.company_id
INNER
  JOIN [COLOR="DarkGreen"]( SELECT job_id
              , GROUP_CONCAT(sector) AS sectors
           FROM jobs_sectors
         GROUP
             BY job_id ) AS i
    ON i.job_id = jobs.id[/COLOR]
LEFT OUTER
  JOIN [COLOR="blue"]( SELECT job_id
              , COUNT(*) AS featured
           FROM featured_jobs
         GROUP
             BY job_id ) AS f
    ON f.job_id = jobs.id[/COLOR]
 WHERE jobs.status = 'on'
ORDER
    BY featured DESC
     , date_posted DESC

Zaggs · August 27, 2011, 7:00am

Thanks, although the derived sectors field is still being shown as a BLOB field - is this right?

r937 · August 27, 2011, 10:40am

use LEFT(GROUP_CONCAT(…),255)

Zaggs · August 27, 2011, 1:10pm

Thanks r937!

That brings me to my final question…

If I wanted to perform a search on the sectors and find all rows (jobs) which are listed in more than one sector - how would I do that?

For example the jobs_sectors table might contain the following:

job_id sector
1 22
1 28
2 22
3 25

Now if I did a search for sector 22, it should return both job_id 1 and job_id 2.

Thanks in advance, you’ve been a great help!