MySQL join by field with comma seperated list

Hi Guys,

Is it possible to do a join on a table where the contents of the join field are comma separated list. So for example I want to do a join on the table below:


SELECT jobs.job_title,
companies.company_name,
companies.company_url,
companies.logo_thumbnail,
jobs.job_url,
jobs.job_salary,
jobs.job_ote,
jobs.job_perks,
jobs.job_description,
jobs.location,
jobs.date_posted,
[B]industries.industry, [/B]
count(featured_jobs.id) AS featured
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
[B]INNER JOIN industries ON industries.id = jobs.industry[/B]
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC
LIMIT 0 , 30

I want to get the industry name and the industry is stored by id (but there can be more that one and the IDs are separated by comma)… For example the contents of the jobs.industry might be “1011,1016,1090”

Can I do a join in this way?

two problems

yes, you can do the join on a comma-delimited column, but it will always require a table scan (and therefore, be slow as cold glue)

secondly, if there is more than one industry per job, you won’t be able to just “get the industry name” because there will be more than one industry, and how did you want to display them? you’re doing a GROUP BY on the job, so you’re only going to see one row per job in the results

Hi r937,

I think it would probably be better for me in the longer term to recreate the database structure so that the industries are stored in a separate table. That would be much easier/faster to join then.

totally agree

but you will still ahve to decide which of the industries you want to show for each job

Hmmm yes, how would I get a list of industries associated with each job? What is the ideal scenario here?

you might have to do a GROUP_CONCAT on the industry names, just to keep the results to one row per job

At the moment the industries are stored in the jobs table in the following format:

1024, 1025, 1089

Would it be better to store them in their own table (i.e a row for each) and then do a group_concat?

absolutely, yes

as i said before, performing a join on a comma-delimited list will be slow as cold glue

Can I just ask what you would see as the best possible structure for the new table? Would I need an ID field (primary key) or would it just be best to create an index on the job_id field?

CREATE TABLE job_industries
( job_id INTEGER NOT NULL
, industry_id INTEGER NOT NULL
, PRIMARY KEY ( job_id, industry_id )
, INDEX industry_jobs ( industry_id, job_id )
, FOREIGN KEY ( job_id ) REFERENCES jobs ( id )
, FOREIGN KEY ( industry_id ) REFERENCES industries ( id )
);

Perfect, thank you! And… on that basis how would I run the original query to include my group_concat?

you’d need to join to a subquery that did the aggregation

give it a try, come on…

Im trying this, but it returns the industry as a BLOB field and I can’t see the data within it. Any idea why?


SELECT jobs.job_title, companies.company_name, companies.company_url, companies.logo_thumbnail, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, 
GROUP_CONCAT( jobs_sectors.sector, ', ' ) AS industry, 
count( featured_jobs.id ) AS featured
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN jobs_sectors ON jobs_sectors.job_id = jobs.id
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC

no, you need to replace the join to jobs_sectors with a join to a subquery, which does the GROUP_CONCAT and has a GROUP BY jobs_sectors.job_id

you’ve never seen a join to a subquery before?

No do you have a link where I can see how to join a subquery?

I tried this, but no luck…Am I getting close :wink:


SELECT jobs.job_title, companies.company_name, companies.company_url, companies.logo_thumbnail, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, count( featured_jobs.id ) AS featured
FROM jobs
INNER JOIN companies ON companies.id = jobs.company_id
INNER JOIN (
SELECT GROUP_CONCAT( jobs_sectors.sector, ', ' )
FROM jobs_sectors
GROUP BY jobs_sectors.job_id
) AS industry
LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
WHERE jobs.status = 'on'
GROUP BY jobs.id
ORDER BY featured DESC , date_posted DESC

SELECT jobs.job_title
     , companies.company_name
     , companies.company_url
     , companies.logo_thumbnail
     , jobs.job_url
     , jobs.job_salary
     , jobs.job_ote
     , jobs.job_perks
     , jobs.job_description
     , jobs.date_posted
     , [COLOR="DarkGreen"]i.sectors[/COLOR]
     , [COLOR="Blue"]f.featured[/COLOR]
  FROM jobs
INNER
  JOIN companies
    ON companies.id = jobs.company_id
INNER
  JOIN [COLOR="DarkGreen"]( SELECT job_id
              , GROUP_CONCAT(sector) AS sectors
           FROM jobs_sectors
         GROUP
             BY job_id ) AS i
    ON i.job_id = jobs.id[/COLOR]
LEFT OUTER
  JOIN [COLOR="blue"]( SELECT job_id
              , COUNT(*) AS featured
           FROM featured_jobs
         GROUP
             BY job_id ) AS f
    ON f.job_id = jobs.id[/COLOR]
 WHERE jobs.status = 'on'
ORDER
    BY featured DESC
     , date_posted DESC

Thanks, although the derived sectors field is still being shown as a BLOB field - is this right?

use LEFT(GROUP_CONCAT(…),255)

Thanks r937!

That brings me to my final question…

If I wanted to perform a search on the sectors and find all rows (jobs) which are listed in more than one sector - how would I do that?

For example the jobs_sectors table might contain the following:

job_id sector
1 22
1 28
2 22
3 25

Now if I did a search for sector 22, it should return both job_id 1 and job_id 2.

Thanks in advance, you’ve been a great help!