Why are UNIONS faster than JOINS?

Ok, I realize I’m being very general here and the statement may not be true, but in the database we are working with, UNIONS seems to be much faster than LEFT JOINs when using junction tables.

Okay, I just realized that what’s causing the difference is that when using JOINS I am using ORs which I think are what’s causing the speed difference.

Does anyone know of a better way to write the following code without using UNIONs?:

SELECT SQL_CALC_FOUND_ROWS DISTINCT dvdpedia.dogTag, dvdpedia.title, dvdpedia.starring, dvdpedia.originalTitle, dvdpedia.theatrical, dvdpedia.media, localeLanguage, MATCH(title) AGAINST('jones' IN BOOLEAN MODE) AS relevance 
					 	, director
						FROM dvdpedia 
						LEFT OUTER JOIN dvd_director2title ON dvdpedia.dogTag = dvd_director2title.dogTag
						LEFT OUTER JOIN dvd_director ON dvd_director.id = dvd_director2title.pId
						WHERE MATCH(title) AGAINST('jones' IN BOOLEAN MODE) 
						or MATCH(director) AGAINST('jones' IN BOOLEAN MODE)
						or MATCH(starring) AGAINST('jones' IN BOOLEAN MODE)    
						GROUP BY dogTag						
						ORDER BY localeLanguage = 'en' DESC, localeLanguage, relevance DESC

Thanks.

title, director, and starring and not the types of data fields that are conducive to fulltext indexing

i doubt any of these three is actually longer than 50 characters, so you should be probably be using LIKE

secondly, you write LEFT OUTER JOIN but then you also have a condition on the right table director column

i’m having trouble understanding why you have a many-to-many realtionship table for dvds to directors, and ~not~ one for dvds to “starring” (if i think this is what it looks like, the people starring in the dvd)

finally, i think your use of DISTINCT is redundant and may also be contributing to poor performance

My understanding is that regardless of the length of the field, a FULLTEXT search is usually faster than a LIKE search, is this not so? Also the advantage of a FULLTEXT search on those fields is that then we can order by Relevance.

secondly, you write LEFT OUTER JOIN but then you also have a condition on the right table director column

Sorry I don’t understand what you mean here, the LEFT OUTER JOIN is so that we can get listings of movies without directors in case they are searching for titles (this being a keyword search, we don’t know if they’re searching for titles, directors or actors).

i’m having trouble understanding why you have a many-to-many realtionship table for dvds to directors, and ~not~ one for dvds to “starring” (if i think this is what it looks like, the people starring in the dvd)

I do have one for credits as well, it’s just that the 6 main actors go into that starring field to make thing easier. Then I have another query that only searches the actors table which has the many-to-many relationship.

finally, i think your use of DISTINCT is redundant and may also be contributing to poor performance

Aha, hadn’t caught that, great, at least one thing I can optimize :slight_smile:

While we are talking about optimization, is there a way to find out if my joins are not using the indexes properly? I have tried using EXPLAIN but that doesn’t seem to help. I’ve also tried turning on the slow query logs and parsing those with mysqldumpslow but although this helps track the time the queries take, it doesn’t tell me if the joins are being used properly or where I could be optimizing the queries.

Let me clarify, EXPLAIN loses me when it reports something like:

id select_type table type possible_keys key
1 PRIMARY <derived2> All

I have no idea how to interpret or optimize this, even where in my query it’s referring to.

Thanks again.

writing a simple explanation that will help you learn how to read EXPLAIN output is not easy

it requires an understanding of the table design (use SHOW CREATE TABLE to include the indexes), the query (source), and the full EXPLAIN output

taking a single line from the EXPLAIN output isn’t enough to go on, i’m afraid

Here’s a full output of the explain, does this help?

+----+-------------+---------------------+--------+---------------+------------+---------+------------------------------------+-------+----------------------+
| id | select_type | table               | type   | possible_keys | key        | key_len | ref                                | rows  | Extra                |
+----+-------------+---------------------+--------+---------------+------------+---------+------------------------------------+-------+----------------------+
|  1 | PRIMARY     | dvdpedia            | const  | PRIMARY       | PRIMARY    | 4       | const                              |     1 |                      |
|  1 | PRIMARY     | dvd_locale          | const  | PRIMARY       | PRIMARY    | 12      | const,const                        |     0 | unique row not found |
|  1 | PRIMARY     | dvd_studio          | const  | PRIMARY       | PRIMARY    | 4       | const                              |     1 |                      |
|  1 | PRIMARY     | dvd_pictureFormat   | const  | PRIMARY       | PRIMARY    | 4       | const                              |     1 |                      |
|  1 | PRIMARY     | dvd_rated           | const  | PRIMARY       | NULL       | NULL    | NULL                               |     1 |                      |
|  1 | PRIMARY     | dvd_aspectRatio     | const  | PRIMARY       | NULL       | NULL    | NULL                               |     1 |                      |
|  1 | PRIMARY     | dvd_series          | const  | PRIMARY       | NULL       | NULL    | NULL                               |     1 |                      |
|  1 | PRIMARY     | dvd_images          | ref    | dogTag        | dogTag     | 4       | const                              |     2 |                      |
|  1 | PRIMARY     | dvd_sound           | eq_ref | PRIMARY       | PRIMARY    | 4       | const                              |     1 |                      |
|  1 | PRIMARY     | dvd_genre           | eq_ref | PRIMARY       | PRIMARY    | 4       | const                              |     1 |                      |
|  1 | PRIMARY     | <derived2>          | ALL    | NULL          | NULL       | NULL    | NULL                               |  8773 |                      |
|  1 | PRIMARY     | <derived3>          | ALL    | NULL          | NULL       | NULL    | NULL                               | 12580 |                      |
|  1 | PRIMARY     | <derived4>          | ALL    | NULL          | NULL       | NULL    | NULL                               |  7425 |                      |
|  1 | PRIMARY     | <derived5>          | ALL    | NULL          | NULL       | NULL    | NULL                               | 10537 |                      |
|  1 | PRIMARY     | <derived6>          | ALL    | NULL          | NULL       | NULL    | NULL                               |  6044 |                      |
|  1 | PRIMARY     | <derived7>          | ALL    | NULL          | NULL       | NULL    | NULL                               | 10714 |                      |
|  7 | DERIVED     | dvd_country2title   | index  | PRIMARY       | dogtag_pid | 8       | NULL                               | 12852 | Using index          |
|  7 | DERIVED     | country             | eq_ref | PRIMARY       | PRIMARY    | 4       | bruji_data.dvd_country2title.pId   |     1 |                      |
|  6 | DERIVED     | dvd_subtitles2title | index  | PRIMARY       | dogtag_pid | 8       | NULL                               | 16621 | Using index          |
|  6 | DERIVED     | languages           | eq_ref | PRIMARY       | PRIMARY    | 4       | bruji_data.dvd_subtitles2title.pId |     1 |                      |
|  5 | DERIVED     | dvd_languages2title | index  | PRIMARY       | dogtag_pid | 8       | NULL                               | 16957 | Using index          |
|  5 | DERIVED     | languages           | eq_ref | PRIMARY       | PRIMARY    | 4       | bruji_data.dvd_languages2title.pId |     1 |                      |
|  4 | DERIVED     | dvd_producer2title  | index  | PRIMARY       | dogtag_pid | 8       | NULL                               | 18071 | Using index          |
|  4 | DERIVED     | dvd_producer        | eq_ref | PRIMARY       | PRIMARY    | 4       | bruji_data.dvd_producer2title.pId  |     1 |                      |
|  3 | DERIVED     | dvd_director2title  | index  | PRIMARY       | dogtag_pid | 8       | NULL                               | 15032 | Using index          |
|  3 | DERIVED     | dvd_director        | eq_ref | PRIMARY       | PRIMARY    | 4       | bruji_data.dvd_director2title.pId  |     1 |                      |
|  2 | DERIVED     | dvd_writer2title    | index  | PRIMARY       | dogtag_pid | 8       | NULL                               | 16801 | Using index          |
|  2 | DERIVED     | dvd_writer          | eq_ref | PRIMARY       | PRIMARY    | 4       | bruji_data.dvd_writer2title.pId    |     1 |                      |
+----+-------------+---------------------+--------+---------------+------------+---------+------------------------------------+-------+----------------------+

it doesn’t help me, no, not without the entire query, plus a few days to study it…

Here’s the full query:

EXPLAIN
SELECT dvdpedia.*
  , dvd_rated.rated
  , d.director
  , w.writer
  , p.producer
  , DATE_FORMAT(dvdpedia.theatrical, '%Y-%m-%d') AS 'theatrical'
  , DATE_FORMAT(dvd_locale.releaseDate, '%Y-%m-%d') AS 'releaseDate'
  , dvd_aspectRatio.aspectRatio
  , l.languages
  , dvd_locale.region
  , dvd_locale.locale
  , dvd_locale.price
  , dvd_locale.videoFormat
  , dvd_images.upc
  , dvd_pictureFormat.pictureFormat
  , dvd_studio.studio
  , dvd_sound.sound
  , dvd_genre.genre
  , dvd_series.series
  , s.subtitles
  , c.country
  FROM dvdpedia 
  LEFT OUTER JOIN dvd_locale ON dvdpedia.dogTag = dvd_locale.dogTag	AND dvd_locale.locale = 'UK'
  LEFT OUTER JOIN dvd_studio ON dvdpedia.studioId = dvd_studio.id
  LEFT OUTER JOIN dvd_pictureFormat ON dvdpedia.pictureFormatId = dvd_pictureFormat.id
  LEFT OUTER JOIN dvd_images ON dvd_images.dogTag = dvdpedia.dogTag  
  LEFT OUTER JOIN dvd_rated ON dvd_rated.id = dvd_locale.ratedId 
  LEFT OUTER JOIN dvd_aspectRatio ON  dvdpedia.aspectRatioId = dvd_aspectRatio.id
  LEFT OUTER JOIN dvd_sound ON dvdpedia.soundId = dvd_sound.id
  LEFT OUTER JOIN dvd_genre ON dvdpedia.genreId = dvd_genre.id		
  LEFT OUTER JOIN dvd_series ON dvdpedia.seriesId = dvd_series.id
  LEFT OUTER JOIN ( SELECT dvd_writer2title.dogTag
  , GROUP_CONCAT(writer SEPARATOR ', ') AS writer
  FROM dvd_writer2title 
  INNER JOIN dvd_writer 
  ON dvd_writer.id = dvd_writer2title.pId
  GROUP BY dvd_writer2title.dogTag ) AS w
  ON w.dogTag = dvdpedia.dogTag		    
  LEFT OUTER JOIN ( SELECT dvd_director2title.dogTag,
  GROUP_CONCAT(director SEPARATOR ', ') AS director
  FROM dvd_director
  INNER JOIN dvd_director2title
  ON dvd_director.id = dvd_director2title.pid
  GROUP BY dvd_director2title.dogTag ) AS d
  ON d.dogTag = dvdpedia.dogTag
  LEFT OUTER JOIN ( SELECT dvd_producer2title.dogTag,
  GROUP_CONCAT(producer SEPARATOR ', ') AS producer
  FROM dvd_producer
  INNER JOIN dvd_producer2title
  ON dvd_producer.id = dvd_producer2title.pid
  GROUP BY dvd_producer2title.dogTag ) AS p
  ON p.dogTag = dvdpedia.dogTag
  LEFT OUTER JOIN (SELECT dvd_languages2title.dogTag, GROUP_CONCAT(languages.languages SEPARATOR ', ' ) AS languages
  FROM languages 
  INNER JOIN  dvd_languages2title 
  ON dvd_languages2title.pId = languages.id
  GROUP BY dvd_languages2title.dogTag) AS l
  ON l.dogTag = dvdpedia.dogTag
  LEFT OUTER JOIN (SELECT dvd_subtitles2title.dogTag, GROUP_CONCAT(languages.languages SEPARATOR ', ' ) AS subtitles
  FROM languages 
  INNER JOIN  dvd_subtitles2title 
  ON dvd_subtitles2title.pId = languages.id
  GROUP BY dvd_subtitles2title.dogTag) AS s
  ON s.dogTag = dvdpedia.dogTag
  LEFT OUTER JOIN (SELECT dvd_country2title.dogTag, GROUP_CONCAT(country SEPARATOR ', ') AS country
  FROM country 
  INNER JOIN dvd_country2title 
  ON country.id = dvd_country2title.pId
  GROUP BY dvd_country2title.dogTag) AS c
  ON c.dogTag = dvdpedia.dogTag
  WHERE dvdpedia.dogTag = '40';

wow, what a query

judging from a cursory glance at the EXPLAIN, it looks okay

what issues are you having with it?

i presume this isn’t the same issue as the ORs not performing as well as the UNIONs in your original post with the FULLTXT search…

Well it’s not issues as such, we are just trying to optimize as much as we can and of course when we see things in EXPLAIN that do not use indexes they seem ripe candidates for optimization. My suspicion is that it’s these queries here:

 LEFT OUTER JOIN ( SELECT dvd_writer2title.dogTag
  , GROUP_CONCAT(writer SEPARATOR ', ') AS writer
  FROM dvd_writer2title 
  INNER JOIN dvd_writer 
  ON dvd_writer.id = dvd_writer2title.pId
  GROUP BY dvd_writer2title.dogTag ) AS w
  ON w.dogTag = dvdpedia.dogTag  

causing the no-index situation. We are rewriting them like this now:

, GROUP_CONCAT(DISTINCT writer SEPARATOR ', ') AS writer

Then the simple JOIN by itself like the others. This produces a much better EXPLAIN output with all SIMPLE joins and only one empty value in “possible_keys” but the type is “index” there so I think we should be fine no?

A single SELECT will use no more than one index per table.

A UNION will use no more than one index per SELECT in the union.

Hence, the latter will make better use of indexes, as seen by the “Using index” in a lot of places in its EXPLAIN.

So from what you are saying UNIONs by their nature are truly faster than JOINs. However JOINs seem to be the preferred way of associating tables.

And I have to say that it really bugs me that a UNION, which is faster, spits out a join of type ALL which is evil according to the docs and yet the equivalent JOIN, which is slower, does not, the join types are all good:

+----+--------------+------------+----------+------------------------+------------------------+---------+------+------+-----------------------------+
| id | select_type  | table      | type     | possible_keys          | key                    | key_len | ref  | rows | Extra                       |
+----+--------------+------------+----------+------------------------+------------------------+---------+------+------+-----------------------------+
|  1 | PRIMARY      | dvdpedia   | fulltext | title_fulltext         | title_fulltext         | 0       |      |    1 | Using where; Using filesort |
|  2 | UNION        | dvdpedia   | fulltext | originalTitle_fulltext | originalTitle_fulltext | 0       |      |    1 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL      | NULL                   | NULL                   | NULL    | NULL | NULL | Using filesort              |
+----+--------------+------------+----------+------------------------+------------------------+---------+------+------+-----------------------------+

i think you misunderstand how a UNION works

i’m assuming the ALL means that it is simply accepting all rows from the two SELECTs, which is more efficient than trying to detect duplicates

SELECT …
UNION [ ALL | DISTINCT ]
SELECT …

as for the slowness, i believe that’s down to the “using filesort” which you forced upon it by your ORDER BY clause

I probably do. Any good, succinct explanations you can point me to (aside from the Mysql manual)?

you mean the concept? it’s really rather simple – each SELECT retrieves result rows separately from all other SELECTs, and all the result rows are fed into the same result table

consequently each SELECT must return exactly the same number of columns, and the columns have to be “union compatible” i.e. similar datatypes – string with strings, numbers with numbers, dates with dates

i don’t know of any resources which explain the EXPLAIN for unions, though

Yes, that bit I know, I was talking about the magic of UNIONs, when you said I misunderstood how it worked, that’s what came to my mind.