I am trying to better optimize my queries, and this is one that runs a ton…
SELECT trailers.trailer_id, trailers.trailer_title, films.film_title, films.image
FROM films,trailers,tags
WHERE tags.snub='$var'
AND tags.trailer_id=trailers.trailer_id
AND trailers.film_id=films.film_id
ORDER BY trailers.trailer_id DESC
LIMIT 0,10
This query uses three tables to get the information needed. Is this efficient?
A JOIN would properly implement ANSI-92 SQL, whereas you’re currently using the ANSI-89 spec. It should improve perfornance a bit, as it will limit your hash by two dimensions at a time (it will first hash the first two tables and then the 3rd, rather than all 3 at once). You may be able to further control the hashing by using a V-sproc with sub-queries doing a primary hash and secondary hash in secluded scopes.
However, more than likely, if your query isn’t performing well, you need to INDEX the keys that you’re joining on (film_id and trailer_id in all tables as well as tags.snub)… that’s the #1 reason a query runs too slowly, and will add huge increases in performance.
steve, not that i doubt what you’re saying (because i can’t understand it well enough to doubt it)…
… but could you give a reference or two for this hashing business? and how to determine whether to hash two tables or three?
also, what’s a V-sproc? and how do you write a subquery to “do” a hash? and what the heck is a secluded scope (other than perhaps an observatory on a mountaintop)?
Hashing is the process of comparing multiple records from 2 tables. Say you have 10 records in each of the 3 tables… hashing them all together would result in 1000 results, which would all be compared against the where clause to filter the results. INNER JOINS could potentially get a query plan that filters the results of the first hash first, which could result in say 20 records hashed against the 2nd 10, instead of 100, which results in a total of 300 records getting hashed instead of 1000.
A V-sproc is a way of organizing joins in a stored procedure to force an optimized query execution plan. It would look like this:
CREATE PROCEDURE vsproc AS BEGIN
SELECT tr.*,
f.film_title,
f.image
FROM
(SELECT
tr.trailer_id,
tr.trailer_title
FROM tags AS t
INNER JOIN trailers as tr ON t.trailer_id = tr.trailer_id
AND t.snub='$var'
) AS tr
INNER JOIN films AS f
ON tr.film_id = f.film_id
ORDER BY tr.trailer_id DESC
LIMIT 0,10
END
Notice each “scope” only joins 2 tables. You start with the scope that will limit results the most, and increasingly expand # of comparisons on outer bands of the query. This is most effective when joining several tables… you can really optimize execution time a lot by controlling the order of filtering and not trusting the DB to optimize the plan itself (which fails without fail).
transio, nice explanation, but i think those methods went out of fashion once the database optimizer achieved some measure of competence beyond your rather pessimistic “fails without fail”
i would expect that your two-at-a-time-from-the-inside-out method will today produce the same execution path as what mysql would produce from the original query, applying the most restrictive constraint first, etc.
that’s also how i like to code my complex joins, although i simply linearize them, i don’t nest them – most restrictive table first, and that’s the table that has the WHERE clause condition
not every time, no – just when the ORDER BY sequence is different from the sequence the data was retrieved in, and it has to put the intermediate rows somewhere