Repeated values, how to best handle

Hello, I am making a movie database and wanted to ask opinions on how to separate the actors from the title tables.

I was thinking of a title table with duration, comments, etc but then another table for actors and an actorId field in the title table to refer to the actors. However this results in a many to many relationship so I need to build a junction table for that. Pretty standard stuff, but was wondering if anyone had any other ideas on how to better do this.

Thanks.

Ahhhhhhh, gotcha. Thanks a lot for the explanation.

How’s the weather in Toronto by the way? I lived there for a year, loved the city, hated the winters :slight_smile:

GROUP BY produces one row for every distinct combination of values of the GROUP BY columns

so if you write GROUP BY foo,bar, you get one row for every unique combination of values of the foo and bar columns

if you write GROUP BY titleId, you get one row for every titleId

thus if your subquery accesses a table that has multiple rows per titleId, and you GROUP BY titleId, then the subquery will produce only one row per titleId, which can then be safely joined to the other tables in the outer query

Quite so, thanks! Forgetting the GROUP BY was causing all the languages to show up in the first movie edition only. Not sure why that is, but glad it works.

So does the whole edition and current query seem ok to you?

just looking at the syntax, i think so

of course, i’m not really all that familiar with your data

If I may try your patience for a bit, could you tell me why the GROUP BY clause causes the subqueries to work properly. There’s always RTFM, but in the case of subqueries I am a bit confused, trying to make sense of all of this :slight_smile:

you forgot the GROUP BY in the third subquery

also, you don’t have any edition columns in the SELECT clause, so the results of the query will surely be confusing

Back again to the table structure because I am unsure how to handle something.

I have the main table, with a junction table for director and another table for actor which has one row for each actor and role and a foreign key for the main table to link them.

I am doing a keyword search, where the keyword can be either title (dvdpedia table), director (director table via the junction table), or actor (actor table). I don’t know what people will be looking for, could be any of these three things.

So this works for matching title and director:

SELECT DISTINCT dvdpedia.id
        , dvdpedia.title
        , locale
        , d.director
        FROM dvdpedia 
            
        LEFT OUTER JOIN ( SELECT director2title.titleId,
                            GROUP_CONCAT(director SEPARATOR ', ') AS director
                            FROM director
                            INNER JOIN director2title
                                ON director.id = director2title.pid
                                GROUP BY director2title.titleId ) AS d
            ON d.titleId = dvdpedia.id

                                    
WHERE MATCH(title) AGAINST(:title IN BOOLEAN MODE)
OR d.director LIKE :director";

:title and :director are placeholder variables

However, I’m unsure how to match also actors. Since the actor table does not work through a junction table, I can’t do the same trick. I tried doing a

WHERE dvdpedia.id IN (SELECT titleId FROM actor WHERE MATCH(title) AGAINST(:actor IN BOOLEAN MODE))

Which works, but takes an obscene amount of time to run. So I’m hoping there is a much better way to do this.

As always, thanks in advance.

Hello, I’m coming back to a fulltext issue I’m having:

If I search the actor table for actor and role (2 fields in the table). When the role field is empty, I’m getting no results with a fulltext search.

Sometimes this column WILL be empty. How can I get results for this combination? It seems odd that if the column is empty, there wouldn’t be a match no? I mean, I do get a result using a similar query with LIKE instead of fulltext search. I haven’t seen anything about this in the manual.

Ok, I’ll try it as soon as I get home.

“if you also wanted to collapse the actors into a concatenated value, you would do that in a subquery as well”

This query IS faster, but produces one row for each actor in a movie, see attached screenshot.

let’s use a portion of your database to illustrate

one movie can have multiple actors, and one movie can have multiple writers

try to combine these relationships in a single query, and boom, you get cross join effects

one thing you could do is collapse one of these relationships in a subquery, so that the subquery produces a single row per movie, with GROUP_CONCAT to collapse multiple values to one (concatenated) value


SELECT dvdpedia.id
     , dvdpedia.title
     , rated.rated
     , w.writers
     , actors.name
     , actors.lastName
  FROM dvdpedia 
INNER
  JOIN actor2title 
    on actor2title.titleId = dvdpedia.id
INNER
  JOIN actors 
    on actors.id = actor2title.pId
INNER
  JOIN rated 
    on rated.id = dvdpedia.ratedId 
INNER
  JOIN ( SELECT writer2dvd.titleId
              , GROUP_CONCAT(CONCAT(writers.name
                                  , ' '
                                  , writers.lastName) AS writers
           FROM writer2dvd 
         INNER
           JOIN writers 
             ON writers.id = writer2dvd.pId
         GROUP 
             BY writer2dvd.titleId ) AS w
    ON w.titleId = dvdpedia.id

thus the outer query has only one one-to-many relationship, along with the one-to-one relationship with the subquery

if you also wanted to collapse the actors into a concatenated value, you would do that in a subquery as well

It does, but I don’t know how to fix it. Any pointers?

it does not fix the problem, it merely papers over it

your query tries to combine multiple one-to-many relationships, with the result that you are getting cross join effects (multiplied results)

using DISTINCT within the GROUP_CONCAT merely collapses the output, you are still executing a horrendously inefficient query underneath

does this explanation make sense to you?

of course!!!

there will be as many “repeats” for a single actor as there are movies that this actor played in

they aren’t really repeats, of course, because you would not allow the same actor to play the same role in the same movie more than once

however, this brings up the interesting point that you might want to have the same actor play more than one role in the same movie

for example…

Coming To America - Eddie Murphy - Prince Akeem
Coming To America - Eddie Murphy - Clarence
Coming To America - Eddie Murphy - Randy Watson
Coming To America - Eddie Murphy - Saul

in this case you need to change the table’s primary key

CREATE TABLE actor2title
( pId INTEGER NOT NULL
, titleId INTEGER NOT NULL
, role VARCHAR(99) NOT NULL
, PRIMARY KEY ( pId , titleId , role )
);

and of course now you need to specify a role for every actor in every movie, since no portion of a primary key may be null

:slight_smile:

Yes, use a LEFT OUTER JOIN instead of an INNER JOIN.

Yet another question: The above is working beautifully, except that if the movie has a missing entry (say no actors associated yet) the query will not list it. Is there a way to list movies with NULL entries?

But doesn’t that defeat the whole point of normalization of not repeating the same value? Also, it would make searches based on actors a lot slower no?

But the actor will play different roles in different movies, so I would end up with repeated rows for the same actor inside that table???