Repeated values, how to best handle

Now I have a related question: I need to build a credits table where I would put the actor’s roles that they’ve played in different movies. Obviously, this is tied to a particular movie, but I have the actors in a different table and use a many to many relationship to get the actor for the movies.

What would be the most efficient way of dealing with credits? Any ideas? I’d need a solution that doesn’t break the query I already have going.

Of course, that’s the whole point :slight_smile:
An actor can have more than 1 role in his life, and in that case he will have more than 1 row in this table.

Ok, spoke too soon: This works (sort of) but is now giving me the wrong movie title, it’s giving me entry #1 when it should be #2. Here’s the whole query:

SELECT DISTINCT dvdpedia.id, dvdpedia.title, group_concat(DISTINCT concat(`actors`.`name`, ' ',`actors`.`lastName`) ORDER BY actors.lastName separator ', ') AS `actors`, rated.rated, group_concat(DISTINCT concat(writers.name, " ", writers.lastName) separator ', ') AS writers
from dvdpedia join `actor2title` on (`dvdpedia`.`id` = `actor2title`.`titleId`) join `actors` on (`actor2title`.`pId` = `actors`.`id`) join rated on (dvdpedia.ratedId = rated.id) join writer2dvd on (dvdpedia.id = writer2dvd.titleId) join writers ON (writer2dvd.pId = writers.id) group by `dvdpedia`.`id` ORDER BY dvdpedia.title;

Hold the phone, through the magic of Google I found the proper use of the DISTINCT keyword. Used at the beginning of the query it doesn’t make a difference, used within the group_concat statement, it makes a world of difference and fixes the problem below (phew). I’m just leaving it here in the hopes that it might help other users.

sigh :frowning:

Now I have another problem. I just realized that if I add another many to many relationship query to the mix, say the writers for example, and I use the group_concat for writers I get the writers repeated as many times as there are actors:

George Gallo, George Gallo, George Gallo, George Gallo, George Gallo…

If I don’t use group_concat, I only get the first writer. Not sure how to fix this. The Sakila demo db doesn’t seem to have a similar situation for me to get an idea how to do it properly.

put the role that an actor played in a movie into the many-to-many table

CREATE TABLE actor2title
( pId INTEGER NOT NULL
, titleId INTEGER NOT NULL
, PRIMARY KEY ( pId , titleId )
, role VARCHAR(99)
);

Off Topic:

weather in toronto

:slight_smile:

Hello, things are looking good. Now I have a question about how to handle editions for movies. There is a lot of information that might change between one edition and another, all these fields for example:

title, features, image, duration, release, rated, sound, language, subtitles, region, media, # of discs, format, aspect ratio, price, etc.

So I need to put them in another table. My question is, what would be the best way to handle this? It seems that right now, the fields that remain the same are less than the fields that change. Also, the default edition (for lack of a better term) will also need to have an entry in the “edition” for the info stored there.

So I’ll end up with a table that is the main table (dvdpedia) but has little info and an edition table that has most of the info with multiple entries for each title in the main table.

Does that sound ok? It doesn’t “feel” right to me, I can’t help but wonder if I’m missing something. :shifty:

Here’s my query so far:

SELECT dvdpedia.id
     , dvdpedia.origTitle AS 'Original title'
     , rated.rated
     , w.writers
     , a.cast
     , studios.studio
     , aspectRatio.aspect
     , videoFormat.format
     , l.languages
     , country.country
  FROM dvdpedia 
LEFT OUTER JOIN edition ON edition.titleId = dvdpedia.id  
LEFT OUTER JOIN rated ON rated.id = edition.ratedId 
LEFT OUTER JOIN videoFormat ON edition.videoFormatId = videoFormat.id
LEFT OUTER JOIN aspectRatio ON  edition.aspectRatioId = aspectRatio.id
LEFT OUTER JOIN studios ON dvdpedia.studioId = studios.id
LEFT OUTER JOIN country ON dvdpedia.countryId = country.id

LEFT OUTER JOIN ( SELECT writer2dvd.titleId
              , GROUP_CONCAT(CONCAT(writers.name, ' ', writers.lastName) SEPARATOR ', ') AS writers
           FROM writer2dvd 
         INNER JOIN writers 
             ON writers.id = writer2dvd.pId
         GROUP 
             BY writer2dvd.titleId ) AS w
    ON w.titleId = dvdpedia.id
    
    LEFT OUTER JOIN ( SELECT actor2title.titleId, 
    				GROUP_CONCAT(CONCAT(actors.name, ' ', actors.lastName) SEPARATOR ', ') AS cast 
    			FROM actors 
	    		INNER JOIN actor2title 
	    			ON actors.id = actor2title.pId GROUP BY actor2title.titleId) AS a 
    ON a.titleId = dvdpedia.id
    
LEFT OUTER JOIN (SELECT lang2title.titleId, GROUP_CONCAT(languages.lang SEPARATOR ', ' ) AS languages
				FROM languages 
				INNER JOIN  lang2title ON lang2title.pId = languages.id) AS l
		ON l.titleId = edition.id;

that’s pretty much it

da manual has several examples

:slight_smile:

Thanks. Any pointers on how to set the indexes up? Or is it simply set up the columns I want to use fulltext search for and that’s it?

nope, not possible

Just the fact that that column cannot be empty, automatically negates the id column???

Even though I’ve got this:

id INT(11) NOT NULL AUTO_INCREMENT,

but that table ~is~ the thrid table, between actors and movies

it’s the second table where the actors are all stored.

It’s 2 tables with a many-to-one relationship as far as I can tell. There’s something I’m not getting here.

I’m confused, my understanding of a junction table is a third table that manages the relationship between 2 other tables. In this table, I do have multiple rows for each actor, one for each movie they’re in, but each distinct row corresponds to one and only one title in the dvdpedia table, there is no third table managing the relationship.

Also, I don’t understand what you mean by this:

and do you really want to allow the actor’s name to be used to identify the actor?

The actors are identified by an id in their table. What am I missing?

Hello again, I had to redo the structure of my database a bit and added fulltext searches. Everything seems to work, except that now I need to query for a specific actor and still have it come back with the list of all actors for that movie. So I have a query which works for the actor as such:

SELECT dvdpedia.id, dvdpedia.title, actor, d.director,  MATCH(actor) AGAINST('$actor' IN BOOLEAN MODE) AS relevance,

							FROM actor 
								INNER JOIN actor2title ON actor.id = actor2title.pId 
								INNER JOIN dvdpedia ON actor2title.titleId = dvdpedia.id 
								
						LEFT OUTER JOIN ( SELECT director2title.`titleId`, director
									FROM director
									INNER JOIN director2title
										ON director.id = director2title.pid
										GROUP BY director2title.titleId ) AS d
						ON d.titleId = dvdpedia.id
										
						WHERE MATCH(actor) AGAINST('$actor' IN BOOLEAN MODE) 
						ORDER BY relevance DESC

So I’m getting the correct info, but just that one actor, I need to also get a list of all actors that are associated with that title. So I amended it to this:

SELECT dvdpedia.id, dvdpedia.title, d.director,  
	MATCH(actor) AGAINST('$actor' IN BOOLEAN MODE) AS relevance
	, a.cast
							FROM actor 
								INNER JOIN actor2title ON actor.id = actor2title.pId 
								INNER JOIN dvdpedia ON actor2title.titleId = dvdpedia.id 
								
						LEFT OUTER JOIN ( SELECT director2title.titleId, director
									FROM director
									INNER JOIN director2title
										ON director.id = director2title.pid
										GROUP BY director2title.titleId ) AS d
						ON d.titleId = dvdpedia.id
						
						INNER JOIN ( SELECT actor2title.titleId, 
	    				GROUP_CONCAT(actor SEPARATOR ', ') AS cast 
	    			FROM actor 
		    		INNER JOIN actor2title 
		    			ON actor.id = actor2title.pId 
		    			GROUP BY actor2title.titleId) AS a 
	    ON a.titleId = dvdpedia.id
										
							WHERE MATCH(actor) AGAINST('$actor' IN BOOLEAN MODE) 
							ORDER BY relevance DESC

Then I have a table that is joining to itself which I’m sure is not good practice and it takes about 325 ms which is long. Apart from running 2 queries, one with the IDs of the other, I’m not sure how else to do it.

Thanks.

Apologies, but I’ve had to change the table structure and the above query no longer applies.

I now have no junction table, but have added the role column to the actors table so this table will have a titleId column which corresponds to which movie the actor is in.

CREATE TABLE `actor` (
  `id` int(11) NOT NULL auto_increment,
  `titleId` int(11) default NULL,
  `actor` varchar(255) NOT NULL default '',
  `role` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  FULLTEXT KEY `actor` (`actor`)
) ENGINE=MyISAM AUTO_INCREMENT=232 DEFAULT CHARSET=utf8

So now I need the same thing, a list of actors and their roles for each movie, along with the movie’s info. I have this query:

SELECT dvdpedia.id
     , dvdpedia.title
     , a.cast
      FROM dvdpedia
LEFT OUTER JOIN ( SELECT actor.titleId, 
    				GROUP_CONCAT(actor SEPARATOR ', ') AS cast 
    			FROM actor) AS a 
    ON a.titleId = dvdpedia.id
    WHERE 0=0

;

which puts all actors for all movies into the cast column of a single entry.

What am I doing wrong?

but that table ~is~ the thrid table, between actors and movies :slight_smile:

you’re not using the id, though –

actor VARCHAR(255) NOT NULL DEFAULT ‘’,

What r937 said what I was meaning.

a “junction” table, also called an association table, or a relationship table, or a linking table, or a many-to-many table, is exactly what you already have – actor2title

imagine how we feel – we don’t know what you’re trying to accomplish with the different relationships :slight_smile:

I’m sorry Chris, I don’t know what you are referring to here.

This is the explain based on your query, I had to modify it a bit to get the other data I needed:

+----+-------------+-------------+--------+---------------+---------+---------+----------------------------+------+---------------------------------------------+
| id | select_type | table       | type   | possible_keys | key     | key_len | ref                        | rows | Extra                                       |
+----+-------------+-------------+--------+---------------+---------+---------+----------------------------+------+---------------------------------------------+
|  1 | SIMPLE      | edition     | ALL    | NULL          | NULL    | NULL    | NULL                       |    5 | Using temporary; Using filesort             |
|  1 | SIMPLE      | actor2title | index  | PRIMARY       | PRIMARY | 8       | NULL                       |   16 | Using where; Using index; Using join buffer |
|  1 | SIMPLE      | actors      | eq_ref | PRIMARY       | PRIMARY | 4       | bruji_data.actor2title.pId |    1 | Using where                                 |
+----+-------------+-------------+--------+---------------+---------+---------+----------------------------+------+---------------------------------------------+

This is the actual query:

SELECT edition.id, edition.title, edition.image, CONCAT(actors.name, ' ', actors.lastName) AS cast
			  FROM actors
			INNER
			  JOIN actor2title
			    ON actor2title.pid = actors.id
			INNER
			  JOIN edition
			    ON edition.titleId = actor2title.titleId
			 WHERE actors.name LIKE '%$cast%' 
			    OR actors.lastName LIKE '%$cast' ORDER BY title

And this is the EXPLAIN from the query I had above (not copying it here again to avoid confusion):

+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table       | type            | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY            | edition     | index           | NULL          | PRIMARY | 4       | NULL |    5 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | actor2title | index           | NULL          | PRIMARY | 8       | NULL |   16 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | actors      | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 | Using where              |
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+--------------------------+

actually, you do

presumably this new actor table has multiple rows for each actor, one for each movie the actor is in, yes?

that’s a junction table :slight_smile:

and do you really want to allow the actor’s name to be used to identify the actor? what happens if it is misspelled on one of the several rows? loss of data integrity

:slight_smile: