Is it a bad practice to use primary key in junction tables?

bad insofar as it’s unnecessary

also, there is the whole clustering index issue, which will store junction rows in creation sequence, rather than clustering them by one or the other of the two constituent foreign keys (assuming the pair is made PK)

Sorry - it was too early in the morning, and I wasn’t caffeinated enough. I was thinking parent/child table relationships, not junction tables.

I’ll go back and re-iterate that using a view (or stored procedure, though that would be overkill 90% of the time) would get around this entity framework issue

[quote=“DaveMaxwell, post:6, topic:197054, full:true”]… get around this entity framework issue
[/quote]
i’m afraid i don’t understand the issue

OP said “Symfony2 doesn’t allow that kind of table because I couldn’t create the entity class, it throws an error about primary keys (which doesn’t exist).”

perhaps OP did not try declaring the pair of FKs as the PK…

every table should have a PK

the PK for a junction table should/will be the pair of FKs

whichever of the two is chosen as the leading column in the composite PK, determines the storage sequence (see clustering index http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html)

More specifically, if you’re using Doctrine with Symfony2 (likely), then Doctrine will create and manage the junction table for you, automatically.

See their documentation on many-to-many mappings.

1 Like

If I’m understanding him correctly, it’s a framework/code/Symfony issue, not a db issue. I think he’s talking about Symfony requiring each database table to have a PK defined as a standard so that it can be accessed via it’s framework entity (i.e. PHP) structure. It has nothing to do with logical or unique keys, just a standard table structure component.

C# has a similar constraint in that if you want to access a table via the entity framework within the code, it by default required a primary key defined that is either a uniqueidentifier (the standard), or a GUID. You can have composite keys but it requires you to jump through a dozen hoops, and just becomes a maintenance nightmare.

It’s a bit annoying, but easily gotten around. You can either access via hard coded SQL (which people avoid like the plaque for some reason), or you can create views/stored procs which return bogus keys along with the result sets.

Having a composite primary key is certainly the ‘proper’ way to do it and then there should be no problem with the rows being represented as entity objects provided the ORM supports such configuration. However, I find I have a limit of tolerance on how many columns the PK will span - it’s OK for me to have two columns but once I had a 3-column PK and still needed to use it as a FK in another table - that seemed like too much noise and hassle to have all those columns and having to join by all of them - instead I just added a separate integer auto-increment PK and used that happily as the PK and FK.

Many ORMs and ActiveRecord libraries don’t allow composite keys. Just saying…

Why make life more difficult.

[quote=“oddz, post:12, topic:197054, full:true”]Why make life more difficult.
[/quote]
exactly my thinking!!

declare that auto_increment only when you have to… otherwise, don’t make life more difficult

before i discuss your “too much noise and hassle” as well as “having to join by all of them”…

… let me say that i’ve never seen a 3-column PK where none of the individual columns was a FK to somewhere else

if you have such an example, please do share

Did I say none of the individual columns was a FK? Sure they were FKs but that wasn’t the point I was making. These 3 columns (apart from being FKs) formed a composite PK of the table (let’s call it table A) and I had yet another table B which I needed to connect to A with a FK. If I kept the 3-column PK in table A then I’d have to use a 3-column FK in table B. When I added a single column PK in table A then I could use a single column FK in table B. Overall I didn’t see any problems with that and selects and joins became clearer because there was only one referencing column instead of 3. If there’s any problem with that I’d be happy to know!

Okay, now I’m officially lost. :smiley: I managed to solve the issue adding a primary & auto incremental ID for each table.

I still read the replies and try to understand.

[quote=“Lemon_Juice, post:15, topic:197054, full:true”]
Did I say none of the individual columns was a FK? Sure they were FKs but that wasn’t the point I was making. [/quote]i understood your point full well

in fact, you made two points – “too much noise and hassle” as well as “having to join by all of them”

i’m pleased you’ll let me try to explain my point too

however, i really wish we could use an actual example

i’d hate to lose you by referring to columns such as “column1” and “column2” because the concept is way easier to understand with actual columns and not placeholders

any chance you could make up an example?

Ha! I couldn’t find the original database where I had this particular case so I made a new example, maybe it’s slightly on the edge of practical realness but should illustrate the point well:

https://dl.dropboxusercontent.com/u/13230070/screenshots/fks-model.png

Practical usage: there are users and photos and each user can choose any number of favourite photos (user_has_photo). Each photo he adds to his favourites he can annotate with any number of labels. Each label appears on the site in shape of a rectangle and the rectangle can consist of any number of colours chosen by the user - then a rainbow-like gradient is formed from the colours the user has chosen for his label of his favourited photo.

As you can see the table user_has_photo_has_label_has_colour has a 4-column PK and a 3-column FK that connects it to user_has_photo_has_label. If user_has_photo_has_label had an additional one-column PK, then we could use it in user_has_photo_has_label_has_colour instead of the thee-column FK - and to me it would look a bit clearer.

So, what do you say? :smiley:

SQL:

CREATE TABLE `user` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `photo` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `filename` VARCHAR(50) NOT NULL,
  `description` TEXT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `user_has_photo` (
  `user_id` INT NOT NULL,
  `photo_id` INT NOT NULL,
  PRIMARY KEY (`user_id`, `photo_id`),
  INDEX `fk_user_has_photo_photo1_idx` (`photo_id` ASC),
  CONSTRAINT `fk_user_has_photo_user`
    FOREIGN KEY (`user_id`)
    REFERENCES `user` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_user_has_photo_photo1`
    FOREIGN KEY (`photo_id`)
    REFERENCES `photo` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE TABLE `label` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(40) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `user_has_photo_has_label` (
  `user_has_photo_user_id` INT NOT NULL,
  `user_has_photo_photo_id` INT NOT NULL,
  `label_id` INT NOT NULL,
  PRIMARY KEY (`user_has_photo_user_id`, `user_has_photo_photo_id`, `label_id`),
  INDEX `fk_user_has_photo_has_label_label1_idx` (`label_id` ASC),
  CONSTRAINT `fk_user_has_photo_has_label_user_has_photo1`
    FOREIGN KEY (`user_has_photo_user_id` , `user_has_photo_photo_id`)
    REFERENCES `user_has_photo` (`user_id` , `photo_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_user_has_photo_has_label_label1`
    FOREIGN KEY (`label_id`)
    REFERENCES `label` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE TABLE `colour` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `colour` VARCHAR(10) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `user_has_photo_has_label_has_colour` (
  `user_has_photo_has_label_user_has_photo_user_id` INT NOT NULL,
  `user_has_photo_has_label_user_has_photo_photo_id` INT NOT NULL,
  `user_has_photo_has_label_label_id` INT NOT NULL,
  `colour_id` INT NOT NULL,
  PRIMARY KEY (`user_has_photo_has_label_user_has_photo_user_id`, `user_has_photo_has_label_user_has_photo_photo_id`, `user_has_photo_has_label_label_id`, `colour_id`),
  INDEX `fk_user_has_photo_has_label_has_colour_colour1_idx` (`colour_id` ASC),
  CONSTRAINT `fk_user_has_photo_has_label_has_colour_user_has_photo_has_lab1`
    FOREIGN KEY (`user_has_photo_has_label_user_has_photo_user_id` , `user_has_photo_has_label_user_has_photo_photo_id` , `user_has_photo_has_label_label_id`)
    REFERENCES `user_has_photo_has_label` (`user_has_photo_user_id` , `user_has_photo_photo_id` , `label_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_user_has_photo_has_label_has_colour_colour1`
    FOREIGN KEY (`colour_id`)
    REFERENCES `colour` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

that’s lovely

i’m on my way out but at first glance it will do splendidly

not sure why a label would have colours, but let’s carry on because the structure is the issue

EDIT –

you would want an auto_increment on user_has_photo_has_label_has_colour

  1. less noise and hassle, check

  2. not having to join… bzzzt, wrong :slight_smile: (i’m just having fun)

sample query: which users have used lavender as a label – you need an extra join

Just assume it’s a super critical feature to have user configurable colourful labels :smiley:

actually, I wanted an AI on user_has_photo_has_label and reference it in user_has_photo_has_label_has_colour - but it looks like you meant that?

Did you mean lavender in a label? You are right on this one, this would need an extra join. And also “which photos are labelled with lavender colour” would also need an extra join.

well, I never said anything about not having to join!

Ok, let me check :slight_smile: When looking for users or photos which have a label of a colour - more hassle because of an extra join.

But those are not common queries I would want to run. A much more common query would be one used to fetch labels with their colours on a page. Let’s say there is a page with a user’s favourited photo (corresponding to user_has_photo) and I want to fetch all labels with their colours. How would you do this? Here is my take:

SELECT label.name,
    (SELECT GROUP_CONCAT(colour)
     FROM colour
     
     INNER JOIN user_has_photo_has_label_has_colour
     ON user_has_photo_has_label_has_colour.colour_id=colour.id
     
     WHERE
      user_has_photo_has_label_has_colour.user_has_photo_has_label_user_has_photo_user_id=user_has_photo_has_label.user_has_photo_user_id
      AND user_has_photo_has_label_has_colour.user_has_photo_has_label_user_has_photo_photo_id=user_has_photo_has_label.user_has_photo_photo_id
      AND user_has_photo_has_label_has_colour.user_has_photo_has_label_label_id=user_has_photo_has_label.label_id
    ) AS label_colours
     
    FROM user_has_photo_has_label
    
    INNER JOIN label
    ON label.id=user_has_photo_has_label.label_id
    
    WHERE
     user_has_photo_has_label.user_has_photo_user_id=1
     AND user_has_photo_has_label.user_has_photo_photo_id=2

If I had an AI PK on user_has_photo_has_label then this would be simpler:

SELECT label.name,
    (SELECT GROUP_CONCAT(colour)
     FROM colour
     
     INNER JOIN user_has_photo_has_label_has_colour
     ON user_has_photo_has_label_has_colour.colour_id=colour.id
     
     WHERE
      user_has_photo_has_label_has_colour.user_has_photo_has_label_id=user_has_photo_has_label._id
    ) AS label_colours
     
    FROM user_has_photo_has_label
    
    INNER JOIN label
    ON label.id=user_has_photo_has_label.label_id
    
    WHERE
     user_has_photo_has_label.user_has_photo_user_id=1
     AND user_has_photo_has_label.user_has_photo_photo_id=2

In this case the simplification is in the subquery WHERE clause and not in a join but this is what I meant. You are right that some selects will require an additional join with the added AI PK but considering those would be uncommon selects I’m not yet 100% sold - this would depend on specific requirements. The 3-column WHERE clause in the first subquery is what I mean by “noise and hassle” :slight_smile:

[quote=“Lemon_Juice, post:20, topic:197054, full:true”]
well, I never said anything about not having to join!
[/quote]quite right, you said the join would involve more columns, which is kinda just “less noise and hassle” again

it was me who mentioned you’d have to do extra joins :wink:

you’re using an unnecessarily complex query which makes it harder to write than it should be

try this –

SELECT label.name , GROUP_CONCAT(colour.colour) AS colours FROM user_has_photo_has_label AS upl INNER JOIN label ON label.id = upl.label_id INNER JOIN user_has_photo_has_label_has_colour AS uplc ON uplc.user_id = upl.user_id AND uplc.photo_id = upl.photo_id AND uplc.label_id = upl.label_id INNER JOIN colour ON colour.id = uplc.colour_id WHERE upl.user_id = 1 AND upl.photo_id = 2 GROUP BY label.name

(some unnecessarily verbose names were mercifully renamed)

you are welcome to rewrite this for your auto_increment scenario

:slight_smile:

These table names are hurting my brain. :-p Why wouldn’t you call that sucker “user_favorite_photo_label_colors”?

Overall I agree that your practical usage example looks like a situation where it’s better to have a single-column primary key in what would otherwise be a junction table. But I think you need to emphasize more clearly that the problem arises only when you need to reference the junction as a foreign key. Because otherwise, the composite key could span 10 columns and it wouldn’t be a problem. It’s only when you need to relate the junction as a foreign key that even a 2-column key would be inconvenient.

For what it’s worth, in Doctrine terminology (circling back to OP), we would probably stop thinking of these things as junction tables and start thinking of them as separaite entities, and rather than a single many-to-many relationship with an implicit junction table, instead there would be a new entity with two one-to-many relationships. In Juice’s example, for instance, there would be a UserFavoritePhoto entity with a one-to-many relationship with users and a one-to-many relationship with photos.

in any data modelling discussion as well

only techies think of “junction” tables as instantiations of many-to-many relationships

each FK is its own entity relationship

especially in grandchildren structures like userfavouritephotos_gots_labels_gots_colors

To my brain they are of equal complexity - you are doing one more join while I’m doing a subquery. But certainly yours is much better formatted and the aliases also make it more readable.

I will do it in my mind - I’m getting rid of the two conditions in the second join :smile:

This was a very quick act of designing a database and I just let Mysql Workbench leave the default table names :smiley:

I wouldn’t say it is very inconvenient - it can be a slight inconvenience having to use a composite key but as r937 showed getting rid of the composite FK can have its drawbacks, too.

Correct - but I think a good ORM would be able to make out all the relations based on the composite keys without having to replace them with AI primary keys - in other words, this would be transparent to the user (developer).

Yes, but I fear that the deeper we go down the chain of relationships the farther the translated objects would be in appearance and behaviour from the database and we would end up with two very distinct worlds. That is of course the goal of ORM but I’m not sure if this would make things easier for the developer. I’ve never really queried such complex data structures with ORM but I suppose the task of getting all user’s labels with colours would be quite an undertaking without resorting to SQL.