Is it a bad practice to use primary key in junction tables?

Let’s say I’ve got three tables: user, photos and userxphotos. I use userxphotos table as a junction table, but as far as I understand, Symfony2 doesn’t allow that kind of table because I couldn’t create the entity class, it throws an error about primary keys (which doesn’t exist).

Is it a bad practice to use primary key (Auto incremental, of course) in junction tables?

It’s not bad practice, but can be unecessary in some situations. Personally, I prefer them because it makes for easier joins on tables. And since MOST entity frameworks require a primary key, it doesn’t hurt to have it.

What you COULD do is create a view which does the joins for you and use that in your entity framework. Not the most elegant solution, but it does work if you don’t want to override the SQL in you data access layer.

Edit: Corrected statement above - Rudy’s questioning it made me re-think it and realize I was incorrect. See my response to him below…

1 Like

Oh, thank you.

please, do explain this point, because i just cannot see it

bad insofar as it’s unnecessary

also, there is the whole clustering index issue, which will store junction rows in creation sequence, rather than clustering them by one or the other of the two constituent foreign keys (assuming the pair is made PK)

Sorry - it was too early in the morning, and I wasn’t caffeinated enough. I was thinking parent/child table relationships, not junction tables.

I’ll go back and re-iterate that using a view (or stored procedure, though that would be overkill 90% of the time) would get around this entity framework issue

[quote=“DaveMaxwell, post:6, topic:197054, full:true”]… get around this entity framework issue
[/quote]
i’m afraid i don’t understand the issue

OP said “Symfony2 doesn’t allow that kind of table because I couldn’t create the entity class, it throws an error about primary keys (which doesn’t exist).”

perhaps OP did not try declaring the pair of FKs as the PK…

every table should have a PK

the PK for a junction table should/will be the pair of FKs

whichever of the two is chosen as the leading column in the composite PK, determines the storage sequence (see clustering index http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html)

More specifically, if you’re using Doctrine with Symfony2 (likely), then Doctrine will create and manage the junction table for you, automatically.

See their documentation on many-to-many mappings.

1 Like

If I’m understanding him correctly, it’s a framework/code/Symfony issue, not a db issue. I think he’s talking about Symfony requiring each database table to have a PK defined as a standard so that it can be accessed via it’s framework entity (i.e. PHP) structure. It has nothing to do with logical or unique keys, just a standard table structure component.

C# has a similar constraint in that if you want to access a table via the entity framework within the code, it by default required a primary key defined that is either a uniqueidentifier (the standard), or a GUID. You can have composite keys but it requires you to jump through a dozen hoops, and just becomes a maintenance nightmare.

It’s a bit annoying, but easily gotten around. You can either access via hard coded SQL (which people avoid like the plaque for some reason), or you can create views/stored procs which return bogus keys along with the result sets.

Having a composite primary key is certainly the ‘proper’ way to do it and then there should be no problem with the rows being represented as entity objects provided the ORM supports such configuration. However, I find I have a limit of tolerance on how many columns the PK will span - it’s OK for me to have two columns but once I had a 3-column PK and still needed to use it as a FK in another table - that seemed like too much noise and hassle to have all those columns and having to join by all of them - instead I just added a separate integer auto-increment PK and used that happily as the PK and FK.

Many ORMs and ActiveRecord libraries don’t allow composite keys. Just saying…

Why make life more difficult.

[quote=“oddz, post:12, topic:197054, full:true”]Why make life more difficult.
[/quote]
exactly my thinking!!

declare that auto_increment only when you have to… otherwise, don’t make life more difficult

before i discuss your “too much noise and hassle” as well as “having to join by all of them”…

… let me say that i’ve never seen a 3-column PK where none of the individual columns was a FK to somewhere else

if you have such an example, please do share

Did I say none of the individual columns was a FK? Sure they were FKs but that wasn’t the point I was making. These 3 columns (apart from being FKs) formed a composite PK of the table (let’s call it table A) and I had yet another table B which I needed to connect to A with a FK. If I kept the 3-column PK in table A then I’d have to use a 3-column FK in table B. When I added a single column PK in table A then I could use a single column FK in table B. Overall I didn’t see any problems with that and selects and joins became clearer because there was only one referencing column instead of 3. If there’s any problem with that I’d be happy to know!

Okay, now I’m officially lost. :smiley: I managed to solve the issue adding a primary & auto incremental ID for each table.

I still read the replies and try to understand.

[quote=“Lemon_Juice, post:15, topic:197054, full:true”]
Did I say none of the individual columns was a FK? Sure they were FKs but that wasn’t the point I was making. [/quote]i understood your point full well

in fact, you made two points – “too much noise and hassle” as well as “having to join by all of them”

i’m pleased you’ll let me try to explain my point too

however, i really wish we could use an actual example

i’d hate to lose you by referring to columns such as “column1” and “column2” because the concept is way easier to understand with actual columns and not placeholders

any chance you could make up an example?

Ha! I couldn’t find the original database where I had this particular case so I made a new example, maybe it’s slightly on the edge of practical realness but should illustrate the point well:

https://dl.dropboxusercontent.com/u/13230070/screenshots/fks-model.png

Practical usage: there are users and photos and each user can choose any number of favourite photos (user_has_photo). Each photo he adds to his favourites he can annotate with any number of labels. Each label appears on the site in shape of a rectangle and the rectangle can consist of any number of colours chosen by the user - then a rainbow-like gradient is formed from the colours the user has chosen for his label of his favourited photo.

As you can see the table user_has_photo_has_label_has_colour has a 4-column PK and a 3-column FK that connects it to user_has_photo_has_label. If user_has_photo_has_label had an additional one-column PK, then we could use it in user_has_photo_has_label_has_colour instead of the thee-column FK - and to me it would look a bit clearer.

So, what do you say? :smiley:

SQL:

CREATE TABLE `user` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `photo` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `filename` VARCHAR(50) NOT NULL,
  `description` TEXT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `user_has_photo` (
  `user_id` INT NOT NULL,
  `photo_id` INT NOT NULL,
  PRIMARY KEY (`user_id`, `photo_id`),
  INDEX `fk_user_has_photo_photo1_idx` (`photo_id` ASC),
  CONSTRAINT `fk_user_has_photo_user`
    FOREIGN KEY (`user_id`)
    REFERENCES `user` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_user_has_photo_photo1`
    FOREIGN KEY (`photo_id`)
    REFERENCES `photo` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE TABLE `label` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(40) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `user_has_photo_has_label` (
  `user_has_photo_user_id` INT NOT NULL,
  `user_has_photo_photo_id` INT NOT NULL,
  `label_id` INT NOT NULL,
  PRIMARY KEY (`user_has_photo_user_id`, `user_has_photo_photo_id`, `label_id`),
  INDEX `fk_user_has_photo_has_label_label1_idx` (`label_id` ASC),
  CONSTRAINT `fk_user_has_photo_has_label_user_has_photo1`
    FOREIGN KEY (`user_has_photo_user_id` , `user_has_photo_photo_id`)
    REFERENCES `user_has_photo` (`user_id` , `photo_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_user_has_photo_has_label_label1`
    FOREIGN KEY (`label_id`)
    REFERENCES `label` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE TABLE `colour` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `colour` VARCHAR(10) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

CREATE TABLE `user_has_photo_has_label_has_colour` (
  `user_has_photo_has_label_user_has_photo_user_id` INT NOT NULL,
  `user_has_photo_has_label_user_has_photo_photo_id` INT NOT NULL,
  `user_has_photo_has_label_label_id` INT NOT NULL,
  `colour_id` INT NOT NULL,
  PRIMARY KEY (`user_has_photo_has_label_user_has_photo_user_id`, `user_has_photo_has_label_user_has_photo_photo_id`, `user_has_photo_has_label_label_id`, `colour_id`),
  INDEX `fk_user_has_photo_has_label_has_colour_colour1_idx` (`colour_id` ASC),
  CONSTRAINT `fk_user_has_photo_has_label_has_colour_user_has_photo_has_lab1`
    FOREIGN KEY (`user_has_photo_has_label_user_has_photo_user_id` , `user_has_photo_has_label_user_has_photo_photo_id` , `user_has_photo_has_label_label_id`)
    REFERENCES `user_has_photo_has_label` (`user_has_photo_user_id` , `user_has_photo_photo_id` , `label_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_user_has_photo_has_label_has_colour_colour1`
    FOREIGN KEY (`colour_id`)
    REFERENCES `colour` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

that’s lovely

i’m on my way out but at first glance it will do splendidly

not sure why a label would have colours, but let’s carry on because the structure is the issue

EDIT –

you would want an auto_increment on user_has_photo_has_label_has_colour

  1. less noise and hassle, check

  2. not having to join… bzzzt, wrong :slight_smile: (i’m just having fun)

sample query: which users have used lavender as a label – you need an extra join

Just assume it’s a super critical feature to have user configurable colourful labels :smiley:

actually, I wanted an AI on user_has_photo_has_label and reference it in user_has_photo_has_label_has_colour - but it looks like you meant that?

Did you mean lavender in a label? You are right on this one, this would need an extra join. And also “which photos are labelled with lavender colour” would also need an extra join.

well, I never said anything about not having to join!

Ok, let me check :slight_smile: When looking for users or photos which have a label of a colour - more hassle because of an extra join.

But those are not common queries I would want to run. A much more common query would be one used to fetch labels with their colours on a page. Let’s say there is a page with a user’s favourited photo (corresponding to user_has_photo) and I want to fetch all labels with their colours. How would you do this? Here is my take:

SELECT label.name,
    (SELECT GROUP_CONCAT(colour)
     FROM colour
     
     INNER JOIN user_has_photo_has_label_has_colour
     ON user_has_photo_has_label_has_colour.colour_id=colour.id
     
     WHERE
      user_has_photo_has_label_has_colour.user_has_photo_has_label_user_has_photo_user_id=user_has_photo_has_label.user_has_photo_user_id
      AND user_has_photo_has_label_has_colour.user_has_photo_has_label_user_has_photo_photo_id=user_has_photo_has_label.user_has_photo_photo_id
      AND user_has_photo_has_label_has_colour.user_has_photo_has_label_label_id=user_has_photo_has_label.label_id
    ) AS label_colours
     
    FROM user_has_photo_has_label
    
    INNER JOIN label
    ON label.id=user_has_photo_has_label.label_id
    
    WHERE
     user_has_photo_has_label.user_has_photo_user_id=1
     AND user_has_photo_has_label.user_has_photo_photo_id=2

If I had an AI PK on user_has_photo_has_label then this would be simpler:

SELECT label.name,
    (SELECT GROUP_CONCAT(colour)
     FROM colour
     
     INNER JOIN user_has_photo_has_label_has_colour
     ON user_has_photo_has_label_has_colour.colour_id=colour.id
     
     WHERE
      user_has_photo_has_label_has_colour.user_has_photo_has_label_id=user_has_photo_has_label._id
    ) AS label_colours
     
    FROM user_has_photo_has_label
    
    INNER JOIN label
    ON label.id=user_has_photo_has_label.label_id
    
    WHERE
     user_has_photo_has_label.user_has_photo_user_id=1
     AND user_has_photo_has_label.user_has_photo_photo_id=2

In this case the simplification is in the subquery WHERE clause and not in a join but this is what I meant. You are right that some selects will require an additional join with the added AI PK but considering those would be uncommon selects I’m not yet 100% sold - this would depend on specific requirements. The 3-column WHERE clause in the first subquery is what I mean by “noise and hassle” :slight_smile: