Unique Key and NULLs

DoubleDee · September 19, 2013, 7:18pm

From what I have read - and did not know before - a MySQL field with a Unique Key/Index can contain NULLs.

Starting a few months ago, when I create a new table, I like to make the first field be “ID”, and set it as AutoIncrement, Null = No, Index = Unique. (That serves as a pseudo Primary Key.)

Then, when possible, I look for another field which has a “natural” key in it (e.g. article_slug, username, etc), and I make that my Primary Key. (If I can’t find a “natural” key, then I would just make the “ID” the Primary Key.)

Any issues with that approach?

To me, even if a Unique Indexed field in MySQL can contain NULLs, that wouldn’t be able to happen if you have the field set as AutoIncrement, Null = No. (Unless someone tinkered with things on the backend. And since only I have access to the database, that wouldn’t apply.)

Sincerely,

Debbie

r937 · September 19, 2013, 10:41pm

none that i can think of, and it has several advantages

the only minor issue is that if you can get away with creating a table with a PK only, then you save the overhead of the additional index

DoubleDee · September 19, 2013, 11:08pm

For consistency, would it be better to make my Surrogate ID fields the Primary Key, and then make any Natural Keys the Unique Key?

For example, in MySQL, which one is better…

(Current way)


ARTICLE
- id (UK)
- slug (PK)

(Alternative way)


ARTICLE
- id (PK)
- slug (UK)

Likewise, for a Junction Table, which approach is better…

(Current way)


ARTICLE_THREAD
- id (UK)
- article_id (PK1)(FK)
- member_id (PK2)(FK)
- subscribed_on (PK3)

(Alternative way)


ARTICLE_THREAD
- id (PK)
- article_id (UK1)(FK)
- member_id (UK2)(FK)
- subscribed_on (UK3)

Sincerely,

Debbie

r937 · September 20, 2013, 12:07am

put that question into the context of whether any part of the PK is going to allow NULLs, and it practically answers itself

DoubleDee · September 20, 2013, 1:45am

I don’t follow you…

Your statement is backwards.

The question not whether a Primary Key field can hold NULLs. (No it cannot!)

The question is whether a field defined with a Unique Index and Null = No is any more likely to allow Nulls (I said “Null = No”, right??) than a Primary Key.

And I am saying, “No!! There is no difference!”

In the one of the example tables I provided above…


ARTICLE
- id
- slug

“ID” is defined as MediumInt, AutoIncrement, Null = No (**So there can never be Nulls.)

“SLUG” is defined as Varchar(100), Null = No (**So there can never be Nulls.)

Since neither of those fields allows for NULLs without any Index at all, in my mind it should not matter if I choose to make each Unique Index or a Primary Key.

Right?

That was my original belief, and then some Oracle guy I know at work came along and made it seem like it was suicide to use a field that had a Unique Index on it in a relationship.

The more I think about it, that argument not only makes little sense, since I have just shown that it is the Null = No attribute which determines if a field is NULL or not. But think about it…

In MySQL, most people use a generic INDEX for Foreign Keys. And an Indexed field can hold Nulls, just as a Unique Index can hold Nulls. And yet you don’t hear anyone saying, “Your Foreign Key fields must be set as a Primary Key to disallow Null values.”

Right?

My point being, unless you make your Foreign Key a Primary or Composite Primary Key - which a lot of people don’t - you’d have the risk of there being Nulls using either INDEX or UNIQUE. Of course, the way to prevent this issue is by defining the attribute (right word?) Null = No.

Correct?

After thinking about this over supper, as long as my key fields have a Null = No, I really don’t see where using either of the two combinations below would make any real difference…


ARTICLE
- id (PK)
- slug (UK)


ARTICLE
- id (UK)
- slug (PK)

Care to differ?

Sincerely,

Debbie

DoubleDee · September 20, 2013, 2:06am

Since you doubt that I ever test anything…


CREATE TABLE `_test` (\
  `id` int(10) unsigned NOT NULL COMMENT '(UK)',\
  `slug` varchar(100) NOT NULL COMMENT '(UK)',\
  UNIQUE KEY `idx_u_id` (`id`),\
  UNIQUE KEY `idx_u_slug` (`slug`)\
) ENGINE=InnoDB DEFAULT CHARSET=latin1


INSERT INTO _test
(id, slug)
VALUES (NULL, 'test')


Error

SQL query:

INSERT INTO _test(
id,
slug
)
VALUES (
NULL , 'test'
)

MySQL said: Documentation
#1048 - Column 'id' cannot be null


INSERT INTO _test
(id, slug)
VALUES (1, NULL)


Error

SQL query:

INSERT INTO _test(
id,
slug
)
VALUES ( 1, NULL )

MySQL said: Documentation
#1048 - Column 'slug' cannot be null

I don’t see where a Primary Key field offers a whole lot more than a Unique Index (other than you don’t have to define Null = No explicitly with a PK)…

Sincerely,

Debbie

Jeff_Mott · September 20, 2013, 2:19am

I think it will make your application less flexible to changing requirements. What if one day, for example, you decide that users should be able to change their username? It would be awfully difficult to implement that feature if you used the username as a “natural” key, because every username would be repeated throughout other tables as a foreign key.

And I don’t think you would even get any benefit for your trouble. The only benefit I can fathom is the expectation of some kind of performance boost. Are natural keys supposed to perform faster? How much faster? Are they supposed to use less disk space? How much less? If you don’t know the answer to those questions, then you’re going into this blind, and it may turn out that you’re sacrificing flexibility for just a micro-optimization – or it may turn out not to be an optimization at all.

DoubleDee · September 20, 2013, 3:29am

My member table is one exception to my new rule of making “ID” a UK and finding a natural key for the PK. For that table, I left “ID” as the PK and have “username” as a UK.

Here is where I am coming from… (Right or not?!)

1.) Records - and most things in the Universe - are easier to refer to when they have a short, unique name.

This is why God create Serial #'s, SKU’s, Tele #'s, and so.

When I am developing, and I need to visually find a record for whatever reason, it is much easier to look for record 1234 than for “The record where field1 = ‘John Doe’ and field2 = ‘1970-02-01’ and field3 = ‘Arizona’…”

And in a production database with 100,000 records, that feature is even more useful.

2.) Creating a Surrogate Key called ID using AutoIncrement creates a nice, standard, expected numbering scheme for every table.

If there is also a good Natural Key, then you have choices! And if one doesn’t exist, you always have your Surrogate Key.

3.) At the same time, just creating an ID field that is your PK is academic in that it only ensures unique ID’s for records and not unique entities for records.

What good is a PK if you have this…


id	username
---	---------
1	DoubleDee
2	DoubleDee
3	DoubleDee
4	DoubleDee

If you say, “Well, put a UK on the ‘username’ field.”

Then I say, “Why not make ‘username’ your PK and keep ‘id’ as a secondary surrogate?” (Then when you decide to allow people to change Usernames, you have a fall-back if needed.)

Off Topic:

Primary Keys should not change. But if they need to, there is ON UPDATE CASCADE. Need to change ‘AmySmith77’ to ‘AmySmithJohnson77’? Then just let ON UPDATE CASCADE do the work automatically for you…

4.) If you just use Natural Keys, then you need to be aware that MySQL - or at least phpMyAdmin - sorts records not by order entered but alphabetically by the PK.

Need t know what order things were entered into the database? Hope you have a created_on field… (Unless, of course, you have an ID!)

5.) Having an ID field which is an Integer, will perform better than a VARCHAR(100), although some say it is negligible.

Of course, if you always have an ID field plus a Natural Key, you have more options…

6.) If I have an ID plus a Natural Key, I would normally use the ID for joins, but in some cases like my (former) article_placement table, this is easier to read…


id	section		subsection	article
---	--------	-----------	--------
27	finance		accounting	why-you-need-to-hire-a-cpa

Than something cryptic like this…


id	section		subsection	article
---	--------	-----------	--------
27	1		5		3795

In the past I would just create an ID field that was my PK and leave it at that.

Then I had to work with a really complex scheme of organizing the content of my website, and I saw the benefits of having Natural Keys in Junction Tables because they were “self-identifying”.

At the same time, I realized what a PITA it was to not be able to refer to a record by a simple number like 27.

So since them, I have been trying to have it BOTH WAYS, and I thought I had the perfect approach until some people at work started shooting down my ideas.

Based on my tests posted above, I still think my new approach holds water pretty well, but who knows?!

Sincerely,

Debbie

oddz · September 21, 2013, 10:00pm

Jeff_Mott:

I think it will make your application less flexible to changing requirements. What if one day, for example, you decide that users should be able to change their username? It would be awfully difficult to implement that feature if you used the username as a “natural” key, because every username would be repeated throughout other tables as a foreign key.

And I don’t think you would even get any benefit for your trouble. The only benefit I can fathom is the expectation of some kind of performance boost. Are natural keys supposed to perform faster? How much faster? Are they supposed to use less disk space? How much less? If you don’t know the answer to those questions, then you’re going into this blind, and it may turn out that you’re sacrificing flexibility for just a micro-optimization – or it may turn out not to be an optimization at all.

That is a good answer. In more cases than not I tend to error on the side of caution and just use auto increment PK. If something needs to be unique than I just make it unique end of story. I find that dealing with integer PKs is much easier than dealing with strings anyway. Especially when it comes to passing that type of info in URLs. There are several problems that are likely to occur if user input is used to reference items in the database that I just rather not deal with and succumb to using auto increments.