Best database structure for this example

Hi!

I need to create a database of users who will be identified by the following fields: id (primary key), username, password, role.

The role field would be a number that I’m going to use to show information only to certain users (if a user has role 3 he can see information entered by users of level 1, 2 and 3 but not info entered by users of role 4).

Do you think it’s better to keep a single table or is it better to split it into 2 tables, the first one containing the fields id, username and password and the second one containing id and role?

Ok, I’m happy about that ;D

Wait a second, but if I use the username field as the key instead of using an integer id, I’m not using any extra space because the username must be there anyway. Or is there something that I’m missing here?

The way I would approach this is as follows. There are two circumstances where properties are pushed off into a new table

  1. Where there is a 1:many relationship between the object and the property
  2. Where most objects will not have a subset of properties

So, in this question, the fundamental question is if it is ever possible that a user could have more than one role. If these are mutually exclusive (eg heirachical) then there is no need for a separate table.

Irrespective, there is nothing to stop there being a separate table where the role is enumerated (eg 3 = ‘administrator’)

The argument about primary key of username or ID - apart from the humour of tit for tat *****ing - I would always go for a key since yes, this means an extra column on this table but it also means that every other table that the user touches does not need a string column for the foreign key. It also means that I can offer the option to change the username without having to visit every other table to change the key.

Wow dportas… that has to be the greatest first post ever. Welcome to the forums.

In my experience (which is admittedly not as extensive as the gurus above) I never use a field that I will display to a user as a key field. I have been caught out enough times believing that a field would always be unique then later on finding that I it isn’t.

My rule of thumb is therefore that I don’t use an actual data field as a key. That generally translates into every table having a auto-increment Id field as the key to the table.

nope :slight_smile:

I think he’s suggesting that storing the username for the FKs takes a lot more disk space than storing INTs for FKs.
Which is true of course.

i don’t understand the point you’re making

if username (string) is the primary key, then yes, it would get used whenever a row in a related table is inserted

if userid (integer) is the primary key, then yes, it would get used whenever a row in a related table is inserted

the difference is, if userid (integer) is the primary key, it gets an index, but you ~also~ have to have a unique index on username

if username (string) is the primary key, then you have only that one index, not two

If you use the username as the key, in any other tables which have a relationship to those fields you would have to put the username in them as well.

Example:
users table

  • username
  • display_name
  • password

user_posts

  • posted_by <- username used in this field
  • post
  • date

If username is your id, then you’d have to use it for every one of your posts… unless r937 knows some magic trick I don’t…

Unfortunately Third Normal Form is often misquoted in this way and that can lead to a lot of confusion. The more precise description of 3NF is that a relation R satisfies 3NF if and only if, for every functional dependency A->B satisfied by R, ONE of the following three conditions is true. Either:

(1) B is a subset of A (i.e. A->B is “trivial”)
(2) A is a superkey
(3) B is a subset of a candidate key.

This could be written very loosely and informally as “every non-key attribute is dependent on every candidate key”. The critical and often overlooked words are every candidate key. Like all normal forms, 3NF is concerned with candidate keys and is entirely agnostic about primary keys. It’s very unfortunate that there are so many examples around that only show tables with one key and that go on to explain 3NF in terms of dependency on a single key without of making it clear that all the keys are equally important.

Also notice condition (3) in the above definition. Transitive (non-key) dependencies do not violate 3NF if the dependent attriubute(s) are part of a key. This is arguably a “mistake” in 3NF that is rectified by Boyce-Codd Normal Form (BCNF). BCNF is equivalent to the above definition with the last condition removed. Many informal descriptions of 3NF miss this last condition and therefore more accurately they describe BCNF instead of 3NF! This is a mistake, but a “good” one because BCNF is “stronger” than 3NF.

For an accurate understanding of 3NF and other normal forms I can recommend David Maier’s “Theory of Relational Databases”, Chris Date’s “Relational Database Dictionary” or “Foundations of Databases” (AKA “The Alice Book”) by Abiteboul et al.

Why does any of this matter?

If you think about it, to define 3NF based on primary keys alone wouldn’t even work. If only primary keys mattered then a bijective relationship (A->B, B->A) would be impossible to represent accurately while respecting 3NF because any relation containing both A and B would always violate 3NF. Bijection is a common data modelling scenario and certainly something that needs to be accommodated in any useful database design methodology. If a single primary key was all that mattered then 3NF would become of doubtful value when dealing with relations with more than one candidate key because either you would be forced to ignore some dependencies or you would have to accept that 3NF was automatically violated.

Another point to notice is that if normalization was only concerned with primary keys then 3NF really would be the ultimate normal form for those PK dependencies. That’s because non-key dependencies and join dependencies of the kind addressed by BCNF and 5NF would not be possible (because they assume there could be multiple keys). We know that isn’t the case and we know that dependencies on all keys are important. That is why 3NF is defined in terms of candidate keys and not primary keys.

What does any of this mean for the specific example in this thread? I’m always hesitant to comment on normalization problems unless the intended set of keys and dependencies is clearly stated. They have not been, so I’ll make some assumptions. If ID and UserName are both candidate keys and if the dependencies we want to enforce include ID->UserName, UserName->Password (…etc) and if Password is not a determinant then a table containing only (ID, UserName,Password) is in 3NF (in fact it’s in BCNF and 5NF as well :slight_smile: ).

That’s a lot of assumptions but given those conditions then it seems that the table I just described would be “fully normalized”. Other interpretations might be possible - we only have column names to go on after all. It would help if anyone wishing to make points about normalization examples would state what keys and dependencies they are intending to enforce. Otherwise effective discussion on such things is virtually impossible.

Hope this helps.

David

What do you mean? o_O If you want to normalize a table you must follow very precise rules.

<snip/> how do you stuff the real people into your database in order that the reap user is there rather than the username field that represents them? <snip/>

The username is the field in the table that represents the user and ALL of the other fields except the id represent facts about the user that the username represents.

The id is either a surrogate for the username (in which case it does not belong in the normalised logical database design and is added as part of the denormalisation process of converting the logical design into a physical implementation OR if the id does belong in the logical design then it implies that username is not unique in which case two tables are required since the rest of the data relates to username and the one username can be attached to multiple ids and then two tables are required.

<snip/>

I think littlened was referring to if you give your roles names, such as Author, Manager, Administrator etc. In that case, you would need a second table to contain that data in the occurrence of a name change later. This makes more sense then using 1, 2, 3 as they have meaning.

In the ID vs. username battle, I’m not going to enter that argument! Its all to do with personal preference and what you want your system to do. The normalisation rules do have some leeway for interpretation.

totally agree :slight_smile:

totally disagree :slight_smile:

how can you “rename” a role when it doesn’t have a name?

or were you thinking that you’d somehow need to give it a new auto_increment number? if so, why?

Sometimes I need to do a similar thing, but I usually create them in seperate table…

users table
id int primary key auto increment
name
email
password
role_id

roles table
id int primary key auto increment
description

That way, if I’m having any kind of user interface, I can have a drop down that shows the description of each role, rather than asking a use to select from 1 - 4. It also means, that should I need to rename a role at a later date, it’s much easier to do so without effecting any of the logic of the site.

<snip/>

There are fields there that are not dependent on the key, the whole key and nothing but the key and so it is not normalised.

To get a table into third normal form you must “Remove non-key columns representing facts about other non-key columns into a separate table” so since all of the fields except the key and the username are dependent on the username to properly normalise that data you should split it into two tables with username as the primary key of the second table since all of the fields apart from username represent facts about that user.

<snip/>

in a word, no

<snip/>

why is this automatically considered a benefit?

username is a great example of a column where you do not want changes to happen frequently, consequently the ON UPDATE CASCADE option is most appropriate here