I’m sorry but you are giving out only half of the information, leaving many things to speculation.
What Shaydez is doing is perfectly fine from performance point of view.
I disagree with you that in MOST cases there are better choices for primary key, even if natural key seems like the best choice - most likely it isn’t.
Let’s review a few cases you’ve mentioned.
I’ll take the popular supplier_code that every supplier hands out to merchants.
Even if it seems like a perfect natural key candidate, it has the following implications:
-
Supplier code is most likely a sequence of characters, forcing you to use non-integer column with variable length. Non-integer variable column length has implications with character encoding, creating the B-tree index.
-
Non-integer key in this particular case would mean that your junction tables would have to store the variable character key as their foreign key, leading again to a larger index.
-
Having non-sequential primary key in MyISAM and InnoDB storage engines has performance implications when writing the data and when retrieving the data in terms of I/O. I won’t go into specifics, answers to that question are explained at many blogs in detail.
-
Assuming you will never have two suppliers whose codes will clash is wishful thinking. Just so we’re on the same page, on my recent project I’ve had 10 supplier codes clashing. All coming from different suppliers.
Having a constraint on uniqueness of supplier code is bad.
However this can be avoided by having a composite key in place (supplier_id, supplier_code). But then again, if supplier_id is yet another varchar - the larger the database, the worse the performance compared to having integer key.
Now someone else might point out how using composite keys is bad and we could go on all night what’s good and what isn’t.
In my experience when an auto-increment is used as the primary key on every table then it is a bad design because there is usually a better natural key to use for at least one, usually most and sometimes even all the tables. I have seen databases where all of the tables have an auto-increment primary key immediately followed by a unique identifier that would be a perfect natural key.
If the sequential column is used for internal use of the application built upon the database then it cannot be “bad design”.
Theory does imply one thing, however it is_not bad to have an integer uniquely identifying an entity in the table opposed to alternative (natural key in form of a character) if the application uses that integer key extensively.
There are a number of additional issues that you need to handle in code when you use an auto-increment key (and duplicate inserts is one of those which is why I brought up avoiding the use of auto-increment when it isn’t needed in the first place since it is only with auto-increment that you can have two or twenty billion duplicated records inserted into your database if you don’t handle it in code - since without the auto-increments duplicated records are not possible).
Actually, it’s very SIMPLE for a seasoned programmer to avoid the double entry problem.
However, if there is a malicious user and if your application is of the type that it allows public access (guestbooks, forums) - it really isn’t hard to create a script / program / bot which will flood the database, be it a duplicate or not.
If there’s a chance that user might accidentally send the same data, there are number of ways to prevent that:
- flood control in form of having X seconds before allowing next send.
- redirect after the submit was done
- use javascript to manipulate sending options (disable the form, and so on)
all of which don’t require any interaction with the database or which rely on the database to provide the error.
Another disadvantage of auto-increment is that it is mySQL proprietary and while most other relational databases do have their own way of achieving the same thing your code is less portable if you do use one.
I’ve never encountered a client that had in mind moving to another way of storing the data. Creating a bespoke application having in mind that it can (for some unclear reason) require a different storage engine (flat file, another RDBMS) is terrible approach to application design in my book.
MySQL is perfectly fine for various tasks (it serves Google’s Adwords, I think that proves it’s capable of operating under stress), if one creates application for MySQL then it should work with MySQL. However, other relational databases have triggers for auto incrementing tasks, that’s where it originates from in MySQL.
I’m not a fan of ORM’s or abstraction layers, the whole argument of avoiding a great feature in favor of someone deciding it’s better to use X than Y isn’t a good enough argument.
Using auto-increment automatically for every primary key demonstrates a complete lack of knowledge of what a relational database is. Using them for those tables where the fully normalised logical design indicates that they would be useful is what the option is there for in the first place.
I disagree with you here.
I’m not a fan of sticking auto_increment everywhere, but as I mentioned before - not having a sequential integer as primary key has performance implications.
Of course, junction tables which connect various entries from other tables in M:M relation don’t need to have one, assuming the foreign keys are integers themselves.
But imagine the case where you had users table (PK: username), user_categories table (PK: category_name) and junction table (user_id, category_id). Using natural keys - you have to specify each users’ username when trying to connect them to a corresponding category. It’s much more efficient to use an integer as primary key and put a unique constraint on the natural key to preserve some imaginary integrity, even tho it’s not right with theory.
This might seem I’m advocating for the use of auto_increment everywhere you can. I’m a fan of clean design and efficiency, I wouldn’t use something where it’s not good to use it. Based on target population of application, one should bear in mind what’s the best thing to use.
However, advocating avoiding auto_increment in favor of (seemingly) better natural key should come with a detailed warning and explanation.
Everything has their pros and cons, and while I do agree with you on some things - you cannot generalize things in IT world and label them right or wrong without having all of the facts in front of you.