Avoid double submit (Refresh / Back button)

Blue1 · September 14, 2010, 7:05pm

I’m also interested to see what “gurus” have to say about having auto_increment as bad practice. Other databases use triggers in order to have a similar feature.

Saying that a database is poorly designed because it has tables with auto_increment fields is such a terrible statement, I’m really wondering what out of the world example would there have to be that auto_increment would be deemed BAD practice.

I’ll add more oil to fire - having FK constraints is much more terrible design in my book than having an auto_increment column. Why? Good luck debugging piece of software using a database with 300+ tables, everything FKeyed to oblivion by, oh so intelligent guy who created the db and read so many books using words like “database integrity” and “foreign keys”.

As for the original topic - Anthony covered it good, Post / Redirect / Get is the way to go.
If there’s a malicious user who’s trying to insert duplicates in order to harm others - that’s another thing, and totally different preventing accidental inserts.

felgall · September 14, 2010, 7:46pm

It is a matter of whether there is another field in the table that is suitable to use as the primary key. If there is then you should use that as the key and not add an extra field just to use autoincrement.

To be in third normal form all the fields in the table must be dependent on the primary key and nothing else and so having an alternative field that could be the primary key usually means that the table is not properly normalised. Now you may have decided to undo that normalisation for efficiency reasons (if the primary key field is huge) but the original logical design doesn’t require an autoincrement in these cases (and rarely does require one if you design the database properly).

The main reason so many people use autoincrement so frequently is that they don’t know how to design databases. Each autoincrement field you add when an alternate field is an appropriate key comes at a cost of adding another way to break the integrity of the data.

I am not saying you should never use autoincrement but the situations where it should be used is a small fraction of those instances where it is used and the simplest way to fix the double insert problem in most cases is to get rid of the unnecessary autoincrement.

Blue1 · September 14, 2010, 8:25pm

The main reason so many people use autoincrement so frequently is that they don’t know how to design databases.

No, the main reason isn’t that people don’t know how to design databases.
The main reason is that an integer uniquely identifying a record within a table is perfectly suited for being a primary key.
It also removes the requirements of creating a trigger or similar application-level controlled means to create a way of creating a key that will uniquely identify a row.

Each autoincrement field you add when an alternate field is an appropriate key comes at a cost of adding another way to break the integrity of the data.

Care to provide an actual real life example which 10,000 web developers can relate to?

I simply fail to grasp why an autoincremented integer fails as a primary key candidate in tables such as the table that contains this topic’s information.
How does it break integrity of the data?

I am not saying you should never use autoincrement but the situations where it should be used is a small fraction of those instances where it is used and the simplest way to fix the double insert problem in most cases is to get rid of the unnecessary autoincrement.

The actual problem was that user was able to hit F5 and resend the data.
Whether you used autoincrement as primary key or not, whether you chose to notify the user that they’re attempting to insert yet another similar record - how does that remove the ability to actually hit F5 and that browser asks you to resend the data or cancel?
Why would you even BOTHER to perform a query in the first place when you can avoid it completely?
Isn’t that yet another overhead?

Also, since I’m purely wondering - if people who don’t know how to design databases are out there, how come that they make such popular software such as phpBB, Invision Board, vBulletin, WordPress and such that use auto_increment columns when used in combination with MySQL? Isn’t your statement a little bit far-fetched when it comes to judging who is able to design databases and who isn’t?

The problem is that someone who’s new to development with MySQL will take this advice seriously and the effect will be just the opposite - instead of teaching someone how to do things properly, the person will be under impression they’re doing things right when they aren’t.

Shaydez · September 14, 2010, 8:45pm

no freakn way… I think the opposite; its bad NOT to use auto_increment. I design databases all the time and you’re basically telling me my database design is all wrong for all the major corporations i design database for; and the teams i worked with and worked all use Auto_Increment to keep a consistency through the ERD / Database designs.

Everyone has their own ways of coding and database design but… in these case its kind of saying Well an Apple is Red… and you’re trying to convince everyone the Apple is really Blue.

Blue1 · September 14, 2010, 8:48pm

I’m not the Apple. Honestly.
(sorry for OT)

Shaydez · September 14, 2010, 9:11pm

lol I’m agreeing with your post Blue.

felgall · September 14, 2010, 10:24pm

No I am suggesting that what you have is a banana and not an apple at all. Apples are round.

Creating autoincrement fields for the sake of having all keys use autoincrement is like using doors for all the openings into your house when in most cases all you really need are windows. you really ought to consider replacing some of those doors in your house with windows as it will make it much easier to keep the burglars out.

Your statement is also like saying that it is bad to not have wheels on all vehicles - and placing wheels on a boat just makes for more places that the water can leak in and placing wheels on skis will prevent you being able to use them for skiing on snow.

Where a table has a legitimate unique natural key field already then not using that as the primary key immediately opens up a whole range of extra things that you need to test for in order to maintain your data integrity that can potentially create a huge unnecessary overhead.

There are times when using an autoincrement primary key may be appropriate but most of the time the database will be hugely more efficient if you let the unique natural key be the primary key.

Unfortunately with computers now being so powerful you can get away with a whole load of ineficiencies that woulld never have worked back when you had to write things properly in order to make efficient use of inadequate resources. There are huge numbers of people creating databases who don’t have the first clue about efficient database design.

I would go so far as to suggest that you really ought to think twice about using autoincrement for a primary key and should carefully document all of the reasons why a special exception should be made for this particular one so that someone who properly understands how to desing databases doesn’t just automatically get rid of it in their redesing without at least considering first whether there are legitimate reasons for it to be there other than to just make the integrity of the data that much harder to maintain.

Doing a proper database design using the appropriate steps in working out how to arrange your data will easily tell you when an autoincrement is and isn’t required. In almost all cases there will be a better candidate key available. Only those who don’t do proper database design end up qwith autoincrements all over the place.

Getting rid of the autoincrements removes almost all of the possible double insertion error situations to start with (which is why it came up in this thread which is after all about getting rid of double submit errors) and is worth doing where ever possible for the saving in complexity of coding that this one situation provides - and there are many other complexities that getting rid of them will also resolve. So the topic of this thread is a reason why it is bad to use autoincrement where there is an alternative key available.

Blue1 · September 14, 2010, 10:39pm

So basically, you have no real world example and your entire argument is that you know how to design databases, we don’t - hence we’re not worthy of an answer with an example that actually illustrates what you’re saying.
Oh, and there’s all this “before” and “resources” argument that makes everything so mystified.
And no, there is no “in most cases there will be a better candidate key available”, it’s totally false. You cannot assume that you know every application design, every need of a client, every possible step forward to extending or maintaining an application so that you can instantly make a judgment what is good and what isn’t.

I’m very sorry to appear rude and I apologize in advance but I think you went ahead of yourself here and have no actual example to prove what you’re saying is right. I’ll keep thinking you’re absolutely wrong, however I am always ready to take back all I said if you prove me wrong.

hash · September 16, 2010, 11:20am

Popular software is not popular because it is well designed (internally at least). You might as well just say Justin Beiber is a musical genius

I think it takes more understanding about what’s right and not right to make this claim. The theory agrees with Stephen, but in practice (particuarly on the web) short cuts are better. No harm in having a discussion, both methods have their merits, for my 2 cents I’d say: auto_inc is best for most situations (web), but you should understand normal form.

You need to quantify that somehow, as asked, provide an example in the real world. If you can’t, you might as well say C is better than PHP and have us all abandon that for the glory that is optimized code.

I think a good example of not using a natural pk is email addresses. They’re a nice unique id, but also personal info that should be hidden.

StarLion · September 16, 2010, 11:33am

The question I have in response is whether you believe a natural key is always right. For example. Simple BB. I have a table of users. Username is a unique - which means you would say i dont need a userid field, just use the username for the key. Great.

Now i need a posts table. And the posts table needs to store who made each post. Well in order to do that, I need to now foreign key in the key from the users field. Which means storing the same user’s name, over and over and over and over… surely that’s a HUGE overhead that could be (mostly) eliminated by using an INT…

Am I missing something here?

Blue1 · September 16, 2010, 4:04pm

Popular software is not popular because it is well designed (internally at least). You might as well just say Justin Beiber is a musical genius

I agree, but I didn’t mention that specific software only because it’s popular. vBulletin actually has a great database structure.
I’m totally against biased opinion that you MUST not have auto_increment column as your primary key because in MAJORITY of situations where you use a table for storing things such as forum topics, news entries etc - integer is the best solution, there’s no discussion about it. Logic implies it, trying to go against it without any solid proof is ridiculous.

I think it takes more understanding about what’s right and not right to make this claim. The theory agrees with Stephen, but in practice (particuarly on the web) short cuts are better. No harm in having a discussion, both methods have their merits, for my 2 cents I’d say: auto_inc is best for most situations (web), but you should understand normal form.

It’s easier to go into depths of theory when we actually have an example infront of us. The OP had a specific problem, it was about avoiding hitting F5 and getting “resend” dialog. It’s been solved. There hasn’t been a single mention of the OP’s database structure. On the other hand, had he avoided using auto_increment and employed a different approach - he’d perform a query each time a user hit refresh. Isn’t it worse taxing the database than not communicating with it again at all?

StarLion · September 16, 2010, 4:56pm

Off Topic:

The OP was also made in Dec 2009, but this topic has been necro’d and redirected.

felgall · September 16, 2010, 7:52pm

That wasn’t what I was saying at all - I didn’t say that using Post - Redirect - Get wasn’t appropriate to use , just that using it only prevents some double submits. Thereare still ways of doung a double submit of the same data even where that approach is used - particularly if someone is maliciously trying to do a gazillion submits of the same data to try to fill your database with garbage.

The point I was making was that the only time that multiple submits of the same data to the database can cause a problem is where the key uses autoincrement. In all other cases the database is either able to detect that the request has already been processed (attempting to insert a record that’s already there or delete one that’s already gone) or doesn’t matter (updating fields to contain the same values).

It is only when you have an autoincrement key that you end up with duplicated data.

One instance where you may use autoincrements for what are effectively natural keys (since they are the only way of uniquely referring to the records) is in a forum where you need a key for threads and posts where no other practical value exists. Another instance for autoincrement is for invoice or receipt numbers where it is easier to ensure that you don’t get gaps or duplicates. Yet another is where you have a field that is allowed to be larger than 255 characters (eg an email address can be up to 320 characters).

There are also plenty of examples of where an autoincrement makes no sense whatever - eg. a parts table where all of the parts will almost certainly have a unique part code that is used to uniquely identify the parts out in the real world and where that same code makes the most sense as the key for the data in the table as well.Another example is that the Australian Tax office use TFN as the unique key to identify a person (in the US the SSN serves the same purpose) and so there is no need for yet another field there.

There are all sorts of aspects relating to the integrity of the database that can be handled by the database itself if you use the appropriate natural key where you need to add your own logic if you add an autoincrement as well (having someone able to deliberately fill your table with duplicate garbage is one such example, ending up with multiple autoincrement keys pointing to the same natural key and then not updating the other related data to get things totally screwed up is another).

The vast majority of those creating databases for the web have not attended a proper class on correct database design and couldn’t tell you what 6th normal form is if their life depended on it (and therefore wouldn’t know where to start in order to ensure that the data in their database complied if that was a requirement for the system). This is also clear if you read the book “SQL Antipatterns” by Bill Karwin (published by Pragmatic Programmers) where a significant number of the database problems covered in the book are ones that would never occur in a properly normalised database. The whole concept of normalisation for relational databases was created for a reason and the first three normalisations at least should at least be used in the logical design of your database and good reasons documented for every case where you decide to undo a normalisation in the physical design. The other four normalisations only apply in specific instances but where those situations exist then they should be applied. Any data that isn’t normalised has its integrity under the control of the outside code since the database can only maintain internal integrity of normalised data. A significant number of the problems the book I mentioned talks about are ones caused through not normalising data and although the word normalisation isn’t used in the book all of the solutions that are presented for those problems are effectively telling you how to normalise that particular situation in the data.

Blue1 · September 17, 2010, 6:18pm

The fact I was relating to was you implying that a database using tables with auto_increment is, in fact, badly designed database.

Correct me if I’m wrong, but tables that store information such as forum topics (or any similar type of information) need a primary key, which is in 99% cases an integer. Point I’m trying to make is that a table using an auto_increment as primary key is NOT badly designed.
The application itself should be able to detect what user is trying to do and prevent it, such as post - refresh - reinsert here. Relying on the database to provide an error is bad in the entire application design.
I agree with you that there are good candidates for primary key implied by the nature of the data being stored, but in practice it is good to have a sequential integer as primary key even if it makes no theoretical sense - simply due to the way of how db storage engine works internally.
Labeling something bad in design due to use of a feature of the RDBMS is jumping in front of the wagon.

felgall · September 17, 2010, 8:03pm

You missed half of what I was saying then.

I said a database using tables with auto-increment is badly designed if there is another field in the table that is suitable for use as a natural key (which there is a lot of the time).

For example defining a primary key of part_id auto-increment and then having part_num as the next field where part_num uniquely identifies the part both inside the database and out in the real world.

In my experience when an auto-increment is used as the primary key on every table then it is a bad design because there is usually a better natural key to use for at least one, usually most and sometimes even all the tables. I have seen databases where all of the tables have an auto-increment primary key immediately followed by a unique identifier that would be a perfect natural key.

In the case of forums, numbering threads and posts is the most natural way of uniquely identifying them since the main requirement there is that they each have something that uniquely identifies them and there isn’t usually anything suitable in the thread or post itself (particularly if you want to be able to redesign the forum as required depending on the number of posts in each area). The same sort of thing can apply for things like invoice and receipt numbers where you need not only unique values but also need to be able to account for any gaps (and so would mark unwanted records as void instead of deleting them until sufficient time had past to be able to delete all the records over a certain age).

The problem is that many people creating databases don’t do a proper design of their database and use auto-increment keys where the design shows that they belong. For example Shaydez who insists on using an auto-increment key on all the tables including all of those that already have a natural non-numeric key.

There are a number of additional issues that you need to handle in code when you use an auto-increment key (and duplicate inserts is one of those which is why I brought up avoiding the use of auto-increment when it isn’t needed in the first place since it is only with auto-increment that you can have two or twenty billion duplicated records inserted into your database if you don’t handle it in code - since without the auto-increments duplicated records are not possible).

Another disadvantage of auto-increment is that it is mySQL proprietary and while most other relational databases do have their own way of achieving the same thing your code is less portable if you do use one.

The book “SQL Antipatterns” has a number of chapters covering issues that can only ever occur if you use auto-increment and in most cases the fix is to get rid of that field completely as there is another field that will do the same job without the issues.

Auto-increment is one tool available within mySQL which when used appropriately can make your code simpler and when used inappropriately can force you to maintain almost all of the relationships that the database ought to be maintaining for you in the program code instead.

Using auto-increment automatically for every primary key demonstrates a complete lack of knowledge of what a relational database is. Using them for those tables where the fully normalised logical design indicates that they would be useful is what the option is there for in the first place.

To get back to the point I was originally trying to make by bringing up auto-increment in the first place. While Post-Redirect-Get solves the double submit issue for some of the situations where the submit button has been accidentally pressed twice or the back button used it doesn’t prevent all such duplicate requests particularly deliberate ones but also some accidental ones. You therefore still need to have code on the server to handle the possibility of duplicate submits whether you use PRG or not. The only requests where a duplicate CAN be inserted into a database is where you are inserting a record into a table that has an auto-increment key. These instances SHOULD be a minority for most properly designed databases. Additional processing to check for duplicates is required in those cases where an auto-increment is used. The thread started to go off course when someone suggested that auto-increment is very common and I pointed out that it is only common in databases that have not been properly designed and that in most such cases the easiest fix is to fix the database design to get rid of unnecessary auto-increments. After all why add dozens or hundreds of extra lines of code to prevent duplicates when simply using the correct natural primary key means that the database does it for you as it is supposed to.

Blue1 · September 17, 2010, 9:08pm

I’m sorry but you are giving out only half of the information, leaving many things to speculation.
What Shaydez is doing is perfectly fine from performance point of view.
I disagree with you that in MOST cases there are better choices for primary key, even if natural key seems like the best choice - most likely it isn’t.
Let’s review a few cases you’ve mentioned.

I’ll take the popular supplier_code that every supplier hands out to merchants.
Even if it seems like a perfect natural key candidate, it has the following implications:

Supplier code is most likely a sequence of characters, forcing you to use non-integer column with variable length. Non-integer variable column length has implications with character encoding, creating the B-tree index.
Non-integer key in this particular case would mean that your junction tables would have to store the variable character key as their foreign key, leading again to a larger index.
Having non-sequential primary key in MyISAM and InnoDB storage engines has performance implications when writing the data and when retrieving the data in terms of I/O. I won’t go into specifics, answers to that question are explained at many blogs in detail.
Assuming you will never have two suppliers whose codes will clash is wishful thinking. Just so we’re on the same page, on my recent project I’ve had 10 supplier codes clashing. All coming from different suppliers.
Having a constraint on uniqueness of supplier code is bad.
However this can be avoided by having a composite key in place (supplier_id, supplier_code). But then again, if supplier_id is yet another varchar - the larger the database, the worse the performance compared to having integer key.
Now someone else might point out how using composite keys is bad and we could go on all night what’s good and what isn’t.

In my experience when an auto-increment is used as the primary key on every table then it is a bad design because there is usually a better natural key to use for at least one, usually most and sometimes even all the tables. I have seen databases where all of the tables have an auto-increment primary key immediately followed by a unique identifier that would be a perfect natural key.

If the sequential column is used for internal use of the application built upon the database then it cannot be “bad design”.
Theory does imply one thing, however it is_not bad to have an integer uniquely identifying an entity in the table opposed to alternative (natural key in form of a character) if the application uses that integer key extensively.

There are a number of additional issues that you need to handle in code when you use an auto-increment key (and duplicate inserts is one of those which is why I brought up avoiding the use of auto-increment when it isn’t needed in the first place since it is only with auto-increment that you can have two or twenty billion duplicated records inserted into your database if you don’t handle it in code - since without the auto-increments duplicated records are not possible).

Actually, it’s very SIMPLE for a seasoned programmer to avoid the double entry problem.
However, if there is a malicious user and if your application is of the type that it allows public access (guestbooks, forums) - it really isn’t hard to create a script / program / bot which will flood the database, be it a duplicate or not.
If there’s a chance that user might accidentally send the same data, there are number of ways to prevent that:

flood control in form of having X seconds before allowing next send.
redirect after the submit was done
use javascript to manipulate sending options (disable the form, and so on)

all of which don’t require any interaction with the database or which rely on the database to provide the error.

Another disadvantage of auto-increment is that it is mySQL proprietary and while most other relational databases do have their own way of achieving the same thing your code is less portable if you do use one.

I’ve never encountered a client that had in mind moving to another way of storing the data. Creating a bespoke application having in mind that it can (for some unclear reason) require a different storage engine (flat file, another RDBMS) is terrible approach to application design in my book.
MySQL is perfectly fine for various tasks (it serves Google’s Adwords, I think that proves it’s capable of operating under stress), if one creates application for MySQL then it should work with MySQL. However, other relational databases have triggers for auto incrementing tasks, that’s where it originates from in MySQL.
I’m not a fan of ORM’s or abstraction layers, the whole argument of avoiding a great feature in favor of someone deciding it’s better to use X than Y isn’t a good enough argument.

Using auto-increment automatically for every primary key demonstrates a complete lack of knowledge of what a relational database is. Using them for those tables where the fully normalised logical design indicates that they would be useful is what the option is there for in the first place.

I disagree with you here.
I’m not a fan of sticking auto_increment everywhere, but as I mentioned before - not having a sequential integer as primary key has performance implications.
Of course, junction tables which connect various entries from other tables in M:M relation don’t need to have one, assuming the foreign keys are integers themselves.
But imagine the case where you had users table (PK: username), user_categories table (PK: category_name) and junction table (user_id, category_id). Using natural keys - you have to specify each users’ username when trying to connect them to a corresponding category. It’s much more efficient to use an integer as primary key and put a unique constraint on the natural key to preserve some imaginary integrity, even tho it’s not right with theory.

This might seem I’m advocating for the use of auto_increment everywhere you can. I’m a fan of clean design and efficiency, I wouldn’t use something where it’s not good to use it. Based on target population of application, one should bear in mind what’s the best thing to use.
However, advocating avoiding auto_increment in favor of (seemingly) better natural key should come with a detailed warning and explanation.
Everything has their pros and cons, and while I do agree with you on some things - you cannot generalize things in IT world and label them right or wrong without having all of the facts in front of you.

Kakao · November 11, 2010, 8:40am

Blue.,

I was looking for this discussion¹ when I found this thread. After reading it all I felt the urge to register just to say that most of your arguments are about optimization and that, as is commonly cited, “premature optimization is the root of all evil”. Never even think about optimizing without profiling first.

¹ Is auto increment as primary key a bad practice?