Insert If Not Exists Equivalent and InnoDB autoincrement

djeyewater · January 6, 2010, 9:54am

I want to insert a value into a table if it doesn’t exist, and get the id (which is autoincremented) of the inserted row or the id of the existing row.

At the moment I am doing this like so:

INSERT INTO table SET col = 'value' ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)

Then I can just get the last insert id to get the id of the existing or new row (‘col’ has a UNIQUE constraint on it).

But the table in question is using the InnoDB storage engine, which means that it allocates the autoincrement value to the row before the INSERT operation, so if the INSERT fails, the autoincrement value has still increased. e.g. with an empty table

INSERT INTO table SET col = 'value1' ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)

would return the id of 1.
If I try and insert ‘value1’ again 500 times, each time I would get back the id 1.
Now if I try and insert a new value

INSERT INTO table SET col = 'value2' ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)

I will get back the id of 502, since InnoDB increased the auto increment value for those 500 insert operations, even though they didn’t insert anything.

So I am worried about my autoincrement value getting really high really quickly.

There are a few solutions I can think of:

Switch to MyISAM and use triggers instead of Foreign Key Constraints (I don’t need row level locking in this instance)
Check whether a record exists before trying to insert it (and only insert records that don’t exist)
Periodically run a script that ‘cleans up’ the auto increment values (I can’t do this as I may want to use the id values in URLs)
Don’t worry about it and keep everything as it is

Can anyone advise me on this issue?

Thanks

Dave

SJH · January 6, 2010, 10:57am

I can’t help but think maybe you’re overcomplicating things a little bit here. Here’s my take on how to go about doing what you want to do…

INSERT IGNORE INTO table_name (col) VALUES (‘value’);

…and get the id (which is autoincremented) of the inserted row or the id of the existing row.
I’m assuming your scripting language of choice is PHP:


$result = mysql_query("INSERT IGNORE INTO table_name (col) VALUES ('value')");
if(mysql_affected_rows($result) === 1) {
    $id = mysql_insert_id();
} else {
    $get_id = mysql_query("SELECT id FROM table_name WHERE col = 'value'");
    $row = mysql_fetch_array($get_id);
    $id = $row[0]['id'];
}

So I am worried about my autoincrement value getting really high really quickly.
How high is unacceptably high in your book? Because if the datatype of the id column is INT, this will allow for well over two billion rows. And if your table gets that large, I think you’ll have other things to worry about besides an integer overflow

Just some food for thought really. Perhaps there is a more elegant solution though.

djeyewater · January 6, 2010, 12:17pm

Thanks for the advice on not worrying about the id values getting too high.

However, I don’t see how your advice on how to go about inserting the record is better or less complicated than what I’m doing at the moment?

You suggested:


$result = mysql_query("INSERT IGNORE INTO table_name (col) VALUES ('value')");
if(mysql_affected_rows($result) === 1) {
    $id = mysql_insert_id();
} else {
    $get_id = mysql_query("SELECT id FROM table_name WHERE col = 'value'");
    $row = mysql_fetch_array($get_id);
    $id = $row[0]['id'];
}

How is that better than


mysql_query("INSERT INTO table SET col = 'value' ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)");
$id = mysql_insert_id();

Thanks

Dave

SJH · January 6, 2010, 12:52pm

Because I think in that case, mysql_insert_id wouldn’t return the right result because in some cases technically you haven’t inserted any rows, you’ll have updated one. I may be wrong here though.

djeyewater · January 6, 2010, 1:06pm

Thanks for the reply, that statement always returns the correct id because when it does an update it sets LAST_INSERT_ID() to the id of the row it is updating (i.e. the existing row).

Dave

r937 · January 6, 2010, 1:11pm

whether you’re wrong, SJH, can be verified quite quickly by testing

i ain’t gonna bother, but the OP should

in any case, the two step approach you recommended (post #3) is the way i would do it

actually, i might instead reconsider the design here, and ask myself if i need an auto_increment in the first place

djeyewater · January 6, 2010, 4:27pm

I have been using this method in my sites for quite a while, it definitely works. I got it from the MySQL Manual.

Any reason why you use the two step approach? It involves extra code, and up to two database calls as opposed to less code and always one database call. I haven’t done any benchmarking, but assume the two step method would be slower?

Do you mean have a manual auto increment on the id column, or instead use the column I’m inserting values to as the primary key, and loose the id column altogether?

Thanks

Dave

r937 · January 6, 2010, 4:54pm

oh, then it ~must~ be okay

just make sure you use it exactly as they suggested:[indent]If a table contains an AUTO_INCREMENT column and INSERT … ON DUPLICATE KEY UPDATE inserts a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value. If the statement updates a row instead, LAST_INSERT_ID() is not meaningful prior to MySQL 5.1.12. However, you can work around this by using LAST_INSERT_ID(expr). Suppose that id is the AUTO_INCREMENT column. To make LAST_INSERT_ID() meaningful for updates, insert rows as follows:

INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), c=3;[/indent]

yes, i might consider not having an auto_increment key at all

djeyewater · January 6, 2010, 5:45pm

Thanks for the advice, do you know if this would effect memory usage much or query speed? (I’m mostly concerned with memory usage as I only have limited memory available on my hosting account)

I guess the size of other tables with columns referencing the table in question would increase, since they’d need to use the actual value that they were referencing as the key instead of just an integer. And I guess the index size would increase for tables that have a multi column index that includes a column that references the table in question. But I don’t know if this would effect memory usage or performance?

Thanks

Dave

r937 · January 6, 2010, 5:49pm

it might

i can’t comment further because all you’ve mentioned is a table called “table” and a column called “col”

if there are lotsa tables referencig this one, then yeah, i would probably use a surrogate key

djeyewater · January 6, 2010, 9:02pm

Heh, yeah, I realise I didn’t give you much to go on.

To give you a better idea of the database schema, I have one main table, which contains columns that reference values in various other tables. The majority of these ‘other tables’ just contain 2 columns, id (int, primary key), and value (varchar, unique).

Then I also have more tables with just 2 columns, id (int, primary key), and value (varchar, unique). Then I have lookup tables that link these tables to the main table. So, for example, my main table is called ‘imageData’, then I have a lookup table called ‘img_keywords’ like

CREATE TABLE `img_keywords` (
 `img_id` int(10) unsigned NOT NULL,
 `keywords_id` int(10) unsigned NOT NULL,
 PRIMARY KEY (`img_id`,`keywords_id`),
 KEY `fk_img_keywords_imageData` (`img_id`),
 KEY `fk_img_keywords_keywords` (`keywords_id`),
 CONSTRAINT `fk_img_keywords_imageData` FOREIGN KEY (`img_id`) REFERENCES `imageData` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
 CONSTRAINT `fk_img_keywords_keywords` FOREIGN KEY (`keywords_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

and the keywords table like:

CREATE TABLE `keywords` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `Subject` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `UNIQUE` (`Subject`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

So I don’t have lots of tables referencing the sort of table I was asking about, but I do have lots of the sort of table that I was asking about.

I’ve attached an image of the schema, which probably shows the setup better than I can explain it.

Dave

r937 · January 7, 2010, 5:18am

thanks for the details, the schema looks really good

i would definitely use ids here, because of the many-to-many relationships

i might not use an id for keywords, but that’s a special case

the other lookup tables would be necessary for relational integrity (i.e. to prevent the use of a lens type value, for example, that wasn’t a legitimate lens type), so some of these might not require a numeric id

but at this point i wouldn’t go back and change anything

djeyewater · January 7, 2010, 9:56pm

Okay, thanks for taking the time to look at the schema, and the advice.

Cheers

Dave