tentim — 2012-11-10T04:24:58-05:00 — #1
I need to generate a unique value for each user, currently I'm
doing a sha1 on parameters unique to each user e.g email address and
time stamp, but there is a requirement that the value should be 16
characters long or less, time stamp alone may not be unique, what do
cups — 2012-11-10T14:14:22-05:00 — #2
How about you store that the (truncated) value in a database table, make the row unique, and in the case of a clash on insert ("Duplicate row found" on mysql, err number 1062 - I think) generate another key and try again.
localhost8080 — 2012-11-10T15:28:29-05:00 — #3
you should make a column in your database called 'id' or something and set it to 'auto increment'
tentim — 2012-11-11T15:49:18-05:00 — #4
Thanks, but I wanted to avoid the database route, if there is an algorithm that generates
16 or less characters, if there is none, I might have to take that route...
felgall — 2012-11-11T16:19:52-05:00 — #5
No hashing algorithm can be used to provide unique values as all they guarantee is that a small change in the source will result in a completely different hash. There will always be a large number of completely different source values that will match to the same hash. Of course if you have a limited length value in the first place then the chances of getting two values that generate the same hash will be relatively unlikely.
lemon_juice — 2012-11-11T17:41:49-05:00 — #6
It would be easier for us if you explained what you need this value for. From your description it is not clear whether you want to generate the unique value once and store it in the db for future use or you want to generate the value multiple times and each time it can be different. Do you have any requirements what characters the value should consist of? From your limited description, the advice from localhost8080 seems to satisfy your requirements perfectly if you are willing to get the id from the db.
tentim — 2012-11-12T07:37:52-05:00 — #7
The value is generated and sent to another site for use, but it has to be a different value
each time and I don't want to go through the 'troubles' of storing in db and checking each time.
Really? If that is the case, then I'll have to go the db route.
starlion — 2012-11-12T08:02:25-05:00 — #8
"Roll a dice" Okay. I rolled a 6.
"Now never roll that number again. But you cant remember what the number you just rolled was."
uh... yeah. You're gonna need some form of database/flatfile/whatever to remember what has already been used. Any 'random' system will inevitably have clashes at the rate of <used>/<available>
Consider the dice above.
I roll a dice. As i have not yet rolled a dice, I can roll any number. My chances of a collision are 0 (0/6).
I've rolled a six. The next time I roll the dice, my chances of collision are 1/6.
If I dont know I've rolled a 6, I have a blind-chance 1/6 of sending a bad result.
The more I roll the dice, the higher the chance goes, until I cant roll the dice anymore (6/6).
Your available keys will be greatly larger than 6, of course (X^16, depending on your pool of available characters), but as the system gets used more and more, the chances will steadily grow of a bad key. Which is why your system needs to remember what has already been used.
lemon_juice — 2012-11-12T08:03:43-05:00 — #9
Are you going to use these values temporarily? I assume that is the case if you are not going to store them in the database. In that case you might simply generate a 16-character random string and use it as an ID. The probability of a collision is so low that in practical terms it's insignificant unless you are using them for systems where absolute security is critical. You could also generate a random hash for this purpose. As fellgall said, you are not guaranteed for each value to be unique but the probability of uniqueness is extremely high.
Here is some code for a fairly good random 16-character string generator:
$random_bytes = mcrypt_create_iv(12, MCRYPT_DEV_URANDOM);
$string = base64_encode($random_bytes);
The string will have alphanumeric characters and / and +. For a hexadecimal string you can use this:
$random_bytes = mcrypt_create_iv(8, MCRYPT_DEV_URANDOM);
$string = bin2hex($random_bytes);
However, as you can see the first one is better because it has more bytes and therefore less chance of getting a collision.
Alternatively, you could use uniqid('', true) - a pretty good function for unique ID generation combining microtime and a pseudo random number generator. However, it's longer than 16 characters but you might be able to get rid of the dot and pack the data in some more efficient way.
lemon_juice — 2012-11-12T08:10:46-05:00 — #10
This might be a problem but it depends on how it is used (we don't have enough information from the OP). If each random value is supposed to be used temporarily, for example it is valid for a few hours and then discarded then this would not be a problem because even though you would have millions of users there would be just a small number of random IDs in use at a given time and therefore the chance of collision being extremely low. But yes, if those IDs were to accumulate over time in a database then the collision probability will grow with each new ID.
starlion — 2012-11-12T08:37:32-05:00 — #11
Personally, I'd rather my chance of a bad key to be 0. shrug
lemon_juice — 2012-11-12T08:48:17-05:00 — #12
Each to his own but there are cases where the work to guarantee 0 chance is too expensive to be practical. There's not need for 100% perfection if 99.999999999% will suffice and save people a lot of work and complications. There are cases where this is even impossible - for example revision keys in distributed version control systems.
wonshikee — 2012-11-12T14:42:15-05:00 — #13
You can just use timestamp + first 6 characters of the email if you can't rely on a persistent storage to double check. The chance of a collision is so tiny it's a waste of time to consider it.