Keeping Images Unique

I am building a “User Profile” feature on my website, including allowing Users to uploads a Profile Picture.

The problem is that two separate Users could have a Picture with the same name (e.g. “me.jpg”).

What is the best way to handle this in my Upload Script?

One person I know suggested having a “User Folder” for every User. But what happens if my website grows to 20,000 Users? (That has got to be enough to make even a Linux Server choke?!) :eek:

I could append the “UserID” to each Image, but since it starts at “1” currently, that would look weird. Plus, you would want it fixed-width like “000001”.

I could append the “Email”, but that isn’t very reliable. (I am wondering if I should have made people create a “Username” too…)

What do you think I should do?

Thanks,

Debbie

As users aren’t general bothered how you store/name profile pictures, I’d [fphp]SHA1[/fphp] hash the current [fphp]time[/fphp] along with their user id and use that.

Why Hash the Filename?

Doesn’t it make sense to have the Filename be “self-identifying” (e.g. “doubledee_01.jpg”, “anthonysterling_02.jpg”)?

Otherwise what happens if the Pictures ever got mixed up?!

Debbie

Maybe, but now your mixing things up a little. Which is good, but you need to define this behaviour first so you know what to code.

Is this what you want to happen? Do you want the username/id/original-name/time in there?

What if you put the user id at the end of the image filename when they are saved? So your example with me.jpg would be me-123.jpg and another one me-124.jpg.

Apart from name collisions, you shouldn’t allow a user designated file name from an uploaded file, if there were any security weaknesses in your upload processing, then this would make it easier for an attacker access a malicious file

So just create some random - yet unique “Photo ID” - and leave it at that?

These Photo ID’s will be store din the Member Record, but like I said, I just always worry about what would happen if things ever got mixed up, and then you’d have 10,000 files with name that wouldn’t let you easily sort things out.

I guess my system just needs to not mess up?!

Debbie

If the image in question has its name stored in the DB then it doesn’t matter. The DB links it to the user, the file name of the image should be irrelevant. I personalty use “sha1_file” thus any duplicate files that have the same hash only one is saved.

Why not just have the system create a new directory for each user? That way it will be user1/me.jpg and user2/me.jpg. They are the same file names but different directories which will differentiate the addresses.

Except if you have a collision, then someone loses their photo?! :eek:

If you upload your Photo, and my system assigns the hash 6510723, and then later I come along, and by pure coincidence, my system also assigns me the hash 6510723 then your Photo goes poof!!

Even if I hashed by e-mail or something like that, there could be a collision, so how can I avoid that?!

Debbie

Huh? How does their photo go poof? I don’t understand…

If a files that is ran though “sha1_file” returns the same value as another file, they are identical byte for byte. So why save two? Just reference the same file in the DB for both users. * So it is clear I use the return value of “sha1_file” as the name of the file. This makes sure the file name is unique but also avoids duplicate files.

Debbie needs Food and Sleep

I went to all of this trouble to make sure the File Some-User uploaded is a valid image, and it is. So the last thing I need to do is come up with a Unique Name for Some-User’s Image.

I thought you were saying to SHA1 something (??) and get a random Filename, right?

And I said, “What happens if you have a collisions between two hashed whatevers and you get 12345 and I get 12345?!”

I can’t have two separate User Pictures called “12345.jpeg”

So 100,000 Users from now, how do I ensure that NEVER happens??

Follow me?

Debbie

logic_earth’s solution is good but do understand that if you have any kind cleaning up, you need to be careful not to delete any files that still have references in the database.

The chance of a collision is 1/(2^51).

If you are THAT worried, then simply do a file_exists() check, and if it does exist, append/prepend something.

If the two user upload an image that returns the same hash then yes you can.
If the hash is the same, its means they are identical down to the last bit.
Saving both would be a waste of storage space.

sha1_file reads the file and hashes the contents of the file, its not random.
The same file returns the same hash until it is altered.

The DB is what connects a user to an image, the file name is irrelevent.
Have two users use the same image is perfectly acceptable.
For example, on Sitepoint a lot of users have the same avatar.

My understanding is that while very unlikely, you could hash “DoubleDee” twice and get the same resulting hash of “6666666”, right?

Since a collision in this case would overwrite another User’s Photo, that can’t ever happen.

After thinking about it, having file names be the same as something personally identifiable like e-mail or username seems like a bad idea!

So I am okay if a User’s Photo is given a Filename that is some random string of letters and/or numbers, but again, it must be unique!

Debbie