I have a PHP script which uploads a file to a CDN using cURL.
They send me back an MD5 checksum (or I can generate and send them one for them to check).
Either way I check it, I need to generate an MD5 on my end. Problem is, MD5 is SLOW. In about an hour the file had only uploaded about 65% (100MB). I took out the MD5 to check and a 500MB file uploaded much quicker.
It's only one file at a time (right now), so I can't really parallel any calls since there is nothing really to parallel.
Any tips to make this faster?
Dumb question, can you (or them) create the checksum, store it somehow (text), upload the file, and check it after the file's been uploaded?
Another dumb question, does it have to be PHP?
What do you use to generate the md5 checksum?
As you mentioned when you removed the md5 check it was faster, this leads me to believe you create the md5 checksum before uploading, or?
I use md5 on files myself. I have one file which is around 650Mb and it takes less than a minute to perform md5_file() on it.
You're either on a shared server or you're using your own md5 function because it shouldn't take that long.
Sounds like a processor issue... beef up the machine you're using.
Are you doing an md5() on each packet you send? How big is a packet?
The first time I was generating the md5 before I sent, though I can do it before or after in this case.
I'm using md5_file() to generate the MD5.
We're on a cloud server which is currently running at 512MB, I can (and will when it goes to production) beef it up, but would that cause a substantial increase?
I'm doing the md5_file() on the entire file once it is uploaded (by fopening the tmp_name and giving it that).
It might seem your limitation is on the cloud server, especially disk IO and how much memory your script can utilize (php memory limit). While cloud servers are nice since they allow you to easily expand when required they can in some cases be "slower" than a normal dedicated with same specs due to the restrictions running on the cloud system (if this is a case for you, would of course depend on the cloud provider you use etc).
What you might give a try is to use md5sum directly, in the past we have used that as we found it to be faster than md5_file.
md5sum directly, as in use exec() to call it?
I saw that may be a good idea. I'll test it out next week and see if it provides any substantial help.
It's not a huge deal if it takes a while because I can upload and then take my time with the checksums. However, it's likely we'll have a pretty large number of people uploading a pretty large number of files over the course of a few hours at the end of each week, so I want to make things as quick and as efficient as possible, and right now this seems to be the largest bottleneck I might have some level of control over.
I doubt it's a memory issue... likely I/O and processor. You should be on a dedicated server with a decent processor and direct, singular I/O access. Try running top and iotop to confirm this assumption.
Yes or by using backticks, whatever floats your boat
I can not remember how much faster it was on our case as its a few years since we did that project, but it was big enough difference for it to matter when you checked larger files.
This project has been put on hold until late this week/early next week, so I haven't had a chance to test this.
Once I do, I'll report back my findings.