To use file_exists() or not?

Hi guys,

We’re having a debate over the pros and cons of using file_exists() to check whether an image exists or not, prior to deciding whether to show it or a holding image. Generally we agree that it’s a good thing as it prevents image placeholders being displayed by accident, but on some pages that have a lot of images, especially ones with a lot of traffic, it hammers the server somewhat and introduces significant processing delays.

The way that it’s used in this case is basically to check whether a resized, cached version of an image exists or not. If not then the image will be processed and saved as necessary. It seems to be the only way to do this particular task but we still have these performance considerations, especially on large directories on busy sites.

So I’m wondering if anyone has any thoughts and real-world solutions? Is file_exists() the best solution? is_file() instead? stream_resolve_include_path()? Just curious about what different people have chosen to do with these requirements.

Thanks

file_exists() is what I’ve always used. According to the PHP documentation, “Note: The results of this function are cached. See clearstatcache() for more details.” The same is true of is_file(). You would have to look at the PHP source code to see which function does more work and what order that work is done in (my guess is that both are equally performant).

The bigger problem is how you are processing the images. You are waiting until the last possible second to create a cached, resized the image. Traditional “lazy evaluation” doesn’t work well in multi-user environments under load. This is because multiple users will make the same request for the same resource and then the system has to work twice as hard to resolve each request (i.e. both users run the image resizer code on the same image). And you will also start seeing weird scenarios that involve file locking.

What you want to do instead is schedule the resize operation by storing finished images in one directory and working images in another. Then, scan the working directory periodically and resize and move images to the final location. Your script can check for the existence of a scheduled image and show a different image (e.g. “This image isn’t available yet.”).

Ideally, you would create all the sizes of the image you will ever need during the upload process of the original image, but that isn’t always possible.

Just a few ideas to chew on.

Thanks for that. Different articles I’ve read (including comments on PHP.net) suggest that file_exists() is as much as half as fast as the other two, though these appear to be very simple benchmarks and don’t take in to account a heavily loaded server, for example, or massive directories.

I agree that this is a lazy way of doing it. The problem with doing it at upload is that a) not all images need to be resized to all sizes and b) if we add the requirement for another size at a later date then we’ll not have the sizes that we need. Ideally we could do with a combination, I suppose, or schedule them to be resized as you say.

My boss at a previous employer wrote a caching system (for HTML really) whereby he had a status file that was checked first. It had either a 0, 1 or 2 in it. I forget exactly which meant what, but one stated that a page was already cached, one stated that a new cached version was being generated (you wouldn’t expect this to last for long) and another that the cached file had expired. If I hit an expired page then the status file would be changed to indicate that a new one was being generated but I’d still see the old file. If you then hit it while the cached file was still being generated then you’d be given the old file again but no further action would be taken. After the page was generated the status file would be updated to say that it was cached and the next person would see the new page. If you wanted to manually expire a page then you just changed the status.

It prevented people ever having to wait for a new page to be generated (in some cases that was a god-send as some pages took forever to generate due to poorly designed database schema and massive data-sets) but it also meant that you might be looking at data that is no longer valid. Some files we regenerated over night to ensure that they were the up to date every day (like what the best selling products were yesterday, as that would be the same from midnight to midnight). Not sure if you can implement something like this for images though. In fact, it’s going to be significantly more intensive than just checking if a file exists or not.

Hi Antnee,

Have you looked at page file caching your site? I believe there are lots of PHP options available, search and see if there are suitable options.

If file_exists() should only be used once and results used to build the HTML page. The resultant HTML page should be cached and then used for subsequent requests.

Cached pages can be deleted at any time and the php script used again to generate a new page which is cached once again…

edit:
Looks like you posted just as I was creating my response to your original post :slight_smile:

We can’t cache whole pages, just sections of it, unfortunately. But yes, I agree and have done so in the past

For the first issue: With hard drive space cheap as dirt, is this really an issue?

For the second issue: When that happens, write a script that walks the directory and converts the originals to the new size. Only has to be done one time and can be done all in one go.

Also, when working with large numbers of images, prefer GraphicsMagick over GD and ImageMagick. It takes advantage of multi-core hardware to pipeline the conversion and also has built-in features like “I’m going to be sizing this to WxH, so don’t bother loading the whole image into RAM.”

Hard drive space is cheap, managed backup space is not :wink: I’m currently on a mission to find as many directories as possible to exclude from some of our backups as they already cost a lot and we’re way over quota on some servers. We’re also running a lot of sites on a small number of servers. Some sites get a lot of traffic, some get the odd person every few weeks. I’m reluctant to schedule anything as it’ll probably end up doing a load of unnecessary work. Lots of considerations to make before making a start on anything… like how am I going to update 1200+ sites with no common codebase? :open_mouth:

You could also consider using JS to check if an image exists, this would take the burden off your server.

Really? Would doing it via JS not still be making additional checks to the filesystem?

No, you would pass the image in PHP without checking. In JS, it would check to see if the image loaded, it provides an onerror event handler, from where you could replace it with the placeholder image.

Is file_exists() really the source of “hammering” the server? I only ask because it sounds like the machine will have a lot of other things happening, including the actual image processing too, which would commonly be more of a problem.

It’s still a lot slower if the cached images are found than with file_exists() disabled

What about using htaccess to detect broken image files and replacing them with a placeholder? Not sure if this is viable, but you would then be able to drop the file_exists and allow apache to do the work.

@Antnee

We’re having a debate over the pros and cons of using file_exists() to check whether an image exists or not, prior to deciding whether to show it or a holding image. Generally we agree that it’s a good thing as it prevents image placeholders being displayed by accident, but on some pages that have a lot of images, especially ones with a lot of traffic, it hammers the server somewhat and introduces significant processing delays.

Can you supply a link to your site?

It’s still a lot slower if the cached images are found than with file_exists() disabled

Have you tried using the file_exists() and not displaying the images. Does nt loading the images “still hammer the server” and slow the page rendering?

Have you tried moving the images to a sub-domain? There are many helpful articles explaining ? I like this one:

http://www.thehobbyblogger.com/don’t-let-images-bog-down-your-blog/

  1. Host images on a subdomain

Did you ever have a kiddie pool? Do you remember how long it took to fill it up with your one hose? What if you could’ve used your neighbor’s hose at the same time and filled up the pool twice as fast?

Most web browsers allow only two to four “hoses” or connections from a single domain (like TheHobbyBlogger.com) to download content. So if you have a lot of images on your blog, your readers might have to wait for one or more images to load before they can begin reading your content.

By creating a subdomain such as img.yourdomain.com to host your images, you effectively add a second set of hoses to fill your readers web browsers. This allows your blog content to load simultaneously with your images and speed up your page loads.

@cpradio - that might not be a bad idea and would be worth investigating. All requests already go through an .htaccess to direct the requests to the right place so I suppose it should theoretically be possible. Have never done such a thing though so would require some additional learning. Never a bad thing though!

@John_Betong - As I mentioned earlier, there are a huge number of sites on a small number of servers and it’s the cumulative effect that is causing the problem. It’s not a massive problem - the servers get by fine - but I just noticed that if we removed the file_exists() checks that performance increases even more.

Ultimately this was a hypothetical question, just to back up a debate we were having, though I really appreciate the practical solutions, so thanks guys.

@Antnee,

A quick search shows it should be possible. [url=http://stackoverflow.com/questions/7744869/404-image-placeholder]Example 1 and [url=http://www.webmasterworld.com/forum10/2915.htm]Example 2

Thanks :slight_smile: Saved me some effort later :smiley:

@Antnee, you could use it two ways (maybe even more).

  1. You could just always try and output the file requested, let Apache pick up on its broken, and send the REQUEST_URI to a new php file that outputs a placeholder and kicks off a background task that generates the necessary image. (ie: placeholder.php?requestedFile=%{REQUEST_URI}

  2. You can continue your image resizing the way it exists today, and let Apache pick up on any broken images after your request has been served to the user. (ie: using placeholder.jpg for your rewriterule).

I’d personally got with option #1, but it gives you discussion opportunities in your team.

Indeed. I like option #1 too. Thanks