Within this page there is one product image with an ambiguous name such as random_product.png the only thing distinguising it from the other images on the page is that it has a location of catalog/random_product.png.
What I would like to do is have script scan all the product pages 1 - 6000 and save the image as the ID e.g., if random_product.png had an id of 1234 the script would save the file as 1234.png
Are there any scripts available that would handle this?
Who owns these images? Do they know you are using them? Have they given you permission? Images and video are protected works, you can’t just grab them and use them for your own convenience.
<?php
function save_image($pageID) {
$base = 'http://example.com/';
//use cURL functions to "open" page
//load $page as source code for target page
//Find catalog/ images on this page
preg_match_all('~catalog/([a-z0-9\\.\\_\\-]+(\\.gif|\\.png|\\.jpe?g))~i', $page, $matches);
/*
$matches[0] => array of image paths (as in source code)
$matches[1] => array of file names
$matches[2] => array of extensions
*/
for($i=0; $i < count($matches[0]); $i++) {
$source = $base . $matches[0][$i];
$tgt = $pageID . $matches[2][$i]; //NEW file name. ID + extension
if(copy($source, $tgt)) $success = true;
else $success = false;
}
return $success; //Rough validation. Only reports last image from source
}
//Download image from each page
for($i=1; $i<=6000; $i++) {
if(!save_image($i)) echo "Error with page $i<br>";
}
?>
You’ll have to add your own cURL code to load the HTML source of each page into the $page variable.
It’d probably be nice on the hosting web server not to do all 6000 pages in one go, and even for smaller runs you may need to increase your max execution time.
And remember that any copyright restrictions will still apply regardless of how you get the images.
Thanks for this, i’ll give it a shot and let you know how I get on, much appreciated. The web server is our own physical dedicated so no problems with using resources, only a few websites on it at present.