Does anyone know of a PHP script that will find orphan files on a web site? - i.e. files that are not referenced in any other files, nor in a database.
I’ve just inherited a site that has not been maintained very well. There appear to be hundreds of files scattered about which are not referenced anywhere - I need to be sure that they are true orphans before I remove them.
Searching “PHP find orphaned files” turns up various solutions, some involve a good IDE, IDE addons and so on.
If its html/js/image files then I think it would be easy enough, but when it comes to include files outside of the document root (which is what I originally imagined you meant), then I think you’d be pushed.
For that you’d maybe want something which runs for a while which “touches” the included file somehow, not even sure if that would do-able, unless all include classes were brought in via an autoloader.
I have scoured the net for a script but failed miserably.
I’m looking at attempting to code the following:
Build a repository file containing all the contents of every text-based file on the site - I already have a function that will perform this, and it doesn’t take too long, e.g. a site with 2000+ files takes around 10 seconds, creating a file of around 24mb. Depends on how big the files are, of course.
Do a recursive scan of the site, listing all file names on the site and their paths.
Then scan the repository file for each file name (not path) found in step 2. This obviously wouldn’t be 100% as files with the same file name might be in various locations. However, it would be a reasonably good indication. I don’t think I would be able to search for the path as links could contain relative or absolute paths and might be referenced from various locations. Stand to be corrected here.
Produce a list of all the files in the web, indicating whether or not a reference to the file was found.
If no reference is found in step 4, provide the ability to search an entire database for reference to that file name, if the site is db driven that is.
The above would only cover internal references to the files and possibly include some false positives, but that would be enough for my purposes.
If you or anyone can throw in same more ideas, it would be appreciated.
This will find in all of the .php files if there is a reference to oddfile.php be it an include or require.
You may have some files that end in .inc so check them also.
If nothing is found, you can mark the file as unused or delete it.
The site belongs to a glazing company and it’s running on joomla. There are hundreds, possibly thousands of image files in various directories. Not sure how they want to proceed yet, but my first task is to tidy up the damn thing.
Before you proceed any further you should make sure that the Joomla installation is up to date, the same goes for any plugins that may be in use. Also before attempting to identify what files are in use and what is an “orphan” file make a complete backup of all the files and of the complete database in case anything goes wrong
<?php
/*
Autor/Author: Fernando Gámbaro fgambaro - hotmail . com
Fecha/date: 20/09/2013 - 2013/09/20
El objetivo principal de este programa es crear una lista con aquellos archivos que
no están siendo referenciados dentro de un sitio web.
Para ello creamos un array con todos los archivos del sitio
(incluidos sub carpetas) seleccionamos que archivos buscar, y para cada archivo
del tipo php, html, css o js, buscamos referencias dentro de ellos.
Algunas funciones las obtuve desde internet
The main objective of this program is to create a list of files that are not being referenced in a website.
To do this we create an array of all the site files (including sub folders) select which files
search, and for each file type php, html, css or js, seek references within them.
Some functions are got from the internet
*/
function listdir($dir='.') {
if (!is_dir($dir)) {
return false;
}
$files = array();
listdiraux($dir, $files);
return $files;
}
function listdiraux($dir, &$files) {
$handle = opendir($dir);
while (($file = readdir($handle)) !== false) {
if ($file == '.' || $file == '..') {
continue;
}
$filepath = $dir == '.' ? $file : $dir . '/' . $file;
if (is_link($filepath))
continue;
if (is_file($filepath))
$files[] = $filepath;
else if (is_dir($filepath))
listdiraux($filepath, $files);
}
closedir($handle);
}
$files = listdir('.');
/*
Desde aquí lo elabore yo.-
From here I did.-
*/
$files = array_unique($files, SORT_REGULAR);
sort($files, SORT_LOCALE_STRING);
global $tipo;
$tipo = array();
/*
Para cada uno de los archivo encontrados en los directorios
Busco referencias a los archivos del tipo png, jpg, php, y html
dentro de los archivo de los archivos php y html.-
For each of the directory file found in
I am looking for references to files like png, jpg, php, and html
file within php and html files.-
*/
foreach ($files as $f) {
if ( strpos($f, ".html") == true or strpos($f, ".png") == true or strpos($f, ".jpg") == true
or ( strpos($f, ".php") == true and strpos($f, ".php~") != true ) ) {
$tipo[] = $f;
}
}
$tipo = array_unique($tipo, SORT_REGULAR);
sort($tipo, SORT_LOCALE_STRING);
foreach ($files as $ff) {
if ( strpos($ff, ".html") == true or (strpos($ff, ".php") == true and strpos($ff, ".php~") != true)
or strpos($ff, ".css") == true or strpos($ff, ".js") == true ){
/*
Muestro en que archivo estoy realizando la búsqueda y de lo buscado que encuentro.-
I show that I am doing file search and the sought that meeting.-
*/
echo "Viendo archivo -> ".$ff."<br>";
$found=false;
$lines = file($ff);
foreach($lines as $line) {
$line=str_replace('"','',$line);
foreach ($tipo as $key => $t) {
if ( strpos($f, "/") == true ) {
$pieces = explode("/", $f);
$f = array_pop($pieces);
}
$f=str_replace('"','',$f);
if ( strpos($line, $t) != false ) {
echo " ".$t." <- encontrado<br>";
unset($tipo[$key]);
$found=true;
}
}
}
}
}
echo "Resultado<p>";
/* Muestro el archivo no encontrado.-
We show file not found.- */
foreach ($tipo as $key => $tip) {
echo $key." => ".$tip."<br>";
}
?>