Detecting orphan files

Does anyone know of a PHP script that will find orphan files on a web site? - i.e. files that are not referenced in any other files, nor in a database.

I’ve just inherited a site that has not been maintained very well. There appear to be hundreds of files scattered about which are not referenced anywhere - I need to be sure that they are true orphans before I remove them.

Regards

Ok - I guess such a script doesn’t exist. Looks like I’ll have to write one myself.

If anyone is interested in the end result, let me know.

Searching “PHP find orphaned files” turns up various solutions, some involve a good IDE, IDE addons and so on.

If its html/js/image files then I think it would be easy enough, but when it comes to include files outside of the document root (which is what I originally imagined you meant), then I think you’d be pushed.

For that you’d maybe want something which runs for a while which “touches” the included file somehow, not even sure if that would do-able, unless all include classes were brought in via an autoloader.

Thanks cups

I have scoured the net for a script but failed miserably.

I’m looking at attempting to code the following:

  1. Build a repository file containing all the contents of every text-based file on the site - I already have a function that will perform this, and it doesn’t take too long, e.g. a site with 2000+ files takes around 10 seconds, creating a file of around 24mb. Depends on how big the files are, of course.

  2. Do a recursive scan of the site, listing all file names on the site and their paths.

  3. Then scan the repository file for each file name (not path) found in step 2. This obviously wouldn’t be 100% as files with the same file name might be in various locations. However, it would be a reasonably good indication. I don’t think I would be able to search for the path as links could contain relative or absolute paths and might be referenced from various locations. Stand to be corrected here.

  4. Produce a list of all the files in the web, indicating whether or not a reference to the file was found.

  5. If no reference is found in step 4, provide the ability to search an entire database for reference to that file name, if the site is db driven that is.

The above would only cover internal references to the files and possibly include some false positives, but that would be enough for my purposes.

If you or anyone can throw in same more ideas, it would be appreciated.

Regards to all

It may be time consuming but try opening Putty and run this from the command line.


find /var/www/  -name "*".php  -type f -print0  | xargs -0 grep oddfile.php | uniq -c  | sort -u  | cut -d":" -f1 | awk '{print "-rf " $2}' | uniq

This will find in all of the .php files if there is a reference to oddfile.php be it an include or require.
You may have some files that end in .inc so check them also.
If nothing is found, you can mark the file as unused or delete it.

Thanks for the suggestion lorenw

Unfortunately, putty is not an option. I have ftp access and a control panel - that’s it. As you say, it would be kinda time consuming too.

Regards

Is there any sort of CMS in use or is the site currently a collection of static pages?

How many pages are there atm?

Do you expect many pages to be added to the site in the future?

Hi SpacePhoenix

The site belongs to a glazing company and it’s running on joomla. There are hundreds, possibly thousands of image files in various directories. Not sure how they want to proceed yet, but my first task is to tidy up the damn thing.

Happy new year by the way

Before you proceed any further you should make sure that the Joomla installation is up to date, the same goes for any plugins that may be in use. Also before attempting to identify what files are in use and what is an “orphan” file make a complete backup of all the files and of the complete database in case anything goes wrong

I have limited experience with joomla, but it appears to be up to date, and everything is already backed up.

At last, I have a working prototype, with front-end up and running and it seems to do the trick.

Anyone care to test it for me?

Best put it in a password protected directory - form validation is minimal.

I used this code from my own

<?php
/*
Autor/Author: Fernando Gámbaro fgambaro - hotmail . com
Fecha/date: 20/09/2013 - 2013/09/20

El objetivo principal de este programa es crear una lista con aquellos archivos que
no están siendo referenciados dentro de un sitio web.
Para ello creamos un array con todos los archivos del sitio
(incluidos sub carpetas) seleccionamos que archivos buscar, y para cada archivo
del tipo php, html, css o js, buscamos referencias dentro de ellos.
Algunas funciones las obtuve desde internet

The main objective of this program is to create a list of files that are not being referenced in a website.
To do this we create an array of all the site files (including sub folders) select which files
search, and for each file type php, html, css or js, seek references within them.
Some functions are got from the internet

*/
function listdir($dir='.') {
    if (!is_dir($dir)) {
        return false;
    }

    $files = array();
    listdiraux($dir, $files);

    return $files;
}

function listdiraux($dir, &$files) {
    $handle = opendir($dir);
    while (($file = readdir($handle)) !== false) {
        if ($file == '.' || $file == '..') {
            continue;
        }
        $filepath = $dir == '.' ? $file : $dir . '/' . $file;
        if (is_link($filepath))
            continue;
        if (is_file($filepath))
            $files[] = $filepath;
        else if (is_dir($filepath))
            listdiraux($filepath, $files);
    }
    closedir($handle);
}

$files = listdir('.');
/*
Desde aquí lo elabore yo.-

From here I did.-
*/

$files = array_unique($files, SORT_REGULAR);
sort($files, SORT_LOCALE_STRING);

global $tipo;
$tipo = array();

/*
Para cada uno de los archivo encontrados en los directorios
Busco referencias a los archivos del tipo png, jpg, php, y html
dentro de los archivo de los archivos php y html.-

For each of the directory file found in
I am looking for references to files like png, jpg, php, and html
file within php and html files.-
*/

foreach ($files as $f) {

	if ( strpos($f, ".html") == true  or strpos($f, ".png") == true or strpos($f, ".jpg") == true
	or ( strpos($f, ".php") == true and strpos($f, ".php~") != true ) ) {
		$tipo[] = $f;
	}
}

$tipo = array_unique($tipo, SORT_REGULAR);
sort($tipo, SORT_LOCALE_STRING);


	foreach ($files as $ff) {
		if ( strpos($ff, ".html") == true  or (strpos($ff, ".php") == true and strpos($ff, ".php~") != true)
		 or strpos($ff, ".css") == true or strpos($ff, ".js") == true ){
		 	/*
				Muestro en que archivo estoy realizando la búsqueda y de lo buscado que encuentro.-
				
				I show that I am doing file search and the sought that meeting.-		 	
		 	*/
			echo "Viendo archivo -> ".$ff."<br>";
			$found=false;
			$lines = file($ff);
			foreach($lines as $line) {
				$line=str_replace('"','',$line);
				foreach ($tipo as $key => $t) {

					if ( strpos($f, "/") == true ) {
						$pieces = explode("/", $f);
						$f = array_pop($pieces);
					}
					$f=str_replace('"','',$f);
				
					if ( strpos($line, $t) != false ) {
	            	echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;".$t."  <- encontrado<br>";
	            	unset($tipo[$key]);
		         	$found=true;
      	      }
				}

			}

 	  }	
	}	

echo "Resultado<p>";
	/* Muestro el archivo no encontrado.-
		We show file not found.-	*/
	foreach ($tipo as $key => $tip) {
		echo $key." => ".$tip."<br>";
	}
?>

This is an old thread, that I don’t see the need to re-open, closing it.