in_array problem

Hi,

Working on a small script that will read in a file containing filenames, and check if the filename is to be deleted or not. My in_array() function only seems to work on the last entry in the earray. Here is the code …


<?php

$files = array(
	      '403.php',
	      '404.php',
	      '406.php',
	      '414.php',
	      '500.php',
	      '501.php',
	      'favicon.ico',
	      '.htaccess',
	      'robots.txt');

print_r($files);

$handle = @fopen('WP_ro_files.txt', "r");
if ($handle) {
   while (!feof($handle)) {
       $filenames[] = fgets($handle, 4096);
   }
   fclose($handle);
}

//print_r($filenames);

foreach ($filenames as $filename) {
	$delete_ok = 1;

    if (in_array($filename, $files)) {
	echo "Filename found - $filename\
";
	$delete_ok = 0;
    } else {
	$pieces = explode("/", $filename);
	$result = count($pieces);
	$path = $pieces[1];   //get the second element, which is the path or filename
	
	if ($path == "plugins"){	//no files to be deleted in the plugins path
	    $delete_ok = 0;
	}
	if ($path == "themes"){	//no files to be deleted in the themes path
	    $delete_ok = 0;
	}
    }

    if ($delete_ok){
	echo "File to be deleted - $filename\
";
	//put code here yo unlink the file (path name + filename)
    } else {
	echo "File will not be deleted - $filename\
";
    }
}

?>

When a file is matched from the array $files, it is expected to find an entry, but it only finds the last one ??

J

Sorry, but I’m not entirely sure what you are doing, but if the found files ($filenames) you want to compare looks like this:


$filenames = array(
          'favicon.ico', 
          '.htaccess', 
          'robots.txt'); 

and the white-list you want to check against looks like this:


$files = array( 
          '403.php', 
          '404.php', 
          '406.php', 
          '414.php', 
          '500.php', 
          '501.php', 
          'favicon.ico', 
          '.htaccess', 
          'robots.txt'); 

Then you can grab the intersections of those 2 arrays with:


$crosses = array_intersect($files, $filenames); 
var_dump($crosses);

gives a positive array you can loop through:

array
6 => string ‘favicon.ico’ (length=11)
7 => string ‘.htaccess’ (length=9)
8 => string ‘robots.txt’ (length=10)

Thanks Cups. :slight_smile:

Using my test data, I still only get one match, but using your test data, I get all 3 matches okay. So, there must be something wrong with this code


$handle = @fopen('WP_ro_files.txt', "r");
if ($handle) {
   while (!feof($handle)) {
       $filenames[] = fgets($handle, 4096);
   }
   fclose($handle);
} 

which loads the array filename. Is that the correct way to load an array from a file ?

J

Noticed that no matter what method I used to load the file into an array, the results were the same, only one match. This match was the last value in the array, and then noticed all other lines has spaces to the right of the filename. So, added this …


foreach ($filenames as &$filename) {
    $filename = rtrim($filename);
}

and got all expected matches. There must be a more elegant method to load a file into an array, and do a “rtrim” at the same time ??

The in_array() works okay now, it was the method I was using to load the file, should be doing a right trim on ach line in file.

J


while (!feof($handle)) {
       $filenames[] = rtrim(fgets($handle, 4096));
   }

Since fgets() should stop reading when it reaches a new-line, you might also want to take a look at the script that creates the file in input (if it’s created by one of your scripts) and make sure a new line character is placed directly after the filename, leaving no tracing spaces.

$filenames = file('WP_ro_files.txt',FILE_IGNORE_NEW_LINES| FILE_SKIP_EMPTY_LINES);

or if you want to be entirely sure,

$filenames = array_map('trim',file('WP_ro_files.txt',FILE_IGNORE_NEW_LINES| FILE_SKIP_EMPTY_LINES));

Tweak your file searches for PCRE…


$files = array(
          '~403\\.php~',
          '~404\\.php~',
          '~406\\.php~',
          '~414\\.php~',
          '~500\\.php~',
          '~501\\.php~',
          '~favicon\\.ico~',
          '~\\.htaccess~',
          '~robots\\.txt~',
          '~/plugins~',
          '~/themes~'
);

Now filter your array.


$todelete = array_diff($filenames,preg_filter($files,$files,$filenames));

So what am I doing here?
[FPHP]file/FPHP returns the file, split into an array.
[FPHP]array_map[/FPHP], in the optional step, applies the [FPHP]trim[/FPHP] function to every element of the array.
Changing the search is plain enough; putting it into regex so that it can be read into a preg function.
Filtering the array:
[FPHP]preg_filter[/FPHP] goes through the array, and filters out any files that do not match one of the patterns. This will leave a result of every file in the array that is in your ‘skip list’.
[FPHP]array_diff[/FPHP] then takes that result, and subtracts those entries from the original array; resulting in the files that were NOT in your ‘skip list’.

I first wanted to say file(), then decided against it because it loads the entire file at once which could create a problem with a huge file. Which of course is a no-reason, because in the code the entire file is loaded in an array anway… duh.

Okay, thanks for the code and tips on new lines.

Thanks for the code tips, and explaining what each function does. I’m not familar with regex, but can see what the patterns will do. The script might get down to just a few lines now. :smiley:

When I try this test script …


 $filenames = array_map('trim',file('WP_ro_files.txt',FILE_IGNORE_NEW_LINES| FILE_SKIP_EMPTY_LINES)); 
//Tweak your file searches for PCRE...
//PHP Code:
$files = array( 
          '~403\\.php~', 
          '~404\\.php~', 
          '~406\\.php~', 
          '~414\\.php~', 
          '~500\\.php~', 
          '~501\\.php~', 
          '~favicon\\.ico~', 
          '~\\.htaccess~', 
          '~robots\\.txt~',
          '~/plugins~',
          '~/themes~'
); 
//Now filter your array.
//PHP Code:
$todelete = array_diff($filenames,preg_filter($files,$files,$filenames));
var_dump($todelete);

the dump returns all filenames. The preg_filter() function has the array $files defined as 2 parameters, the “pattern”, and the “replace”. Is that correct ?

I think the contents of $filenames is causing unexpected results. There are filenames like

/wp-includes/js/tinymce/plugins/wpeditimage/js/editimage.dev.js
wp-includes/js/tinymce/themes/advanced/skins/wp_theme/ui.css

whereas I only wanted to look for “plugins” or “themes” in the second part of the pathname, not anywhere.

edit: Also, the result of the preg_filter() has changed the values, such as

wp-content~/themes~/twentyten/loop.php

re: the changing values: bleh. Yeah, thats what i get for being lazy.
re: the unexpected results, ah, okay, i missed that caveat.


 $filenames = array_map('trim',file('WP_ro_files.txt',FILE_IGNORE_NEW_LINES| FILE_SKIP_EMPTY_LINES));

$files = array('~403\\.php~','~404\\.php~','~406\\.php~','~414\\.php~','~500\\.php~','~501\\.php~','~favicon\\.ico~','~\\.htaccess~','~robots\\.txt~','~^(/?[^/]+)/plugins~','~^(/?[^/]+)/themes~');
$filesrep = array('403.php','404.php','406.php','414.php','500.php','501.php','favicon.ico','.htaccess','robots.txt','$1/plugins','$1/themes');

$todelete = array_diff($filenames,preg_filter($files,$filesrep,$filenames));
var_dump($todelete);  

Thanks StarLion, that worked perfectly. :slight_smile:

Hmm, went to use the script on a site today, and got …

PHP Fatal error: Call to undefined function preg_filter()

I see you have to be running php version 5.3 or greater to use this function; the site uses 5.2.10

What did people use prior to the function preg_filter() being available ?

The comment on the preg_filter manual page provide a workaround.

Thanks Cups. In the end I shifted to another server (same hosting company), and that solved the problem.