Link Checker tool

how can I check the report invalid links automatic in file

like I have some data entry company where people submit their data entry report in which they mention their link where they post data

how can I check those links that did those links are valid which they submit in report
did they really submit data
or not
how can i automatically check the links

Thanks

Hi realcoder,

You should write a validation function that parses input when it is submitted by the user. Inside the validation function you would do something like:

 
$hostname ='somebadmalforednonexistantdomain.com'; 
if( validateDomainName($hostname) == 1){
  /*write  $hostname to db */
} else {
  /*return error code and display or don't write to db */
}

function validateDomainName($hostname = '');
   $ip = gethostbyname($hostname );
    if (preg_match('/^(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/',$ip)) {
       return 1;
    } else {
       return 0;
    }
}
 

Quickly thrown together but something along that lines.

Regards,
Steve

well user provide lists of links where they have submit data
i want that i just put those links in any file and automatically that which links response is good and which links response is 404 … ?

Hi realcoder,

You asked

how can I check the report invalid links automatic in file

The steps might be:

  1. read the file
    $filename = "list.txt";$list = getListFile($filename); function getListFile($filename){ $fh = fopen($myFile, 'r'); $theData = fread($fh, 5); fclose($fh); return $theData; }
  2. For the sake of this example $list = ‘This is a file that has a couple of domains, first one is “http://www.liviam.ca” the next is “https://sitepoint.com”, “http://example.com?id=4[”’;
  3. So you need to parse the $list for clean domains that their revese DNS checks out so you do
    $unclean_domains = doReg($list); $clean= array(); foreach($unclean_domains as $domain){ $domain = parse_url($domain); if( validateDomainName($domain['host']) == 1){ $clean[] = $domain['host']; } } var_dump($clean); /* Functions */ function doReg($string){ $regex = '~(?:https?|irc|ftp|file)://(?:www)?.*?\\.(?:com|net|info|org|ca)~i'; preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER); return $result[0]; } function validateDomainName($hostname = ''){ $ip = gethostbyname($hostname ); if (preg_match('/^(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/',$ip)) { return 1; } else { return 0; } }
  4. Var_Dumping $clean outputs
    > array(3) { [0]=> string(13) “www.liviam.ca” [1]=> string(13) “sitepoint.com” [2]=> string(11) “example.com” }
    You can then loop through this array to grab each of the valid links.

You don’t want to rely on returned 404 codes as many different urls can be created that don’t return 404. What is demonstrated above is taking a string, extracting urls from string, testing that the urls are registered to a valid reverse DNS and cleaned of any parameters, then returns a clean array of domains that you can put into a database, or a file or an object… whatever you want.

Regards,
Steve