How can I do this with fgetcsv?

Hello,
I need to read csv file and analyze it to see if it contains error, and if errors occur then display error(s), and if not, then insert the values to the db.

The sample csv looks like

id,first_name,last_name,email
"12345","Jane","test@test.com" // incorrect row
"99999","John","Smith","test@test.com" // correct row
"232323",Sarah,"Smith","test@test.com"  // incorrect row

I just put the comment beside each row for you guys understanding.
As you see, each row should contain 4 columns, each separated with comma(,), enclosed with “”
First row is not correct because it only has 3 columns, and in this case, it should display error like this

Line 2   "12345","Jane","test@test.com"     Only 3 fields were found 

and last row is not correct because Sarah is not enclosed with “” , and in this case, it should display error like this

Line 4   "232323",Sarah,"Smith","test@test.com"    Firstname is not enclosed.

What I tried to do is

$handle = fopen($importFile, "r");
while (($line = fgetcsv($handle, 1000, ",",'"')) !== FALSE)
{
    $num = count($line);
    if ($num != 4)
    {
       do something;
    }
:
:
}

Q1: How can I display the error causing line as it is (like

"12345","Jane","test@test.com"

) from csv file in the middle as I read it with fgetcsv?
I tried

   for ($c=0; $c < $num; $c++)
    {
        echo $line[$c] . " ";
    }

in the do something; place, but it prints

12345 Jane test@test.com 
  • no comma, no “”

Q2: my fgetcsv - fgetcsv($handle, 1000, “,”,‘"’) - does not catch the error for line 4, even though Sarah is not enclosed with “” , it considers the row not error and proceed as normal.
Why is it? I need to force the each value in csv to have enclosure “”

Thanks for your help in advance.

I suppose you could use something other than fgetcsv()

But I’m wondering if it might work if you passed the fourth argument as NULL? eg.

while (($line = fgetcsv($handle, 1000, ",",NULL)) !== FALSE) 

It’s late for me so I don’t have time to try it right now, but if it works, it should over-ride the default value for the enclosure, and if you’re lucky it will pass everythiing between the , delimiter to the returned array.

Umm

while (($line = fgetcsv($handle, 1000, ",",NULL)) !== FALSE) 

does not work, it says

fgetcsv(): enclosure must be a character

I tried to use fgets() and str_getcsv() so I can display the row as it is with fgets() and still do csv with str_getcsv() like

$handle = fopen($importFile, "r");

while(! feof($handle))
{
    $row = fgets($handle);

    print_r(str_getcsv($row,",",'"'));
}

but str_getcsv just dies,no output. What did I do wrong?

I’ll appreciate any help

I’m busy right now, but as soon as I get a chance I’ll do some testing to see if I can come up with something.

Hi,

Your orignal problem with not detecting the missing " is because a csv doesn’t need them to be correct. They are there to make sure that you can use the delimiter (,) in the values itself ect…

If you want to have this functionality you need to write your own function.
Something like this:


function csvParser($handle, $delimiter = ',', $enclosure = '"', $escape = '\\\\')
{
    $headers = str_getcsv(fgets($handle), $delimiter, $enclosure, $escape);
    $need    = count($headers);
    $array   = array();

    while(!feof($handle)) {

        $row = fgets($handle);
        if(preg_match("/(?(?<!$enclosure),|,(?!$enclosure))/", $row))
            die("enclosing error: $row");

        $row = str_getcsv($row, $delimiter, $enclosure, $escape);
        if(count($row) != $need)
            die("count error: count($row) != $need");

        foreach ($headers as $pos => $key) {
            $named[$key] = $row[$pos];
        }

        $array[] = $named;
    }

    return $array;
}

$importFile = 'test.csv';
$handle = fopen($importFile, "r");

print_r(csvParser($handle));

this will die on the failures you want.
you can change the function to returning false or what ever. This is just an example that will give you an array filled with assoc arrays for easy access.

Just something to show you the right way.

it is only tested on PHP 5.3

Thanks, yes fgetcsv is only available on php 5.3 is the problem. Our server is 5.2 and there is no plan for upgrading to 5.3 Probably I need to write something my own… sigh

I do think something like fristi posted where testing the validity of the CSV file’s structure is better. That is, I think testing the file and using the file would be better done separately. But if you’re limited I’m sure you can hack something together even if it’s not robust, and it should work for your needs if not others when things change.

I’ll take your word for it that passing NULL doesn’t work. But hopefully you can use another character instead of the default enclosures.

I don’t know what kind of structural errors you need to worry about, but I threw together this

lmsook.csv

"valid row","12345","Jane","Doe","jd@test.com"
"missing enclosures","23232",Sarah,"Smith","ss@test.com"
"missing field","99999","John","js@test.com"
"empty field","98765","Dave",,"ds@test.com"
"missing delimiter","56745","Chip""Whitley","cw@test.com"
"wrong enclosures",'34985',"Santa","Claus","sc@test.com"
"wrong delimiter","39581","Luke","Skywalker";"ls@test.com"

and this using “^”
lmsook.php

<?php
error_reporting(E_ALL);
ini_set('display_errors', true);

$handle = fopen('lmsook.csv', "r");
while (($line = fgetcsv($handle, 1000, ",", "^")) !== FALSE) 
{
    $num = count($line);

    for ($c=0; $c < $num; $c++) 
    {
        echo $c . ': ';
        var_dump($line[$c]);
        echo '<br />';
    }
    echo '<hr />';
} 
?>

You could use count() to catch most of the errors, and for those that pass you could trim() the "s and then use the values.

I was sitting in the car and I realized that my function is not working in some cases…
if the comma is in the beginning or at the end of the string between the enclosures it will mark that that as an error, which is not good… the cases are not happening alot, but as a programmer you can not take that risk.
In the end I think it needs more complex processing to detect the places and numbers of $enclosers and $delimiters.

I haven’t had time to come up with something, if I do, I’ll let you know.

Just a heads up that the function I posted has bugs.

My last attempt at this :slight_smile:


class CsvReader
{
    private $_handle;
    private $_delimiter;
    private $_enclosure;
    private $_escape;
    private $_pattern;

    /*
     * @method Constructor
     * @param  resource-handle  $handle  a file handle opend with fopen
     * @param  string  $delimiter  optional delimiter of the csv fields
     * @param  string  $enclosure  optional enclosure of the csv fields
     * @param  string  $escape  optional escape char for the $enclosure
     */
    public function __construct($handle, $delimiter = ',', $enclosure = '"', $escape = '\\\\')
    {
        $this->_handle    = $handle;
        $this->_delimiter = $delimiter;
        $this->_enclosure = $enclosure;
        $this->_escape    = $escape;
        $this->_pattern   = '/(?<!' . preg_quote($escape) .')'.
                            preg_quote($enclosure) .
                            preg_quote($delimiter) .
                            preg_quote($enclosure) . './';
    }

    /*
     * @method getHeader
     * @param  bool  $rewind  after getting header rewind the handle or not
     * @return array returns an array containing the column names
     */
    public function getHeader($rewind = false)
    {
        rewind($this->_handle);
        $first = rtrim(fgets($this->_handle));

        $headers = (strpos($this->_enclosure, $first) === false) ?
                   explode($this->_delimiter, $first) :
                   preg_split($this->_pattern,$first) ;

        if ($rewind !== false) rewind($this->_handle);
        return $headers;
    }

    /*
     * @method checkSyntax
     * @return NULL|array returns NULL on success and an array containing the error rows on failure
     */
    public function checkSyntax()
    {
        $headers = $this->getHeader();
        $need    = count($headers);
        $err     = array();

        $nr = 1;
        while(!feof($this->_handle)) {

            $row    = rtrim(fgets($this->_handle));
            $pieces = preg_split(
                $this->_pattern,
                $row,
                -1,
                PREG_SPLIT_NO_EMPTY
            );

            if (count($pieces) != $need)
                $err[$nr] = $row;
            $nr++;
        }
        return (count($err) === 0) ?
                NULL : $err;
    }


    /*
     * @method checkSyntax
     * @param bool $check requires a syntaxcheck before reading?
     * @return array returns an array in thr form of $arr[$rowNr][$fieldname]
     */
    function csvToArray($check = true)
    {
        $errs = ($check == true) ?
                $this->checkSyntax() :
                NULL;

        if (!is_null($errs)) {
            $errStr = '';
            foreach ($errs as $nr => $err) {
                $errStr .= "$nr: $err<br/>\
";
            }
            throw new Exception("CSV file error:<br/>\
$errStr");
        }

        $headers = $this->getHeader();
        $amount  = count($headers);

        while ($data = fgetcsv($this->_handle, 0, $this->_delimiter, $this->_enclosure)) {
            $named = array();
            foreach ($headers as $pos => $key) {
                $named[$key] = $data[$pos];
            }
            $arr[] = $named;
        }

        return $arr;
    }
}

example:


$importFile = 'test.csv';
$handle = fopen($importFile, "r");
$reader = new CsvReader($handle);
try {
    $fetched = $reader->csvToArray();
} catch(Exception $e) {
  echo 'Message: ' . $e->getMessage();
  exit;
}
print_r($fetched);

I like that class a lot. You have good coding style.

I found only one problem. CSV files typically use newlines (\r,
, or \r
) as line terminators. The class finds the last empty newline as an error because it doesn’t equal $need as determined from the count($headers).

I don’t know how you would choose to address this but something like

public function checkSyntax()
.....

            if ( (count($pieces) != $need) && ($row != NULL) )
                $err[$nr] = $row;
            $nr++;

seems to work OK.

And the only thing that could be improved (for lazy people like me) is for the script to tell me what the error is so I don’t need to figure it out for myself :wink:

Thanks :slight_smile: Most of that style I picked up while coding Perl. I have come a long way to get this, and still looking for improvements. At this point I started to read Perl Best Practices to do so. With Perl you can do things in so many ways that if you don’t learn yourself a good style, the code just becomes unreadable…

Great! I didn’t think of that, in my example CSV I didn’t have a new line at the end of the last line so I didn’t encounter that problem. You’re fix seems ok to me, it’s just that I’m surprised that NULL seems to work because $row is the result of an rtrim function which should always return a string, so in case of the last line it should be an ampty string. Maybe it works because it doesn’t use the strict evaluation (!==)

I wanted to edit my class in my previous post to add your fix, but it seems I can not anymore… meh the OP just has to do it himself :wink:

The problem in there lies that if you want more info, then the checking needs to be done on a much more complex level, walking through the string and evaluating the order of chars or using some very complex regexes…
I thought it was overkill for something simple like this.

Maybe when I’m bored I will give it a shot. Or you are always welcome to try :wink:
didn’t put a copywrite on it so it’s free to be modified!

anyway thanks for your response, I appriciate it alot! it doesn’t seem like I wasted my evening then :wink:

:d’oh: I didn’t see the rtrim. I looked in the docs and it said “returns a NULL array” so I tried a strict equal, which failed, var_dump()ed and saw the empty string but instead of tracking it to the source tried the loose comparison which worked.

Off Topic:

I confess I need some sleep pretty badly, I’ve had a rough couple of days/nights, but I hate to admit it, and am good at fooling myself until such blunders pop up.

Anyway, as that’s the case than a not empty would be better

.....
            if ( (count($pieces) != $need) && !empty($row) )
                $err[$nr] = $row;
            $nr++;

And yes, returning the error type would add considerably to the complexity. Probably not worth it unless this is going to be used on an awful lot of files and very often. Even then it’s not so bad if someone needs to look at the line to determine the error. If someone is working with that many files the correct line format would be so ingrained as to almost ensure spontaneous recognition of errors.

Off Topic:

Yes, AFAIK there’s a 10 minute to edit window. This is to prevent someone from making a good post, getting by the moderators, and then coming back later adding in SPAM with less likelihood of getting caught. Frustrating for honest members and just another case of SPAMmers ruining it for the rest of us.