Reading csv. How to convert array from numeric keys?


$keys = array('name', 'address');

handle = fopen("data.csv", "r") ;

    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {

// ?

   }

What is the most efficient way of turning a sample data.csv file such as this:


name, address
"C Stunt", "Letsby Avenue"
"Ishood Coco", "Arfway house"

into an associated array such as


$arr[0]['name'] = "C Stunt" ;
$arr[0]['address'] = "Letsby Avenue" ;
$arr[1]['name'] = "Ishood Coco" ;
$arr[1]['address'] = "Arfway house" ;

Naturally the real csv file has many more columns than this, I just wondered if there was an easy way of doing it, rather than re-assigning the values by hand?

… and when it bites you no the *rse don’t come running to me… :slight_smile:

Y’see - thats why I know I will never, ever be a ninja like you - just as I know I will never play for England nor become a famous Hollywood actor.

Thank the lord for ninjas, soccer players and actors though.

I admire your ‘JFDI fu’ Mr.Geraghty, good luck and godspeed. :wink:

Off Topic:

Given the location of the file you mentioned off-board, would YQL be an option to filter the results?

Sometimes there is a clash between SRP, JFDI and the clock and sometimes the wrong one wins, and I know it and I feel bad but at least I know why and I don’t care - so shoot me.

If have never needed to do this to a csv file before, in the unlikely event that I ever do then I will go back and refactor.

(just seeing that makes me more inclined to go back NOW… the force … is strong … I feel weak …I can resist…)

I mean I just know, know, know in my bones that I should be unit testing this too, but it has to get out the door and onto a website like, yesterday.

ps Thanks to all for your input on this one by the way, really interesting list of solutions.

I still think additional filtering should be separate from the object that composes the associative entry.

This particular method (or rather any method) should ‘do one thing and one thing only’ to quote Uncle Bob. In this case, it’s determining whether or not the entry is acceptable for the associative array.

It’s a code smell, definitely.

How would you change that filter? You have to alter the object internals, and well, you know how that goes.

Create a filter for each filter, which sounds kinda obvious when you say it like that, from an OO POV.

Don’t make me crack open the SRP wiki link. :smiley:

In order to filter the original csv file, I used Salathes idea and am mucking about in the accept() method.

This works as expected, but looks wrong;


    public function accept()
    {

    $item = $this->getInnerIterator();
    $i = $item->current() ;

    if( $this->_count === count(parent::current()) && $i[14] === "ABC" )
        return true;


    }

I’ll probably abstract away the hard coded match between item 14 and ABC, but am I accessing the 15th array item in the best way, or is there something obvious I missed?

There’s nothing stopping the class from bring rewritten to accept an SplFileObject in the constructor, but then you’d have to keep track of whether we’re looking at a CSV file and behave accordingly since pretty much everything else in there is tailored towards this particular, individual use case. That’s a detail for Cups to ponder, to see which he prefers.

I can be less awesome if you’d prefer. (:

Thanks Salathe!

I was thinking of doing something similar, but didn’t like the idea of creating the CSV and flags internally. However, the combination of the LimitIterator and FilterIterator is something I hadn’t considered.

I like that a lot.

Thanks again.

Anthony.

ps. If you weren’t so awesome, I think you’d be my current nemesis.

For what it’s worth, a minimalist version of Anthony’s AssociativeSplCsvFileObjectIterator might be of use. It uses a FilterIterator (to only accept rows with count(headers) fields) which extends IteratorIterator (which does the iterator of our SplFileObject without us having to redefine the basic iteration methods).

(The class name is just poking fun, probably best not to keep it for yourself.)


class MinimalistFilterIteratorVersionOfAnthonysAssociativeSplCsvFileObjectIterator extends FilterIterator
{
    protected $_headers, $_count;
    public function __construct($path)
    {
        // Build CSV file iterator
        $csv = new SplFileObject($path, 'r');
        $csv->setFlags(SplFileObject::READ_CSV);
        // Remember column names and count
        $this->_headers = $csv->current();
        $this->_count   = count($this->_headers);
        // LimitIterator allows us to always skip the headers
        parent::__construct(new LimitIterator($csv, 1));
    }
    // Returns a nice assoc. array
    public function current()
    {
        return array_combine($this->_headers, parent::current());
    }
    // Skip this line if it does not have $_count number of fields
    public function accept()
    {
        return $this->_count === count(parent::current());
    }
}

$csv = new MinimalistFilterIteratorVersionOfAnthonysAssociativeSplCsvFileObjectIterator('data.csv');
foreach ($csv as $row) {
    print_r($row);
}


When I first looked in the [fphp]fgetcsv[/fphp] manual I found this old project

http://code.google.com/p/parsecsv-for-php/

And did not really down the page any further, but revisiting it I just noticed that there are some refs to the SPL, but they just use fgetcsv rather than an SPLFileobject.

It would have been interesting if there were some test cases for all this stuff.

That parsecsv class seems to want to emulate really simple sql statements, and may have been born from frustrations with formatting and encoding - and just seemed overkill to me.

I thought I’d mention it in case anyone else was searching on this subject.

Whoa, 3 very different answers there, thanks people.

@Anthony I was hoping there was an SPL -based solution after seeing your pastebin-housed SPL experiment a few weeks ago.

The initial target block of code you set works fine and seems to handle empty values as expected.

As I have a series of operations to do, that target code will allow me to quickly extract the initial values from a large csv file, to create the smaller one.

I might as well explain the whole thing.

1 I wget a largish csv file nightly ( ~4k rows) and cache it.

2 I extract only those rows of interest to me, around 100 rows.

3 Some of those text fields will then be checked for “transformations” ie turning email into <a></a> links etc, squirt it into a template.

4 The result is then duly cached for 24 hrs.

So you see, between steps 2 and 3 it’ll make much more sense for me, or anyone else using the data to be able to reference the array-elements by name.

Regarding the filtering, see FilterIterator, use it to wrap AssociativeSplCsvFileObjectIterator and only return the rows meeting the criteria you set within FilterIterator::accept().

Other than that, AssociativeSplCsvFileObjectIterator still needs some refactoring but the premise is there.

So far, I have.


$csv = new SplFileObject('data.csv');
$csv->setFlags(SplFileObject::READ_CSV);
foreach(new AssociativeSplCsvFileObjectIterator($csv) as $line => $entry){
  foreach($entry as $key => $value){
    printf('(&#37;d) %s => %s<br />', $line, $key, $value);
  }
}

/*
  (0) name => C Stunt
  (0) address => Letsby Avenue
  (1) name => Ishood Coco
  (1) address => Arfway house
*/


class AssociativeSplCsvFileObjectIterator implements Iterator
{
  protected
    $csv      = null,
    $keys     = array(),
    $position = 0;
  
  public function __construct(SplFileObject $csv){
    $this->csv = $csv;
    $this->setKeys();
  }
  
  public function rewind(){
    $this->setKeys();
    $this->position = 0;
  }
  
  public function current(){
    return array_combine($this->keys, $this->csv->current());
  }
  
  public function key(){
    return $this->position;
  }
  
  public function next(){
    $this->csv->next();
    $this->position++;
  }
  
  function valid(){
    return $this->csv->valid();
  }
  
  protected function setKeys(){
    $this->csv->rewind();
    $this->keys = array_values($this->csv->current());
    $this->csv->next();
  }
}

I can’t say I’m happy with the Iterator yet though.

Right, I’m on a mission to SPL’ize this, for reference, I need to recreate the following.


$file = new SplFileObject('data.csv');
$file->setFlags(SplFileObject::READ_CSV);

$keys = null;

foreach($file as $line => $entry){
  if(null === $keys){
    $keys = array_values($entry);
    continue;
  }
  foreach($entry as $key => $value){
    printf('(%d) %s => %s<br />', $line, $keys[$key], $value);
  }
}

/*
  (1) name => C Stunt
  (1) address => Letsby Avenue
  (2) name => Ishood Coco
  (2) address => Arfway house
*/

I’d extend or implement the SPL Iterator interface, assign the keys on the first iteration and return the set key from Iterator::key() on subsequent requests.

You can the use this with the standard SPL File object (which reads CSV’s) and read the data as you go.

Very memory efficient.

$handle   = fopen("data.csv", "r");
$count    = 0;
$headers  = array();
$arr      = array();

while (($data = fgetcsv($handle, 1000, ",")) !== FALSE)
{
	if ($count == 0)
	{
		$headers = $data;
		
		continue;
	}
	
	for ($i = 0; $i < count($headers); ++$i)
	{
		$arr[$count][$headers[$i]] = $data[$i];
	}
	
	++$count;
}

?

$row = array_combine($headers, $data);

P.S. Unless you really know and trust your CSV file, it’s probably worth checking the length of the $data is the same as the number of headers before combining.