Troubleshooting LOAD DATA INFILE

Hi,

I post this to the Database forum as I suspect that it has something to do with MySQL or the SQL written?

I am using PDO as a driver and have been trying to troubleshoot why I can´t get “LOAD DATA INFILE” working.

I have paired it down into a very simple example and can even get that working. I try output errors using

echo "\
PDOStatement::errorCode(): ";
print $this->o_Db->errorCode();

which generates a rather generic

PDOStatement::errorCode(): 42000
Error 42000 seems most often to be associated with invalid SQL but I have been very careful with the paths and the sql syntax, plus I escape (using ticks) all db field names.

It is setup like this:


$csv_path_and_filename = '/var/www/eprs/libs/page_queries/test_data.csv';
$temp_table_name = $o_RandomString->getRandomString('__temp_'); // Generates a temporary table name like __temp_rjxrrhh

$sql=<<<EOL
         LOAD DATA INFILE "$csv_path_and_filename"
         REPLACE INTO TABLE $temp_table_name
         FIELDS TERMINATED BY ';'
         IGNORE LINES 1
         ( `date`, `name`)
EOL;
$result = $this->o_Db->exec($sql);
echo "\
PDOStatement::errorCode(): ";
print $o_Db->errorCode();

You may notice that I escaped the database field names using ticks as ‘date’ is a reserved word in SQL. Also I did try the SQL using OPTIONALLY ENCLOSED BY “\”" LINES TERMINATED BY “\\r\
" and ENCLOSED BY “\”” LINES TERMINATED BY "\
" as well as the absence of these lines as shown above.

The database create function:


function createTempTable($temp_table_name){ 
        $sql = "
            CREATE TABLE $temp_table_name (
              date DATE
              , name VARCHAR(150)
             ) TYPE=innodb;
      $stmt = prepare($sql);
      $results = $o_Db->exec(($sql);
      return $results;
}

My CSV contains two data rows plus a ‘title’ row

Date;Name
05/05/07;Bill
05/05/09;Mike

I have verified that the ‘/var/www/eprs/libs/page_queries/test_data.csv’ file is located in that specified path on the server. I have also set the file and folder permissions to 0755 so that permissions should not be causing the problem

It is a MySQL 5.0.51a-24 version and you can see in the create statement that it is a INNODB table. The server is Centos Linux

Do you have any suggestions on what else to try?

I would appreciate your help on this.

Regards,
Steve

i think you will want to debug the LOAD DATA statement in isolation first, before placing it inside your php code

you will want to ensure that all the options and features, such as LINES TERMINATED BY, are properly set up

one thing i noticed is that you might have trouble with your date values if they are not in year-month-day sequence

LOAD DATA INFILE should have IGNORE 1 LINES not IGNORE LINES 1.

When inside Excel the date was set as 2012-11-31 but on save as it must of switched this; thank you for noticing :slight_smile: I can force it back using date() in php. I did try to run this using mysql workbench and via the command line; however I am not sure if I was specifying the paths correctly in each of these as I was using and absolute path?

Thanks!

I’ll post all working code once I get there.

Regards,
Steve

Ok,

Both suggestions were needed to get the files importing. Interestingly in MySQL LOAD DATA INFILE documentation they sow examples using ‘IGNORE LINES 1’ but it does not work when I use that. The dates however, are still not working correctly.

Rudy, I did as you suggested and isolated the SQL out of PHP, here is what it is now:

  LOAD DATA INFILE "/var/www/ex/libs/page_queries/test_data2.csv"
  REPLACE INTO TABLE `__temp_ovgoptjint`
  FIELDS TERMINATED BY ';'
  IGNORE 1 LINES
  (@date, `name`)
  SET Date = date_format(STR_TO_DATE(@date, '%b %m, %Y'), '%Y-%m-%-d');

The actual dates that need to be working are formatted as in the CSV file contents below:


Date;Name
Sep 14, 2009;Bill
Sep 18, 2009;Mike

When I do this it inserts the dates correctly:

  LOAD DATA INFILE "/var/www/ex/libs/page_queries/test_data2.csv"
  REPLACE INTO TABLE `__temp_ovgoptjint`
  FIELDS TERMINATED BY ';'
  IGNORE 1 LINES
  (`date`, `name`)

While having this data


Date;Name
2009-09-14;Bill
2009-09-18;Mike

When using the SET command my date column contains only NULL


SET Date = date_format(STR_TO_DATE(@date, '%b %m, %Y'), '%Y-%m-%-d');

The way I read the line above is:

  1. We are setting the Date column with the variable @date as the variable value being operated upon
  2. The str_to_date() method should use the value of the date variable that is formatted the same way as the CSV files 'abbreviated month + space + day (as digits) + full month (as digits)
  3. the date_format() method should return a valid date value formatted in 2012-05-30.

I also tried just str_to_date() like:


SET Date = STR_TO_DATE(@date, %Y-%m-%-d');

This, however would not allow the statement to execute.

Any further ideas on how I might get this date formatted correctly?

Thanks,
Steve

Are you sure about this:

SET Date = STR_TO_DATE(@date, %Y-%m-%-d’);

You have a - before d and you left out the starting ’ for the format string.

SET Date = STR_TO_DATE(@date, ‘%Y-%m-%d’);

Yes, sorry that was a typo… I know longer had it in the code to copy so I re-wrote it :frowning: but I am sure that it does not work.

I can’t mess around any longer with this so I decided to set the DATE field to VARCHAR instead. Before you have a Kiniption Fit, I’m importing this data into a temporary db. When I need to insert the DATE into permanent tables I will process the string date using PHP’s date formatting and then insert it into a date field. This has the added benefit; if a CSV file that is being imported contains different date formats it would be very difficult to get the SQL to adapt, so In this case I can have more control on how dates are processed and still import a lot of records very quickly.

Regards,
Steve

that’s an awesome post, steve

that’s a strategy i’ve advocated before, a “landing” table for uploads using all VARCHARs as necessary, and reformating to permanent tables once the data is completely analyzed

it’s step 2 in ETL (extract, transform, load), assuming that “load” means loading cleansed data into permanent tables

no reason why the transform step shouldn’t happen within the database

Thanks Rudy :slight_smile:

I can see the value of this strategy, although this is the first time I’ve used such a thing.

I am however a little disappointed that I couldn’t get it working with the date filtering in the SQL, but the shear nature that I have to be so specific means that it can break easily if a different date format is thrown at it. So, your advocation for this strategy makes a lot of sense… just wish I had thought of it earlier :rolleyes:

Hi,

not sure why this is not working for you. I have run a test here and I post the code below:

$ cat /tmp/demo.txt
Date;Name
2009-09-14;Bill
2009-09-18;Mike

$ cat demo.sql
LOAD DATA INFILE “/tmp/demo.txt”
REPLACE INTO TABLE datetest
FIELDS TERMINATED BY ‘;’
IGNORE 1 LINES
(@date, name)
SET Date = STR_TO_DATE(@date, ‘%Y-%m-%d’);

mysql> create table datetest (date date, name varchar(45));
Query OK, 0 rows affected (0.03 sec)

mysql> source demo.sql
Query OK, 2 rows affected (0.00 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0

mysql> select * from datetest;
±-----------±-----+
| date | name |
±-----------±-----+
| 2009-09-14 | Bill |
| 2009-09-18 | Mike |
±-----------±-----+
2 rows in set (0.00 sec)

Hi,

Try this CSV instead. I could import the csv/demo.txt you have it is when I tried this:



Date;Name
Sep 14, 2009;Bill 
[COLOR=#464646][/COLOR]Sep 18, 2009;Mike[COLOR=#464646]
[/COLOR]

that it didn’t work. The 2009-09-14 in your code is already in a format MySQL will accept so there is no need to convert it.

Thanks for trying though :slight_smile:

Regards,
Steve

Hi Steve,

this should now work. I thought you were looking at the 2009-09-14 date format.

$ cat /tmp/demo.txt
Date;Name
Sep 14, 2009;Bill
Sep 18, 2009;Mike

$ cat demo.sql
LOAD DATA INFILE “/tmp/demo.txt”
REPLACE INTO TABLE datetest
FIELDS TERMINATED BY ‘;’
IGNORE 1 LINES
(@date, name)
SET Date = STR_TO_DATE(@date, ‘%b %d, %Y’);

mysql> truncate table datetest;
Query OK, 0 rows affected (0.00 sec)

mysql> source demo.sql
Query OK, 2 rows affected (0.00 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0

mysql> select * from datetest;
±-----------±------+
| date | name |
±-----------±------+
| 2009-09-14 | Bill |
| 2009-09-18 | Mike |
±-----------±------+
2 rows in set (0.00 sec)