Using PHP for parsing

Dear all,

I have a text file in notepad which is log downloaded from a mobile application. There are two important data I wish to extract and separate from the log files. Could someone please give me a clue on how php can help parse this log files so i can separate the various data into a table? I have attached a sample file for your look up.
The two data that need separated is the $GPGGA and $GPRMC. The content of the file in reality is about 300kb but had to reduce it. So it is explanaratory purposes. I have done similar thing in excel and VB script but excel is slow and not able to take data of more than 30 thousand rows.

Thanks for your assistance in advance.

Paul_paisley

After extracting, I would like to make the comparison of the rows of data extracted between files. I do not know the best and easiest way to handle this matter I mean automating the whole process.

Up to this point - no problem.

You can upload as many files as you like (within fixed limitations of server disk space etc) using a html form to input the files you want to upload.

You can then extract lines of data from the uploaded files as described earlier in this thread.

Now I am getting a little confused about the above quote.

It would help if you could provide a more detailed description of how you want to compare “rows of data extracted between files”.

Unless you want to do something really unusual I’m pretty sure it could be done in PHP.

After the data is compared/filtered, it should be straight forward to put it into a database.

Just a quick pointer (no pun intended)


<?php
error_reporting(-1);
ini_set('display_errors', true);

$file = fopen('sample.txt', 'r');

for($line = 1; false === feof($file); $line++){
  
  $entry = fgets($file);
  
  if(in_array(substr($entry, 0, 6), array('$GPGGA', '$GPRMC'))){
    #the entry begins with $GPGGA or $GPRMC
  }
  
}

fclose($file);
?>

Below is an example of how the data is formated:

$GPRMC,040142,A,6009.3358,N,02444.6373,E,1.62,325,190810,
$GPRMC,040154,A,6009.3333,N,02444.6380,E,0.00,0,190810,
$GPRMC,040206,A,6009.3166,N,02444.6459,E,0.00,0,190810,

The entries in the raw text file begining with $GPRMC and $GPGA are the ones am interested in i.e the highligted ones. So I want to separate the $GPRMC from the $GPGGA enteries from the raw text file. If I can separate them, then I can move on to compare the two entries which would now be in separate text files with same entries in another text file but whose data collection mode is different. The relationship is that the format given is an example of one of the entries collected with one data collection mode while the raw text file contain two entries collected with 2 data collection mode. If you don’t understand me please ask me to clarify more.

Thanks for you assistance.

paul

…and it’s good to have you back. :wink:

An alternative to Anthony’s quick sample, which abstracts away manually opening and looping over the file (checking for EOF) and handily separates out the CSV fields into an array for each line, might look something like the following.

<?php

$file = new SplFileObject('sample.txt');

// Read the file as a CSV-formatted file
$file->setFlags(SplFileObject::READ_CSV);

// Scan over each line for the ones that we want
foreach ($file as $line) {
    if (in_array($line[0], array('$GPGGA', '$GPRMC'))) {
        // Do whatever you want here
        var_dump($line);
    }
}

P.S. It’s nice to be back, I missed Sitepoint. (:

Many thanks and sorry for late reply. It is the same issue with the other thread. I have been struggling with this issue and am glad you have come to my aid.

[COLOR=black]

[/COLOR]

Yes - I think we are essentially doing the same thing in 2 different ways. The demo code I posted is the way I am used to handling files.

I am looping through my $inFiles array and for each input file I am creating an output file. eg. for sample_1.txt I am creating an output file called sample_1_out.txt.

Then I read one line at a time from sampl_1.txt and tokenizing it using explode(). If the first element in the $lineTokens array contains any of the elements in the $searchCriteria array I then write the whole current line read from sample_1.txt to sample_1_out.txt

The foreach loop repeats the above for each file in the $inFiles array.

[COLOR=#0000cc][COLOR=#0000cc][COLOR=#0000cc][COLOR=#0000cc][COLOR=black]

[/COLOR][/COLOR][/COLOR][/COLOR][/COLOR]

Your code should be modifiable to output the same thing as my code does using your OOP structure. Essentially you just need to wrap your code to handle 1 file with a foreach loop to handle mutiple files as in my demo.


Thanks Kalon. I would try the code and get back to you. Does this line fwrite($outFile,$line); mean I could write the output straight into another text file? Just to know, is there no way I can modify the previous code i showed in my last reply? maybe just add the lines that can handle the multiple text files? Hope to hear from you soon.

Regards
Paul

Sort of ??

I’m not sure what you mean here though -

…how to collect these uploaded files and pass them to my php script without storing them to the server

I assume you mean the source files will at some stage be uploaded onto the server (otherwise I don’t know how you can extract data from them) and then they will be automatically deleted after the data has been extracted and written to output files.

If that is what you mean then no problem - the source files can be deleted using unlink().

If the source files will have the same name every time you run the script, you can hard-code the source file names into the extraction script as per earlier demo or you can use a combination of opendir() and readdir() to get the source file names into the extraction script.

hi there,
The quote was not intentional. It was a mistake please ignore it. I do reply with quotes to keep heads fresh about the last post.

I would tell you about the data am comparing shortly but now I do not want to upload the files I would be extracting data from only to store them in the server as the files are originating from me. I want to be able to upload the file and do the extraction straight. I need an idea of how to collect these uploaded files and pass them to my php script without storing them to the server. What I would want in the server is only the extracted files which the php script would write to text files created in the server. Is this any clearer ?

Thanks guys am busy trying it out and having some errors. I would get back soon. Hope it works out.

Hi there, the result is stunning. There are no errors now but the output is blank. What could be wrong is now my wonder. I dont know what you think. what do you think is wrong. Kalon?

Hello Guys,

Now I have a php script that is able to accept multiple text files, extract what I specify and write all of it in one text file as only $GPGGA or $GPRMC. I now need some suggestions from you all on my next step which is this.

Since you have followed me closely, I hope you would understand me and I beg you reason with me.

The background of my question is this: I have a Wampserver dev. environment which has a mysql database that will hold the data extracted from the text files. This data extraction from the text files seem to be an intermediary process since it may not be wise to store raw data or mixed data in the database. The data extracted is still in incomplete as I need to compare certain rows in one text file with another row in another textfile
It is my wish to automate this process so that the system can enable a user to upload files and using the php extraction script, to extract the strings of lines and write to a file. I am thinking about desigining a web interface to upload the files. Now, when this files are uploaded I aim to use the php script to extract what i want from the files. After extracting, I would like to make the comparison of the rows of data extracted between files. I do not know the best and easiest way to handle this matter I mean automating the whole process. This is where I need some advice. I was handling this whole thing in excel but excel is slow and performs bad with textfile of more 30k rows and still requiring some manual process intermittently. Is this a data mining problem that requires data mining tools? or can I achieve what I want with my wampserver dev. environment. I need your expert opinion. Please I would clarify any of your misunderstanding if you please get back to me.

Thank you very much.
Best Regards
Paul

Sorry about that Anthony. I moving on gradually and have passed the stage of handling multiple files with Kalon’s suggestions. Thank you Tony

Hi Paul.

What code are you using now? Please can make sure you pop it in

 tags too, it hurts my eyes! :)

sorry Kalon i was silly. i have the result in the file created. You are great. Many thanks. let me give you break for now. I remain greatful to this forum
Regards

Paul

my mistake - sorry :frowning:

 
$outFile = fopen(dirname($inFileName).basename($inFiles,'.txt').'_out.txt',"w");

should be

 
$outFile = fopen(dirname($inFileName).basename([COLOR=red]$inFileName[/COLOR],'.txt').'_out.txt',"w");

Hopefully there are no other errors.

As I mentioned before, I haven’t tested the code.

Hello, i tried your suggestion and this line:
$outFile = fopen(dirname($inFileName).basename($inFiles,‘.txt’).‘_out.txt’,“w”);

gave the following error:
Warning: basename() expects parameter 1 to be string, resource given in C:\wamp\www\saterisk\index.php on line 14

Warning: basename() expects parameter 1 to be string, resource given in C:\wamp\www\saterisk\index.php on line 14

I am trying to figure it out. Any suggestions? Thanks.

Paul

If you want to process multiple input files and be flexible with your search criteria and not be limited to the first 6 chars. of each line, then this is one way of doing it.

I haven’t tested the code so hopefully there are no bugs, but you should be able to see the jist of what I am trying to do.

 
<?php
$searchCriteria = array('$GPGGA','$GPRMC');
$inFiles = array('sample_1.txt','sample_2.txt','sample_3.txt','sample_4.txt',);

//process each file in $inFiles
foreach($inFiles as $inFileName) {
 //open the input and output files
 $inFile = fopen(inFileName,"r");
 $outFile = fopen(dirname($inFileName).basename($inFile,'.txt').'_out.txt',"w");
 //read the inFile line by line and output the line if searchCriteria is met
 while(!feof($inFile)) {
  $line = fgets($inFile);
     $lineTokens = explode(',',$line);
     if(in_array($lineTokens[0],$searchCriteria)) {
      fwrite($outFile,$line);
     }
 }

 //close the 2 files
 fclose($inFile);
 fclose($outFile);
}
?>