Parsing tab delimited txt file PHP fgetcsv hangs / not ending / exiting function

Parsing tab delimited txt file PHP fgetcsv hangs / not ending / exiting function - php

This function is pretty generic its intended to parse a txt file that is delimited with a tab, the files im trying to parse is geonames database, it tops out at 1953146 results every time and from which point nothing happens at all, no more querys and doesnt exit, counting the lines i can see there is 8,000,000 lines in the file so im guessing it is stalled, errors are enabled and there is no error returned php_memory is set to 2048M execution time is set to unlimited.
<?php
function table_populate($table,$file,$columns){
$handle = fopen($file, "r");
$lines = count(explode("\n",file_get_contents($file)));
$i = 0;
while (($line = fgetcsv($handle, 10000, "\t")) !== false && $i < $lines) {
if(preg_match('[#]',$line['0'])){
// do nothing row is commented out
}else{
$row = '';
$comma = '';
for ($z = 0; $z < count($line); $z++) {
$row .= $comma."'".$line[$z]."'";
$comma = ', ';
}
$sql = "INSERT INTO ".$table." (".$columns.") VALUES (".$row.")";
}
$i++;
}
fclose($handle);
return;
}
?>

Related

PHP and csv report calculations

I have a csv file that I would like to generate a summary report from. The csv looks like this :
The csv has in each row an activity and the coresponding time when it starts.
The summary I'm trying to generate has to look like this :
Basically I need to show each activity and the times when it starts and it ends
I did as following in PHP, I'm almost done but the result I get is not really what I want :
$csvFileName = "The csv path";
$report = array();
$file = fopen($csvFileName, "r");
while (($data = fgetcsv($file, 8000, "\n")) !== FALSE) {
$num = count($data);
for ($c = 0; $c < $num; $c++) {
$t = explode(',', $data[$c]);
$time = $t[0];
$activity = $t[1];
$report[] = array($activity, $time);
}
}
fclose($file);
//I'm reading the whole file content and copying it into an array.
$summaryReport = array();
$j = 1;
for($i=0; $i<sizeof($report); $i++){
if($report[$i][0] !== $report[$j][0]){
array_push($summaryReport,array($report[$i][0],$report[$i][1],$report[$j][1]));
}
$j++;
}
echo json_encode($summaryReport);
The output json looks like this :
[["Start","10:42","10:59"],["Driving route","11:10","11:50"],["Lunch-Rest Break","11:50","11:57"],["Driving route","11:57","12:03"],["Break","12:11","12:41"],["Driving route","13:05","14:09"],["Waiting","14:14","14:28"]]
What I'm looking for as result is something like that:
[["Start","10:42","10:59"],["Driving route","10:59","11:50"],["Lunch-Rest Break","11:50","11:57"],["Driving route","11:57","12:03"],["Break","12:03","12:41"],["Driving route","12:41","14:09"],["Waiting","14:09","14:28"],["End","14:28"]]
my coding logic is not really working well, does anyone see how can I do a simple loop to do what I'm looking for?
Thank you in advance.

The result can be achieved much easier. Look at my code, I got rid of all your inner loops, fixed syntax errors and there is no need to store the whole csv file in memory:
PHP code
<?php
$csvFileName = "./test.csv";
$file = fopen($csvFileName, "r");
$summaryReport = array();
$i = 0;
$previous_name = null;
while ($data = fgetcsv($file, 8000)) {
if ($previous_name !== $data[1])
{
$summaryReport[$i] = array($data[1], $data[0]);
if ($i > 0)
{
$summaryReport[$i-1][2] = $data[0];
}
$previous_name = $data[1];
++$i;
}
}
fclose($file);
echo json_encode($summaryReport);
Test csv file
10:41,Start
10:59,Driving
11:29,Driving
11:11,End
Output
[["Start","10:41","10:59"],["Driving","10:59","11:11"],["End","11:11"]]

Read file lines backwards (fgets) with php

I have a txt file that I want to read backwards, currently I'm using this:
$fh = fopen('myfile.txt','r');
while ($line = fgets($fh)) {
echo $line."<br />";
}
This outputs all the lines in my file.
I want to read the lines from bottom to top.
Is there a way to do it?

First way:
$file = file("test.txt");
$file = array_reverse($file);
foreach($file as $f){
echo $f."<br />";
}
Second Way (a):
To completely reverse a file:
$fl = fopen("\some_file.txt", "r");
for($x_pos = 0, $output = ''; fseek($fl, $x_pos, SEEK_END) !== -1; $x_pos--) {
$output .= fgetc($fl);
}
fclose($fl);
print_r($output);
Second Way (b):
Of course, you wanted line-by-line reversal...
$fl = fopen("\some_file.txt", "r");
for($x_pos = 0, $ln = 0, $output = array(); fseek($fl, $x_pos, SEEK_END) !== -1; $x_pos--) {
$char = fgetc($fl);
if ($char === "\n") {
// analyse completed line $output[$ln] if need be
$ln++;
continue;
}
$output[$ln] = $char . ((array_key_exists($ln, $output)) ? $output[$ln] : '');
}
fclose($fl);
print_r($output);

Try something simpler like this..
print_r(array_reverse(file('myfile.txt')));

Here is my solution for just printing the file backwards. It is quite memory-friendly. And seems more readable (IMO [=in my opinion]).
It goes through the file backwards, count the characters till start of a line or start of the file and then reads and prints that amount of characters as a line, then moves cursor back and reads another line like that...
if( $v = #fopen("PATH_TO_YOUR_FILE", 'r') ){ //open the file
fseek($v, 0, SEEK_END); //move cursor to the end of the file
/* help functions: */
//moves cursor one step back if can - returns true, if can't - returns false
function moveOneStepBack( &$f ){
if( ftell($f) > 0 ){ fseek($f, -1, SEEK_CUR); return true; }
else return false;
}
//reads $length chars but moves cursor back where it was before reading
function readNotSeek( &$f, $length ){
$r = fread($f, $length);
fseek($f, -$length, SEEK_CUR);
return $r;
}
/* THE READING+PRINTING ITSELF: */
while( ftell($v) > 0 ){ //while there is at least 1 character to read
$newLine = false;
$charCounter = 0;
//line counting
while( !$newLine && moveOneStepBack( $v ) ){ //not start of a line / the file
if( readNotSeek($v, 1) == "\n" ) $newLine = true;
$charCounter++;
}
//line reading / printing
if( $charCounter>1 ){ //if there was anything on the line
if( !$newLine ) echo "\n"; //prints missing "\n" before last *printed* line
echo readNotSeek( $v, $charCounter ); //prints current line
}
}
fclose( $v ); //close the file, because we are well-behaved
}
Of course replace PATH_TO_YOUR_FILE with your own path to your file, # is used when opening the file, because when the file is not found or can't be opened - warning is raised - if you want to display this warning - just remove the error surpressor '#'.

If the file is not so big you can use file():
$lines = file($file);
for($i = count($lines) -1; $i >= 0; $i--){
echo $lines[$i] . '<br/>';
}
However, this requires the whole file to be in memory, that's why it is not suited for really large files.

Here's my simple solution without messing up anything or adding more complex code
$fh = fopen('myfile.txt','r');
while ($line = fgets($fh)) {
$result = $line . "<br>" . $result;
}
echo $result // or return $result if you are using it as a function

remove All lines except first 20 using php

how to remove every line except the first 20 using php from a text file?

If loading the entire file in memory is feasible you can do:
// read the file in an array.
$file = file($filename);
// slice first 20 elements.
$file = array_slice($file,0,20);
// write back to file after joining.
file_put_contents($filename,implode("",$file));
A better solution would be to use the function ftruncate which takes the file handle and the new size of the file in bytes as follows:
// open the file in read-write mode.
$handle = fopen($filename, 'r+');
if(!$handle) {
// die here.
}
// new length of the file.
$length = 0;
// line count.
$count = 0;
// read line by line.
while (($buffer = fgets($handle)) !== false) {
// increment line count.
++$count;
// if count exceeds limit..break.
if($count > 20) {
break;
}
// add the current line length to final length.
$length += strlen($buffer);
}
// truncate the file to new file length.
ftruncate($handle, $length);
// close the file.
fclose($handle);

For a memory efficient solution you can use
$file = new SplFileObject('/path/to/file.txt', 'a+');
$file->seek(19); // zero-based, hence 19 is line 20
$file->ftruncate($file->ftell());

Apologies, mis-read the question...
$filename = "blah.txt";
$lines = file($filename);
$data = "";
for ($i = 0; $i < 20; $i++) {
$data .= $lines[$i] . PHP_EOL;
}
file_put_contents($filename, $data);

Something like:
$lines_array = file("yourFile.txt");
$new_output = "";
for ($i=0; $i<20; $i++){
$new_output .= $lines_array[$i];
}
file_put_contents("yourFile.txt", $new_output);

This should work as well without huge memory usage
$result = '';
$file = fopen('/path/to/file.txt', 'r');
for ($i = 0; $i < 20; $i++)
{
$result .= fgets($file);
}
fclose($file);
file_put_contents('/path/to/file.txt', $result);

How to detect a delimiter in a string in PHP?

I am curious if you have a string how would you detect the delimiter?
We know php can split a string up with explode() which requires a delimiter parameter.
But what about a method to detect the delimiter before sending it to explode function?
Right now I am just outputting the string to the user and they enter the delimiter. That's fine -- but I am looking for the application to pattern recognize for me.
Should I look to regular expressions for this type of pattern recognition in a string?
EDIT: I have failed to initially specify that there is a likely expected set of delimiters. Any delimiter that is probably used in a CSV. So technically anyone could use any character to delimit a CSV file but it is more probable to use one of the following characters: comma, semicolon, vertical bar and a space.
EDIT 2: Here is the workable solution I came up with for a "determined delimiter".
$get_images = "86236058.jpg 86236134.jpg 86236134.jpg";
//Detection of delimiter of image filenames.
$probable_delimiters = array(",", " ", "|", ";");
$delimiter_count_array = array();
foreach ($probable_delimiters as $probable_delimiter) {
$probable_delimiter_count = substr_count($get_images, $probable_delimiter);
$delimiter_count_array[$probable_delimiter] = $probable_delimiter_count;
}
$max_value = max($delimiter_count_array);
$determined_delimiter_array = array_keys($delimiter_count_array, max($delimiter_count_array));
while( $element = each( $determined_delimiter_array ) ){
$determined_delimiter_count = $element['key'];
$determined_delimiter = $element['value'];
}
$images = explode("{$determined_delimiter}", $get_images);

Determine which delimiters you consider probable (like ,, ; and |) and for each search how often they occur in the string (substr_count). Then choose the one with most occurrences as the delimiter and explode.
Even though that might not be fail-safe it should work in most cases ;)

I would say this works 99.99% of the cases :)
The basic idea is, that number of valid delimiters should be the same line by line.
This script calculates delimiter count discrepancies between all lines.
Less discrepancy means more likely valid delimiter.
Putting it all together this function read rows and return it back as an array:
function readCSV($fileName)
{
//detect these delimeters
$delA = array(";", ",", "|", "\t");
$linesA = array();
$resultA = array();
$maxLines = 20; //maximum lines to parse for detection, this can be higher for more precision
$lines = count(file($fileName));
if ($lines < $maxLines) {//if lines are less than the given maximum
$maxLines = $lines;
}
//load lines
foreach ($delA as $key => $del) {
$rowNum = 0;
if (($handle = fopen($fileName, "r")) !== false) {
$linesA[$key] = array();
while ((($data = fgetcsv($handle, 1000, $del)) !== false) && ($rowNum < $maxLines)) {
$linesA[$key][] = count($data);
$rowNum++;
}
fclose($handle);
}
}
//count rows delimiter number discrepancy from each other
foreach ($delA as $key => $del) {
echo 'try for key=' . $key . ' delimeter=' . $del;
$discr = 0;
foreach ($linesA[$key] as $actNum) {
if ($actNum == 1) {
$resultA[$key] = 65535; //there is only one column with this delimeter in this line, so this is not our delimiter, set this discrepancy to high
break;
}
foreach ($linesA[$key] as $actNum2) {
$discr += abs($actNum - $actNum2);
}
//if its the real delimeter this result should the nearest to 0
//because in the ideal (errorless) case all lines have same column number
$resultA[$key] = $discr;
}
}
var_dump($resultA);
//select the discrepancy nearest to 0, this would be our delimiter
$delRes = 65535;
foreach ($resultA as $key => $res) {
if ($res < $delRes) {
$delRes = $res;
$delKey = $key;
}
}
$delimeter = $delA[$delKey];
echo '$delimeter=' . $delimeter;
//get rows
$row = 0;
$rowsA = array();
if (($handle = fopen($fileName, "r")) !== false) {
while (($data = fgetcsv($handle, 1000, $delimeter)) !== false) {
$rowsA[$row] = Array();
$num = count($data);
for ($c = 0; $c < $num; $c++) {
$rowsA[$row][] = trim($data[$c]);
}
$row++;
}
fclose($handle);
}
return $rowsA;
}

I have the same problem, I am dealing with a lot of CSV's from various databases, which various people extract to CSV in various ways, sometimes different each time for the same dataset ... Have simply implemented a function like this in my convert base class
protected function detectDelimiter() {
$handle = #fopen($this->CSVFile, "r");
if ($handle) {
$line=fgets($handle, 4096);
fclose($handle);
$test=explode(',', $line);
if (count($test)>1) return ',';
$test=explode(';', $line);
if (count($test)>1) return ';';
//.. and so on
}
//return default delimiter
return $this->delimiter;
}

I made something like this:
$line = fgetcsv($handle, 1000, "|");
if (isset($line[1]))
{
echo "delimiter is: |";
$delimiter="|";
}
else
{
$line1 = fgetcsv($handle, 1000, ";");
if (isset($line1[1]))
{
echo "delimiter is: ;";
$delimiter=";";
}
else
{
echo "delimiter is: ,";
$delimiter=",";
}
}
This simply checks whether there is a second column after a line is read.

I am having the same issue. My system will recieve CSV files from the client but it could use ";", "," or " " as delimiter and I wnat to improve the system so the client dont have to know which is (They never do).
I search and found this library:
https://github.com/parsecsv/parsecsv-for-php
Very good and easy to use.

Using file() incrementally?

I'm not sure if this is possible, I've been googling for a solution... But, essentially, I have a very large file, the lines of which I want to store in an array. Thus, I'm using file(), but is there a way to do that in batches? So that every,say, 100 lines it creates, it "pauses"?
I think there's likely to be something I can do with a foreach loop or something, but I'm not sure that I'm thinking about it the right way...
Like
$i=0;
$j=0;
$throttle=100;
foreach($files as $k => $v) {
if($i < $j+$throttle && $i > $j) {
$lines[] = file($v);
//Do some other stuff, like importing into a db
}
$i++;
$j++;
}
But, I think that won't really work because $i & $j will always be equal... Anyway, feeling muddled... Can someone help me think a lil' clearer?

Read the file in line by line for however many lines you need, appending each line to an array. When the array gets to the desired length, process it, and empty the array. E.g.:
$handle = #fopen("/tmp/inputfile.txt", "r");
$throttle = 100;
$data = array();
if ($handle) {
while(!feof($handle)) {
$buffer = fgets($handle, 4096);
$data[] = $buffer;
if(count($data) == $throttle) {
doSomething($data);
$data = array();
}
}
fclose($handle);
}

You never incremented $i or $j... What you can do, is something like:
$data = array();
$chunk = 100;
$f = fopen($file, 'r');
while (!feof($f)) {
for ($i = 0; $i < $chunk; $i++) {
$tmp = fgets($f);
if ($tmp !== false) {
$data[] = $tmp;
} else {
//No more data, break out of the inner loop
break;
}
}
//Process your data
$data = array();
}
fclose($f);

If by "pause", you mean that you really want to pause execution of your script, use sleep or some of its variants: http://php.net/manual/en/function.sleep.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing tab delimited txt file PHP fgetcsv hangs / not ending / exiting function - php

Related

PHP and csv report calculations

Read file lines backwards (fgets) with php

remove All lines except first 20 using php

How to detect a delimiter in a string in PHP?

Using file() incrementally?

Categories

Resources