Comparing two CSV files in PHP out of memory - php

I am comparing two CSV files in PHP. I am only interested in knowing that the column names are the same and not interested in the data (at least for now).
This is a generic script that handles any CSV file I choose to upload. The file I am uploading is compared against a set sample file (so I can only upload the file if a sample has been provided). This sample file only contains a few lines of data so is not large by any stretch. The file I am uploading can range from 500kb to about 10mb (the one I am uploading is 7,827,180 bytes).
Everything has been fine until today when I started getting this message:
"Fatal error: Out of memory (allocated 524288) (tried to allocate 7835372 bytes) in C:\xampp\htdocs\errcoaching\app\handlers\file_upload_parse.php on line 8" (line 8 refers to line 2 in my sample (the first line inside my function).
function check_csv($f_a, $f_b){
$csv_upload = array_map("str_getcsv", file($f_a,FILE_SKIP_EMPTY_LINES))[0]; // This is line 8
$csv_sample = array_map("str_getcsv", file($f_b,FILE_SKIP_EMPTY_LINES))[0];
$match = 'true';
foreach ($csv_sample as $key => $value) {
if($value != $csv_upload[$key]){
$match = 'false';
break 1;
}
}
return $match;
}

You are reading the whole file into an array, then disgarding all but the 1st line.
Instead you can use http://php.net/manual/en/function.fgetcsv.php to read only the 1st line:
$handle_a = fopen($f_a, "r");
$handle_b = fopen($f_b, "r");
$csv_upload = fgetcsv($handle_a);
$csv_sample = fgetcsv($handle_b);
//the rest of your code
fclose($handle_a);
fclose($handle_b);
You should probably also handle file read errors

Related

How to read a range of rows from CSV file to JSON array using PHP to handle large CSV file?

The target is how to read a range of rows/lines from large CSV file into a JSON array in order to handle large files and read the data in pagination method, each page fetches a range of lines ( e.x. page number 1 fetch from line 1 to 10, page number 2 fetch from line 11 to line 20, and so and ).
the below PHP script read from the being CSV file to the desired line ($desired_line), My question is how we can determine the starting line to read from a specific line ($starting_line)
<?php
// php function to convert csv to json format
function csvToJson($fname, $starting_line, $desired_line) {
// open csv file
if (!($fp = fopen($fname, 'r'))) {
die("Can't open file...");
}
//read csv headers
$key = fgetcsv($fp,"1024","\t");
$line_counter = 0;
// parse csv rows into array
$json = array();
while (($row = fgetcsv($fp,"1024","\t")) && ($line_counter < $desired_line)) {
$json[] = array_combine($key, $row);
$line_counter++;
}
// release file handle
fclose($fp);
// encode array to json
return json_encode($json);
}
// Define the path to CSV file
$csv = 'file.csv';
print_r(csvToJson($csv, 20, 30));
?>
You should use functions like:
fgets() to read the file line by line
fseek() to move to the position of the last fgets() of the chunk
ftell() to read the position for fseek()
Something like this (it's only a schema):
<?php
...
$line_counter = 0;
$last_pos = ...
$fseek($fp,$last_pos);
while($line = fgets($fp)){ // read a line of the file
$line_counter++;
(...) // parse line of csv here
if($line_counter == 100){
$lastpos = ftell($fp);
(...) // save the $lastpos for next reading cycle
break;
}
}
...
?>
You can also skip the fseek() and ftell() part and just count the lines every time from the beginning, but that will generally have to go through the whole file from the beginning till the desired lines.

Read large file in php and navigate line by line

I am developing a log file viewer in php that should read 10 lines from the file (say 2 GB ) and when user clicks next then the consequent 10 lines has to be read.
when back button is pressed the last 10 lines has to be printed.
As of now I have implemented file read using fgets (due to size of file) and I trying to figure out how to seek the next 10 and previous 10 lines.
if($handle)
{
$cnt=1;
while(($buffer=fgets($handle))!==false and $cnt<=10) {
echo $buffer;
$cnt++;
}
if(feof($handle)) {
echo "error";
}
}
The SplFileObject class in PHP does what you want to do. See:
http://php.net/manual/en/splfileobject.seek.php
Example code:
<?php
// Set $lineNumber to the line that you want to start at
// Remember that the first line in the file is line 0
$lineNumber = 43;
// This sets how many lines you want to grab
$lineCount = 10;
// Open the file
$file = new SplFileObject("logfile.log");
// This seeks to the line that you want to start at
$file->seek($lineNumber);
for($currentLine=0; $currentLine < $lineCount; $currentLine++) {
echo $file->current();
$file->next();
}
?>

filesize() always reads 0 bytes even though file size isn't 0 bytes

I wrote some code below, at the moment I'm testing so there's no database queries in the code.
The code below where it says if(filesize($filename) != 0) always goes to else even though the file is not 0 bytes and has 16 bytes of data in there. I am getting nowhere, it just always seems to think file is 0 bytes.
I think it's easier to show my code (could be other errors in there but I'm checking each error as I go along, dealing with them one by one). I get no PHP errors or anything.
$filename = 'memberlist.txt';
$file_directory = dirname($filename);
$fopen = fopen($filename, 'w+');
// check is file exists and is writable
if(file_exists($filename) && is_writable($file_directory)){
// clear statcache else filesize could be incorrect
clearstatcache();
// for testing, shows 0 bytes even though file is 16 bytes
// file has inside without quotes: '1487071595 ; 582'
echo "The file size is actually ".filesize($filename)." bytes.\n";
// check if file contains any data, also tried !==
// always goes to else even though not 0 bytes in size
if(filesize($filename) != 0){
// read file into an array
$fread = file($filename);
// get current time
$current_time = time();
foreach($fread as $read){
$var = explode(';', $read);
$oldtime = $var[0];
$member_count = $var[1];
}
if($current_time - $oldtime >= 86400){
// 24 hours or more so we query db and write new member count to file
echo 'more than 24 hours has passed'; // for testing
} else {
// less than 24 hours so don't query db just read member count from file
echo 'less than 24 hours has passed'; // for testing
}
} else { // WE ALWAYS END UP HERE
// else file is empty so we add data
$current_time = time().' ; ';
$member_count = 582; // this value will come from a database
fwrite($fopen, $current_time.$member_count);
fclose($fopen);
//echo "The file is empty so write new data to file. File size is actually ".filesize($filename)." bytes.\n";
}
} else {
// file either does not exist or cant be written to
echo 'file does not exist or is not writeable'; // for testing
}
Basically the code will be on a memberlist page which currently retrieves all members and counts how many members are registered. The point in the script is if the time is less than 24 hours we read the member_count from file else if 24 hours or more has elapsed then we query database, get the member count and write new figure to file, it's to reduce queries on the memberlist page.
Update 1:
This code:
echo "The file size is actually ".filesize($filename)." bytes.\n";
always outputs the below even though it's not 0 bytes.
The file size is actually 0 bytes.
also tried
var_dump (filesize($filename));
Outputs:
int(0)
You are using:
fopen($filename, "w+")
According to the manual w+ means:
Open for reading and writing; place the file pointer at the beginning of the file and truncate the file to zero length. If the file does not exist, attempt to create it.
So the file size being 0 is correct.
You probably need r+
Sorry I know this question is closed but I am writing my own answer so it might be useful for someone else
if use c+ in fopen function ,
fopen($filePath , "c+");
then the filesize() function return size of file
and you can use clearstatcache($filePath) to clear the cache of this file.
notice: when we use c+ in fopen() and then use the fread(), function reserve the file content and place our string at the end of file content

Caching large Array causes memory exhaustion

So I'm trying to cache an array in a file and use it somewhere else.
import.php
// Above code is to get each line in CSV and put in it in an array
// (1 line is 1 multidimensional array) - $csv
$export = var_export($csv, true);
$content = "<?php \$data=" . $export . ";?>";
$target_path1 = "/var/www/html/Samples/test";
file_put_contents($target_path1 . "recordset.php", $content);
somewhere.php
ini_set('memory_limit','-1');
include_once("/var/www/html/Samples/test/recordset.php");
print_r($data);
Now, I've included recordset.php in somewhere.php to use the array stored in it. It works fine when the uploaded CSV file has 5000 lines, now if i try to upload csv with 50000 lines for example, i'm getting a fatal error:
Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 79691776 bytes)
How can I fix it or is there a possible way to achieve what i want in a more convenient way? Speaking about the performance... Should i consider the CPU of the server? I've override the memory limit and set it to -1 in somewhere.php
There are 2 ways to fix this:
You need to increase memory(RAM) on the server as memory_limit can only use memory which is available on server. And it seems that you have very low RAM available for PHP.
To Check the total RAM on Linux server:
<?php
$fh = fopen('/proc/meminfo','r');
$mem = 0;
while ($line = fgets($fh)) {
$pieces = array();
if (preg_match('/^MemTotal:\s+(\d+)\skB$/', $line, $pieces)) {
$mem = $pieces[1];
break;
}
}
fclose($fh);
echo "$mem kB RAM found"; ?>
Source: get server ram with php
You should parse your CSV file in chunks & every time release occupied memory using unset function.

PHP Fatal error: Allowed memory size of xxx bytes exhausted (tried to allocate XX bytes)

I have a php script that splits a large file and inserts into PostgreSQL. This import has worked before on PHP 5.3 and PostgreSQL 8.3 and Mac OS X 10.5.8. I now moved everything over to a new Mac Pro. This has plenty of RAM (16MB), Mac OS X 10.9.2, PHP 5.5.8, PostgreSQL 9.3.
The problem is when reading the large import file. It is a tab-separated file over 181 MB. I have tried to increase PHP memory up to 2GB (!) with no more success. So I guess the problem must be in the code reading the text file and splitting it. I get this error:
PHP Fatal error: Allowed memory size of 2097152000 bytes exhausted (tried to allocate 72 bytes) in /Library/FileMaker Server/Data/Scripts/getGBIFdata.php on line 20
Is there a better way to do this? I read the file and split the lines, then again split each line by \t (tab). The error I get on this line:
$arr = explode("\t", $line);
Here is my code:
<?php
## I have tried everything here, memory_limit in php.ini is 256M
ini_set("memory_limit","1000M");
$db= pg_connect('host=127.0.0.1 dbname=My_DB_Name user=Username password=Pass');
### SETT ERROR_STATE:
pg_set_error_verbosity($db, PGSQL_ERRORS_VERBOSE);
### Emtpy DB
$result = pg_query("TRUNCATE TABLE My_DB_Name");
$fcontents = file ('///Library/FileMaker\ Server/Data/Documents/EXPORT/export_file.tab');
for($i=0; $i<sizeof($fcontents); $i++) {
$line = trim($fcontents[$i]);
$arr = explode("\t", $line);
$query = "insert into My_DB_Name(
field1, field2 etc.... )
values (
'{$arr[0]}','{$arr[1]}','{$arr[2]}','{$arr[3]}', etc........
)";
$result = pg_query($query); echo "\n Lines:".$i;
pg_send_query($db, $query);
$res1 = pg_get_result($db);
}
## Update geometry column
$sql = pg_query("
update darwincore2 set punkt_geom=
ST_SetSRID(ST_MakePoint(My_DB_Name.longitude, darwincore2.latitude),4326);
");
?>
I think the problem is that you're using the file() function which reads the whole file in memory at once. Try reading it line by line using fopen and fgets.
$fp = fopen(filename, "r");
while (($line = fgets($fp)) !== false) {
... insert $line into the db....
}
fclose($fp);
You can also import a file directly with the COPY command (http://www.postgresql.org/docs/9.2/static/sql-copy.html)
this case can be occurred from code e.g infinite loop, process large amount data, or even database queries
You should check code, there might have been infinite loop or such type case

Categories