Problem getting last chunk in a while loop - php

So it's been awhile since i did any PHP and to be honest, this question feels kinda dumb. But my head is just stuck thinking about how to get last chunk in a file.
My while loop reads a file, line by line and after 10 lines it should execute a code. Problem occures when there's 51 lines. How do i reach the last line?
The file is over 300 mb so I cannot load it into memory (array).
while ($row = fgets($handle))
{
$chunk[] = array_combine($feed_product_arraykeys, explode("\t", $row));
if(count($chunk) == 10)
{
echo count($chunk) . '<br>';
// Initiate code
unset($chunk);
}
}
Best Regards

Here's an alternate way. Just read the file into an array and chunk it into chunks of 10 The remaining will be in the last chunk:
foreach(array_chunk(file('/path/to/file'), 10) as $row) {
$chunk[] = array_combine($feed_product_arraykeys, explode("\t", $row));
echo count($chunk) . '<br>';
}

So i actually fixed it by counting number of rows in the file. I thought it would be slow but its actually fast, even on a 300 mb file with 130k rows.
// Count number of lines in feed
$feed_row_count = count_lines_in_file("tmp/56.csv");
$row_counter = 0;
$feed_handle = fopen("tmp/56.csv", "r");
while ($row = fgets($feed_handle))
{
$row_counter++;
$chunk[] = array_combine($feed_product_arraykeys, explode("\t", $row));
if(count($chunk) == 25 || $feed_row_count == $row_counter)
{
echo count($chunk) . '<br>';
// Initiate SQL
unset($chunk);
}
}

Related

Checking for partial duplications in a CSV in PHP

I'm having an issue with a memory leak in this code. What I'm attempting to do is to temporarily upload a rather large CSV file (at least 12k records), and check each record for a partial duplication against other records in the CSV file. The reason why I say "partial duplication" is because basically if most of the record matches (at least 30 fields), it is going to be a duplicate record. The code I've written should, in theory, work as intended, but of course, it's a rather large loop and is exhausting memory. This is happening on the line that contains "array_intersect".
This is not for something I'm getting paid to do, but it is with the purpose of helping make life at work easier. I'm a data entry employee, and we are having to look at duplicate entries manually right now, which is asinine, so I'm trying to help out by making a small program for this.
Thank you so much in advance!
if (isset($_POST["submit"])) {
if (isset($_FILES["sheetupload"])) {
$fh = fopen(basename($_FILES["sheetupload"]["name"]), "r+");
$lines = array();
$records = array();
$counter = 0;
while(($row = fgetcsv($fh, 8192)) !== FALSE ) {
$lines[] = $row;
}
foreach ($lines as $line) {
if(!in_array($line, $records)){
if (count($records) > 0) {
//check array against records for dupes
foreach ($records as $record) {
if (count(array_intersect($line, $record)) > 30) {
$dupes[] = $line;
$counter++;
}
else {
$records[] = $line;
}
}
}
else {
$records[] = $line;
}
}
else {
$counter++;
}
}
if ($counter < 1) {
echo $counter." duplicate records found. New file not created.";
}
else {
echo $counter." duplicate records found. New file created as NEWSHEET.csv.";
$fp = fopen('NEWSHEET.csv', 'w');
foreach ($records as $line) {
fputcsv($fp, $line);
}
}
}
}
A couple of possibilities, assuming the script is reaching the memory limit or timing out. If you can access the php.ini file, try increasing the memory_limit and the max_execution_time.
If you can't access the server settings, try adding these to the top of your script:
ini_set('memory_limit','256M'); // change this number as necessary
set_time_limit(0); // so script does not time out
If altering these settings in the script is not possible, you might try using unset() in a few spots to free up memory:
// after the first while loop
unset($fh, $row);
and
//at end of each foreach loop
unset($line);

Trouble reading huge CSV file with php fgetcsv - understanding memory consumption

Good morning,
I´m actually going through some hard lessons while trying to handle huge csv files up to 4GB.
Goal is to search some items in a csv file (Amazon datafeed) by a given browsenode and also by some given item id´s (ASIN). To get a mix of existing items (in my database) plus some additional new itmes since from time to time items disapear on the marketplace. I also filter the title of the items because there are many items using the same.
I have been reading here lots af tips and finally decided to use php´s fgetcsv() and thought this function will not exhaust memory, since it reads the file line by line.
But no matter what I try I´m always running out of memory.
I can not understand why my code uses so much memory.
I set the memory limit to 4096MB, time limit is 0. Server has 64 GB Ram and two SSD hardisks.
May someone please check out my piece of code and explain how it is possible that im running out of memory and more important how memory is used?
private function performSearchByASINs()
{
$found = 0;
$needed = 0;
$minimum = 84;
if(is_array($this->searchASINs) && !empty($this->searchASINs))
{
$needed = count($this->searchASINs);
}
if($this->searchFeed == NULL || $this->searchFeed == '')
{
return false;
}
$csv = fopen($this->searchFeed, 'r');
if($csv)
{
$l = 0;
$title_array = array();
while(($line = fgetcsv($csv, 0, ',', '"')) !== false)
{
$header = array();
if(trim($line[6]) != '')
{
if($l == 0)
{
$header = $line;
}
else
{
$asin = $line[0];
$title = $this->prepTitleDesc($line[6]);
if(is_array($this->searchASINs)
&& !empty($this->searchASINs)
&& in_array($asin, $this->searchASINs)) //search for existing items to get them updated
{
$add = true;
if(in_array($title, $title_array))
{
$add = false;
}
if($add === true)
{
$this->itemsByASIN[$asin] = new stdClass();
foreach($header as $k => $key)
{
if(isset($line[$k]))
{
$this->itemsByASIN[$asin]->$key = trim(strip_tags($line[$k], '<br><br/><ul><li>'));
}
}
$title_array[] = $title;
$found++;
}
}
if(($line[20] == $this->bnid || $line[21] == $this->bnid)
&& count($this->itemsByKey) < $minimum
&& !isset($this->itemsByASIN[$asin])) // searching for new items
{
$add = true;
if(in_array($title, $title_array))
{
$add = false;
}
if($add === true)
{
$this->itemsByKey[$asin] = new stdClass();
foreach($header as $k => $key)
{
if(isset($line[$k]))
{
$this->itemsByKey[$asin]->$key = trim(strip_tags($line[$k], '<br><br/><ul><li>'));
}
}
$title_array[] = $title;
$found++;
}
}
}
$l++;
if($l > 200000 || $found == $minimum)
{
break;
}
}
}
fclose($csv);
}
}
I know my answer is a bit late but I had a similar problem with fgets() and things based on fgets() like SplFileObject->current() function. In my case it was on a windows system when trying to read a +800MB file. I think fgets() doesn't free the memory of the previous line in a loop. So every line that was read stayed in memory and let to a fatal out of memory error. I fixed it using fread($lineLength) instead but it is a bit trickier since you must supply the length.
It is very hard to manage large data using array without encountering timeout issue. Instead why not parse this datafeed to a database table and do the heavy lifting from there.
Have you tried this? SplFileObject::fgetcsv
<?php
$file = new SplFileObject("data.csv");
while (!$file->eof()) {
//your code here
}
?>
You are running out of memory because you use variables, and you are never doing an unset(); and use too many nested foreach. You could shrink that code in more functions
A solution should be, use a real Database instead.

PHP - Why is reading this csv file using so much memory, how can I improve my code?

Situation is that I need to import a fairly large csv file (approx 1/2 million records - 80mb) to a mysql database. I know I could do this from the command line but I need I UI so the client can do it.
Here is what I have so far:
ini_set('max_execution_time', 0);
ini_set('memory_limit', '1024M');
$field_maps = array();
foreach (Input::get() as $field => $value){
if ('fieldmap_' == substr($field, 0, 9) && $value != 'unassigned'){
$field_maps[str_replace('fieldmap_', null, $field)] = $value;
}
}
$file = app_path().'/../uploads/'.$client.'_'.$job_number.'/'.Input::get('file');
$result_array = array();
$rows = 0;
$bulk_insert_count = 1000;
if (($handle = fopen($file, "r")) !== FALSE)
{
$header = fgetcsv($handle);
$data_map = array();
foreach ($header as $k => $th){
if (array_key_exists($th, $field_maps)){
$data_map[$field_maps[$th]] = $k;
}
}
$tmp_rows_count = 0;
while (($data = fgetcsv($handle, 1000)) !== FALSE) {
$row_array = array();
foreach ($data_map as $column => $data_index){
$row_array[$column] = $data[$data_index];
}
$result_array[] = $row_array;
$rows++;
$tmp_rows_count++;
if ($tmp_rows_count == $bulk_insert_count){
Inputs::insert($result_array);
$result_array = array();
if (empty($result_array)){
echo '*************** array cleared *************';
}
$tmp_rows_count = 0;
}
}
fclose($handle);
}
print('done');
I am currently working on a local vagrant box, when I try to run the above locally it process almost all the rows of the csv file and then dies shortly before the end (no error) but it gets up to the boxes memory limit of 1.5Gb.
I suspect some of what I have done in the above code is unnecessary, e.g. but I thought by building up and inserting a limited number of rows I would reduce memory use but it hasn't done enough.
I suspect this would probably work on the live server with more memory available but I cannot believe that it has to take 1.5Gb of memory to process an 80mb file, there must be a better approach. Any help much appreciated
Had this problem once, this solved it for me:
DB::connection()->disableQueryLog();
Info in the docs about it: http://laravel.com/docs/database#query-logging

Read large data from csv file in php [duplicate]

This question already has answers here:
file_get_contents => PHP Fatal error: Allowed memory exhausted
(4 answers)
Closed 3 years ago.
I am reading csv & checking with mysql that records are present in my table or not in php.
csv has near about 25000 records & when i run my code it display "Service Unavailable" error after 2m 10s (onload: 2m 10s)
here i have added code
// for set memory limit & execution time
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');
//function to read csv file
function readCSV($csvFile)
{
$file_handle = fopen($csvFile, 'r');
while (!feof($file_handle) ) {
set_time_limit(60); // you can enable this if you have lot of data
$line_of_text[] = fgetcsv($file_handle, 1024);
}
fclose($file_handle);
return $line_of_text;
}
// Set path to CSV file
$csvFile = 'my_records.csv';
$csv = readCSV($csvFile);
for($i=1;$i<count($csv);$i++)
{
$user_email= $csv[$i][1];
$qry = "SELECT u.user_id, u.user_email_id FROM tbl_user as u WHERE u.user_email_id = '".$user_email."'";
$result = #mysql_query($qry) or die("Couldn't execute query:".mysql_error().''.mysql_errno());
$rec = #mysql_fetch_row($result);
if($rec)
{
echo "Record exist";
}
else
{
echo "Record not exist";
}
}
Note: I just want to list out records those are not exist in my table.
Please suggest me solution on this...
An excellent method to deal with large files is located at: https://stackoverflow.com/a/5249971/797620
This method is used at http://www.cuddlycactus.com/knownpasswords/ (page has been taken down) to search through 170+ million passwords in just a few milliseconds.
After struggling a lot, finally i found a good solution, may be it help others also.
When i tried 2,367KB csv file containing 18226 rows, the least time taken by different php scripts were
(1) from php.net fgetcsv documentation named CsvImporter, and
(2) file_get_contents => PHP Fatal error: Allowed memory exhausted
(1) took 0.92574405670166
(2) took 0.12543702125549 (string form) & 0.52903485298157 (splitted to array)
Note: this calculation not include adding to mysql.
The best solution i found uses 3.0644409656525 total including adding to database and some conditional check also.
It took 11 seconds in processing a 8MB file.
solution is :
$csvInfo = analyse_file($file, 5);
$lineSeperator = $csvInfo['line_ending']['value'];
$fieldSeperator = $csvInfo['delimiter']['value'];
$columns = getColumns($file);
echo '<br>========Details========<br>';
echo 'Line Sep: \t '.$lineSeperator;
echo '<br>Field Sep:\t '.$fieldSeperator;
echo '<br>Columns: ';print_r($columns);
echo '<br>========Details========<br>';
$ext = pathinfo($file, PATHINFO_EXTENSION);
$table = str_replace(' ', '_', basename($file, "." . $ext));
$rslt = table_insert($table, $columns);
if($rslt){
$query = "LOAD DATA LOCAL INFILE '".$file."' INTO TABLE $table FIELDS TERMINATED BY '$fieldSeperator' ";
var_dump(addToDb($query, false));
}
function addToDb($query, $getRec = true){
//echo '<br>Query : '.$query;
$con = #mysql_connect('localhost', 'root', '');
#mysql_select_db('rtest', $con);
$result = mysql_query($query, $con);
if($result){
if($getRec){
$data = array();
while ($row = mysql_fetch_assoc($result)) {
$data[] = $row;
}
return $data;
}else return true;
}else{
var_dump(mysql_error());
return false;
}
}
function table_insert($table_name, $table_columns) {
$queryString = "CREATE TABLE " . $table_name . " (";
$columns = '';
$values = '';
foreach ($table_columns as $column) {
$values .= (strtolower(str_replace(' ', '_', $column))) . " VARCHAR(2048), ";
}
$values = substr($values, 0, strlen($values) - 2);
$queryString .= $values . ") ";
//// echo $queryString;
return addToDb($queryString, false);
}
function getColumns($file){
$cols = array();
if (($handle = fopen($file, 'r')) !== FALSE)
{
while (($row = fgetcsv($handle)) !== FALSE)
{
$cols = $row;
if(count($cols)>0){
break;
}
}
return $cols;
}else return false;
}
function analyse_file($file, $capture_limit_in_kb = 10) {
// capture starting memory usage
$output['peak_mem']['start'] = memory_get_peak_usage(true);
// log the limit how much of the file was sampled (in Kb)
$output['read_kb'] = $capture_limit_in_kb;
// read in file
$fh = fopen($file, 'r');
$contents = fread($fh, ($capture_limit_in_kb * 1024)); // in KB
fclose($fh);
// specify allowed field delimiters
$delimiters = array(
'comma' => ',',
'semicolon' => ';',
'tab' => "\t",
'pipe' => '|',
'colon' => ':'
);
// specify allowed line endings
$line_endings = array(
'rn' => "\r\n",
'n' => "\n",
'r' => "\r",
'nr' => "\n\r"
);
// loop and count each line ending instance
foreach ($line_endings as $key => $value) {
$line_result[$key] = substr_count($contents, $value);
}
// sort by largest array value
asort($line_result);
// log to output array
$output['line_ending']['results'] = $line_result;
$output['line_ending']['count'] = end($line_result);
$output['line_ending']['key'] = key($line_result);
$output['line_ending']['value'] = $line_endings[$output['line_ending']['key']];
$lines = explode($output['line_ending']['value'], $contents);
// remove last line of array, as this maybe incomplete?
array_pop($lines);
// create a string from the legal lines
$complete_lines = implode(' ', $lines);
// log statistics to output array
$output['lines']['count'] = count($lines);
$output['lines']['length'] = strlen($complete_lines);
// loop and count each delimiter instance
foreach ($delimiters as $delimiter_key => $delimiter) {
$delimiter_result[$delimiter_key] = substr_count($complete_lines, $delimiter);
}
// sort by largest array value
asort($delimiter_result);
// log statistics to output array with largest counts as the value
$output['delimiter']['results'] = $delimiter_result;
$output['delimiter']['count'] = end($delimiter_result);
$output['delimiter']['key'] = key($delimiter_result);
$output['delimiter']['value'] = $delimiters[$output['delimiter']['key']];
// capture ending memory usage
$output['peak_mem']['end'] = memory_get_peak_usage(true);
return $output;
}
Normally, "Service Unavailable" error will come when 500 error occurs.
I think this is coming because of insufficient execution time. Please check your log/browser console, may be you can see 500 error.
First of all,
Keep set_time_limit(60) out of loop.
Do some changes like,
Apply INDEX on user_email_id column, so you can get the rows faster with your select query.
Do not echo message, Keep the output buffer free.
And
I have done these kind of take using Open source program. You can get it here http://sourceforge.net/projects/phpexcelreader/
Try this.

PHP counter with flock

I have a problem with a counter. I need to count two variables, separated with a |, but sometimes the counter doesn't increase a variable's value.
numeri.txt (the counter):
6122|742610
This is the PHP script:
$filename="numeri.txt";
while(!$fp=fopen($filename,'c+'))
{
usleep(100000);
}
while(!flock($fp,LOCK_EX))
{
usleep(100000);
}
$contents=fread($fp,filesize($filename));
ftruncate($fp,0);
rewind($fp);
$contents=explode("|",$contents);
$clicks=$contents[0];
$impressions=$contents[1]+1;
fwrite($fp,$clicks."|".$impressions);
flock($fp,LOCK_UN);
fclose($fp);
I have another counter that is a lot slower but counts both values (clicks and impressions) exactly. Sometimes the counter numeri.txt counts more impressions than the other counter. Why? How can I fix this?
We're using the following at our high-traffic site to count impressions:
<?php
$countfile = "counter.txt"; // SET THIS
$yearmonthday = date("Y.m.d");
$yearmonth = date("Y.m");;
// Read the current counts
$countFileHandler = fopen($countfile, "r+");
if (!$countFileHandler) {
die("Can't open count file");
}
if (flock($countFileHandler, LOCK_EX)) {
while (($line = fgets($countFileHandler)) !== false) {
list($date, $count) = explode(":", trim($line));
$counts[$date] = $count;
}
$counts[$yearmonthday]++;
$counts[$yearmonth]++;
fseek($countFileHandler, 0);
// Write the counts back to the file
krsort($counts);
foreach ($counts as $date => $count) {
fwrite($countFileHandler, "$date:$count\n");
fflush($countFileHandler);
}
flock($countFileHandler, LOCK_UN);
} else {
echo "Couldn't acquire file lock!";
}
fclose($countFileHandler);
}
?>
The results are both daily and monthly totals:
2015.10.02:40513
2015.10.01:48396
2015.10:88909
Try performing a flush before unlocking. You're unlocking before the data might even be written, allowing another execution to clobber.
http://php.net/manual/en/function.fflush.php

Categories