Doctrine2 Batch processing

Doctrine2 Batch processing - php

I'm trying to import text file with 10000+ lines into DB. I've found a manual here. But my script ends just before reaching the final flush without any error. The boolean parameter in my custom flush method stands for calling clear method after flushing.
Code:
$handle = fopen($file, 'r');
if ($handle != FALSE) {
// Clear entities
$clear = 1;
// Read line
while (($data = fgets($handle)) !== FALSE) {
$entity = $this->_createEntity($data);
echo $clear . '<br>';
$this->getPresenter()->getService('mapService')->persist($entity);
if ($clear % 100 == 0) {
echo 'saving...<br>';
$this->getPresenter()->getService('mapService')->flush(TRUE); // Flush and clear
}
$clear++;
}
echo 'end...'; // Script ends before reaching this line
$this->getPresenter()->getService('mapService')->flush(); // Final flush
echo '...ed';
}
fclose($handle);
Custom Flush method:
public function flush($clear = FALSE) {
$this->db->flush();
if ($clear) {
$this->db->clear();
}
}
Echo output:
1
...
9998
9999
10000
saving...
But no end......ed.
Thanks a lot in advance.
EDIT
I've changed number of line in files to process in one batch from 10k to 5000. It's OK now. But I still wonder why 10k is "too much" for PHP or Doctrine.

Try using feof:
$handle = fopen($file, 'r');
if ($handle != FALSE) {
// Clear entities
$clear = 1;
// Read line
//while (($data = fgets($handle)) !== FALSE) {
while (!feof($handle)){
$data = fgets($handle);
$entity = $this->_createEntity($data);
echo $clear . '<br>';
$this->getPresenter()->getService('mapService')->persist($entity);
if ($clear % 100 == 0) {
echo 'saving...<br>';
$this->getPresenter()->getService('mapService')->flush(TRUE); // Flush and clear
}
$clear++;
}
echo 'end...'; // Script ends before reaching this line
$this->getPresenter()->getService('mapService')->flush(); // Final flush
echo '...ed';
}
fclose($handle);

Related

PHP - while loop (!feof()) isn't outputting/showing everything

I am trying to read (and echo) everything of a .txt-File.
This is my code:
$handle = #fopen("item_sets.txt", "r");
while (!feof($handle))
{
$buffer = fgets($handle, 4096);
$trimmed = trim($buffer);
echo $trimmed;
}
This is my "item_sets.txt": http://pastebin.com/sxapZGuW
But it doesn't echo everything (and changing how much it shows depending on if and how many characters i echo after it). var_dump() shows me that the last string is never finished printing out. That looks like this:
" string(45) ""[cu_well_tra. But if I put an
echo "whateverthisisjustarandomstringwithseveralcharacters";,
my last output lines look like this:
" string(45) ""[cu_well_traveled_ak47]weapon_ak47" "1"
" string(5) "}
"
Basically my code isn't printing/echoing all of what it should or at least not showing it.
Thanks in advance :)

Thats because your test for EOF is before you output your last read
Try this with the test for EOF as part of the reading process
<?php
$line_count = 0;
$handle = fopen("item_sets.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$trimmed = trim($buffer);
echo $trimmed;
$line_count++;
}
} else {
echo 'Unexpected error opening file';
}
fclose($handle);
echo PHP_EOL.PHP_EOL.PHP_EOL.'Lines read from file = ' . $line_count;
?>
Also I removed the # infront of the fopen its bad practice to ignore errors, and much better practice to look for them and deal with them.
I copied your data into a file called tst.txt and ran this exact code
<?php
$handle = fopen('tst.txt', 'r');
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$trimmed = trim($buffer);
echo $trimmed;
}
} else {
echo 'Unexpected error opening file';
}
fclose($handle);
And it generated this output ( just a small portion shown here )
"item_sets"{"set_community_3"{"name" "#CSGO_set_community_3""set_description" "#CSGO_set_community_3_desc""is_collection"
And the last output is
[aa_fade_revolver]weapon_revolver" "1"
Which is the last entry in the data file

Reading large text files efficiently

I have a couple of huge (11mb and 54mb) files that I need to read to process the rest of the script. Currently I'm reading the files and storing them in an array like so:
$pricelist = array();
$fp = fopen($DIR.'datafeeds/pricelist.csv','r');
while (($line = fgetcsv($fp, 0, ",")) !== FALSE) {
if ($line) {
$pricelist[$line[2]] = $line;
}
}
fclose($fp);
.. but I'm constantly getting memory overload messages from my webhost. How do I read it more efficiently?
I don't need to store everything, I already have the keyword which exactly matches the array key $line[2] and I need to read just that one array/line.

If you know the key why don't you filter out by the key? And you can check memory usage with memory_get_usage() function to see how much memory allocated after you fill your $pricelist array.
echo memory_get_usage() . "\n";
$yourKey = 'some_key';
$pricelist = array();
$fp = fopen($DIR.'datafeeds/pricelist.csv','r');
while (($line = fgetcsv($fp, 0, ",")) !== FALSE) {
if (isset($line[2]) && $line[2] == $yourKey) {
$pricelist[$line[2]] = $line;
break;
/* If there is a possiblity to have multiple lines
we can store each line in a separate array element
$pricelist[$line[2]][] = $line;
*/
}
}
fclose($fp);
echo memory_get_usage() . "\n";

You can try this (I have not checked if it works properly)
$data = explode("\n", shell_exec('cat filename.csv | grep KEYWORD'));
You will get all the lines containing the keyword, each line as an element of array.
Let me know if it helps.

I join what user2864740 said : "The problem is the in-memory usage caused by the array itself and is not about "reading" the file"
My Solution is :
Split your `$priceList` array
Load only 1 at time a splitted Array in memory
Keep the other splitted Arrays in an intermediate file
N.B: i did not test what i've written
<?php
define ("MAX_LINE", 10000) ;
define ("CSV_SEPERATOR", ',') ;
function intermediateBuilder ($csvFile, $intermediateCsvFile) {
$pricelist = array ();
$currentLine = 0;
$totalSerializedArray = 0;
if (!is_file()) {
throw new Exception ("this is not a regular file: " . $csv);
}
$fp = fopen ($csvFile, 'r');
if (!$fp) {
throw new Exception ("can not read this file: " . $csv);
}
while (($line = fgetcsv($fp, 0, CSV_SEPERATOR)) !== FALSE) {
if ($line) {
$pricelist[$line[2]] = $line;
}
if (++$currentLine == MAX_LINE) {
$fp2 = fopen ($intermediateCsvFile, 'a');
if (!$fp) throw new Exception ("can not write in this intermediate csv file: " . $intermediateCsvFile);
fputs ($fp2, serialize ($pricelist) . "\n");
fclose ($fp2);
unset ($pricelist);
$pricelist = array ();
$currentLine = 0;
$totalSerializedArray++;
}
}
fclose($fp);
return $totalSerializedArray;
}
/**
* #param array : by reference unserialized array
* #param integer : the array number to read from the intermediate csv file; start from index 1
* #param string : the (relative|absolute) path/name of the intermediate csv file
* #throw Exception
*/
function loadArray (&$array, $arrayNumber, $intermediateCsvFile) {
$currentLine = 0;
$fp = fopen ($intermediateCsvFile, 'r');
if (!$fp) {
throw new Exception ("can not read this intermediate csv file: " . $intermediateCsvFile);
}
while (($line = fgetcsv($fp, 0, CSV_SEPERATOR)) !== FALSE) {
if (++$currentLine == $arrayNumber) {
fclose ($fp);
$array = unserialize ($line);
return;
}
}
throw new Exception ("the array number argument [" . $arrayNumber . "] is invalid (out of bounds)");
}
Usage example
try {
$totalSerializedArray = intermediateBuilder ($DIR . 'datafeeds/pricelist.csv',
$DIR . 'datafeeds/intermediatePricelist.csv');
$priceList = array () ;
$arrayNumber = 1;
loadArray ($priceList,
$arrayNumber,
$DIR . 'datafeeds/intermediatePricelist.csv');
if (!array_key_exists ($key, $priceList)) {
if (++$arrayNumber > $totalSerializedArray) $arrayNumber = 1;
loadArray ($priceList,
$arrayNumber,
$DIR . 'datafeeds/intermediatePricelist.csv');
}
catch (Exception $e) {
// TODO : log the error ...
}

You can drop the
if ($line) {
That only repeats the check from the loop condition. If your file is 54MB, and you are going to retain every line from the file, as an array, plus the key from column 3 (which is hashed for lookup)... I could see that requiring 75-85MB to store it all in memory. That isn't much. Most wordpress or magento pages using widgets run 150-200MB. But if your host is set low it could be a problem.
You can try filtering out some rows by changing the if($line) to a if($line[1] == 'book') to reduce how much you store. But the only sure way to handle storing that much content in memory is to have that much memory available to the script.

You can try set bigger memory using this. You can change limit how you want.
ini_set('memory_limit', '2048M');
But also depents how you want that script use.

How can I get the total number of rows in a CSV file with PHP?

Using PHP, how can I get the total number of rows that are in a CSV file? I'm using this method but cannot get it to work properly.
if (($fp = fopen("test.csv", "r")) !== FALSE) {
while (($record = fgetcsv($fp)) !== FALSE) {
$row++;
}
echo $row;
}

Create a new file reference using SplFileObject:
$file = new SplFileObject('test.csv', 'r');
Try to seek to the highest Int PHP can handle:
$file->seek(PHP_INT_MAX);
Then actually it will seek to the highest line it could in the file, there is your last line and the last line + 1 is equals to your total lines:
echo $file->key() + 1;
Tricky, but this will avoid you from loading the file contents into memory, which is a very cool thing to do when dealing with really large files.

Here's another option using file() to read the entire file into an array, automatically parsing new lines etc:
$fp = file('test.csv');
echo count($fp);
Also, since PHP5, you can pass in the FILE_SKIP_EMPTY_LINES... to skip empty lines, if you want to:
$fp = file('test.csv', FILE_SKIP_EMPTY_LINES);
Manual: http://php.net/manual/en/function.file.php

Try
$c =0;
$fp = fopen("test.csv","r");
if($fp){
while(!feof($fp)){
$content = fgets($fp);
if($content) $c++;
}
}
fclose($fp);
echo $c;

I know that this is pretty old, but actually I ran into the same question.
As a solution I would assume to use linux specific logic:
$rows = shell_exec('$(/bin/which cat) file.csv | $(/bin/which tr) "\r" "\n" | $(which wc) -l');
NOTE: this only works for linux only and this only should be used if you are 100% certain that your file has no multiline-cells

CSV rows are separated by line breaks. Therefore, split the rows by line breaks, and you will get an array of rows, which is countable.
if (($fp = fopen("test.csv", "r")) !== FALSE) {
$rows = explode("\n", $fp);
$length = count($rows);
echo $length;
}

Note; none of higher-upvoted solutions that count lines in the file are reliable, as they are only counting the lines, not the csv entries (which can contain newline characters)
I'm using a similar solution to op, and it works perfectly, but with op's code the while part can break on empty lines, which is potentially his problem.
So it looks like this (edited op's code)
$rowCount=0;
if (($fp = fopen("test.csv", "r")) !== FALSE) {
while(!feof($fp)) {
$data = fgetcsv($fp , 0 , ',' , '"', '"' );
if(empty($data)) continue; //empty row
$rowCount++;
}
fclose($fp);
}
echo $rowCount;

I find this the most reliable:
$file = new SplFileObject('file.csv', 'r');
$file->setFlags(
SplFileObject::READ_CSV |
SplFileObject::READ_AHEAD |
SplFileObject::SKIP_EMPTY |
SplFileObject::DROP_NEW_LINE
);
$file->seek(PHP_INT_MAX);
$lineCount = $file->key() + 1;

I know this is an old post, but I've been googling this issue, and found that the only problem with the original code was that you need to define $row outside the while loop, like this:
if (($fp = fopen("test.csv", "r")) !== FALSE) {
$row = 1;
while (($record = fgetcsv($fp)) !== FALSE) {
$row++;
}
Just in case it helps someone :)
echo $row;
}

In case you are getting the file from a form
$file = $_FILES['csv']['tmp_name'];
$fp = new SplFileObject($file, 'r');
$fp->seek(PHP_INT_MAX);
echo $fp->key() + 1;
$fp->rewind();
Works like charm!!!!!!!!!!!!!!!!!!

$filename=$_FILES['sel_file']['tmp_name'];
$file=fopen($filename,"r");
$RowCount=0;
while ((fgetcsv($file)) !== FALSE)
{
$RowCount++;
}
echo $RowCount;
fclose($file);

Tailing Log File and Write results to new file

I'm not sure how to word this so I'll type it out and then edit and answer any questions that come up..
Currently on my local network device (PHP4 based) I'm using this to tail a live system log file: http://commavee.com/2007/04/13/ajax-logfile-tailer-viewer/
This works well and every 1 second it loads an external page (logfile.php) that does a tail -n 100 logfile.log The script doesn't do any buffering so the results it displayes onscreen are the last 100 lines from the log file.
The logfile.php contains :
<? // logtail.php $cmd = "tail -10 /path/to/your/logs/some.log"; exec("$cmd 2>&1", $output);
foreach($output as $outputline) {
echo ("$outputline\n");
}
?>
This part is working well.
I have adapted the logfile.php page to write the $outputline to a new text file, simply using fwrite($fp,$outputline."\n");
Whilst this works I am having issues with duplication in the new file that is created.
Obviously each time tail -n 100 is run produces results, the next time it runs it could produce some of the same lines, as this repeats I can end up with multiple lines of duplication in the new text file.
I can't directly compare the line I'm about to write to previous lines as there could be identical matches.
Is there any way I can compare this current block of 100 lines with the previous block and then only write the lines that are not matching.. Again possible issue that block A & B will contain identical lines that are needed...
Is it possible to update logfile.php to note the position it last tooked at in my logfile and then only read the next 100 lines from there and write those to the new file ?
The log file could be upto 500MB so I don't want to read it all in each time..
Any advice or suggestions welcome..
Thanks
UPDATE # 16:30
I've sort of got this working using :
$file = "/logs/syst.log";
$handle = fopen($file, "r");
if(isset($_SESSION['ftell'])) {
clearstatcache();
fseek($handle, $_SESSION['ftell']);
while ($buffer = fgets($handle)) {
echo $buffer."<br/>";
#ob_flush(); #flush();
}
fclose($handle);
#$_SESSION['ftell'] = ftell($handle);
} else {
fseek($handle, -1024, SEEK_END);
fclose($handle);
#$_SESSION['ftell'] = ftell($handle);
}
This seems to work, but it loads the entire file first and then just the updates.
How would I get it start with the last 50 lines and then just the updates ?
Thanks :)
UPDATE 04/06/2013
Whilst this works it's very slow with large files.
I've tried this code and it seems faster, but it doesn't just read from where it left off.
function last_lines($path, $line_count, $block_size = 512){
$lines = array();
// we will always have a fragment of a non-complete line
// keep this in here till we have our next entire line.
$leftover = "";
$fh = fopen($path, 'r');
// go to the end of the file
fseek($fh, 0, SEEK_END);
do{
// need to know whether we can actually go back
// $block_size bytes
$can_read = $block_size;
if(ftell($fh) < $block_size){
$can_read = ftell($fh);
}
// go back as many bytes as we can
// read them to $data and then move the file pointer
// back to where we were.
fseek($fh, -$can_read, SEEK_CUR);
$data = fread($fh, $can_read);
$data .= $leftover;
fseek($fh, -$can_read, SEEK_CUR);
// split lines by \n. Then reverse them,
// now the last line is most likely not a complete
// line which is why we do not directly add it, but
// append it to the data read the next time.
$split_data = array_reverse(explode("\n", $data));
$new_lines = array_slice($split_data, 0, -1);
$lines = array_merge($lines, $new_lines);
$leftover = $split_data[count($split_data) - 1];
}
while(count($lines) < $line_count && ftell($fh) != 0);
if(ftell($fh) == 0){
$lines[] = $leftover;
}
fclose($fh);
// Usually, we will read too many lines, correct that here.
return array_slice($lines, 0, $line_count);
}
Any way this can be amend so it will read from the last known position.. ?
Thanks

Introduction
You can tail a file by tracking the last position;
Example
$file = __DIR__ . "/a.log";
$tail = new TailLog($file);
$data = $tail->tail(100) ;
// Save $data to new file
TailLog is a simple class i wrote for this task here is a simple example to show its actually tailing the file
Simple Test
$file = __DIR__ . "/a.log";
$tail = new TailLog($file);
// Some Random Data
$data = array_chunk(range("a", "z"), 3);
// Write Log
file_put_contents($file, implode("\n", array_shift($data)));
// First Tail (2) Run
print_r($tail->tail(2));
// Run Tail (2) Again
print_r($tail->tail(2));
// Write Another data to Log
file_put_contents($file, "\n" . implode("\n", array_shift($data)), FILE_APPEND);
// Call Tail Again after writing Data
print_r($tail->tail(2));
// See the full content
print_r(file_get_contents($file));
Output
// First Tail (2) Run
Array
(
[0] => c
[1] => b
)
// Run Tail (2) Again
Array
(
)
// Call Tail Again after writing Data
Array
(
[0] => f
[1] => e
)
// See the full content
a
b
c
d
e
f
Real Time Tailing
while(true) {
$data = $tail->tail(100);
// write data to another file
sleep(5);
}
Note: Tailing 100 lines does not mean it would always return 100 lines. It would return new lines added 100 is just the maximum number of lines to return. This might not be efficient where you have heavy logging of more than 100 line per sec is there is any
Tail Class
class TailLog {
private $file;
private $data;
private $timeout = 5;
private $lock;
function __construct($file) {
$this->file = $file;
$this->lock = new TailLock($file);
}
public function tail($lines) {
$pos = - 2;
$t = $lines;
$fp = fopen($this->file, "r");
$break = false;
$line = "";
$text = array();
while($t > 0) {
$c = "";
// Seach for End of line
while($c != "\n" && $c != PHP_EOL) {
if (fseek($fp, $pos, SEEK_END) == - 1) {
$break = true;
break;
}
if (ftell($fp) < $this->lock->getPosition()) {
break;
}
$c = fgetc($fp);
$pos --;
}
if (ftell($fp) < $this->lock->getPosition()) {
break;
}
$t --;
$break && rewind($fp);
$text[$lines - $t - 1] = fgets($fp);
if ($break) {
break;
}
}
// Move to end
fseek($fp, 0, SEEK_END);
// Save Position
$this->lock->save(ftell($fp));
// Close File
fclose($fp);
return array_map("trim", $text);
}
}
Tail Lock
class TailLock {
private $file;
private $lock;
private $data;
function __construct($file) {
$this->file = $file;
$this->lock = $file . ".tail";
touch($this->lock);
if (! is_file($this->lock))
throw new Exception("can't Create Lock File");
$this->data = json_decode(file_get_contents($this->lock));
// Check if file is valida json
// Check if Data in the original files as not be delete
// You expect data to increate not decrease
if (! $this->data || $this->data->size > filesize($this->file)) {
$this->reset($file);
}
}
function getPosition() {
return $this->data->position;
}
function reset() {
$this->data = new stdClass();
$this->data->size = filesize($this->file);
$this->data->modification = filemtime($this->file);
$this->data->position = 0;
$this->update();
}
function save($pos) {
$this->data = new stdClass();
$this->data->size = filesize($this->file);
$this->data->modification = filemtime($this->file);
$this->data->position = $pos;
$this->update();
}
function update() {
return file_put_contents($this->lock, json_encode($this->data, 128));
}
}

Not really clear on how you want to use the output but would something like this work ....
$dat = file_get_contents("tracker.dat");
$fp = fopen("/logs/syst.log", "r");
fseek($fp, $dat, SEEK_SET);
ob_start();
// alternatively you can do a while fgets if you want to interpret the file or do something
fpassthru($fp);
$pos = ftell($fp);
fclose($fp);
echo nl2br(ob_get_clean());
file_put_contents("tracker.dat", ftell($fp));
tracker.dat is just a text file that contains where the read position position was from the previous run. I'm just seeking to that position and piping the rest to the output buffer.

Use tail -c <number of bytes, instead of number of lines, and then check the file size. The rough idea is:
$old_file_size = 0;
$max_bytes = 512;
function last_lines($path) {
$new_file_size = filesize($path);
$pending_bytes = $new_file_size - $old_file_size;
if ($pending_bytes > $max_bytes) $pending_bytes = $max_bytes;
exec("tail -c " + $pending_bytes + " /path/to/your_log", $output);
$old_file_size = $new_file_size;
return $output;
}
The advantage is that you can do away with all the special processing stuff, and get good performance. The disadvantage is that you have to manually split the output into lines, and probably you could end up with unfinished lines. But this isn't a big deal, you can easily work around by omitting the last line alone from the output (and appropriately subtracting the last line number of bytes from old_file_size).

PHP: How to read a file live that is constantly being written to

I want to read a log file that is constantly being written to. It resides on the same server as the application. The catch is the file gets written to every few seconds, and I basically want to tail the file on the application in real-time.
Is this possible?

You need to loop with sleep:
$file='/home/user/youfile.txt';
$lastpos = 0;
while (true) {
usleep(300000); //0.3 s
clearstatcache(false, $file);
$len = filesize($file);
if ($len < $lastpos) {
//file deleted or reset
$lastpos = $len;
}
elseif ($len > $lastpos) {
$f = fopen($file, "rb");
if ($f === false)
die();
fseek($f, $lastpos);
while (!feof($f)) {
$buffer = fread($f, 4096);
echo $buffer;
flush();
}
$lastpos = ftell($f);
fclose($f);
}
}
(tested.. it works)

Yes, you need to sleep some time in the loop but you don't have to reopen the file. I was just looking for a similar problem. I wanted to read a file that might have been changed since last read.
The problem is that the resource has reached end of file (EOF). And does not continue to read. The solution is to reset the pointer with fseek($fh, ftell($fh)).
A complete program that waits for input in a text file might look like this one:
<?php
$fh = fopen('/var/log/system', 'r');
while (true) {
$line = fgets($fh);
if ($line !== false) {
// show the line or send it via email or to a websocket..
} else {
// sleep for 0.1 seconds (or more?)
usleep(0.1 * 1000000);
fseek($fh, ftell($fh));
}
}

For example :
$log_file = '/tmp/test/log_file.log';
$f = fopen($log_file, 'a+');
$fr = fopen($log_file, 'r' );
for ( $i = 1; $i < 10; $i++ )
{
fprintf($f, "Line: %u\n", $i);
sleep(2);
echo fread($fr, 1024) . "\n";
}
fclose($fr);
fclose($f);
//Or if you want use tail
$f = fopen($log_file, 'a+');
for ( $i = 1; $i < 10; $i++ )
{
fprintf($f, "Line: %u\n", $i);
sleep(2);
$result = array();
exec( 'tail -n 1 ' . $log_file, $result );
echo "\n".$result[0];
}
fclose($f);

you can close the file handle when it is not used(once a portion of data has been written). or you can use a buffer to store the data and put it to the file only when it's full. this way you won't have the file open all the time.
if you want to get everything that is written to the file as soon as it is written there, you might need to extend the code, writing the data, so that it would output to other places too(screen, some variable, other file...)

<?php
$fp = fopen('/var/log/syslog', 'r');// Read only
while (true) {
$line = stream_get_line($fp, 1024 * 1024, "\n");// Full line found ? (searches for a line break)
if ($line === false) {
usleep(100000);// 100ms
continue;
}
echo 'line:' . $line . PHP_EOL;
}
// -- Code impossible to reach --
// fclose($fp);

Just an idea..
Did you think of using the *nix tail command? execute the command from php (with a param that will return a certain number of lines) and process the results in your php script.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Doctrine2 Batch processing - php

Related

PHP - while loop (!feof()) isn't outputting/showing everything

Reading large text files efficiently

How can I get the total number of rows in a CSV file with PHP?

Tailing Log File and Write results to new file

PHP: How to read a file live that is constantly being written to

Categories

Resources