I'm using a custom read filter to read files in chunks:
class chunkReadFilter implements PHPExcel_Reader_IReadFilter{
private $start_row, $end_row, $chunk_size;
public function __construct($chunk_size, $start_row=1){
$this->chunk_size = $chunk_size;
$this->start_row = $start_row;
$this->end_row = $start_row+$chunk_size-1;
}
public function moveCursor(){
$this->start_row += $this->chunk_size;
$this->end_row += $this->chunk_size;
}
public function readCell($column, $row, $worksheetName = ''){
return $row>=$this->start_row && $row<=$this->end_row;
}
}
My problem is that I'm not sure about know how to detect I've finished. Examples and documentation always hard-code a maximum row:
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
}
The PHPExcel_Worksheet::getHighestRow() and PHPExcel_Worksheet::getHighestDataRow() methods seem to work on filtered data (kind of). For instance, in a 200 row file:
If I read rows from 100 to 120 I get 120
If I attempt to read rows from 300 to 320 I get 1 :-?
What's the best way to stop the loop?
The best way to stop the loop is to know how many rows you should be reading in the first place.
There is a helper method in every Reader that will provide some basic meta data about the file without needing to load it all.
Before starting your loop:
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetData = $objReader->listWorksheetInfo($inputFileName);
echo '<h3>Worksheet Information</h3>';
echo '<ol>';
foreach ($worksheetData as $worksheet) {
echo '<li>', $worksheet['worksheetName'], '<br />';
echo 'Rows: ', $worksheet['totalRows'],
' Columns: ', $worksheet['totalColumns'], '<br />';
echo 'Cell Range: A1:',
$worksheet['lastColumnLetter'], $worksheet['totalRows'];
echo '</li>';
}
echo '</ol>';
This is documented in section 7 of the User documentation for Reading Spreadsheet files, and in Examples/Reader/exampleReader19.php
The best way to loop thoughout cells is using getRowIterator and getCellIterator:
$rows = $sheet->getRowIterator();
foreach ($rows as $r => $row) {
$cells = $row->getCellIterator();
foreach ($cells as $c => $cell) {
$value = $cell->getValue();
}
}
Related
I am trying to display the total number in each line of a file that does not contain a path to another file as well as the filename,
but if it has a path to another file loop through that file and sum all the number it has and the loop goes on and on as long as there exists a path to a file in the current file.
here is my code
function fileProcessor($value){
if(file_exists(trim($value))){
$total = 0;
$files = file($value, FILE_SKIP_EMPTY_LINES);
foreach($files as $data) {
if(!preg_match("/.txt/i", $data)){
$num = floatval($data);
$total += $num;
} else {
fileProcessor(trim($data));
}
}
echo $value. ' - ' .($total);
} else {
echo 'File does not exist';
}
fileProcessor('text_files/first.txt');
}
I have 3 .txt files I'm working with, inside those files I have something like this
first.txt
1
3
3
second.txt
second.txt
2
3
third.txt
third.txt
1
2
The output I am looking for
first.txt - 15
second.txt - 8
third.txt - 3
I will really appreciate it if someone can point me in the right direction, I don't know if I'm doing it right.
There are two problems with your code:
You're not including the directory for the source file in the path to subsidiary files, so those files are never found.
You're not returning the total from the function so that higher level invocations can add the total for subsidiary files
Correcting those issues, and renaming the variables to something meaningful gives this code:
function fileProcessor($filename){
if(file_exists(trim($filename))){
$total = 0;
$rows = file($filename, FILE_SKIP_EMPTY_LINES);
foreach($rows as $data) {
if(!preg_match("/.txt/i", $data)){
$num = floatval($data);
$total += $num;
}else {
$total += fileProcessor(dirname($filename)."/".trim($data));
}
}
echo $filename. ' - ' .($total)."<br>\n";
}else{
echo "File does not exist<br>\n";
}
return $total;
}
fileProcessor('text_files/first.txt');
Output:
text_files/third.txt - 3
text_files/second.txt - 8
text_files/first.txt - 15
This lists the files in the order in which the totals are finally accumulated, so the lowest levels appear first.
[Edit]
I spotted a problem with the order of results if two or more filenames appear in a file. Here's a reworked version that deals with that.
To list the files in the order in which they are encountered requires reversing the natural order. In the new version below I've placed the filenames in a $fileList array which is passed down by reference. Each new invocation of the function adds its results to the end of that array. Once processing is complete the array is displayed.
function fileProcessor( &$fileList){
// Get the last filename in the array
end($fileList);
$filename = key($fileList);
if(file_exists(trim($filename))){
// Accumulate the values
$fileList[$filename] = 0;
$rows = file($filename, FILE_SKIP_EMPTY_LINES);
foreach($rows as $data) {
if(!preg_match("/.txt/i", $data)){
$num = floatval($data);
$fileList[$filename] += $num;
}else {
// Recursive call. Add the next filename to the array
$fileList[dirname($filename)."/".trim($data)]=0;
$fileList[$filename] += fileProcessor($fileList);
}
}
} else {
$fileList[$filename]= "File does not exist: $filename";
}
// return the total for the file to add to the accumulator for the file above.
return $fileList[$filename];
}
// Place the initial file in the array
$results = ['text_files/first.txt' => 0];
// run the function
fileProcessor($results);
// display the results
foreach($results as $filename=>$total) {
echo $filename.' - '.$total."<br>\n";
}
Output:
text_files/first.txt - 15
text_files/second.txt - 8
text_files/third.txt - 3
You could use the static var:
<?php
function fileProcessor($value) {
static $results = [];
if (file_exists(trim($value))) {
$results[$value] = 0;
$files = file($value, FILE_SKIP_EMPTY_LINES);
foreach($files as $data) {
if (!preg_match("/.txt/i", $data)) {
$num = floatval($data);
$results[$value] += $num;
} else {
fileProcessor(trim($data));
}
}
} else {
echo 'File does not exist';
}
reset($results);
if (key($results) != $value) {
return;
}
foreach($results as $key => $value) {
echo $key. ' - ' .$value."\n";
}
}
fileProcessor('text_files/first.txt');
Output:
text_files/first.txt - 7
text_files/second.txt - 5
text_files/third.txt - 3
I have this code:
<?php
function generator() {
yield 'First value';
for ($i = 1; $i <= 3; $i++) {
yield $i;
}
}
$gen = generator();
$first = $gen->current();
echo $first . '<br/>';
//$gen->next();
foreach ($gen as $value) {
echo $value . '<br/>';
}
This outputs:
First value
First value
1
2
3
I need the 'First value' to yielding only once. If i uncomment $gen->next() line, fatal error occured:
Fatal error: Uncaught exception 'Exception' with message 'Cannot rewind a generator that was already run'
How can I solve this?
The problem is that the foreach try to reset (rewind) the Generator. But rewind() throws an exception if the generator is currently after the first yield.
So you should avoid the foreach and use a while instead
$gen = generator();
$first = $gen->current();
echo $first . '<br/>';
$gen->next();
while ($gen->valid()) {
echo $gen->current() . '<br/>';
$gen->next();
}
chumkiu's answer is correct. Some additional ideas.
Proposal 0: remaining() decorator.
(This is the latest version I am adding here, but possibly the best)
PHP 7+:
function remaining(\Generator $generator) {
yield from $generator;
}
PHP 5.5+ < 7:
function remaining(\Generator $generator) {
for (; $generator->valid(); $generator->next()) {
yield $generator->current();
}
}
Usage (all PHP versions):
function foo() {
for ($i = 0; $i < 5; ++$i) {
yield $i;
}
}
$gen = foo();
if (!$gen->valid()) {
// Not even the first item exists.
return;
}
$first = $gen->current();
$gen->next();
$values = [];
foreach (remaining($gen) as $value) {
$values[] = $value;
}
There might be some indirection overhead. But semantically this is quite elegant I think.
Proposal 1: for() instead of while().
As a nice syntactic alternative, I propose using for() instead of while() to reduce clutter from the ->next() call and the initialization.
Simple version, without your initial value:
for ($gen = generator(); $gen->valid(); $gen->next()) {
echo $gen->current();
}
With the initial value:
$gen = generator();
if (!$gen->valid()) {
echo "Not even the first value exists.<br/>";
return;
}
$first = $gen->current();
echo $first . '<br/>';
$gen->next();
for (; $gen->valid(); $gen->next()) {
echo $gen->current() . '<br/>';
}
You could put the first $gen->next() into the for() statement, but I don't think this would add much readability.
A little benchmark I did locally (with PHP 5.6) showed that this version with for() or while() with explicit calls to ->next(), current() etc are slower than the implicit version with foreach(generator() as $value).
Proposal 2: Offset parameter in the generator() function
This only works if you have control over the generator function.
function generator($offset = 0) {
if ($offset <= 0) {
yield 'First value';
$offset = 1;
}
for ($i = $offset; $i <= 3; $i++) {
yield $i;
}
}
foreach (generator() as $firstValue) {
print "First: " . $firstValue . "\n";
break;
}
foreach (generator(1) as value) {
print $value . "\n";
}
This would mean that any initialization would run twice. Maybe not desirable.
Also it allows calls like generator(9999) with really high skip numbers. E.g. someone could use this to process the generator sequence in chunks. But starting from 0 each time and then skipping a huge number of items seems really a bad idea performance-wise. E.g. if the data is coming from a file, and skipping means to read + ignore the first 9999 lines of the file.
solutions provided here does not work if you need to iterate more than once.
so I used iterator_to_array function to convert it to array;
$items = iterator_to_array($items);
I have data file with two lines (two lines just for my example, in real, that file can contain millions of lines) and I use SplFileObject and LimitIterator with offseting. But this combination have strange behaviour in some cases:
$offset = 0;
$file = new \SplFileObject($filePath);
$fileIterator = new \LimitIterator($file, $offset, 100);
foreach ($fileIterator as $key => $line) {
echo $key;
}
Output is: 01
But with $offset set to 1, output is blank (foreach doesn't iterate any line).
My data file contain this:
{"generatedAt":1434665322,"numRecords":"1}
{"id":"215255","code":"NB000110"}
What I'm doing wrong?
Thanks
Required:
Use SplFileObject to process a number of records from:
a given start record number
for a given number of records or until EOF.
The issue is that SplFileObject gets confused as regards the last record in the file. This prevents it working correctly in foreach loops.
This code uses the SplFileObject and 'skip records' and 'processes records'. Alas, It cannot use foreach loops.
Skip a number of records from the start of the file ($offset).
Process a given number of records or unit the end of file ($recordsToProccess)
The code:
<?php
$filePath = __DIR__ . '/Q30932555.txt';
// $filePath = __DIR__ . '/Q30932555_1.txt';
$offset = 1;
$recordsToProcess = 100;
$file = new \SplFileObject($filePath);
// skip the records
$file->seek($offset);
$recordsProcessed = 0;
while ( ($file->valid() || strlen($file->current()) > 0)
&& $recordsProcessed < $recordsToProcess
) {
$recordsProcessed++;
echo '<br />', 'current: ', $file->key(), ' ', $file->current();
$file->next();
}
Reading the related PHP bug 65601 suggests adding the READ_AHEAD flag will fix this. Tested and works as you expected it to.
$offset = 0;
$file = new \SplFileObject($filePath);
$file->setFlags(SplFileObject::READ_AHEAD);
$fileIterator = new \LimitIterator($file, $offset, 100);
foreach ($fileIterator as $key => $line) {
echo $key;
}
I have a simple object thing that is able to have children of the same type.
This object has a toHTML method, which does something like:
$html = '<div>' . $this->name . '</div>';
$html .= '<ul>';
foreach($this->children as $child)
$html .= '<li>' . $child->toHTML() . '</li>';
$html .= '</ul>';
return $html;
The problem is that when the object is complex, like lots of children with children with children etc, memory usage skyrockets.
If I simply print_r the multidimensional array that feeds this object I get like 1 MB memory usage, but after I convert the array to my object and do print $root->toHtml() it takes 10 MB !!
How can I fix this?
====================================
Made a simple class that is similar to my real code (but smaller):
class obj{
protected $name;
protected $children = array();
public function __construct($name){
$this->name = $name;
}
public static function build($name, $array = array()){
$obj = new self($name);
if(is_array($array)){
foreach($array as $k => $v)
$obj->addChild(self::build($k, $v));
}
return $obj;
}
public function addChild(self $child){
$this->children[] = $child;
}
public function toHTML(){
$html = '<div>' . $this->name . '</div>';
$html .= '<ul>';
foreach($this->children as $child)
$html .= '<li>' . $child->toHTML() . '</li>';
$html .= '</ul>';
return $html;
}
}
And tests:
$big = array_fill(0, 500, true);
$big[5] = array_fill(0, 200, $big);
print_r($big);
// memory_get_peak_usage() shows 0.61 MB
$root = obj::build('root', $big);
// memory_get_peak_usage() shows 18.5 MB wtf lol
print $root->toHTML();
// memory_get_peak_usage() shows 24.6 MB
The problem is that you're buffering all the data in memory, which you don't actually need to do, as you're just outputting the data, rather than actually processing it.
Rather than buffering everything in memory, if all you want to do is output it you should just output it to wherever it's going to:
public function toHTMLOutput($outputStream){
fwrite($outputStream, '<div>' . $this->name . '</div>';
fwrite($outputStream, '<ul>');
foreach($this->children as $child){
fwrite($outputStream, '<li>');
$child->toHTMLOutput($outputStream);
fwrite($outputStream, '</li>');}
}
fwrite($outputStream, '</ul>');
}
$stdout = fopen('php://stdout', 'w');
print $root->toHTMLOutput($stdout);
or if you want to save the output to a file
$stdout = fopen('htmloutput.html', 'w');
print $root->toHTMLOutput($stdout);
Obviously I've only implemented it for the toHTML() function but the same principle should be done for the build function, which could lead to you skipping a separate toHTML function at all.
Introduction
Since you are sill going to output the HTML there is no need to save it indirectly consuming memory.
Here is a simple class that :
Builds menu from multidimensional array
Memory efficient uses Iterator
Can Write to Socket , Stream , File , array , Iterator etc
Example
$it = new ListBuilder(new RecursiveArrayIterator($big));
// Use Echo
$m = memory_get_peak_usage();
$it->display();
printf("%0.5fMB\n", (memory_get_peak_usage() - $m) / (1024 * 1024));
Output
0.03674MB
Other Output Interfaces
$big = array_fill(0, 500, true);
$big[5] = array_fill(0, 200, $big);
Simple Compare
// Use Echo
$m = memory_get_peak_usage();
$it->display();
$responce['echo'] = sprintf("%0.5fMB\n", (memory_get_peak_usage() - $m) / (1024 * 1024));
// Output to Stream or File eg ( Socket or HTML file)
$m = memory_get_peak_usage();
$it->display(fopen("php://output", "w"));
$responce['stream'] = sprintf("%0.5fMB\n", (memory_get_peak_usage() - $m) / (1024 * 1024));
// Output to ArrayIterator
$m = memory_get_peak_usage();
$it->display($array = new ArrayIterator());
$responce['iterator'] = sprintf("%0.5fMB\n", (memory_get_peak_usage() - $m) / (1024 * 1024));
// Output to Array
$m = memory_get_peak_usage();
$it->display($array = []);
$responce['array'] = sprintf("%0.5fMB\n", (memory_get_peak_usage() - $m) / (1024 * 1024));
echo "\n\nResults \n";
echo json_encode($responce, 128);
Output
Results
{
"echo": "0.03684MB\n",
"stream": "0.00081MB\n",
"iterator": "32.04364MB\n",
"array": "0.00253MB\n"
}
Class Used
class ListBuilder extends RecursiveIteratorIterator {
protected $pad = "\t";
protected $o;
public function beginChildren() {
$this->output("%s<ul>\n", $this->getPad());
}
public function endChildren() {
$this->output("%s</ul>\n", $this->getPad());
}
public function current() {
$this->output("%s<li>%s</li>\n", $this->getPad(1), parent::current());
return parent::current();
}
public function getPad($n = 0) {
return str_repeat($this->pad, $this->getDepth() + $n);
}
function output() {
$args = func_get_args();
$format = array_shift($args);
$var = vsprintf($format, $args);
switch (true) {
case $this->o instanceof ArrayIterator :
$this->o->append($var);
break;
case is_array($this->o) || $this->o instanceof ArrayObject :
$this->o[] = $var;
break;
case is_resource($this->o) && (get_resource_type($this->o) === "file" || get_resource_type($this->o) === "stream") :
fwrite($this->o, $var);
break;
default :
echo $var;
break;
}
}
function display($output = null) {
$this->o = $output;
$this->output("%s<ul>\n", $this->getPad());
foreach($this as $v) {
}
$this->output("%s</ul>\n", $this->getPad());
}
}
Conclusion
As you can see looping with iterator is fast but store values in iterator or object might not be that memory efficient.
Total number of elements in Your array is a little over 100000.
Each element of Your array is just one byte (boolean) so for over 100000 elements it takes 100000bytes ~0.1MB
Each of Your objects is ~100 bytes it is 100*100000 = 100000000 bytes ~ 10MB
But You have ~18MB so where is this 8 from?
If You run this code
<?php
$c = 0; //we use this to count object isntances
class obj{
protected $name;
protected $children = array();
public static $c=0;
public function __construct($name){
global $c;
$c++;
$this->name = $name;
}
public static function build($name, $array = array()){
global $c;
$b = memory_get_usage();
$obj = new self($name);
$diff = memory_get_usage()-$b;
echo $c . ' diff ' . $diff . '<br />'; //display change in allocated size
if(is_array($array)){
foreach($array as $k => $v)
$obj->addChild(self::build($k, $v));
}
return $obj;
}
public function addChild(self $child){
$this->children[] = $child;
}
public function toHTML(){
$html = '<div>' . $this->name . '</div>';
$html .= '<ul>';
foreach($this->children as $child)
$html .= '<li>' . $child->toHTML() . '</li>';
$html .= '</ul>';
return $html;
}
}
$big = array_fill(0, 500, true);
$big[5] = array_fill(0, 200, $big);
$root = obj::build('root', $big);
You will notice a change is constant with exception for objects created as
1024th, 2048th, 4096th...
I don't have link to any article or manual page about it but my guess is that php hold references to each created object in array with initial size of 1024. When You make this array full its size will get doubled to make space for new objects.
If You take difference from for example 2048th object subtract a size of object( the constant value You have in other lines) and divide by 2048 You will always get 32 - standard size of pointer in C.
So for 100000 objects this array grown to size of 131072 elements.
131072*32 = 4194304B = 4MB
This calculation are just approximate but I think it answers Your question what takes so much memory.
To answer how to keep memory low - avoid using objects for large set of data.
Obviously objects are nice and stuff but primitive data types are faster and smaller.
Maybe You can make it work with one object containing array with data. Hard to propose any alternative without more info about this objects and what methods/interface they require.
One thing that might be catching you is that you might be getting close to blowing your stack because of recursion. It might make sense in this case to create a rendering function that deals with the tree as a whole to render instead of relying on recursion to do the rendering for you. For informative topics on this see tail call recursion and tail call optimization.
To stick with your code's current structure and dodge a lot of the resource problems that you are likely facing the simplest solution may be to simply pass in the html string as a reference like:
class obj{
protected $name;
protected $children = array();
public function __construct($name){
$this->name = $name;
}
public static function build($name, $array = array()){
$obj = new self($name);
if(is_array($array)){
foreach($array as $k => $v)
$obj->addChild(self::build($k, $v));
}
return $obj;
}
public function addChild(self $child){
$this->children[] = $child;
}
public function toHTML(&$html = ""){
$html .= '<div>' . $this->name . '</div>';
$html .= '<ul>';
foreach($this->children as $child){
$html .= '<li>';
$html .= $child->toHTML($html);
$html .= '</li>';
}
$html .= '</ul>';
}
}
This will keep you from hauling around a bunch of duplicate partial tree renders while the recursive calls are resolving.
As for the actual build of the tree I think a lot of the memory usage is just the price of playing with data that big, your options there are either render instead of building up a hierarchical model just to render (just render output instead of building a tree) or, to employ some sort of caching strategies to either cache copies of the object tree or copies of the rendered html depending on how the data is used within your site. If you have control of the inbound data invalidating relevant cache keys can be added to that work flow to keep the cache from getting stale.
Here's something simple for someone to answer for me. I've tried searching but I don't know what I'm looking for really.
I have an array from a JSON string, in PHP, of cast and crew members for a movie.
Here I am pulling out only the people with the job name 'Actor'
foreach ($movies[0]->cast as $cast) {
if ($cast->job == 'Actor') {
echo '<p>' . $cast->name . ' - ' . $cast->character . '</p>';
}
}
The problem is, I would like to be able to limit how many people with the job name 'Actor' are pulled out. Say, the first 3.
So how would I pick only the first 3 of these people from this array?
OK - this is a bit of over-kill for this problem, but perhaps it serves some educational purposes. PHP comes with a set of iterators that may be used to abstract iteration over a given set of items.
class ActorIterator extends FilterIterator {
public function accept() {
return $this->current()->job == 'Actor';
}
}
$maxCount = 3;
$actors = new LimitIterator(
new ActorIterator(
new ArrayIterator($movies[0]->cast)
),
0,
$maxCount
);
foreach ($actors as $actor) {
echo /*... */;
}
By extending the abstract class FilterIterator we are able to define a filter that returns only the actors from the given list. LimitIterator allows you to limit the iteration to a given set and the ArrayIterator is a simple helper to make native arrays compatible with the Iterator interface. Iterators allow the developer to build chains that define the iteration process which makes them extremely flexible and powerful.
As I said in the introduction: the given problem can be solved easily without this Iterator stuff, but it provides the developer with some extended options and enables code-reuse. You could, for example, enhance the ActorIterator to some CastIterator that allows you to pass the cast type to filter for in the constructor.
Use a variable called $num_actors to track how many you've already counted, and break out of the loop once you get to 3.
$num_actors = 0;
foreach ( $movies[0]->cast as $cast ) {
if ( $cast->job == 'Actor' ) {
echo '...';
$num_actors += 1;
if ( $num_actors == 3 )
break;
}
}
$actors=array_filter($movies[0]->cast, function ($v) {
return $v->job == 'Actor';
});
$first3=array_slice($actors, 0, 3);
or even
$limit=3;
$actors=array_filter($movies[0]->cast, function ($v) use (&$limit) {
if ($limit>0 && $v->job == 'Actor') {
$limit--;
return true;
}
return false;
});
Add a counter and an if statement.
$count = 0;
foreach ($movies[0]->cast as $cast)
{
if ($cast->job == 'Actor')
{
echo '<p>' . $cast->name . ' - ' . $cast-character . '</p>';
if($count++ >= 3)
break;
}
}
$limit = 3;
$count = 0;
foreach ($movies[0]->cast as $cast) {
// You can move the code up here if all you're getting is Actors
if ($cast->job == 'Actor') {
if ($count == $limit) break;// stop the loop
if ($count == $limit) continue;// OR move to next item in loop
$count++;
echo '<p><a href="people.php?id='
. $cast->id
. '">'
. $cast->name
. ' - '
. $cast->character
. '</a></p>';
}
}