I'm just getting started with PHPExcel. My very large spreadsheets cannot be loaded whole into memory (memory fail). To only load the parts of the worksheet I need, I'm trying to use the MyReadFilter code that was provided in the documentation, but the code is a bit above me and I'm hoping someone can help me understand it.
From the PHPExcel documentation, here's the function:
class ReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
private $_columns = array();
/** Get the list of rows and columns to read */
public function __construct($startRow, $endRow, $columns) {
$this->_startRow = $startRow;
$this->_endRow = $endRow;
$this->_columns = $columns;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the rows and columns that were configured
if ($row >= $this->_startRow && $row <= $this->_endRow) {
if (in_array($column,$this->_columns)) {
return true;
}
}
return false;
}
}
I'm using the following lines to invoke PHPExcel
// Get the selected Excel file, passed from form
$testFile = $_FILES['upload_test']['tmp_name'];
// Identify the file type of the selected file
$inputFileType = PHPExcel_IOFactory::identify($testFile);
// Create a reader object of the correct file type
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
// Instantiate the filter class and apply it to the reader object
$filterSubset = new ReadFilter(1,1000,range('A','Z'));
$objReader->setReadFilter($filterSubset);
// Load the selected file into the reader
$objWorkbook = $objReader->load($testFile);
I am retrieving data from the resulting worksheet object using this syntax:
$someValue= $objWorkbook->getSheet($idx)->getCell('B11')->getCalculatedValue();
I'm sure I'll have other questions as I go, but my initial one is about invoking the function. If I change the above line from:
$filterSubset = new ReadFilter(1,1000,range('A','Z'));
to:
$filterSubset = new ReadFilter(1,1000,range('A','AA')); //Last column changed
the entire read fails. I actually only need calculated values from column B, but that column has references as far over as column AS, so I need to read them as well. Can someone please tell me how to use this function to read past column Z, or to modify it? Ideally, what I'd like is to just read the contents of about a dozen columns spread out from B to AS, but I can't figure that out either.
Thanks much for any help.
range('A','AA'); is not valid try creating your own custom range
Example
echo "<pre>";
print_r(xrange('AA', 'ZZ'));
Function Used
function xrange($start, $end, $limit = 1000) {
$l = array();
while ($start !== $end && count($l) < $limit) {
$l[] = $start;
$start ++;
}
$l[] = $end;
return $l;
}
See Live Demo
range('A','AA) isn't a valid range.... PHP's range function doesn't assume that AA follows Z. Try using column numbers instead, using PHPExcel's columnIndexFromString() and stringFromColumnIndex() static methods in the PHPExcel_Cell class to convert 27 to AA and vice versa (watch out for the base value 0 or 1 though).
Related
I have a file-uploading-processor with array output. How I can add this array to database?
That's my processor's code:
class ModCLJsonUploadProcessor extends modProcessor {
public $languageTopics = ['modcl'];
public function process() {
$file = fopen($_FILES['json-file']['tmp_name'], 'r');
$json = fread($file, $_FILES['json-file']['size']);
$objs = json_decode($json);
$english = array();
for ($i = 0; $i < count($objs); $i++) {
$english[$i] = $objs[$i]->{'name'};
}
return $this->success($english);
}
}
return 'ModCLJsonUploadProcessor';
I tried to use the native modObjectCreateProcessor, but it does not support arrays.
In order to save items into the database using a processor, the best practice would be to create a schema and save the given data as objects in the database. This allows you to then later retreive the items (objects) in other processors or snippets using xPDO.
For a basic explenation/how to on creating a schema and interacting with the custom objects have a look at the "Developing an extra in MODX Revolution" doc here
I am trying to do a very simple but numerous iterations task. I choose 7 random serial numbers from an array of 324000 serial numbers and place them in another array and then search that array to see if a particular number is within it, execute another script and fwrite out how many times the looked for number is in the array.
This goes fairly fast in single thread. But when I put it in pthreads, even one single pthread running is 100x slower than single thread. The workers are not sharing any resources (i.e. the grab all info from their own folders and write info to their own folders)..fwrite bottlenecks is not the problem. The problem is with the arrays which I note below. Am I running into a cache line problem, where the arrays although they have separate variables are still sharing the same cache line? Sigh...much appreciate your help, in figuring out why the arrays are slowing it to a crawl.
<?php
class WorkerThreads extends Thread
{
private $workerId;
private $linesId;
private $linesId2;
private $c2_result;
private $traceId;
public function __construct($id,$newlines,$newlines2,$xxtrace)
{
$this->workerId = $id;
$this->linesId = (array) $newlines;
$this->linesId2 = (array) $newlines2;
$this->traceId = $xxtrace;
$this->c2_result= (array) array();
}
public function run()
{
for($h=0; $h<90; $h++) {
$fp42=fopen("/folder/".$this->workerId."/count.txt","w");
for($master=0; $master <200; $master++) {
// *******PROBLEM IS IN THE <3000 loop -very slow***********
$b=0;
for($a=0; $a<3000; $a++) {
$zex=0;
while($zex != 1) {
$this->c2_result[0]=$this->linesId[rand(0,324631)];
$this->c2_result[1]=$this->linesId[rand(0,324631)];
$this->c2_result[2]=$this->linesId[rand(0,324631)];
$this->c2_result[3]=$this->linesId[rand(0,324631)];
$this->c2_result[4]=$this->linesId[rand(0,324631)];
$this->c2_result[5]=$this->linesId[rand(0,324631)];
$this->c2_result[6]=$this->linesId[rand(0,324631)];
if(count(array_flip($this->c2_result)) != count($this->c2_result)) { //echo "duplicates\n";
$zex=0;
} else { //echo "no duplicates\n";
$zex=1;
//exit;
}
}
// *********PROBLEM here too !in_array statement, slowing down******
if(!in_array($this->linesId2[$this->traceId],$this->c2_result)) {
//fwrite($fp4,"nothere\n");
$b++;
}
}
fwrite($fp42,$b."\n");
}
fclose($fp42);
$mainfile3="/folder/".$this->workerId."/count_pthread.php";
$command="php $mainfile3 $this->workerId";
exec($command);
}
}
}
$xxTrack=0;
$lines = range(0, 324631);
for($x=0; $x<56; $x++) {
$workers = [];
// Initialize and start the threads
foreach (range(0, 8) as $i) {
$workers[$i] = new WorkerThreads($i,$lines,$lines2,$xxTrack);
$workers[$i]->start();
$xxTrack++;
}
// Let the threads come back
foreach (range(0, 8) as $i) {
$workers[$i]->join();
}
unset($workers);
}
UPDATED CODE
I was able to speed up the original code by 6x times with help from #tpunt suggestions. Most importantly what I learned is that the code is being slowed down by the calls to rand(). If I could get rid of that, then speed time would be 100x faster. array_rand,mt_rand() and shuffle() are even slower. Here is the new code:
class WorkerThreads extends Thread
{
private $workerId;
private $c2_result;
private $traceId;
private $myArray;
private $myArray2;
public function __construct($id,$xxtrace)
{
$this->workerId = $id;
$this->traceId = $xxtrace;
$c2_result=array();
}
public function run()
{
////////////////////THE WORK TO BE DONE/////////////////////////
$lines = file("/fold/considers.txt",FILE_IGNORE_NEW_LINES);
$lines2= file("/fold/considers.txt",FILE_IGNORE_NEW_LINES);
shuffle($lines2);
$fp42=fopen("/fold/".$this->workerId."/count.txt","w");
for($h=0; $h<90; $h++) {
fseek($fp42, 0);
for($master=0; $master <200; $master++) {
$b=0;
for($a=0; $a<3000; $a++) {
$zex=0;
$myArray = [];
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
while (count($myArray) !== 7) {
$myArray[rand(0,324631)] = true;
}
if (!isset($myArray[$lines2[$this->traceId]])) {
$b++;
}
}
fwrite($fp42,$b."\n");
}
$mainfile3="/newfolder/".$this->workerId."/pthread.php";
$command="php $mainfile3 $this->workerId";
exec($command);
}//END OF H LOOP
fclose($fp42);
}
}
$xxTrack=0;
$p = new Pool(5);
for($b=0; $b<56; $b++) {
$tasks[$b]= new WorkerThreads($b,$xxTrack);
$xxTrack++;
}
// Add tasks to pool queue
foreach ($tasks as $task) {
$p->submit($task);
}
// shutdown will wait for current queue to be completed
$p->shutdown();
Your code is just incredibly inefficient. There are also a number of problems with it - I've made a quick breakdown of some of these things below.
Firstly, you are spinning up over 500 threads (9 * 56 = 504). This is going to be very slow because threading in PHP requires a shared-nothing architecture. This means that a new instance of PHP's interpreter will need to be created for each thread you create, where all classes, interfaces, traits, functions, etc, will need to be copied over to the new interpreter instance.
Perhaps more to the point, though, is that your 3 nested for loops are performing 54 million iterations (90 * 200 * 3000). Multiply this by the 504 threads being created, and you can soon see why things are becoming sluggish. Instead, use a thread pool (see pthreads' Pool class) with a more modest amount of threads (try 8, and go from there), and cut down on the iterations being performed per thread.
Secondly, you are opening up a file 90 times per thread (so a total of 90 * 504 = 45360). You only need one file handler per thread.
Thirdly, utilising actual PHP arrays inside of Threaded objects makes them read-only. So with respect to the $this->c2_result property, the code inside of your nested while loop should not even work. Not to mention that the following check does not look for duplicates:
if(count(array_flip($this->c2_result)) != count($this->c2_result))
If you avoid casting the $this->c2_result property to an array (therefore making it a Volatile object), then the following code could instead replace your while loop:
$keys = array_rand($this->linesId, 7);
for ($i = 0; $i < 7; ++$i) {
$this->c2_result[$this->linesId[$keys[$i]]] = true;
}
By setting the values as the keys in $this->c2_result we can remove the subsequent in_array function call to search through the $this->c2_result. This is done by utilising a PHP array as a hash table, where the lookup time for a key is constant time (O(1)), rather than linear time required when searching for values (with in_array). This enables us to replace the following slow check:
if(!in_array($this->linesId2[$this->traceId],$this->c2_result))
with the following fast check:
if (!isset($this->c2_result[$this->linesId2[$this->traceId]]))
But with that said, you don't seem to be using the $this->c2_result property anywhere else. So (assuming you haven't purposefully redacted code that uses it), you could remove it altogether and simply replace the while loop at check after it with the following:
$found = false;
foreach (array_rand($this->linesId, 7) as $key) {
if ($this->linesId[$key] === $this->linesId2[$this->traceId]) {
$found = true;
break;
}
}
if (!$found) {
++$b;
}
Beyond the above, you could also look at storing the data you're collecting in-memory (as some property on the Threaded object), to prevent expensive disk writes. The results could be aggregated at the end, before shutting down the pool.
Update based up your update
You've said that the rand function is causing major slowdown. Whilst it may be part of the problem, I believe it is actually all of the code inside of your third nested for loop. The code inside there is very hot code, because it gets executed 54 million times. I suggested above that you replace the following code:
$zex=0;
while($zex != 1) {
$c2_result[0]=$lines[rand(0,324631)];
$c2_result[1]=$lines[rand(0,324631)];
$c2_result[2]=$lines[rand(0,324631)];
$c2_result[3]=$lines[rand(0,324631)];
$c2_result[4]=$lines[rand(0,324631)];
$c2_result[5]=$lines[rand(0,324631)];
$c2_result[6]=$lines[rand(0,324631)];
$myArray = (array) $c2_result;
$myArray2 = (array) $c2_result;
$myArray=array_flip($myArray);
if(count($myArray) != count($c2_result)) {//echo "duplicates\n";
$zex=0;
} else {//echo "no duplicates\n";
$zex=1;
//exit;
}
}
if(!in_array($lines2[$this->traceId],$myArray2)) {
$b++;
}
with a combination of array_rand and foreach. Upon some initial tests, it turns out that array_rand really is outstandingly slow. But my hash table solution to replace the in_array invocation still holds true. By leveraging a PHP array as a hash table (basically, store values as keys), we get a constant time lookup performance (O(1)), as opposed to a linear time lookup (O(n)).
Try replacing the above code with the following:
$myArray = [];
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
while (count($myArray) !== 7) {
$myArray[rand(0,324631)] = true;
}
if (!isset($myArray[$lines2[$this->traceId]])) {
$b++;
}
For me, this resulted in a 120% speedup.
As for further performance, you can (as mentioned above, again) store the results in-memory (as a simple property) and perform a write of all results at the end of the run method.
Also, the garbage collector for pthreads is not deterministic. It should therefore not be used to retrieve data. Instead, a Threaded object should be injected into the worker thread, where data to be collected should be saved to this object. Lastly, you should shutdown the pool after garbage collection (which, again, should not be used in your case).
despite it is unclear about your code and what $newlines and $newlines2 are, so, I am just guessing here...
something like this ?
The idea is to avoid as much as possible fopen and fwrite in your loop.
1 - open it only once in the construct.
2 - concat your chain in your loop.
3 - write it only once after the loop.
class WorkerThreads extends Thread {
private $workerId;
private $linesId;
private $linesId2;
private $c2_result;
private $traceId;
private $fp42;
private $mainfile3;
public function __construct($id, $newlines, $newlines2, $xxtrace) {
$this->workerId = $id;
$this->linesId = (array) $newlines;
$this->linesId2 = (array) $newlines2;
$this->traceId = $xxtrace;
$this->c2_result = array();
$this->fp42 = fopen("/folder/" . $id . "/count.txt", "w");
$this->mainfile3 = "/folder/" . $id . "/count_pthread.php";
}
public function run() {
for ($h = 0; $h < 90; $h++) {
$globalf42='';
for ($master = 0; $master < 200; $master++) {//<200
$b = 0;
for ($a = 0; $a < 3000; $a++) {
$zex = 0;
if ($zex != 1) {
for ($ii = 0; $ii < 6; $ii++) {
$this->c2_result[$ii] = $this->linesId[rand(0, 324631)];
}
$zex = (count(array_flip($this->c2_result)) != count($this->c2_result)) ? 0 : 1;
}
if (!in_array($this->linesId2[$this->traceId], $this->c2_result)) {
$b++;
}
}
$globalf42 .= $b . "\n";
}
fwrite($this->fp42, $globalf42);
fclose($this->fp42);
$command = "php $this->mainfile3 $this->workerId";
exec($command);
}
}
}
I am getting "Fatal error: Allowed memory size of XXXX bytes exhausted...". I need to iterate through big amount of records, and execute a function to verify of the record fit the criteria which declare many class variables.
foreach ($results as $row)
{
$location = Location::parseDatabaseRow($row);
if ($location->contains($lat, $lon))
{
$found = true;
$locations[] = $location;
break;
}
}
Implementation of the Location class:
public function contains($lat, $lon)
{
$polygon =& new polygon();
.... //Add points to polygons base on location polygons
$vertex =& new vertex($lat, $lon);
$isContain = $polygon->isInside($vertex);
$polygon->res(); //Reset all variable inside polygons
$polygon = null; //Let Garbage Collector clear it whenever
return ($isContain);
}
Shouldn't the $polygon be clear when contain() method is return? What can I do to reduce memory usage?
I am a Java developer, and started to learn PHP. Please help me understand on how to manage stack size and memory allocation and delocation. Thanks in advance.
I am a Java developer, and started to learn PHP.
Here are some corrections which may allow your code to not exhaust memory limit.
use a while. Since your result comes from a database query, you should have the possibility to use fetch() instead of fetchAll() which I assume you are using since you are applying a foreach() on it.
while ($row = $result->fetch()) { // here $result is supposed to be a PDOStatatement.
$location = Location::parseDatabaseRow($row);
if ($location->contains($lat, $lon)) {
$found = true; // where is this used?
$locations[] = $location;
break;
}
}
While uses less memory because not all results are fetched at the same time.
use the ampersand the right way. You are doing a new in each loop. The ampersand is used when you want to pass a value by reference to a function, so that it is affected outside that function scope without the need to return it.
Here, you are using objects, which are somewhat passed by reference by design.
public function contains($lat, $lon) {
$polygon = new polygon();
$vertex = new vertex($lat, $lon);
return $polygon->isInside($vertex);
// no need to reset the values of your polygon, you will be creating a new one on the next loop.
}
for completeness sake here a version using the same polygon object. Notice how I do not use an ampersand because we are passing an object.
$polygon = new polygon();
while ($row = $result->fetch()) { // here $result is supposed to be a PDOStatatement.
$location = Location::parseDatabaseRow($row);
if ($location->contains($lat, $lon, $polygon)) {
$found = true; // where is this used?
$locations[] = $location;
break;
}
}
public function contains($lat, $lon, $polygon) {
//Add points to the passed polygon
$vertex = new vertex($lat, $lon);
$isContain = $polygon->isInside($vertex);
$polygon->res();
// since we eill be using the same $polygon, now we need to reset it
return $isContain;
}
So I'm trying to write a function that does the following: I have about 20 or so XML files (someday I will have over a hundred) and in the header of each file is the name of a person who was a peer review editor <editor role="PeerReviewEditor">John Doe</editor>. I want to run through the directory where these files are stored and capture the name of the Peer-Review-Editor for that file. I want to end up with an variable $reviewEditorNames that contains all of the different names. (I will then use this to display a list of editors, etc.)
Here's what I've got so far. I'm worried about the last part. I feel like the attempt to turn $editorReviewName into $editorReviewNames is not going to combine the individuals for each file, but an array found within a given file (even if there is only one name in a given file, and thus it is an array of 1)
I'm grateful for your help.
function editorlist()
{
$filename = readDirectory('../editedtranscriptions');
foreach($filename as $file)
{
$xmldoc = simplexml_load_file("../editedtranscriptions/$file");
$xmldoc->registerXPathNamespace("tei", "http://www.tei-c.org/ns/1.0");
$reviewEditorName = $xmldoc->xpath("//tei:editor[#role='PeerReviewEditor']");
return $reviewEditorNames[] = $reviewEditorName;
}
}
I would put things more apart, that helps as well when you need to change your code later on.
Next to that, you need to check the return of the xpath, most likely you want to process only the first match (is there one editor per file?) and you want to return it as string.
If you put things into functions of it's own it's more easy to make a function to only do one thing and so it's easier to debug and improve things. E.g. you can first test if a editorFromFile function does what it should and then run it on multiple files:
/**
* get PeerReviewEditor from file
*
* #param string $file
* #return string
*/
function editorFromFile($file)
{
$xmldoc = simplexml_load_file($file);
$xmldoc->registerXPathNamespace("tei", "http://www.tei-c.org/ns/1.0");
$node = $xmldoc->xpath("//tei:editor[#role='PeerReviewEditor'][1]");
return (string) $node[0];
}
/**
* get editors from a path
*
* #param string $path
* #return array
*/
function editorlist($path)
{
$editors = array();
$files = glob(sprintf('%s/*.xml', $path), GLOB_NOSORT);
foreach($files as $file)
{
$editors[] = editorFromFile($file);
}
return $editors;
}
Just a little update:
function editorlist() {
$reviewEditorNames = array(); // init the array
$filename = readDirectory('../editedtranscriptions');
foreach($filename as $file) {
$xmldoc = simplexml_load_file("../editedtranscriptions/$file");
$xmldoc->registerXPathNamespace("tei", "http://www.tei-c.org/ns/1.0");
// add to the array
$result = $xmldoc->xpath("//tei:editor[#role='PeerReviewEditor']");
if (sizeof($result) > 0) {
$reviewEditorNames[] = (string)$result[0];
}
}
// return the array
return $reviewEditorNames;
}
Is there a better/simpler way to find the number of images in a directory and output them to a variable?
function dirCount($dir) {
$x = 0;
while (($file = readdir($dir)) !== false) {
if (isImage($file)) {$x = $x + 1}
}
return $x;
}
This seems like such a long way of doing this, is there no simpler way?
Note: The isImage() function returns true if the file is an image.
Check out the Standard PHP Library (aka SPL) for DirectoryIterator:
$dir = new DirectoryIterator('/path/to/dir');
foreach($dir as $file ){
$x += (isImage($file)) ? 1 : 0;
}
(FYI there is an undocumented function called iterator_count() but probably best not to rely on it for now I would imagine. And you'd need to filter out unseen stuff like . and .. anyway.)
This will give you the count of what is in your dir. I'll leave the part about counting only images to you as I am about to fallll aaasssllleeelppppppzzzzzzzzzzzzz.
iterator_count(new DirectoryIterator('path/to/dir/'));
i do it like this:
$files = scandir($dir);
$x = count($files);
echo $x;
but it also counts the . and ..
The aforementioned code
$count = count(glob("*.{jpg,png,gif,bmp}"));
is your best best, but the {jpg,png,gif} bit will only work if you append the GLOB_BRACE flag on the end:
$count = count(glob("*.{jpg,png,gif,bmp}", GLOB_BRACE));
you could use glob...
$count = 0;
foreach (glob("*.*") as $file) {
if (isImage($file)) ++$count;
}
or, I'm not sure how well this would suit your needs, but you could do this:
$count = count(glob("*.{jpg,png,gif,bmp}"));
You could also make use of the SPL to filter the contents of a DirectoryIterator using your isImage function by extending the abstract FilterIterator class.
class ImageIterator extends FilterIterator {
public function __construct($path)
{
parent::__construct(new DirectoryIterator($path));
}
public function accept()
{
return isImage($this->getInnerIterator());
}
}
You could then use iterator_count (or implement the Countable interface and use the native count function) to determine the number of images. For example:
$images = new ImageIterator('/path/to/images');
printf('Found %d images!', iterator_count($images));
Using this approach, depending on how you need to use this code, it might make more sense to move the isImage function into the ImageIterator class to have everything neatly wrapped up in one place.
I use the following to get the count for all types of files in one directory in Laravel
$dir = public_path('img/');
$files = glob($dir . '*.*');
if ( $files !== false )
{
$total_count = count( $files );
return $totalCount;
}
else
{
return 0;
}
Your answer seems about as simple as you can get it. I can't think of a shorter way to it in either PHP or Perl.
You might be able to a system / exec command involving ls, wc, and grep if you are using Linux depending how complex isImage() is.
Regardless, I think what you have is quite sufficient. You only have to write the function once.
I use this to return a count of ALL files in a directory except . and ..
return count(glob("/path/to/file/[!\.]*"));
Here is a good list of glob filters for file matching purposes.
$nfiles = glob("/path/to/file/[!\\.]*");
if ($nfiles !== FALSE){
return count($nfiles);
} else {
return 0;
}