PHP Pthread Multithread code slower than single thread - php

I am trying to do a very simple but numerous iterations task. I choose 7 random serial numbers from an array of 324000 serial numbers and place them in another array and then search that array to see if a particular number is within it, execute another script and fwrite out how many times the looked for number is in the array.
This goes fairly fast in single thread. But when I put it in pthreads, even one single pthread running is 100x slower than single thread. The workers are not sharing any resources (i.e. the grab all info from their own folders and write info to their own folders)..fwrite bottlenecks is not the problem. The problem is with the arrays which I note below. Am I running into a cache line problem, where the arrays although they have separate variables are still sharing the same cache line? Sigh...much appreciate your help, in figuring out why the arrays are slowing it to a crawl.
<?php
class WorkerThreads extends Thread
{
private $workerId;
private $linesId;
private $linesId2;
private $c2_result;
private $traceId;
public function __construct($id,$newlines,$newlines2,$xxtrace)
{
$this->workerId = $id;
$this->linesId = (array) $newlines;
$this->linesId2 = (array) $newlines2;
$this->traceId = $xxtrace;
$this->c2_result= (array) array();
}
public function run()
{
for($h=0; $h<90; $h++) {
$fp42=fopen("/folder/".$this->workerId."/count.txt","w");
for($master=0; $master <200; $master++) {
// *******PROBLEM IS IN THE <3000 loop -very slow***********
$b=0;
for($a=0; $a<3000; $a++) {
$zex=0;
while($zex != 1) {
$this->c2_result[0]=$this->linesId[rand(0,324631)];
$this->c2_result[1]=$this->linesId[rand(0,324631)];
$this->c2_result[2]=$this->linesId[rand(0,324631)];
$this->c2_result[3]=$this->linesId[rand(0,324631)];
$this->c2_result[4]=$this->linesId[rand(0,324631)];
$this->c2_result[5]=$this->linesId[rand(0,324631)];
$this->c2_result[6]=$this->linesId[rand(0,324631)];
if(count(array_flip($this->c2_result)) != count($this->c2_result)) { //echo "duplicates\n";
$zex=0;
} else { //echo "no duplicates\n";
$zex=1;
//exit;
}
}
// *********PROBLEM here too !in_array statement, slowing down******
if(!in_array($this->linesId2[$this->traceId],$this->c2_result)) {
//fwrite($fp4,"nothere\n");
$b++;
}
}
fwrite($fp42,$b."\n");
}
fclose($fp42);
$mainfile3="/folder/".$this->workerId."/count_pthread.php";
$command="php $mainfile3 $this->workerId";
exec($command);
}
}
}
$xxTrack=0;
$lines = range(0, 324631);
for($x=0; $x<56; $x++) {
$workers = [];
// Initialize and start the threads
foreach (range(0, 8) as $i) {
$workers[$i] = new WorkerThreads($i,$lines,$lines2,$xxTrack);
$workers[$i]->start();
$xxTrack++;
}
// Let the threads come back
foreach (range(0, 8) as $i) {
$workers[$i]->join();
}
unset($workers);
}
UPDATED CODE
I was able to speed up the original code by 6x times with help from #tpunt suggestions. Most importantly what I learned is that the code is being slowed down by the calls to rand(). If I could get rid of that, then speed time would be 100x faster. array_rand,mt_rand() and shuffle() are even slower. Here is the new code:
class WorkerThreads extends Thread
{
private $workerId;
private $c2_result;
private $traceId;
private $myArray;
private $myArray2;
public function __construct($id,$xxtrace)
{
$this->workerId = $id;
$this->traceId = $xxtrace;
$c2_result=array();
}
public function run()
{
////////////////////THE WORK TO BE DONE/////////////////////////
$lines = file("/fold/considers.txt",FILE_IGNORE_NEW_LINES);
$lines2= file("/fold/considers.txt",FILE_IGNORE_NEW_LINES);
shuffle($lines2);
$fp42=fopen("/fold/".$this->workerId."/count.txt","w");
for($h=0; $h<90; $h++) {
fseek($fp42, 0);
for($master=0; $master <200; $master++) {
$b=0;
for($a=0; $a<3000; $a++) {
$zex=0;
$myArray = [];
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
while (count($myArray) !== 7) {
$myArray[rand(0,324631)] = true;
}
if (!isset($myArray[$lines2[$this->traceId]])) {
$b++;
}
}
fwrite($fp42,$b."\n");
}
$mainfile3="/newfolder/".$this->workerId."/pthread.php";
$command="php $mainfile3 $this->workerId";
exec($command);
}//END OF H LOOP
fclose($fp42);
}
}
$xxTrack=0;
$p = new Pool(5);
for($b=0; $b<56; $b++) {
$tasks[$b]= new WorkerThreads($b,$xxTrack);
$xxTrack++;
}
// Add tasks to pool queue
foreach ($tasks as $task) {
$p->submit($task);
}
// shutdown will wait for current queue to be completed
$p->shutdown();

Your code is just incredibly inefficient. There are also a number of problems with it - I've made a quick breakdown of some of these things below.
Firstly, you are spinning up over 500 threads (9 * 56 = 504). This is going to be very slow because threading in PHP requires a shared-nothing architecture. This means that a new instance of PHP's interpreter will need to be created for each thread you create, where all classes, interfaces, traits, functions, etc, will need to be copied over to the new interpreter instance.
Perhaps more to the point, though, is that your 3 nested for loops are performing 54 million iterations (90 * 200 * 3000). Multiply this by the 504 threads being created, and you can soon see why things are becoming sluggish. Instead, use a thread pool (see pthreads' Pool class) with a more modest amount of threads (try 8, and go from there), and cut down on the iterations being performed per thread.
Secondly, you are opening up a file 90 times per thread (so a total of 90 * 504 = 45360). You only need one file handler per thread.
Thirdly, utilising actual PHP arrays inside of Threaded objects makes them read-only. So with respect to the $this->c2_result property, the code inside of your nested while loop should not even work. Not to mention that the following check does not look for duplicates:
if(count(array_flip($this->c2_result)) != count($this->c2_result))
If you avoid casting the $this->c2_result property to an array (therefore making it a Volatile object), then the following code could instead replace your while loop:
$keys = array_rand($this->linesId, 7);
for ($i = 0; $i < 7; ++$i) {
$this->c2_result[$this->linesId[$keys[$i]]] = true;
}
By setting the values as the keys in $this->c2_result we can remove the subsequent in_array function call to search through the $this->c2_result. This is done by utilising a PHP array as a hash table, where the lookup time for a key is constant time (O(1)), rather than linear time required when searching for values (with in_array). This enables us to replace the following slow check:
if(!in_array($this->linesId2[$this->traceId],$this->c2_result))
with the following fast check:
if (!isset($this->c2_result[$this->linesId2[$this->traceId]]))
But with that said, you don't seem to be using the $this->c2_result property anywhere else. So (assuming you haven't purposefully redacted code that uses it), you could remove it altogether and simply replace the while loop at check after it with the following:
$found = false;
foreach (array_rand($this->linesId, 7) as $key) {
if ($this->linesId[$key] === $this->linesId2[$this->traceId]) {
$found = true;
break;
}
}
if (!$found) {
++$b;
}
Beyond the above, you could also look at storing the data you're collecting in-memory (as some property on the Threaded object), to prevent expensive disk writes. The results could be aggregated at the end, before shutting down the pool.
Update based up your update
You've said that the rand function is causing major slowdown. Whilst it may be part of the problem, I believe it is actually all of the code inside of your third nested for loop. The code inside there is very hot code, because it gets executed 54 million times. I suggested above that you replace the following code:
$zex=0;
while($zex != 1) {
$c2_result[0]=$lines[rand(0,324631)];
$c2_result[1]=$lines[rand(0,324631)];
$c2_result[2]=$lines[rand(0,324631)];
$c2_result[3]=$lines[rand(0,324631)];
$c2_result[4]=$lines[rand(0,324631)];
$c2_result[5]=$lines[rand(0,324631)];
$c2_result[6]=$lines[rand(0,324631)];
$myArray = (array) $c2_result;
$myArray2 = (array) $c2_result;
$myArray=array_flip($myArray);
if(count($myArray) != count($c2_result)) {//echo "duplicates\n";
$zex=0;
} else {//echo "no duplicates\n";
$zex=1;
//exit;
}
}
if(!in_array($lines2[$this->traceId],$myArray2)) {
$b++;
}
with a combination of array_rand and foreach. Upon some initial tests, it turns out that array_rand really is outstandingly slow. But my hash table solution to replace the in_array invocation still holds true. By leveraging a PHP array as a hash table (basically, store values as keys), we get a constant time lookup performance (O(1)), as opposed to a linear time lookup (O(n)).
Try replacing the above code with the following:
$myArray = [];
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
while (count($myArray) !== 7) {
$myArray[rand(0,324631)] = true;
}
if (!isset($myArray[$lines2[$this->traceId]])) {
$b++;
}
For me, this resulted in a 120% speedup.
As for further performance, you can (as mentioned above, again) store the results in-memory (as a simple property) and perform a write of all results at the end of the run method.
Also, the garbage collector for pthreads is not deterministic. It should therefore not be used to retrieve data. Instead, a Threaded object should be injected into the worker thread, where data to be collected should be saved to this object. Lastly, you should shutdown the pool after garbage collection (which, again, should not be used in your case).

despite it is unclear about your code and what $newlines and $newlines2 are, so, I am just guessing here...
something like this ?
The idea is to avoid as much as possible fopen and fwrite in your loop.
1 - open it only once in the construct.
2 - concat your chain in your loop.
3 - write it only once after the loop.
class WorkerThreads extends Thread {
private $workerId;
private $linesId;
private $linesId2;
private $c2_result;
private $traceId;
private $fp42;
private $mainfile3;
public function __construct($id, $newlines, $newlines2, $xxtrace) {
$this->workerId = $id;
$this->linesId = (array) $newlines;
$this->linesId2 = (array) $newlines2;
$this->traceId = $xxtrace;
$this->c2_result = array();
$this->fp42 = fopen("/folder/" . $id . "/count.txt", "w");
$this->mainfile3 = "/folder/" . $id . "/count_pthread.php";
}
public function run() {
for ($h = 0; $h < 90; $h++) {
$globalf42='';
for ($master = 0; $master < 200; $master++) {//<200
$b = 0;
for ($a = 0; $a < 3000; $a++) {
$zex = 0;
if ($zex != 1) {
for ($ii = 0; $ii < 6; $ii++) {
$this->c2_result[$ii] = $this->linesId[rand(0, 324631)];
}
$zex = (count(array_flip($this->c2_result)) != count($this->c2_result)) ? 0 : 1;
}
if (!in_array($this->linesId2[$this->traceId], $this->c2_result)) {
$b++;
}
}
$globalf42 .= $b . "\n";
}
fwrite($this->fp42, $globalf42);
fclose($this->fp42);
$command = "php $this->mainfile3 $this->workerId";
exec($command);
}
}
}

Related

PHP tree traversal perfomance: recursive function vs. loop with custom array as stack

I have two functions in PHP that looks like the following, both trying to do something to each end-node of a tree.
function recurse(array $tree) {
foreach($tree as $subtree) {
if(!is_array($subtree)) {
do_some_thing($subtree);
} else {
recurse($subtree);
}
}
}
function dfs(array $tree) {
$stack = [$tree];
while (!empty($stack)) {
$subtree = array_pop($stack);
if(!is_array($subtree)) {
do_some_thing($subtree);
} else {
for ($i = count($subtree) - 1; $i >= 0; $i --) {
$stack[] = $subtree[$i];
}
}
}
}
// TESTING DATA:
// function do_some_thing($node) {
// echo $node;
// }
// recurse([1,2,[3,4,[5],6],7]);
// 0.000156s
// 0.000163s
// 0.000157s
// 0.000168s
// 0.000143s
// dfs([1,2,[3,4,[5],6],7]);
// 0.000293s
// 0.000201s
// 0.001716s
// 0.000335s
// 0.000169s
I think the dfs() function should be better because system call-stack must be bigger than a user-defined stack but am not 100% sure because my test result on PHP-Sandbox seems to indicate the opposite. The recurse method seems to have very stable timing and is fast. The dfs method seems to have high-standard-deviation timing result and is slow on average. I would like to know what is the reason and is this always the case.
Also I don't know in either way, how big a difference there is, theoretically, for larger trees as I wasn't able to generate such a large dataset.

Worst rated PHP Operations declared by Scrutinizer

I use scrutinizer to analyse my code, and I get a function declared:
Worst rated PHP Operations
This is the function:
/**
* Insert Empty Fighters in an homogeneous way.
*
* #param Collection $fighters
* #param Collection $byeGroup
*
* #return Collection
*/
private function insertByes(Collection $fighters, Collection $byeGroup)
{
$bye = count($byeGroup) > 0 ? $byeGroup[0] : [];
$sizeFighters = count($fighters);
$sizeByeGroup = count($byeGroup);
$frequency = $sizeByeGroup != 0
? (int)floor($sizeFighters / $sizeByeGroup)
: -1;
// Create Copy of $competitors
$newFighters = new Collection();
$count = 0;
$byeCount = 0;
foreach ($fighters as $fighter) {
if ($frequency != -1 && $count % $frequency == 0 && $byeCount < $sizeByeGroup) {
$newFighters->push($bye);
$byeCount++;
}
$newFighters->push($fighter);
$count++;
}
return $newFighters;
}
What this function is doing is trying to insert Empty Fighters in a regular
/ homogeneous way
But for me, this method seems quite OK, what am I not seeing?
Any better way to achieve it???
Misleading name (probably not picked up by Scrutinizer). At no point the actual $byeGroup collection is necessary
private function insertByes(Collection $fighters, Collection $byeGroup)
An if statement, that is only used to pull out something, that should have been a method's parameter.
$bye = count($byeGroup) > 0 ? $byeGroup[0] : [];
$sizeFighters = count($fighters);
$sizeByeGroup = count($byeGroup);
Another if statement that adds to complexity. Also uses weak comparison.
$frequency = $sizeByeGroup != 0
? (int)floor($sizeFighters / $sizeByeGroup)
: -1;
// Create Copy of $competitors
$newFighters = new Collection();
$count = 0;
$byeCount = 0;
Content of this foreach should most likely go in a separate method.
foreach ($fighters as $fighter) {
And that complex condition in yet another if statement (which also contains weak comparison), should also be better in a well named private method.
if ($frequency != -1 && $count % $frequency == 0 && $byeCount < $sizeByeGroup) {
Since $bye can be an empty array, this kinda makes no sense.
$newFighters->push($bye);
$byeCount++;
}
$newFighters->push($fighter);
$count++;
}
return $newFighters;
}
TBH, I have no idea what this method does, and it would also be really hard to write any unit test for it.

Why not all threads are completed?

I've tried example from this Joe answer https://stackoverflow.com/a/32187103/2229367 and it works great, but when i tried to edit this code a little:
$pool = new Pool(4);
while (#$i++<10) {
$pool->submit(new class($i) extends Collectable {
public function __construct($id) {
$this->id = $id;
}
public function run() {
printf(
"Hello World from %d\n", $this->id);
$this->html = file_get_contents('http://google.fr?q=' . $this->query);
$this->setGarbage();
}
public $id;
public $html;
});
}
while ($pool->collect(function(Collectable $work){
printf(
"Collecting %d\n", $work->id);
var_dump($work->html);
return $work->isGarbage();
})) continue;
$pool->shutdown();
Count of "Hello world" differs from count of "Collecting".
Docs are out of date.
What about this problem?
Worker::collect is not intended to enable you to reap results; It is non-deterministic.
Worker::collect is only intended to run garbage collection on objects referenced in the stack of Worker objects.
If the intention is to process each result as it becomes available, the code might look something like this:
<?php
$pool = new Pool(4);
$results = new Volatile();
$expected = 10;
$found = 0;
while (#$i++ < $expected) {
$pool->submit(new class($i, $results) extends Threaded {
public function __construct($id, Volatile $results) {
$this->id = $id;
$this->results = $results;
}
public function run() {
$result = file_get_contents('http://google.fr?q=' . $this->id);
$this->results->synchronized(function($results, $result){
$results[$this->id] = $result;
$results->notify();
}, $this->results, $result);
}
private $id;
private $results;
});
}
do {
$next = $results->synchronized(function() use(&$found, $results) {
while (!count($results)) {
$results->wait();
}
$found++;
return $results->shift();
});
var_dump($next);
} while ($found < $expected);
while ($pool->collect()) continue;
$pool->shutdown();
?>
This is obviously not very tolerant of errors, but the main difference is that I use a shared Volatile collection of results, and I synchronize properly to fetch results in the main context as they become available.
If you wanted to wait for all results to become available, and possibly avoid some contention for locks - which you should always try to avoid if you can - then the code would look simpler, something like:
<?php
$pool = new Pool(4);
$results = new Volatile();
$expected = 10;
while (#$i++ < $expected) {
$pool->submit(new class($i, $results) extends Threaded {
public function __construct($id, Volatile $results) {
$this->id = $id;
$this->results = $results;
}
public function run() {
$result = file_get_contents('http://google.fr?q=' . $this->id);
$this->results->synchronized(function($results, $result){
$results[$this->id] = $result;
$results->notify();
}, $this->results, $result);
}
private $id;
private $results;
});
}
$results->synchronized(function() use($expected, $results) {
while (count($results) != $expected) {
$results->wait();
}
});
var_dump(count($results));
while ($pool->collect()) continue;
$pool->shutdown();
?>
Noteworthy that the Collectable interface is already implemented by Threaded in the most recent versions of pthreads - which is the one you should be using ... always ...
The docs are out of date, sorry about that ... one human ...
Pthreads V3 is much less forgiven than V2.
collect is a no go in V3.
Rule n°1: I do all my queries inside the threads, avoiding to pass too large amount of datas inside them. This was ok with V2, not anymore with V3. I keep passed arguments to workers as neat as possible. This also allows faster process.
Rule n°2: I do not go over the number of CPU threads available for each pool and chunck them accordingly with a loop. This way I make sure there are no memory overhead with a ton of pools and each time a loop is done, I force a garbage collection. This turned out to be necessary for me due to very high Ram needs across threads, might not be your case but make sure your consumed ram is not going over your php limit. More you passed arguments to the threads are big, more the ram will go up fast.
Rule n°3: Properly declare your object arrays in workers with (array) to make sure all results are returned.
Here is a basic rewritten working example , following the 3 rules as close as I can do per your example:
uses an array of queries to be multithreaded.
a collectable implement to grab the results in place of collect.
batches of pools according to the CPU nb of threads to avoid ram overheads.
threaded queries, each one having his connection, not passed across workers.
pushing all the results inside an array at the end.
code:
define("SQLHOST", "127.0.0.1");
define("SQLUSER", "root");
define("SQLPASS", "password");
define("SQLDBTA", "mydatabase");
$Nb_of_th=12; // (6 cpu cores in this example)
$queries = array_chunk($queries, ($Nb_of_th));// whatever list of queries you want to pass to the workers
$global_data=array();// all results from all pool cycles
// first we set the main loops
foreach ($queries as $key => $chunks) {
$pool = new Pool($Nb_of_th, Worker::class);// 12 pools max
$workCount = count($chunks);
// second we launch the submits
foreach (range(1, $workCount) as $i) {
$chunck = $chunks[$i - 1];
$pool->submit(new MyWorkers($chunck));
}
$data = [];// pool cycle result array
$collector = function (\Collectable $work) use (&$data) {
$isGarbage = $work->isGarbage();
if ($isGarbage) {
$data[] = $work->result; // thread result
}
return $isGarbage;
};
do {
$count = $pool->collect($collector);
$isComplete = count($data) === $workCount;
} while (!$isComplete);
array_push($global_data, $data);// push pool results into main
//complete purge
unset($data);
$pool->shutdown();
unset($pool);
gc_collect_cycles();// force garbage collector before new pool cycle
}
Var_dump($global_data); // results for all pool cycles
class MyWorkers extends \Threaded implements \Collectable {
private $isGarbage;
public $result;
private $process;
public function __construct($process) {
$this->process = $process;
}
public function run() {
$con = new PDO('mysql:host=' . SQLHOST . ';dbname=' . SQLDBTA . ';charset=UTF8', SQLUSER, SQLPASS);
$proc = (array) $this->process; // important ! avoid volatile destruction in V3
$stmt = $con->prepare($proc);
$stmt->execute();
$obj = $stmt1->fetchall(PDO::FETCH_ASSOC);
/* do whatever you want to do here */
$this->result = (array) $obj; // important ! avoid volatile destruction in V3
$this->isGarbage = true;
}
public function isGarbage() : bool
{
return $this->isGarbage;
}
}

What is the fastest way to convert an object to an array in PHP?

Currently there are quite a few different ways to convert a multi-layered object into a multidimensional array using PHP. Some seem pretty counterproductive but are widely used. I would really like to know which method is fastest (in general)?
I have experimented with several of the most common methods and timed the results. I realize that the depth of the object will have big effects and so will the number of sub-objects at each level. I am curious if anyone has a way that they think is faster. Below is my code using from what I can tell are the two most common methodologies. I needed some sample data so I pulled it from an example XML file.
<?php
$xml = file_get_contents("http://www.w3schools.com/xml/cd_catalog.xml");
//load XML string into SimpleXML object
$xmlObj = simplexml_load_string($xml);
/*
Method 1
Recursive typecasting
http://ben.lobaugh.net/blog/567/php-recursively-convert-an-object-to-an-array
*/
function objToArrayRecursiveTypecast($obj)
{
if(is_object($obj)) $obj = (array) $obj;
if(is_array($obj)) {
$new = array();
foreach($obj as $key => $val) {
$new[$key] = objToArrayRecursiveTypecast($val);
}
}
else $new = $obj;
return $new;
}
$method1StartTime = microtime(true);
$method1Results = objToArrayRecursiveTypecast($xmlObj);
$method1EndTime = microtime(true);
$method1ExecTime = $method1EndTime - $method1StartTime;
/*
Method 2
json_encode json_decode
Appears in code everywhere
*/
$method2StartTime = microtime(true);
$method2Results = json_decode(json_encode($xmlObj), true);
$method2EndTime = microtime(true);
$method2ExecTime = $method2EndTime - $method2StartTime;
/*
Method 3
Recursive object read and array assignment
Answer in http://stackoverflow.com/questions/4345554/convert-php-object-to-associative-array
*/
function object_to_array_recursive( $object, $assoc=TRUE, $empty='' )
{
$res_arr = array();
if (!empty($object)) {
$arrObj = is_object($object) ? get_object_vars($object) : $object;
$i=0;
foreach ($arrObj as $key => $val) {
$akey = ($assoc !== FALSE) ? $key : $i;
if (is_array($val) || is_object($val)) {
$res_arr[$akey] = (empty($val)) ? $empty : object_to_array_recursive($val);
}
else {
$res_arr[$akey] = (empty($val)) ? $empty : (string)$val;
}
$i++;
}
}
return $res_arr;
}
$method3StartTime = microtime(true);
$method3Results = object_to_array_recursive($xmlObj);
$method3EndTime = microtime(true);
$method3ExecTime = $method3EndTime - $method3StartTime;
/*
Method 4
Array map method
http://stackoverflow.com/questions/2476876/how-do-i-convert-an-object-to-an-array/2476954#2476954
*/
function arrayMapObjectToArray($object)
{
if(!is_object($object) && !is_array($object))
return $object;
return array_map('arrayMapObjectToArray', (array) $object);
}
$method4StartTime = microtime(true);
$method4Results = arrayMapObjectToArray($xmlObj);
$method4EndTime = microtime(true);
$method4ExecTime = $method4EndTime - $method4StartTime;
//output results
echo "Method 1 time to execute: $method1ExecTime \n";
echo "Method 2 time to execute: $method2ExecTime \n";
echo "Method 3 time to execute: $method3ExecTime \n";
echo "Method 4 time to execute: $method4ExecTime \n";
?>
I ran the test 3 times on the same unloaded test server. The results follow:
Method 1 time to execute: 0.00066113471984863
Method 2 time to execute: 0.00059700012207031
Method 3 time to execute: 0.00090503692626953
Method 4 time to execute: 0.00050783157348633
Method 1 time to execute: 0.00066494941711426
Method 2 time to execute: 0.00057506561279297
Method 3 time to execute: 0.00089788436889648
Method 4 time to execute: 0.00052714347839355
Method 1 time to execute: 0.00067400932312012
Method 2 time to execute: 0.00057005882263184
Method 3 time to execute: 0.0009000301361084
Method 4 time to execute: 0.00051212310791016
EDIT: Added another method that involves using recursive typecasting.
EDIT: Added array map method. It is the fastest by a considerable margin.
The speediness of the json_encode+json_decode approach comes from the fact that both functions have native implementations that are already compiled thus run much faster than the needed-to-interpret-by-php alternatives. Similar with the array cast, which also gives you speed, but has the limitation of not going deeper than one level.
Also, each call to a PHP coded function will need to setup a PHP stack, which is more expensive than setting up a native (C/C++) stack.
As conclusion, a PHP extension that will provide a object2array() function will be the fastest one, however I am not sure such a method exists. Until such a method is found the json function family remains the fastest one.

How can I get the return value of a Laravel chunk?

Here's an over-simplified example that doesn't work for me. How (using this method, I know there are better ways if I were actually wanting this specific result), can I get the total number of users?
User::chunk(200, function($users)
{
return count($users);
});
This returns NULL. Any idea how I can get a return value from the chunk function?
Edit:
Here might be a better example:
$processed_users = DB::table('users')->chunk(200, function($users)
{
// Do something with this batch of users. Now I'd like to keep track of how many I processed. Perhaps this is a background command that runs on a scheduled task.
$processed_users = count($users);
return $processed_users;
});
echo $processed_users; // returns null
I don't think you can achieve what you want in this way. The anonymous function is invoked by the chunk method, so anything you return from your closure is being swallowed by chunk. Since chunk potentially invokes this anonymous function N times, it makes no sense for it to return anything back from the closures it invokes.
However you can provide access to a method-scoped variable to the closure, and allow the closure to write to that value, which will let you indirectly return results. You do this with the use keyword, and make sure to pass the method-scoped variable in by reference, which is achieved with the & modifier.
This will work for example;
$count = 0;
DB::table('users')->chunk(200, function($users) use (&$count)
{
Log::debug(count($users)); // will log the current iterations count
$count = $count + count($users); // will write the total count to our method var
});
Log::debug($count); // will log the total count of records
$regions = array();
Regions::chunk(10, function($users) use (&$regions ) {
$stickers = array();
foreach ($users as $user)
{
$user->sababu = ($user->region_id > 1)? $user->region_id : 0 ;
$regions[] = $user;
}
});
echo json_encode($regions);
Use this custom function to get return value from chunked data
function iterateRecords($qb, int $count = 15)
{
$page = 1;
do {
$results = $qb->forPage($page, $count)->get();
$countResults = $results->count();
if ($countResults == 0) {
break;
}
foreach ($results as $row) {
yield $row;
}
unset($results);
$page++;
} while ($countResults == $count);
}
How to use it
$qb = User::select();
$users = iterateRecords($qb, 100);
foreach ($users as $user) {
echo $user->id;
}
Total Users Count $totalUsersCount = $qb->count();

Categories