I have been trying to implement multi-threading in php to achieve multi-upload using pthreads php.
From my understanding of multi-threading, this is how I envisioned it working.
I would upload a file,the file will start uploading in the background; even if the file is not completed to upload, another instance( thread ) will be created to upload another file. I would make multiple upload requests using AJAXand multiple files would start uploading, I would get the response of a single request individually and I can update the status of upload likewise in my site.
But this is not how it is working. This is the code that I got from one of the pthread question on SO, but I do not have the link( sorry!! ).
I tested this code to see of this really worked like I envisioned. This is the code I tested, I changed it a little.
<?php
error_reporting(E_ALL);
class AsyncWebRequest extends Thread {
public $url;
public $data;
public function __construct ($url) {
$this->url = $url;
}
public function run () {
if ( ($url = $this->url) ){
/*
* If a large amount of data is being requested, you might want to
* fsockopen and read using usleep in between reads
*/
$this->data = file_get_contents ($url);
echo $this->getThreadId ();
} else{
printf ("Thread #%lu was not provided a URL\n", $this->getThreadId ());
}
}
}
$t = microtime (true);
foreach( ["http://www.google.com/?q=". rand () * 10, 'http://localhost', 'https://facebook.com'] as $url ){
$g = new AsyncWebRequest( $url );
/* starting synchronized */
if ( $g->start () ){
printf ( $url ." took %f seconds to start ", microtime (true) - $t);
while ($g->isRunning ()) {
echo ".";
usleep (100);
}
if ( $g->join () ){
printf (" and %f seconds to finish receiving %d bytes\n", microtime (true) - $t, strlen ($g->data));
} else{
printf (" and %f seconds to finish, request failed\n", microtime (true) - $t);
}
}
echo "<hr/>";
}
So what I expected from this code was it would hit google.com, localhost and facebook.com simultaneously and run their individual threads. But every request is waiting for another request to complete.
For this it is clearly waiting for first response to complete before it is making another request because time the request are sent are after the request from the previous request is complete.
So, This is clearly not the way to achieve what I am trying to achieve. How do I do this?
You might want to look at multi curl for such multiple external requests. Pthreads is more about internal processes.
Just for further reference, you are starting threads 1 by 1 and waiting for them to finish.
This code: while ($g->isRunning ()) doesn't stop until the thread is finished. It's like having a while (true) in a for. The for executes 1 step at a time.
You need to start the threads, add them in an array, and in another while loop check each of the threads if it stopped and remove them from the array.
Related
Ok, so lets start slow...
I have a pthreads script running and working for me, tested and working 100% of the time when I run it manually from the command line via ssh. The script is as follows with the main thread process code adjusted to simulate random process' run time.
class ProcessingPool extends Worker {
public function run(){}
}
class LongRunningProcess extends Threaded implements Collectable {
public function __construct($id,$data) {
$this->id = $id;
$this->data = $data;
}
public function run() {
$data = $this->data;
$this->garbage = true;
$this->result = 'START TIME:'.time().PHP_EOL;
// Here is our actual logic which will be handled within a single thread (obviously simulated here instead of the real functionality)
sleep(rand(1,100));
$this->result .= 'ID:'.$this->id.' RESULT: '.print_r($this->data,true).PHP_EOL;
$this->result .= 'END TIME:'.time().PHP_EOL;
$this->finished = time();
}
public function __destruct () {
$Finished = 'EXITED WITHOUT FINISHING';
if($this->finished > 0) {
$Finished = 'FINISHED';
}
if ($this->id === null) {
print_r("nullified thread $Finished!");
} else {
print_r("Thread w/ ID {$this->id} $Finished!");
}
}
public function isGarbage() : bool { return $this->garbage; }
public function getData() {
return $this->data;
}
public function getResult() {
return $this->result;
}
protected $id;
protected $data;
protected $result;
private $garbage = false;
private $finished = 0;
}
$LoopDelay = 500000; // microseconds
$MinimumRunTime = 300; // seconds (5 minutes)
// So we setup our pthreads pool which will hold our collection of threads
$pool = new Pool(4, ProcessingPool::class, []);
$Count = 0;
$StillCollecting = true;
$CountCollection = 0;
do {
// Grab all items from the conversion_queue which have not been processed
$result = $DB->prepare("SELECT * FROM `processing_queue` WHERE `processed` = 0 ORDER BY `queue_id` ASC");
$result->execute();
$rows = $result->fetchAll(PDO::FETCH_ASSOC);
if(!empty($rows)) {
// for each of the rows returned from the queue, and allow the workers to run and return
foreach($rows as $id => $row) {
$update = $DB->prepare("UPDATE `processing_queue` SET `processed` = 1 WHERE `queue_id` = ?");
$update->execute([$row['queue_id']]);
$pool->submit(new LongRunningProcess($row['fqueue_id'],$row));
$Count++;
}
} else {
// 0 Rows To Add To Pool From The Queue, Do Nothing...
}
// Before we allow the loop to move on to the next part, lets try and collect anything that finished
$pool->collect(function ($Processed) use(&$CountCollection) {
global $DB;
$data = $Processed->getData();
$result = $Processed->getResult();
$update = $DB->prepare("UPDATE `processing_queue` SET `processed` = 2 WHERE `queue_id` = ?");
$update->execute([$data['queue_id']]);
$CountCollection++;
return $Processed->isGarbage();
});
print_r('Collecting Loop...'.$CountCollection.'/'.$Count);
// If we have collected the same total amount as we have processed then we can consider ourselves done collecting everything that has been added to the database during the time this script started and was running
if($CountCollection == $Count) {
$StillCollecting = false;
print_r('Done Collecting Everything...');
}
// If we have not reached the full MinimumRunTime that this cron should run for, then lets continue to loop
$EndTime = microtime(true);
$TimeElapsed = ($EndTime - $StartTime);
if(($TimeElapsed/($LoopDelay/1000000)) < ($MinimumRunTime/($LoopDelay/1000000))) {
$StillCollecting = true;
print_r('Ended To Early, Lets Force Another Loop...');
}
usleep($LoopDelay);
} while($StillCollecting);
$pool->shutdown();
So while the above script will run via a command line (which has been adjusted to the basic example, and detailed processing code has been simulated in the above example), the below command gives a different result when run from a cron setup for every 5 minutes...
/opt/php7zts/bin/php -q /home/account/cron-entry.php file=every-5-minutes/processing-queue.php
The above script, when using the above command line call, will loop over and over during the run time of the script and collect any new items from the DB queue, and insert them into the pool, which allows 4 processes at a time to run and finish, which is then collected and the queue is updated before another loop happens, pulling any new items from the DB. This script will run until we have processed and collected all processes in the queue during the execution of the script. If the script has not run for the full 5 minute expected period of time, the loop is forced to continue checking the queue, if the script has run over the 5 minute mark it allows any current threads to finish & be collected before closing. Note that the above code also includes a code based "flock" functionality which makes future crons of this idle loop and exit or start once the lock has lifted, ensuring that the queue and threads are not bumping into each other. Again, ALL OF THIS WORKS FROM THE COMMAND LINE VIA SSH.
Once I take the above command, and put it into a cron to run for every 5 minutes, essentially giving me a never ending loop, while maintaining memory, I get a different result...
That result is described as follows... The script starts, checks the flock, and continues if the lock is not there, it creates the lock, and runs the above script. The items are taken from the queue in the DB, and inserted into the pool, the pool fires off the 4 threads at a time as expected.. But the unexpected result is that the run() command does not seem to be executed, and instead the __destruct function runs, and a "Thread w/ ID 2 FINISHED!" type of message is returned to the output. This in turn means that the collection side of things does not collect anything, and the initiating script (the cron script itself /home/account/cron-entry.php file=every-5-minutes/processing-queue.php) finishes after everything has been put into the pool, and destructed. Which prematurely "finishes" the cron job, since there is nothing else to do but loop and pull nothing new from the queue, since they are considered "being processed" when processed == 1 in the queue.
The question then finally becomes... How do I make the cron's script aware of the threads that where spawned and run() them without closing the pool out before they can do anything?
(note... if you copy / paste the provided script, note that I did not test it after removing the detailed logic, so it may need some simple fixes... please do not nit-pick said code, as the key here is that pthreads works if the script is executed FROM the Command Line, but fails to properly run when the script is executed FROM a CRON. If you plan on commenting with non-constructive criticism, please go use your fingers to do something else!)
Joe Watkins! I Need Your Brilliance! Thanks In Advance!
After all of that, it seems that the issue was with regards to user permissions. I was setting this specific cron up inside of cpanel, and when running the command manually I was logged in as root.
After setting this command up in roots crontab, I was able to get it to successfully run the threads from the pool. Only issue I have now is some threads never finish, and sometimes I am unable to close the pool. But this is a different issue, so I will open another question elsewhere.
For those running into this issue, make sure you know who the owner of the cron is as it matters with php's pthreads.
I have run into a rather strange issue with a particular part of a large PHP application. The portion of the application in question loads data from MySQL (mostly integer data) and builds a JSON string which gets output to the browser. These requests were taking sevar seconds (8 - 10 seconds each) in Chrome's developer tools as well as via curl. However the PHP shutdown handler I had reported that the requests were executing in less than 1 second.
In order to debug I added a call to fastcgi_finish_request(), and suddenly my shutdown handler reported the same time as Chrome / curl.
With some debugging, I narrowed it down to a particular function. I created the following simple test case:
<?php
$start_time = DFStdLib::exec_time();
$product = new ApparelQuoteProduct(19);
$pmatrix = $product->productMatrix();
// This function call is the problem:
$ranges = $pmatrix->ranges();
$end_time = DFStdLib::exec_time();
$duration = $end_time - $start_time;
echo "Output generation duration was: $duration sec";
fastcgi_finish_request();
$fastcgi_finish_request_duration = DFStdLib::exec_time() - $end_time;
DFSkel::log(DFSkel::LOG_INFO,"Output generation duration was: $duration sec; fastcgi_finish_request Duration was: $fastcgi_finish_request_duration sec");
If I call $pmatrix->ranges() (which is a function that executes a number of calls to mysql_query to fetch data and build an in-memory PHP object structure from that data) then I get the output:
Output generation duration was: 0.2563910484314 sec; fastcgi_finish_request Duration was: 7.3854329586029 sec
in my log file. Note that the call to $pmatrix->ranges() does not take long at all, yet somehow it causes the PHP FastCGI handler to take seven seconds to fihish the request. (This is true even if I don't call fastcgi_finish_request -- the browser takes 7-8 seconds to display the data either way)
If I comment out the call to $pmatrix->ranges() I get:
Output generation duration was: 0.0016419887542725 sec; fastcgi_finish_request Duration was: 0.00035214424133301 sec
I can post the entire source for the $pmatrix->ranges() function, but it's very long. I'd like some advice on where to even start looking.
What is it about the PHP FastCGI request process which would even cause such behavior? Does it call destructor functions / garbage collection? Does it close open resources? How can I troubleshoot this further?
EDIT: Here's a larger source sample:
<?php
class ApparelQuote_ProductPricingMatrix_TestCase
{
protected $myProductId;
protected $myQuantityRanges;
private $myProduct;
protected $myColors;
protected $mySizes;
protected $myQuantityPricing;
public function __construct($product)
{
$this->myProductId = intval($product);
}
/**
* Return an array of all ranges for this matrix.
*
* #return array
*/
public function ranges()
{
$this->myLoadPricing();
return $this->myQuantityRanges;
}
protected function myLoadPricing($force=false)
{
if($force || !$this->myQuantityPricing)
{
$this->myColors = array();
$this->mySizes = array();
$priceRec_finder = new ApparelQuote_ProductPricingRecord();
$priceRec_finder->_link = Module_ApparelQuote::dbLink();
$found_recs = $priceRec_finder->find(_ALL,"`product_id`={$this->myProductId}","`qtyrange_id`,`color_id`");
$qtyFinder = new ApparelQuote_ProductPricingQtyRange();
$qtyFinder->_link = Module_ApparelQuote::dbLink();
$this->myQuantityRanges = $qtyFinder->find(_ALL,"`product_id`=$this->myProductId");
$this->myQuantityPricing = array();
foreach ($found_recs as &$r)
{
if(false) $r = new ApparelQuote_ProductPricingRecord();
if(!isset($this->myColors[$r->color_id]))
$this->myColors[$r->color_id] = true;
if(!isset($this->mySizes[$r->size_id]))
$this->mySizes[$r->size_id] = true;
if(!is_array($this->myQuantityPricing[$r->qtyrange_id]))
$this->myQuantityPricing[$r->qtyrange_id] = array();
if(!is_array($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id] = array();
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id][$r->size_id] = &$r;
}
$this->myColors = array_keys($this->myColors);
$this->mySizes = array_keys($this->mySizes);
}
}
}
$start_time = DFStdLib::exec_time();
$pmatrix = new ApparelQuote_ProductPricingMatrix_TestCase(19);
$ranges = $pmatrix->ranges();
$end_time = DFStdLib::exec_time();
$duration = $end_time - $start_time;
echo "Output generation duration was: $duration sec";
fastcgi_finish_request();
$fastcgi_finish_request_duration = DFStdLib::exec_time() - $end_time;
DFSkel::log(DFSkel::LOG_INFO,"Output generation duration was: $duration sec; fastcgi_finish_request Duration was: $fastcgi_finish_request_duration sec");
Upon continued debugging I have narrowed down to the following lines from the above:
if(!is_array($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id] = array();
These statements are building an in-memory array structure of all the data loaded from MySQL. If I comment these out, then fastcgi_finish_request takes roughly 0.0001 seconds to run. If I do not comment them out, then fastcgi_finish_request takes 7+ seconds to run.
It's actually the function call to is_array that's the issue here
Changing to:
if(!isset($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
Resolves the problem. Why is this?
I'm still relatively new to PHP and trying to use pthreads to solve an issue. I have 20 threads running processes that end at varying times. Most finish around < 10 seconds or so. I don't need all 20, just 10 detected. Once I get to 10, I would like to kill the threads, or to continue on to the next step.
I have tried using set_time_limit to about 20 seconds for each of the threads, but they ignore it and keep running. I am looping through the jobs looking for the join because I didn't want the rest of the program to run but I'm stuck until the slowest one has finished. While pthreads has reduced the time from around a minute to about 30 seconds, I can shave even more time since the first 10 run in about 3 seconds.
Thanks for any help and here is my code:
$count = 0;
foreach ( $array as $i ) {
$imgName = $this->smsId."_$count.jpg";
$name = "LocalCDN/".$imgName;
$stack[] = new AsyncImageModify($i['largePic'], $name);
$count++;
}
// Run the threads
foreach ( $stack as $t ) {
$t->start();
}
// Check if the threads have finished; push the coordinates into an array
foreach ( $stack as $t ) {
if($t->join()){
array_push($this->imgArray, $t->data);
}
}
class class AsyncImageModify extends \Thread{
public $data;
public function __construct($arg, $name, $container) {
$this->arg = $arg;
$this->name = $name;
}
public function run() {
//tried putting the set_time_limit() here, didn't work
if ($this->arg) {
// Get the image
$didWeGetTheImage = Image::getImage($this->arg, $this->name);
if($didWeGetTheImage){
$timestamp1 = microtime(true);
print_r("Starting face detection $this->arg" . "\n");
print_r(" ");
$j = Image::process1($this->name);
if($j){
// lets go ahead and do our image manipulation at this point
$userPic = Image::process2($this->name, $this->name, 200, 200, false, $this->name, $j);
if($userPic){
$this->data = $userPic;
print_r("Back from process2; the image returned is $userPic");
}
}
$endTime = microtime(true);
$td = $endTime-$timestamp1;
print_r("Finished face detection $this->arg in $td seconds" . "\n");
print_r($j);
}
}
}
It is difficult to guess the functionality of Image::* methods, so I can't really answer in any detail.
What I can say, is that there are very few machines I can think of that are suitable to run 20 concurrent threads in any case. A more suitable setup would be the worker/stackable model. A Worker thread is a reuseable context, and can execute task after task, implemented as Stackables; execution in a multi-threaded environment should always use the least amount of threads to get the most work done possible.
Please see pooling example and other examples that are distributed with pthreads, available on github, additionally, much information regarding usage is contained in past bug reports, if you are still struggling after that ...
Very simply, I have a program that needs to perform a large process (anywhere from 5 seconds to several minutes) and I don't want to make my page wait for the process to finish to load.
I understand that I need to run this gearman job as a background process but I'm struggling to identify the proper solution to get real-time status updates as to when the worker actually finishes the process. I've used the following code snippet from the PHP examples:
do {
sleep(3);
$stat = $gmclient->jobStatus($job_handle);
if (!$stat[0]) // the job is known so it is not done
$done = true;
echo "Running: " . ($stat[1] ? "true" : "false") . ", numerator: " . $stat[2] . ", denomintor: " . $stat[3] . "\n";
} while(!$done);
echo "done!\n";
and this works, however it appears that it just returns data to the client when the worker finished telling the job what to do. Instead I want to know when the literal process of the job finished.
My real-life example:
Pull several data feeds from an API (some feeds take longer than others)
Load a couple of the ones that always load fast, place a "Waiting/Loading" animation on the section that was sent off to a worker queue
When the work is done and the results have been completely retrieved, replace the animation with the results
This is a bit late, but I stumbled across this question looking for the same answer. I was able to get a solution together, so maybe it will help someone else.
For starters, refer to the documentation on GearmanClient::jobStatus. This will be called from the client, and the function accepts a single argument: $job_handle. You retrieve this handle when you dispatch the request:
$client = new GearmanClient( );
$client->addServer( '127.0.0.1', 4730 );
$handle = $client->doBackground( 'serviceRequest', $data );
Later on, you can retrieve the status by calling the jobStatus function on the same $client object:
$status = $client->jobStatus( $handle );
This is only meaningful, though, if you actually change the status from within your worker with the sendStatus method:
$worker = new GearmanWorker( );
$worker->addFunction( 'serviceRequest', function( $job ) {
$max = 10;
// Set initial status - numerator / denominator
$job->sendStatus( 0, $max );
for( $i = 1; $i <= $max; $i++ ) {
sleep( 2 ); // Simulate a long running task
$job->sendStatus( $i, $max );
}
return GEARMAN_SUCCESS;
} );
while( $worker->work( ) ) {
$worker->wait( );
}
In versions of Gearman prior to 0.5, you would use the GearmanJob::status method to set the status of a job. Versions 0.6 to current (1.1) use the methods above.
See also this question: Problem With Gearman Job Status
I have this loop to long poll:
$time = time(); //send a blank response after 15sec
while((time() - $time) < 15) {
$last_modif = filemtime("./logs.txt");
if($_SESSION['lasttime'] != $last_modif) {
$_SESSION['lasttime'] = $last_modif;
$logs = file_get_contents("./logs.txt");
print nl2br($logs);
ob_flush();
flush();
die();
}
usleep(10000);
}
problem is: the "if" condition is never entered in the middle of the while loop, even if logs.txt is modified. I have to wait 15sec till the next call to this file to obtain the updated content (so it becomes a regular, "setTimeout style" AJAX polling, not a long-poll). Any idea why ?
This is because of the filemtime() function : its results are cached. Thus each time your loop is executed the timestamp's the same for 15 seconds.
I have not tried it myself, but according to w3cschools :
The result of this function are cached. Use clearstatcache() to clear the cache.
Hope that helps !