After tinkering/testing stuffs for a few days with pthreads I manage to write a script from scratch that actually works for my needs. The point of my script is to run async like 500-1000 threads and keep creating new ones immediately when one is available. However because I'm not an expert in PHP my code may have some flaws or it could be the same task that I want but in another way, in a better way.
Some flaws that I see untill now are:
The script could cause the CPU to go in 100% use because if there is
no sleep used in while loop. See sleep(3) line.
There are flaws in the code or unecessary stuffs. Because I'm noob with pthreads, it may be another way to do the same thing but much more optimised'.
Any suggestion to improve my code is appreciated. The reason why I post this is because I actually didn't find nothing on search engines that is doing this. All the examples are syncronised and all async examples like, run 500 threads and then stop and I was looking to something that it makes 500 threads but keep making another 500 untill reaches end of a file for example.
Script:
<?php
// Maximum number of threads.
$max_threads = 15; // I do plan to use like 500-1000 threads.
// Initiate the threads.
$threads = [];
for($i = 0; $i <= $max_threads; $i++) {
$threads[$i] = new MultiThreads($i);
$threads[$i]->start();
}
// Keep creating threads. Here I plan to make it to read a large file of urls(targets) but for demo/testing purpose I made a while loop.
while(true) {
for($i = 0; $i <= $max_threads; $i++) {
if($threads[$i]->isRunning() != true) { // If is not true(which means thread is busy) keep creating new threads. Why not creating a thread if is available.
$threads[$i] = new MultiThreads($i);
$threads[$i]->start();
}
}
//echo 'Sleeping for a while untill some threads are free.' . "\n"; // Or comment sleep(3) to not wait at all.
//sleep(3);
}
// End of script.
echo 'END! But will never end cause of while loop.' . "\n";
//////////////////////////////////////////////////////////////////
class MultiThreads extends Thread {
private $threadId;
public function __construct(int $id) {
$this->threadId = $id;
}
public function run() {
echo 'Thread(' . $this->threadId . ') started.' . "\n";
sleep(rand(1, 10)); // Simulate some 'work'.
echo 'Thread(' . $this->threadId . ') ended.' . "\n";
}
}
?>
Related
I am running an online shop, and I have a big file with email addresses of customers, but the list is very old, like +15 years old. I bought code from codecanyon that checks the list, but it is very slow.
I tried to make it run multithreaded, but something is very wrong with my code, can you help me out?
The code works, but not the way I want it to. Now it checks the same email x %thread_nr. If I set 10 threads, it checks the same email 10x times.
What is wrong ?
<?php
require_once('email_checker.class.php');
class Task extends Threaded
{
private $value;
public function __construct(int $i)
{
$this->value = $i;
}
public function run()
{
$file_lines = file('mail.txt');
$emailChecker = new emailChecker; // Make a new instance
foreach ($file_lines as $line) {
$response = $emailChecker->check($line);
foreach($response as $result) {
echo $result['query'].'-'.$result['success']."\n";
}
}
}
}
$file_lines = file('mail.txt');
# Create a pool of 4 threads
$pool = new Pool(4);
for ($i = 0; $i < 15000; ++$i)
{
$pool->submit(new Task($i));
}
while ($pool->collect());
$pool->shutdown();
If I was in that situation I would rework the code so its cleaner, and broken into the smallest functions. Checking that the functions work individually.
I would also test that the code works without multi-threading first, with a small portion of your emails.
Is there a way to call multiple file_get_contents without waiting first one to finish. I have a few PHP scripts that do some heavy work and execution time of one is more than 1.5 min so I want to call them all at the same time to reduce execution time. Currently I have multiple file_get_contents but to run next file_get_contents the first one needs to finish first. I tried exec method but it's blocked on server (shared hosting).
Well there are ways of running tasks in parallel if thats what you're trying to achieve but bare in mind PHP is a high level scripting language and the original purpose of it was to serve stateless HTTP request.
There are extensions out there like Gearman, this extension allows applications to complete tasks in parallel.
Obviously this will be in vane for you as you're probably using a service for just hosting. Get something like a vps, a vps from OVH is cheaper than most web hosting services.
Yes, it is possible! But it is little bit tricky!
You have to play with forks.
I agree about queues and else... but just for the sake of ability, here is an example:
<?php
declare(ticks = 1);
$a = [
'https://twitter.com',
'https://facebook.com',
'https://stackoverflow.com',
'https://linkedin.com',
'https://github.com',
];
// Count of php threads (forks or processes).
$max = 3;
// Child counter.
$child = 0;
pcntl_signal(SIGCHLD, function ($signo) {
global $child;
if ($signo === SIGCLD) {
while (($pid = pcntl_wait($signo, WNOHANG)) > 0) {
$signal = pcntl_wexitstatus($signo);
$child--;
}
}
});
foreach ($a as $item) {
while ($child >= $max) {
sleep(1);
}
$child++;
$pid = pcntl_fork();
if ($pid) {
} else {
// Child fork.
sleep(1);
// HERE YOUR CODE:
echo file_get_contents($item);
exit(0);
}
}
while ($child != 0) {
sleep(1);
}
How should I multithread some php-cli code that needs a timeout?
I'm using PHP 5.6 on Centos 6.6 from the command line.
I'm not very familiar with multithreading terminology or code. I'll simplify the code here but it is 100% representative of what I want to do.
The non-threaded code currently looks something like this:
$datasets = MyLibrary::getAllRawDataFromDBasArrays();
foreach ($datasets as $dataset) {
MyLibrary::processRawDataAndStoreResultInDB($dataset);
}
exit; // just for clarity
I need to prefetch all my datasets, and each processRawDataAndStoreResultInDB() cannot fetch it's own dataset. Sometimes processRawDataAndStoreResultInDB() takes too long to process a dataset, so I want to limit the amount of time it has to process it.
So you can see that making it multithreaded would
Speed it up by allowing multiple processRawDataAndStoreResultInDB() to execute at the same time
Use set_time_limit() to limit the amount of time each one has to process each dataset
Notice that I don't need to come back to my main program. Since this is a simplification, you can trust that I don't want to collect all the processed datasets and do a single save into the DB after they are all done.
I'd like to do something like:
class MyWorkerThread extends SomeThreadType {
public function __construct($timeout, $dataset) {
$this->timeout = $timeout;
$this->dataset = $dataset;
}
public function run() {
set_time_limit($this->timeout);
MyLibrary::processRawDataAndStoreResultInDB($this->dataset);
}
}
$numberOfThreads = 4;
$pool = somePoolClass($numberOfThreads);
$pool->start();
$datasets = MyLibrary::getAllRawDataFromDBasArrays();
$timeoutForEachThread = 5; // seconds
foreach ($datasets as $dataset) {
$thread = new MyWorkerThread($timeoutForEachThread, $dataset);
$thread->addCallbackOnTerminated(function() {
if ($this->isTimeout()) {
MyLibrary::saveBadDatasetToDb($dataset);
}
}
$pool->addToQueue($thread);
}
$pool->waitUntilAllWorkersAreFinished();
exit; // for clarity
From my research online I've found the PHP extension pthreads which I can use with my thread-safe php CLI, or I could use the PCNTL extension or a wrapper library around it (say, Arara/Process)
https://github.com/krakjoe/pthreads (and the example directory)
https://github.com/Arara/Process (pcntl wrapper)
When I look at them and their examples though (especially the pthreads pool example) I get confused quickly by the terminology and which classes I should use to achieve the kind of multithreading I'm looking for.
I even wouldn't mind creating the pool class myself, if I had a isRunning(), isTerminated(), getTerminationStatus() and execute() function on a thread class, as it would be a simple queue.
Can someone with more experience please direct me to which library, classes and functions I should be using to map to my example above? Am I taking the wrong approach completely?
Thanks in advance.
Here comes an example using worker processes. I'm using the pcntl extension.
/**
* Spawns a worker process and returns it pid or -1
* if something goes wrong.
*
* #param callback function, closure or method to call
* #return integer
*/
function worker($callback) {
$pid = pcntl_fork();
if($pid === 0) {
// Child process
exit($callback());
} else {
// Main process or an error
return $pid;
}
}
$datasets = array(
array('test', '123'),
array('foo', 'bar')
);
$maxWorkers = 1;
$numWorkers = 0;
foreach($datasets as $dataset) {
$pid = worker(function () use ($dataset) {
// Do DB stuff here
var_dump($dataset);
return 0;
});
if($pid !== -1) {
$numWorkers++;
} else {
// Handle fork errors here
echo 'Failed to spawn worker';
}
// If $maxWorkers is reached we need to wait
// for at least one child to return
if($numWorkers === $maxWorkers) {
// $status is passed by reference
$pid = pcntl_wait($status);
echo "child process $pid returned $status\n";
$numWorkers--;
}
}
// (Non blocking) wait for the remaining childs
while(true) {
// $status is passed by reference
$pid = pcntl_wait($status, WNOHANG);
if(is_null($pid) || $pid === -1) {
break;
}
if($pid === 0) {
// Be patient ...
usleep(50000);
continue;
}
echo "child process $pid returned $status\n";
}
I have a function that needs to go over around 20K rows from an array, and apply an external script to each. This is a slow process, as PHP is waiting for the script to be executed before continuing with the next row.
In order to make this process faster I was thinking on running the function in different parts, at the same time. So, for example, rows 0 to 2000 as one function, 2001 to 4000 on another one, and so on. How can I do this in a neat way? I could make different cron jobs, one for each function with different params: myFunction(0, 2000), then another cron job with myFunction(2001, 4000), etc. but that doesn't seem too clean. What's a good way of doing this?
If you'd like to execute parallel tasks in PHP, I would consider using Gearman. Another approach would be to use pcntl_fork(), but I'd prefer actual workers when it's task based.
The only waiting time you suffer is between getting the data and processing the data. Processing the data is actually completely blocking anyway (you just simply have to wait for it). You will not likely gain any benefits past increasing the number of processes to the number of cores that you have. Basically I think this means the number of processes is small so scheduling the execution of 2-8 processes doesn't sound that hideous. If you are worried about not being able to process data while retrieving data, you could in theory get your data from the database in small blocks, and then distribute the processing load between a few processes, one for each core.
I think I align more with the forking child processes approach for actually running the processing threads. There is a brilliant demonstration in the comments on the pcntl_fork doc page showing an implementation of a job daemon class
http://php.net/manual/en/function.pcntl-fork.php
<?php
declare(ticks=1);
//A very basic job daemon that you can extend to your needs.
class JobDaemon{
public $maxProcesses = 25;
protected $jobsStarted = 0;
protected $currentJobs = array();
protected $signalQueue=array();
protected $parentPID;
public function __construct(){
echo "constructed \n";
$this->parentPID = getmypid();
pcntl_signal(SIGCHLD, array($this, "childSignalHandler"));
}
/**
* Run the Daemon
*/
public function run(){
echo "Running \n";
for($i=0; $i<10000; $i++){
$jobID = rand(0,10000000000000);
while(count($this->currentJobs) >= $this->maxProcesses){
echo "Maximum children allowed, waiting...\n";
sleep(1);
}
$launched = $this->launchJob($jobID);
}
//Wait for child processes to finish before exiting here
while(count($this->currentJobs)){
echo "Waiting for current jobs to finish... \n";
sleep(1);
}
}
/**
* Launch a job from the job queue
*/
protected function launchJob($jobID){
$pid = pcntl_fork();
if($pid == -1){
//Problem launching the job
error_log('Could not launch new job, exiting');
return false;
}
else if ($pid){
// Parent process
// Sometimes you can receive a signal to the childSignalHandler function before this code executes if
// the child script executes quickly enough!
//
$this->currentJobs[$pid] = $jobID;
// In the event that a signal for this pid was caught before we get here, it will be in our signalQueue array
// So let's go ahead and process it now as if we'd just received the signal
if(isset($this->signalQueue[$pid])){
echo "found $pid in the signal queue, processing it now \n";
$this->childSignalHandler(SIGCHLD, $pid, $this->signalQueue[$pid]);
unset($this->signalQueue[$pid]);
}
}
else{
//Forked child, do your deeds....
$exitStatus = 0; //Error code if you need to or whatever
echo "Doing something fun in pid ".getmypid()."\n";
exit($exitStatus);
}
return true;
}
public function childSignalHandler($signo, $pid=null, $status=null){
//If no pid is provided, that means we're getting the signal from the system. Let's figure out
//which child process ended
if(!$pid){
$pid = pcntl_waitpid(-1, $status, WNOHANG);
}
//Make sure we get all of the exited children
while($pid > 0){
if($pid && isset($this->currentJobs[$pid])){
$exitCode = pcntl_wexitstatus($status);
if($exitCode != 0){
echo "$pid exited with status ".$exitCode."\n";
}
unset($this->currentJobs[$pid]);
}
else if($pid){
//Oh no, our job has finished before this parent process could even note that it had been launched!
//Let's make note of it and handle it when the parent process is ready for it
echo "..... Adding $pid to the signal queue ..... \n";
$this->signalQueue[$pid] = $status;
}
$pid = pcntl_waitpid(-1, $status, WNOHANG);
}
return true;
}
}
you can use "PTHREADS"
very easy to install and works great on windows
download from here -> http://windows.php.net/downloads/pecl/releases/pthreads/2.0.4/
Extract the zip file and then
move the file 'php_pthreads.dll' to php\ext\ directory.
move the file 'pthreadVC2.dll' to php\ directory.
then add this line in your 'php.ini' file:
extension=php_pthreads.dll
save the file.
you just done :-)
now lets see example of how to use it:
class ChildThread extends Thread {
public $data;
public function run() {
/* Do some expensive work */
$this->data = 'result of expensive work';
}
}
$thread = new ChildThread();
if ($thread->start()) {
/*
* Do some expensive work, while already doing other
* work in the child thread.
*/
// wait until thread is finished
$thread->join();
// we can now even access $thread->data
}
for more information about PTHREADS read php docs here:
PHP DOCS PTHREADS
if you'r using WAMP like me, then you should add 'pthreadVC2.dll' into
\wamp\bin\apache\ApacheX.X.X\bin
and also edit the 'php.ini' file (same path) and add the same line as before
extension=php_pthreads.dll
GOOD LUCK!
What you are looking for is parallel which is a succinct concurrency API for PHP 7.2+
$runtime = new \parallel\Runtime();
$future = $runtime->run(function() {
for ($i = 0; $i < 500; $i++) {
echo "*";
}
return "easy";
});
for ($i = 0; $i < 500; $i++) {
echo ".";
}
printf("\nUsing \\parallel\\Runtime is %s\n", $future->value());
Output:
.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
Using \parallel\Runtime is easy
Have a look at pcntl_fork. This allows you to spawn child processes which can then do the separate work that you need.
Not sure if a solution for your situation but you can redirect the output of system calls to a file, thus PHP will not wait until the program is finished. Although this may result in overloading your server.
http://www.php.net/manual/en/function.exec.php - If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
There's Guzzle with its concurrent requests
use GuzzleHttp\Client;
use GuzzleHttp\Promise;
$client = new Client(['base_uri' => 'http://httpbin.org/']);
$promises = [
'image' => $client->getAsync('/image'),
'png' => $client->getAsync('/image/png'),
'jpeg' => $client->getAsync('/image/jpeg'),
'webp' => $client->getAsync('/image/webp')
];
$responses = Promise\Utils::unwrap($promises);
There's the overhead of promises; but more importantly Guzzle only works with HTTP requests and it works with version 7+ and frameworks like Laravel.
I'm trying to create a PHP file, which wouldn't run if it's already running. Here's the code I'm using:
<?php
class Test {
private $tmpfile;
public function action_run() {
$this->die_if_running();
$this->run();
}
private function die_if_running() {
$this->tmpfile = #fopen('.refresher2.pid', "w");
$locked = #flock($this->tmpfile, LOCK_EX|LOCK_NB);
if (! $locked) {
#fclose($this->tmpfile);
die("Running 2");
}
}
private function run() {
echo "NOT RUNNNING";
sleep(100);
}
}
$test = new Test();
$test->action_run();
The problem is, when I run this from console, it works great. But when I try to run it from browser, many instances can run simultaneously. This is on Windows 7, XAMPP, PHP 5.3.2. I guess OS is thinking that it's the same process and thus the functionality falls. Is there a cross-platform way to create a PHP script of this type?
Not really anything to promising. You can't use flock for that like this.
You could use system() to start another (php) process that does the locking for you. But drawbacks:
You need to do interprocess communication. Think about a way how to tell the other program when to release the lock etc. You can use stdin for messenging und use 3 constants or something. In this case it's still rather simple
It's bad for performance because you keep creating processes which is expensive.
Another way would be to start another program that runs all the time. You connect to it using some means of IPC (probably just use a tcp channel because it's cross-platform) and allow this program to manage file acces. That program could be a php script in an endless loop as well, but it will probably be simpler to code this in Java or another language that has multithreading support.
Another way would be to leverage existing ressources. Create a dummy database table for locks, create an entry for the file and then do table-row-locking.
Another way would be not to use files, but a database.
I had a similar problem a while ago.
I needed to have a counter where the number returned was unique.
I used a lock-file and only if this instance was able to create the lock-file was it allowed to read the file with the current number.
Instead of counting up perhaps you can allow the script to run.
The trick is to let try a few times (like 5) with a small wait/sleep in between.
function GetNextNumber()
{
$lockFile = "lockFile.txt";
$lfh = #fopen($lockFile, "x");
if (!$lfh)
{
$lockOkay = false;
$count = 0;
$countMax = 5;
// Try ones every second in 5 seconds
while (!$lockOkay & $count < $countMax)
{
$lfh = #fopen($lockFile, "x");
if ($lfh)
{
$lockOkay = true;
}
else
{
$count++;
sleep(1);
}
}
}
if ($lfh)
{
$fh = fopen($myFile, 'r+') or die("Too many users. ");
flock($fh, LOCK_EX);
$O_nextNumber = fread($fh, 15);
$O_nextNumber = $O_nextNumber + 1;
rewind($fh);
fwrite($fh, $O_knr);
flock($fh, LOCK_UN);
fclose($fh);
unlink($lockFile); // Sletter lockfilen
}
return $O_nextNumber;
}