Correct way to run multithreading in a loop - php

I am running an online shop, and I have a big file with email addresses of customers, but the list is very old, like +15 years old. I bought code from codecanyon that checks the list, but it is very slow.
I tried to make it run multithreaded, but something is very wrong with my code, can you help me out?
The code works, but not the way I want it to. Now it checks the same email x %thread_nr. If I set 10 threads, it checks the same email 10x times.
What is wrong ?
<?php
require_once('email_checker.class.php');
class Task extends Threaded
{
private $value;
public function __construct(int $i)
{
$this->value = $i;
}
public function run()
{
$file_lines = file('mail.txt');
$emailChecker = new emailChecker; // Make a new instance
foreach ($file_lines as $line) {
$response = $emailChecker->check($line);
foreach($response as $result) {
echo $result['query'].'-'.$result['success']."\n";
}
}
}
}
$file_lines = file('mail.txt');
# Create a pool of 4 threads
$pool = new Pool(4);
for ($i = 0; $i < 15000; ++$i)
{
$pool->submit(new Task($i));
}
while ($pool->collect());
$pool->shutdown();

If I was in that situation I would rework the code so its cleaner, and broken into the smallest functions. Checking that the functions work individually.
I would also test that the code works without multi-threading first, with a small portion of your emails.

Related

PHP pthreads async optimisation

After tinkering/testing stuffs for a few days with pthreads I manage to write a script from scratch that actually works for my needs. The point of my script is to run async like 500-1000 threads and keep creating new ones immediately when one is available. However because I'm not an expert in PHP my code may have some flaws or it could be the same task that I want but in another way, in a better way.
Some flaws that I see untill now are:
The script could cause the CPU to go in 100% use because if there is
no sleep used in while loop. See sleep(3) line.
There are flaws in the code or unecessary stuffs. Because I'm noob with pthreads, it may be another way to do the same thing but much more optimised'.
Any suggestion to improve my code is appreciated. The reason why I post this is because I actually didn't find nothing on search engines that is doing this. All the examples are syncronised and all async examples like, run 500 threads and then stop and I was looking to something that it makes 500 threads but keep making another 500 untill reaches end of a file for example.
Script:
<?php
// Maximum number of threads.
$max_threads = 15; // I do plan to use like 500-1000 threads.
// Initiate the threads.
$threads = [];
for($i = 0; $i <= $max_threads; $i++) {
$threads[$i] = new MultiThreads($i);
$threads[$i]->start();
}
// Keep creating threads. Here I plan to make it to read a large file of urls(targets) but for demo/testing purpose I made a while loop.
while(true) {
for($i = 0; $i <= $max_threads; $i++) {
if($threads[$i]->isRunning() != true) { // If is not true(which means thread is busy) keep creating new threads. Why not creating a thread if is available.
$threads[$i] = new MultiThreads($i);
$threads[$i]->start();
}
}
//echo 'Sleeping for a while untill some threads are free.' . "\n"; // Or comment sleep(3) to not wait at all.
//sleep(3);
}
// End of script.
echo 'END! But will never end cause of while loop.' . "\n";
//////////////////////////////////////////////////////////////////
class MultiThreads extends Thread {
private $threadId;
public function __construct(int $id) {
$this->threadId = $id;
}
public function run() {
echo 'Thread(' . $this->threadId . ') started.' . "\n";
sleep(rand(1, 10)); // Simulate some 'work'.
echo 'Thread(' . $this->threadId . ') ended.' . "\n";
}
}
?>

voryx thruway multiple publish

I need to publish messages from php script, I can publish a single message fine. But now I need to publish different messages in loop, can't find proper way how to do it, here is what I tried:
$counter = 0;
$closure = function (\Thruway\ClientSession $session) use ($connection, &$counter) {
//$counter will be always 5
$session->publish('com.example.hello', ['Hello, world from PHP!!! '.$counter], [], ["acknowledge" => true])->then(
function () use ($connection) {
$connection->close(); //You must close the connection or this will hang
echo "Publish Acknowledged!\n";
},
function ($error) {
// publish failed
echo "Publish Error {$error}\n";
}
);
};
while($counter<5){
$connection->on('open', $closure);
$counter++;
}
$connection->open();
Here I want to publish $counter value to subscribers but the value is always 5, 1.Is there a way that I open connection before loop and then in loop I publish messages
2.How to access to $session->publish() from loop ?
Thanks!
There are a couple different ways to accomplish this. Most simply:
$client = new \Thruway\Peer\Client('realm1');
$client->setAttemptRetry(false);
$client->addTransportProvider(new \Thruway\Transport\PawlTransportProvider('ws://127.0.0.1:9090'));
$client->on('open', function (\Thruway\ClientSession $clientSession) {
for ($i = 0; $i < 5; $i++) {
$clientSession->publish('com.example.hello', ['Hello #' . $i]);
}
$clientSession->close();
});
$client->start();
There is nothing wrong with making many short connections to the router. If you are running in a daemon process though, it would probably make more sense to setup something that just uses the same client connection and then use the react loop to manage the loop instead of while(1):
$loop = \React\EventLoop\Factory::create();
$client = new \Thruway\Peer\Client('realm1', $loop);
$client->addTransportProvider(new \Thruway\Transport\PawlTransportProvider('ws://127.0.0.1:9090'));
$loop->addPeriodicTimer(0.5, function () use ($client) {
// The other stuff you want to do every half second goes here
$session = $client->getSession();
if ($session && ($session->getState() == \Thruway\ClientSession::STATE_UP)) {
$session->publish('com.example.hello', ['Hello again']);
}
});
$client->start();
Notice that the $loop is now being passed into the client constructor and also that I got rid of the line disabling automatic reconnect (so if there are network issues, your script will reconnect).

Best way to offload one-shot worker threads in PHP? pthreads? fcntl?

How should I multithread some php-cli code that needs a timeout?
I'm using PHP 5.6 on Centos 6.6 from the command line.
I'm not very familiar with multithreading terminology or code. I'll simplify the code here but it is 100% representative of what I want to do.
The non-threaded code currently looks something like this:
$datasets = MyLibrary::getAllRawDataFromDBasArrays();
foreach ($datasets as $dataset) {
MyLibrary::processRawDataAndStoreResultInDB($dataset);
}
exit; // just for clarity
I need to prefetch all my datasets, and each processRawDataAndStoreResultInDB() cannot fetch it's own dataset. Sometimes processRawDataAndStoreResultInDB() takes too long to process a dataset, so I want to limit the amount of time it has to process it.
So you can see that making it multithreaded would
Speed it up by allowing multiple processRawDataAndStoreResultInDB() to execute at the same time
Use set_time_limit() to limit the amount of time each one has to process each dataset
Notice that I don't need to come back to my main program. Since this is a simplification, you can trust that I don't want to collect all the processed datasets and do a single save into the DB after they are all done.
I'd like to do something like:
class MyWorkerThread extends SomeThreadType {
public function __construct($timeout, $dataset) {
$this->timeout = $timeout;
$this->dataset = $dataset;
}
public function run() {
set_time_limit($this->timeout);
MyLibrary::processRawDataAndStoreResultInDB($this->dataset);
}
}
$numberOfThreads = 4;
$pool = somePoolClass($numberOfThreads);
$pool->start();
$datasets = MyLibrary::getAllRawDataFromDBasArrays();
$timeoutForEachThread = 5; // seconds
foreach ($datasets as $dataset) {
$thread = new MyWorkerThread($timeoutForEachThread, $dataset);
$thread->addCallbackOnTerminated(function() {
if ($this->isTimeout()) {
MyLibrary::saveBadDatasetToDb($dataset);
}
}
$pool->addToQueue($thread);
}
$pool->waitUntilAllWorkersAreFinished();
exit; // for clarity
From my research online I've found the PHP extension pthreads which I can use with my thread-safe php CLI, or I could use the PCNTL extension or a wrapper library around it (say, Arara/Process)
https://github.com/krakjoe/pthreads (and the example directory)
https://github.com/Arara/Process (pcntl wrapper)
When I look at them and their examples though (especially the pthreads pool example) I get confused quickly by the terminology and which classes I should use to achieve the kind of multithreading I'm looking for.
I even wouldn't mind creating the pool class myself, if I had a isRunning(), isTerminated(), getTerminationStatus() and execute() function on a thread class, as it would be a simple queue.
Can someone with more experience please direct me to which library, classes and functions I should be using to map to my example above? Am I taking the wrong approach completely?
Thanks in advance.
Here comes an example using worker processes. I'm using the pcntl extension.
/**
* Spawns a worker process and returns it pid or -1
* if something goes wrong.
*
* #param callback function, closure or method to call
* #return integer
*/
function worker($callback) {
$pid = pcntl_fork();
if($pid === 0) {
// Child process
exit($callback());
} else {
// Main process or an error
return $pid;
}
}
$datasets = array(
array('test', '123'),
array('foo', 'bar')
);
$maxWorkers = 1;
$numWorkers = 0;
foreach($datasets as $dataset) {
$pid = worker(function () use ($dataset) {
// Do DB stuff here
var_dump($dataset);
return 0;
});
if($pid !== -1) {
$numWorkers++;
} else {
// Handle fork errors here
echo 'Failed to spawn worker';
}
// If $maxWorkers is reached we need to wait
// for at least one child to return
if($numWorkers === $maxWorkers) {
// $status is passed by reference
$pid = pcntl_wait($status);
echo "child process $pid returned $status\n";
$numWorkers--;
}
}
// (Non blocking) wait for the remaining childs
while(true) {
// $status is passed by reference
$pid = pcntl_wait($status, WNOHANG);
if(is_null($pid) || $pid === -1) {
break;
}
if($pid === 0) {
// Be patient ...
usleep(50000);
continue;
}
echo "child process $pid returned $status\n";
}

Fastest or most robust way to make 7 soap api requests in parallel

my web app requires making 7 different soap wsdl api requests to complete one task (I need the users to wait for the result of all the requests). The avg response time is 500 ms to 1.7 second for each request. I need to run all these request in parallel to speed up the process.
What's the best way to do that:
pthreads or
Gearman workers
fork process
curl multi (i have to build the xml soap body)
Well the first thing to say is, it's never really a good idea to create threads in direct response to a web request, think about how far that will actually scale.
If you create 7 threads for everyone that comes along and 100 people turn up, you'll be asking your hardware to execute 700 threads concurrently, which is quite a lot to ask of anything really...
However, scalability is not something I can usefully help you with, so I'll just answer the question.
<?php
/* the first service I could find that worked without authorization */
define("WSDL", "http://www.webservicex.net/uklocation.asmx?WSDL");
class CountyData {
/* this works around simplexmlelements being unsafe (and shit) */
public function __construct(SimpleXMLElement $element) {
$this->town = (string)$element->Town;
$this->code = (string)$element->PostCode;
}
public function run(){}
protected $town;
protected $code;
}
class GetCountyData extends Thread {
public function __construct($county) {
$this->county = $county;
}
public function run() {
$soap = new SoapClient(WSDL);
$result = $soap->getUkLocationByCounty(array(
"County" => $this->county
));
foreach (simplexml_load_string(
$result->GetUKLocationByCountyResult) as $element) {
$this[] = new CountyData($element);
}
}
protected $county;
}
$threads = [];
$thread = 0;
$threaded = true; # change to false to test without threading
$counties = [ # will create as many threads as there are counties
"Buckinghamshire",
"Berkshire",
"Yorkshire",
"London",
"Kent",
"Sussex",
"Essex"
];
while ($thread < count($counties)) {
$threads[$thread] =
new GetCountyData($counties[$thread]);
if ($threaded) {
$threads[$thread]->start();
} else $threads[$thread]->run();
$thread++;
}
if ($threaded)
foreach ($threads as $thread)
$thread->join();
foreach ($threads as $county => $data) {
printf(
"Data for %s %d\n", $counties[$county], count($data));
}
?>
Note that, the SoapClient instance is not, and can not be shared, this may well slow you down, you might want to enable caching of wsdl's ...

PHP - Multiple instances of script accessing same resources

I have to analyze a lot of information.
To speed things up I'll be running multiple instances of same script at the same moment.
However there is a big chance scripts would analyze same piece of information(duplicate) which I do not like as it would slow down the process.
If running only 1 instance I solve this problem with array(I save what has been already analyzed).
So I have a question how could I somehow sync that array with other "threads" ?
MySQL is an option but I guess it would be overkill?
I read also about memory sharing but not sure if this is solution I am looking for.
So if anyone has some suggestions let me know.
Regards
This is a trivial task using real multi-threading:
<?php
/* we want logs to be readable so we are creating a mutex for output */
define ("LOG", Mutex::create());
/* basically a thread safe printf */
function slog($message, $format = null) {
$format = func_get_args();
if ($format) {
$message = array_shift($format);
if ($message) {
Mutex::lock(LOG);
echo vsprintf(
$message, $format);
Mutex::unlock(LOG);
}
}
}
/* any pthreads descendant would do */
class S extends Stackable {
public function run(){}
}
/* a thread that manipulates the shared data until it's all gone */
class T extends Thread {
public function __construct($shared) {
$this->shared = $shared;
}
public function run() {
/* you could also use ::chunk if you wanted to bite off a bit more work */
while (($next = $this->shared->shift())) {
slog(
"%lu working with item #%d\n", $this->getThreadId(), $next);
}
}
}
$shared = new S();
/* fill with dummy data */
while (#$o++ < 10000) {
$shared[]=$o;
}
/* start some threads */
$threads = array();
while (#$thread++ < 5) {
$threads[$thread] = new T($shared);
$threads[$thread]->start();
}
/* join all threads */
foreach ($threads as $thread)
$thread->join();
/* important; ::destroy what you ::create */
Mutex::destroy(LOG);
?>
The slog() function isn't necessarily required for your use case, but thought it useful to show an executable example with readable output.
The main gist of it is that multiple threads need only a reference to a common set of data to manipulate that data ...

Categories