I have a PHP script that processes data downloaded from multiple REST APIs into a standardized format and builds an array or table of this data. The script currently executes everything synchronously and therefore takes too long.
I have been trying to learn how to execute the function that fetches and processes the data, simultaneously or asynchronously so that the total time is the time of the slowest call. From my research it appears that ReactPHP or Amp are the correct tools.
However, I have been unsuccessful in creating test code that actually executes correctly. A simple example is attached, with mysquare() representing my more complex function. Due to a lack of examples on the net of exactly what I'm trying to achieve I have been forced to use a brute force method with 3 examples listed in my code.
Q1: Am I using the right tool for the job?
Q2: Can you fix my example code to execute asynchronously?
NB: I am a real beginner, so the simplest possible code example with a minimum of high level programming lingo would be appreciated.
<?php
require_once("../vendor/autoload.php");
for ($i = 0; $i <= 4; $i++) {
// Experiment 1
$deferred[$i] = new React\Promise\Deferred(function () use ($i) {
echo $x."\n";
usleep(rand(0, 3000000)); // Simulates long network call
return array($x=> $x * $x);
});
// Experiment 2
$promise[$i]=$deferred[$i]->promise(function () use ($i) {
echo $x."\n";
usleep(rand(0, 3000000)); // Simulates long network call
return array($x=> $x * $x);
});
// Experiment 3
$functioncall[$i] = function () use ($i) {
echo $x."\n";
usleep(rand(0, 3000000)); // Simulates long network call
return array($x=> $x * $x);
};
}
$promises = React\Promise\all($deferred); // Doesn't work
$promises = React\Promise\all($promise); // Doesn't work
$promises = React\Promise\all($functioncall); // Doesn't work
// print_r($promises); // Doesn't return array of results but a complex object
// This is what I would like to execute simulatenously with a variety of inputs
function mysquare($x)
{
echo $x."\n";
usleep(rand(0, 3000000)); // Simulates long network call
return array($x=> $x * $x);
}
Asynchronous doesn't mean multiple threads execute in parallel. 2 functions can only really run at the 'same time', if they (for example) do IO such as a HTTP request.
usleep() blocks, so you gain nothing. Both ReactPHP and Amp will have some kind of 'sleep' function themselves that's built right into the event loop.
For the same reason you will not be able to just use curl, because it will also block out of the box. You need to use the HTTP libraries that React and Amp provide and/recommend.
Since your end-goal is just doing HTTP requests, you could also not use any of these frameworks and just use the curl_multi functions. They're a bit hard to use though.
I'm answering my own question in an attempt to help other users, however this solution was developed alone without the help of an experienced programmer and so I do not know if it is ultimately the best way to do this.
TL;DR
I switched from ReactPHP because I didn't understand it to using amphp/parallel-functions which offers a simplified end user interface... sample code using this interface attached.
<?php
require_once("../vendor/autoload.php");
use function Amp\ParallelFunctions\parallelMap;
use function Amp\Promise\wait;
$start = \microtime(true);
$mysquare = function ($x) {
sleep($x); // Simulates long network call
//echo $x."\n";
return $x * $x;
};
print_r(wait(parallelMap([5,4,3,2,1,6,7,8,9,10], $mysquare)));
print 'Took ' . (\microtime(true) - $start) . ' milliseconds.' . \PHP_EOL;
The example code executes in 10.2 seconds which is slightly longer than the longest running instance of $mysquare().
In my actual use case I was able to fetch data via HTTP from 90 separate sources in around 5 seconds.
Notes:
The amphp/parallel-functions library appears to be using threads under the hood. From my preliminary experience this appears to require a lot more memory than just a single threaded PHP script, but I haven't yet ascertained the full impact. This was highlighted when I was passing a large array to $mysquare via the "use ($myarray)" expression and array was 65Mb. This brought the code to a standstill and it increased execution time exponentially so much so that it took orders of magnitude longer than synchronous execution. Also the memory usage peaked at over 5G! at one point leading me to believe that amphp was duplicating $myarray for each instance. Reworking my code to avoid the "use ($myarray)" expression fixed that problem.
Related
I am tyring to make a php function that updates every second using php itself no other languages, just pure PHP codes.
function exp(){
//do something
}
I want it to return a value each second. Like update every second.
For an application server (not a web server), best practice is to use an event loop pattern instead of sleep. This gives you the ability to run multiple timers should the need arise (sleep is blocking so nothing else can run in the mean time). Web servers on the other hand should not really be executing any long-running scripts.
Whilst other languages give you event loops out of the box (node / js for example, with setInterval), PHP does not, so you have to either use a well known library or make your own). React PHP is a widely used event loop for PHP.
Here is a quick-and-dirty "hello world" implementation of an event loop
define("INTERVAL", 5 ); // 5 seconds
function runIt() { // Your function to run every 5 seconds
echo "something\n";
}
function checkForStopFlag() { // completely optional
// Logic to check for a program-exit flag
// Could be via socket or file etc.
// Return TRUE to stop.
return false;
}
function start() {
$active = true;
$nextTime = microtime(true) + INTERVAL; // Set initial delay
while($active) {
usleep(1000); // optional, if you want to be considerate
if (microtime(true) >= $nextTime) {
runIt();
$nextTime = microtime(true) + INTERVAL;
}
// Do other stuff (you can have as many other timers as you want)
$active = !checkForStopFlag();
}
}
start();
In the real world you would encapsulate this nicely in class with all the whistles and bells.
Word about threading:
PHP is single threaded under the hood (any OS threading must be manually managed by the programmer which comes with a significant learning curve). So every task in your event loop will hold up the tasks that follow. Node on the other hand, for example manages OS threads under the hood, taking that "worry" away from the programmer (which is a topic of much debate). So when you call setInterval(), the engine will work its magic so that the rest of your javascript will run concurrently.
Quick final note:
It could be argued that this pattern is overkill if all you want to do is have a single function do something every 5 seconds. But in the case where you start needing concurrent timers, sleep() will not be the right tool for the job.
sleep() function is the function that you are looking for:
while (true) {
my_function(); // Call your function
sleep(5);
}
While loop with always true
Call your function inside while loop
Wait for 5 seconds(sleep)
Return the beginning of the loop
By the way it's not a logical use case of endless loops in PHP if you are executing the script through a web protocol(HTTP, HTTPS, etc.) because you will get a timeout. A rational use case could be a periodic database updater or a web crawler.
Such scripts can be executed through command line using php myscript.php or an alternative (but not recommended) way is using set_time_limit to extend the limit if you insist on using a web protocol to execute the script.
function exp(){
//do something
}
while(true){
exp();
sleep(5);
}
Use sleep function to make execution sleep for 5 seconds
it will be better if you use setInterval and use ajax to perform your action
$t0 = microtime(true);
$i = 0;
do{
$dt = round(microtime(true)-$t0);
if($dt!= $i){
$i = $dt;
if(($i % 5) == 0) //every 5 seconds
echo $i.PHP_EOL;
}
}while($dt<10); //max execution time
Suppose exp() is your function
function exp(){
//do something
}
Now we are starting a do-while loop
$status=TRUE;
do {
exp(); // Call your function
sleep(5); //wait for 5 sec for next function call
//you can set $status as FALSE if you want get out of this loop.
//if(somecondition){
// $status=FALSE:
//}
} while($status==TRUE); //loop will run infinite
I hope this one helps :)
It's not preferable to make this in PHP, try to make on client side by calculating difference between time you got from database and current time.
you can make this in JS like this:
setInterval(function(){
// method to be executed;
},5000); // run every 5 seconds
I have sript php with three function like this:
public function a($html,$text)
{
//blaa
return array();
}
public function b($html,$text){
//blaa
return array();
}
public function c($html,$text){
//blaa
return array();
}
require_once 'simple_html_dom.php';
$a=array();
$html=new simple_html_dom();
$a=$this->a($html,$text);
$b=$this->b($html,$text);
$c=$this->c($html,$text);
$html->clear();
unset($html);
$a=array_merge($a, $c);
$a=array_merge($a, $b);
a($html,$text) takes 5 seconds before giving a result
b($html,$text) takes 10 seconds before giving a result
c($html,$text) takes 12 seconds before giving a result
Thus the system takes 27 seconds before geving me a result, but I want take my result in 12 seconds. I can't use threads because my hosting does not support threads. How can I solve this problem?
PHP does not support this out of the box. If you really want to do this, you have two basic options (yep, it's going to be dirty). If you want a serious solution depending on your actual use-case, there is another option to consider.
Option 1: Use some AJAX-trickery
Create a page with a button that triggers three AJAX-calls to the different functions that you want to call.
Option 2: Run a command
If you're on UNIX, you can trigger a command from the PHP script to run a PHP script (php xyz.php) and that actually runs it on a different thread.
Serious option: use queues
Seriously: use a queue system like rabbitMQ or BeanstalkD to do these kind of things. Laravel supports it out of the box.
If the wait time is caused by blocking IO (waiting for server response) then curl_multi might help.
From the code you posted, though, it doesn't look like is your problem.
It looks more like simple html dom is taking a long time to parse your html. That's not too surprising because it's not a very good library. If this is the case you should consider switching to DomXPath.
You might wanna look into jQuery deferred objects.... $.when should handle this kinda of situation.
So I have this function for making non-blocking curl requests. It works fine on what I've tested so far (small amounts of requests). But I need this to scale up to thousands of requests (maybe max 10,000). My issue is that I don't want to run into issues with too many parallel requests running at once.
What would you suggest to rate-limit the requests? Usleep? Requests in batches? The function is below:
function poly_curl($requests){
$queue = curl_multi_init();
$curl_array = array();
$count = 0;
foreach($requests as $request)
{
$curl_array[$count] = curl_init($request);
curl_setopt($curl_array[$count], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($queue, $curl_array[$count]);
$count++;
}
$running = NULL;
do {
curl_multi_exec($queue,$running);
} while($running > 0);
$res = array();
$count = 0;
foreach($requests as $request)
{
$res[$count] = curl_multi_getcontent($curl_array[$count]);
$count++;
}
$count = 0;
foreach($requests as $request){
curl_multi_remove_handle($queue, $curl_array[$count]);
$count++;
}
curl_multi_close($queue);
return $res;
}
I think curl_multi_exec is bad for this purpose, because even if you use batches in groups of 100, 99 request could be finished and still will have to wait for the last request completion.
But you need 100 parallel requests and when one finishes, another is immediately started. So you cannot use curl_multi_exec at all.
I would use normal producer-consumer algorithm with multiple (constant number) consumers with every consumer processing only one url. For example php-resque and COUNT=100 php resque.php
You may want to implement something that is called Exponential Backoff (wikipedia).
Basically, it is an algorithm that allows you dynamically scale your processes depending on some feedback.
You define a rate in your application, and on the first time-out, error, or anything you decide, you decrease this rate until the request finishes.
You may implement it easily using the HTTP response code for example.
Last time i was doing something like this it was including downloading and "parsing" files. Was able to proceed only 4 subpages at a time limited by very weak hardware processor (2 cores with HT). What time i ended up with two queuest: 1 for waiting, 2 for in-process. Every time a task gone from 2nd queue, new was taken from 1st one.
May saund complicated, but ended in two loops inside eachother and simple count()'s
Btw, considering so hight rate i would think of using Node.js - for simplicity - or anything more nonblocking and more suitable for deamons than PHP.. As long as threads are PHP weakpoint, it just does not suit there.
PS: nice & useful bit of code, thanks.
We used to face the same problem with C++ connection pooling code. The approach is those days involved some serious analysis.
But, the essence was that, we created a pool and requests would get processed depending on number of available requests. What we also did was assign a maximum number of connection pools.[This was determined by testing].
What you really need is a method to determine how many requests are being processed and put a limit to it. In your case that is $count
Just compare $count to a maximum value[say, $max] to it and stop there. define the value depending on the system the program runs. $max could be hardcoded or dynamic.
The question sort of says it all - is there a function which does the same as the JavaScript function setTimeout() for PHP? I've searched php.net, and I can't seem to find any...
There is no way to delay execution of part of the code of in the current script. It wouldn't make much sense, either, as the processing of a PHP script takes place entirely on server side and you would just delay the overall execution of the script. There is sleep() but that will simply halt the process for a certain time.
You can, of course, schedule a PHP script to run at a specific time using cron jobs and the like.
There's the sleep function, which pauses the script for a determined amount of time.
See also usleep, time_nanosleep and time_sleep_until.
PHP isn't event driven, so a setTimeout doesn't make much sense. You can certainly mimic it and in fact, someone has written a Timer class you could use. But I would be careful before you start programming in this way on the server side in PHP.
A few things I'd like to note about timers in PHP:
1) Timers in PHP make sense when used in long-running scripts (daemons and, maybe, in CLI scripts). So if you're not developing that kind of application, then you don't need timers.
2) Timers can be blocking and non-blocking. If you're using sleep(), then it's a blocking timer, because your script just freezes for a specified amount of time.
For many tasks blocking timers are fine. For example, sending statistics every 10 seconds. It's ok to block the script:
while (true) {
sendStat();
sleep(10);
}
3) Non-blocking timers make sense only in event driven apps, like websocket-server. In such applications an event can occur at any time (e.g incoming connection), so you must not block your app with sleep() (obviously).
For this purposes there are event-loop libraries, like reactphp/event-loop, which allows you to handle multiple streams in a non-blocking fashion and also has timer/ interval feature.
4) Non-blocking timeouts in PHP are possible.
It can be implemented by means of stream_select() function with timeout parameter (see how it's implemented in reactphp/event-loop StreamSelectLoop::run()).
5) There are PHP extensions like libevent, libev, event which allow timers implementation (if you want to go hardcore)
Not really, but you could try the tick count function.
http://php.net/manual/en/class.evtimer.php is probably what you are looking for, you can have a function called during set intervals, similar to setInterval in javascript. it is a pecl extension, if you have whm/cpanel you can easily install it through the pecl software/extension installer page.
i hadn't noticed this question is from 2010 and the evtimer class started to be coded in 2012-2013. so as an update to an old question, there is now a class that can do this similar to javascripts settimeout/setinterval.
Warning: You should note that while the sleep command can make a PHP process hang, or "sleep" for a given amount of time, you'd generally implement visual delays within the user interface.
Since PHP is a server side language, merely writing its execution output (generally in the form of HTML) to a web server response: using sleep in this fashion will generally just stall or delay the response.
With that being said, sleep does have practical purposes. Delaying execution can be used to implement back off schemes, such as when retrying a request after a failed connection. Generally speaking, if you need to use a setTimeout in PHP, you're probably doing something wrong.
Solution: If you still want to implement setTimeout in PHP, to answer your question explicitly: Consider that setTimeout possesses two parameters, one which represents the function to run, and the other which represents the amount of time (in milliseconds). The following code would actually meet the requirements in your question:
<?php
// Build the setTimeout function.
// This is the important part.
function setTimeout($fn, $timeout){
// sleep for $timeout milliseconds.
sleep(($timeout/1000));
$fn();
}
// Some example function we want to run.
$someFunctionToExecute = function() {
echo 'The function executed!';
}
// This will run the function after a 3 second sleep.
// We're using the functional property of first-class functions
// to pass the function that we wish to execute.
setTimeout($someFunctionToExecute, 3000);
?>
The output of the above code will be three seconds of delay, followed by the following output:
The function executed!
if you need to make an action after you execute some php code you can do it with an echo
echo "Success.... <script>setTimeout(function(){alert('Hello')}, 3000);</script>";
so after a time in the client(browser) you can do something else, like a redirect to another php script for example or echo an alert
There is a Generator class available in PHP version > 5.5 which provides a function called yield that helps you pause and continue to next function.
generator-example.php
<?php
function myGeneratorFunction()
{
echo "One","\n";
yield;
echo "Two","\n";
yield;
echo "Three","\n";
yield;
}
// get our Generator object (remember, all generator function return
// a generator object, and a generator function is any function that
// uses the yield keyword)
$iterator = myGeneratorFunction();
OUTPUT
One
If you want to execute the code after the first yield you add these line
// get the current value of the iterator
$value = $iterator->current();
// get the next value of the iterator
$value = $iterator->next();
// and the value after that the next value of the iterator
// $value = $iterator->next();
Now you will get output
One
Two
If you minutely see the setTimeout() creates an event loop.
In PHP there are many libraries out there E.g amphp is a popular one that provides event loop to execute code asynchronously.
Javascript snippet
setTimeout(function () {
console.log('After timeout');
}, 1000);
console.log('Before timeout');
Converting above Javascript snippet to PHP using Amphp
Loop::run(function () {
Loop::delay(1000, function () {
echo date('H:i:s') . ' After timeout' . PHP_EOL;
});
echo date('H:i:s') . ' Before timeout' . PHP_EOL;
});
Check this Out!
<?php
set_time_limit(20);
while ($i<=10)
{
echo "i=$i ";
sleep(100);
$i++;
}
?>
Output:
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10
Which design pattern exist to realize the execution of some PHP processes and the collection of the results in one PHP process?
Background:
I do have many large trees (> 10000 entries) in PHP and have to run recursive checks on it. I want to reduce the elapsed execution time.
If your goal is minimal time - the solution is simple to describe, but not that simple to implement.
You need to find a pattern to divide the work (You don't provide much information in the question in this regard).
Then use one master process that forks children to do the work. As a rule the total number of processes you use should be between n and 2n, where n is the number of cores the machine has.
Assuming this data will be stored in files you might consider using non-blocking IO to maximize the throughput. Not doing so will make most of your process spend time waiting for the disk. PHP has stream_select() that might help you. Note that using it is not trivial.
If you decide not to use select - increasing the number of processes might help.
In regards to pcntl functions: I've written a deamon with them (a proper one with forking, changing session id, the running user, etc...) and it's one of the most reliable piece of software I've written. Because it spawns workers for every task, even if there is a bug in one of the tasks, it does not affect the others.
From your php script, you could launch another script (using exec) to do the processing. Save status updates in a text file, which could then be read periodically by the parent thread.
Note: to avoid php waiting for the exec'd script to complete, pipe the output to a file:
exec('/path/to/file.php | output.log');
Alternatively, you can fork a script using the PCNTL functions. This uses one php script, which when forked can detect whether it is the parent or the child and operate accordingly. There are functions to send/receive signals for the purpose of communicating between parent/child, or you have the child log to a file and the parent read from that file.
From the pcntl_fork manual page:
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
}
This might be a good time to consider using a message queue, even if you run it all on one machine.
The question seems to be a bit confused.
I want to reduce the absolute execution time.
Do you mean elapsed time? Certainly use of the right data-structure will improve throughput, but for a given data-structure, the minmimum order of the algorithm is absolute, and nothing to do with how you implement the algorithm.
Which design pattern exist to realize....?
Design Patterns are something which code is, not a template for writing programs, and a useful tools for curriculum design. To start with a pattern and make your code fit it is in itself an anti-pattern.
Nobody can answer this question withuot knowing a lot more about your data and how its structured, however the key driver for efficiency will be the data-structure you use to implement your tree. If elapsed time is important then certainly look at parallel execution, however it may also be worth considering performing the operation in a different tool - databases are highly optimized for dealing with large sets of data, however note that the obvious method for describing a tree in a relational database is very inefficient when it comes to isolating sub-trees and walking the tree.
In response to Adam's suggesting of forking you replied:
I "heard" that pcntl isnt a good solution. Any experiences?
Where did you hear that? Certainly forking from a CGI or mod_php invoked script is a bad idea, but nothing wrong with doing it from the command line. Do have a google for long running PHP processes (be warned there is a lot of bad information out there). What code you write will vary depending on the underlying OS - which you've not stated.
I suspect that you could solve a large part of your performance issues by identifying which parts of the tree need to be checked and only checking those parts AND triggering the checks when the tree is updated, or at least marking the nodes as 'dirty'.
You might find these helpful:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
http://en.wikipedia.org/wiki/Threaded_binary_tree
C.
You could use a more efficient data structure, such as a btree. I used once in Java but not in PHP. You can try this script: http://www.phpclasses.org/browse/file/708.html, it is an implementation of btree.
If it is not enough, you can use Hadoop to implement a Map/Reduce pattern, as Michael said. I would not fork PHP process, it does not seem to help for performace.
Personally, I would use PHP as client and put everything in Hadoop. This tutorial might help: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html.
Another solution can be to use a Java implementation of Btree: http://jdbm.sourceforge.net/. JDBM is an object database using a Btree+ data astructures. Then you can search with PHP by exposing data with a web service or by accessing it directly with Quercus
Using web or CLI?
If you use web, you could intergrate that part in Quercus Then you could use the advantages of JAVA multithreading.
I don't actually know how reliable Quercus is though. I'd also suggest using a kind of message queue and refactoring the code, so it doesn't need the scope.
Maybe you could rebuild the code to a Map/Reduce pattern. You then can run the PHP code in Hadoop Then you can cluster the processing through a couple of machines.
I don't know if it's useful, but I came across another project, called Gearman. It's also used to cluster PHP processes. I guess you can combine that with a reduce script as well, if Hadoop is not the way you want to go.
pthreads
There is a rather new (since 2012) PHP extension available: pthreads. It can be installed via PECL.
Simple Implementation in PHP Code: extend from Thread Class. Add a run() method and execute the start() method.
<?php
// Example from http://www.phpgangsta.de/richtige-threads-in-php-einfach-erstellen-mit-pthreads
class AsyncOperation extends Thread
{
public function __construct($threadId)
{
$this->threadId = $threadId;
}
public function run()
{
printf("T %s: Sleeping 3sec\n", $this->threadId);
sleep(3);
printf("T %s: Hello World\n", $this->threadId);
}
}
$start = microtime(true);
for ($i = 1; $i <= 5; $i++) {
$t[$i] = new AsyncOperation($i);
$t[$i]->start();
}
echo microtime(true) - $start . "\n";
echo "end\n";
Outputs
>php pthreads.php
0.041301012039185
end
T 1: Sleeping 3sec
T 2: Sleeping 3sec
T 3: Sleeping 3sec
T 4: Sleeping 3sec
T 5: Sleeping 3sec
T 1: Hello World
T 2: Hello World
T 3: Hello World
T 4: Hello World
T 5: Hello World
Try this: PHPThreads
Code Example:
function threadproc($thread, $param) {
echo "\tI'm a PHPThread. In this example, I was given only one parameter: \"". print_r($param, true) ."\" to work with, but I can accept as many as you'd like!\n";
for ($i = 0; $i < 10; $i++) {
usleep(1000000);
echo "\tPHPThread working, very busy...\n";
}
return "I'm a return value!";
}
$thread_id = phpthread_create($thread, array(), "threadproc", null, array("123456"));
echo "I'm the main thread doing very important work!\n";
for ($n = 0; $n < 5; $n++) {
usleep(1000000);
echo "Main thread...working!\n";
}
echo "\nMain thread done working. Waiting on our PHPThread...\n";
phpthread_join($thread_id, $retval);
echo "\n\nOur PHPThread returned: " . print_r($retval, true) . "!\n";
Requires PHP extensions:
posix
pcntl
sockets