I was wondering if there is a way to have a ReactPHP HTTP Server handle requests Asynchronously. I set up a very basic HTTP Server using the documentation (https://github.com/reactphp/http)
HTTPServer.php
<?php
$httpServer = new React\Http\Server(
function(Psr\Http\Message\ServerRequestInterface $request) {
$responseData = $request->getUri()->getPath();
if($responseData == "/testSlow") {sleep(5);} // simulate a slow response time
if($responseData == "/testFast") {sleep(1);} // simulate a fast response time
return new React\Http\Message\Response(
"200",
array("Access-Control-Allow-Headers" => "*", "Access-Control-Allow-Origin" => "*", "Content-Type" => "application/json"),
json_encode($responseData)
);
}
);
$socketServer = new React\Socket\Server("0.0.0.0:31");
$httpServer->listen($socketServer);
?>
It seems to working fine but Synchronously, if I send a request to the /testSlow path and then immediately to the /testFast path, the slow one will always finish first after 5 seconds and only once it has finished will the fast one then start and finish after 1 second
Am I missing some additional setup?
ReactPHP's event loop handles requests asynchronously, not in parallel. It means that there is only one running process. And call to sleep() hangs this process, i.e. prevents event loop from handling next requests. So, in asynchronous apps (in Node.js as well) it is a common practice to move heavy processing to dedicated processes.
I am not ReactPHP expert, so cannot provide a working example, but can point the root cause of the problem. I would recommend to read this awesome blog: https://sergeyzhuk.me/reactphp-series, and this article in particular: https://sergeyzhuk.me/2018/05/04/reactphp-child-processes
Related
I have a PHP application that fetches prices from various sources. On every execution, it's loading from over 20 API endpoints using Guzzle. Due to the load time of each call, it's taking 10-30 sec per execution cycle. If I can make all the calls in parallel, I can cut it down to less than 5 seconds.
What's the easiest way to make parallel API calls in PHP?
You're probably doing the wrong thing. These requests should probably be made at a regular interval in the background and the data be cached.
What you're trying to do is possible by leveraging non-blocking I/O. Curl offers this with curl_multi, which is available in Guzzle. But there are also other libraries implementing HTTP clients based on non-blocking I/O without a dependency on ext-curl, such as Artax.
Artax is based on Amp, which provides the basic primitives like the event loop and promises. You can start multiple requests and then wait for the set of promises.
$client = new Amp\Artax\DefaultClient;
$promises = [];
foreach ($urls as $url) {
$promises[$url] = Amp\call(function () use ($client, $url) {
// "yield" inside a coroutine awaits the resolution of the promise
// returned from Client::request(). The generator is then continued.
$response = yield $client->request($url);
// Same for the body here. Yielding an Amp\ByteStream\Message
// buffers the entire message.
$body = yield $response->getBody();
return $body;
});
}
$responses = Amp\Promise\wait(Amp\Promise\all($promises));
Try to convert the API calls to be done using commands, for example:
If you have 20 calls,
exec('php apiCall_1.php >> "./log.txt" &')
exec('php apiCall_2.php >> "./log.txt" &')
.
.
.
.
exec('php apiCall_20.php >> "./log.txt" &')
Notes:
1- All of those commands will be fired in async (do not forget to add the '&' at the end of each command to fire it, so it will not wait for the response).
2- Each command normally will store the result(which is fetched from the server) in DB collection/table.
3- Along with that, you have to write a method that keeps looking for the result if it's inserted to DB or not, so you can take it and send it back to your api consumer.
In a Silex application running on HVVM I have setup a dummy event listener on Kernel TERMINATE:
$app['dispatcher']->addListener(
KernelEvents::TERMINATE,
function () use ($app) {
usleep(10000000);
$app['logger']->alert("I AM REGISTERED!");
}
);
I was expecting my application to render the response as fast as possible within a second and after 10s I expected the message "I AM REGISTERED" to appear in my log.
Yet strangely the response is sent after the event has been executed, meaning the event blocks the response for 10s and I see both the response and the log message at the same time.
What is going on here?
I find it odd that in the Application.php, it appears that send is called before terminate:
vendor/silex/silex/src/Silex/Application.php:
/**
* Handles the request and delivers the response.
*
* #param Request|null $request Request to process
*/
public function run(Request $request = null)
{
if (null === $request) {
$request = Request::createFromGlobals();
}
$response = $this->handle($request);
$response->send();
$this->terminate($request, $response);
}
The symfony2 docs about HttpKernel, which silex is uisng as well, it says:
Internally, the HttpKernel makes use of the fastcgi_finish_request PHP
function. This means that at the moment, only the PHP FPM server API
is able to send a response to the client while the server's PHP
process still performs some tasks. With all other server APIs,
listeners to kernel.terminate are still executed, but the response is
not sent to the client until they are all completed.
And fastcgi_finish_request is not currently supported by hhvm.
Hence, the response will not be sent unless all events are completed.
PHP is not asynchronous, so while event handling is possible through use of callbacks, as soon as the event triggers, the control flow of the process will be dedicated to it.
Frameworks tend to delay content response to be the last action taken, in case any form of header modification has to happen.
As you mentioned, the content is being sent/echoed before the TERMINATE event is fired, but that's not the whole story.
It depends on how your server is set up. If, for example, you have gzip enabled in apache (very common), then apache will cache all content until PHP has finished execution (and then it will gzip and send it). You mentioned that you're on HHVM, which could also be the problem - it might not flush the content itself until execution is complete.
Either way, the best solution is to... well... not sleep. I'm assuming that you're sleeping to give the database a chance to flush to disk (10 seconds is a really long time to wait for that, though). If that's not the case, then finding a decent solution won't be easy until we can understand why you need to wait that long.
I tried to run a massive update of field values through an API and I ran into maximum execution time for my PHP script.
I divided my job into smaller tasks to run them asynchronously as smaller jobs...
Asynchronous PHP calls?
I found this post and It looks about right but the comments are a little off-putting... Will using curl to run external script files prevent the caller file triggering maximum execution time or will the curl still wait for a response from the server and kill my page?
The question really is: How do you do asynchronous jobs in PHP? Something like Ajax.
EDIT::///
There is a project management tool which has lots of rows of data.
I am using this tools API to access the rows of data and display them on my page.
The user using my tool will select multiple rows of data with a checkbox, and type a new value into a box.
The user will then press an "update row values" button which runs an update script.
this update script divides the hundreds or thousands of items possibly selected into groups of 100.
At this point I was going to use some asynchronous method to contact the project management tool and update all 100 items.
Because when it is updating those items, it could take that server a long time to run its process, I need to make sure that my original page splitting those jobs is no longer waiting for a request from that operation so that I can fire off more requests to update items. and allow my server page to say to my user "Okay, the update is currently happening, it may take a while and we'll send an email once its complete".
$step = 100;
$itemCount = GetItemCountByAppId( $appId );
$loopsRequired = $itemCount / $step;
$loopsRequired = ceil( $loopsRequired );
$process = array();
for( $a = 0; $a < $loopsRequired; $a++ )
{
$items = GetItemsByAppId( $appId, array(
"amount" => $step,
"offset" => ( $step * $a )
) );
foreach( $items[ "items" ] as $key => $item )
{
foreach( $fieldsGroup as $fieldId => $fieldValues )
{
$itemId = $item->__attributes[ "item_id" ];
/*array_push( $process, array(
"itemId" => $itemId,
"fieldId" => $fieldId,
) );*/
UpdateFieldValue( $itemId, $fieldId, $fieldValues );
// This Update function is actually calling the server and I assume it must be waiting for a response... thus my code times out after 30 secs of execution
}
}
//curl_post_async($url, $params);
}
If you are using PHP-CLI, try Threads, or fork() for non-thread-safe version.
Depending on how you implement it, asynchronous PHP might be used to decouple the web request from the processing and therefore isolate the web request from any timeout in the procesing (but you could do the same thing within a single thread). Will breaking the task into smaller concurrent parts make it run faster? Probably not - usually this will extend the length of time it takes for the job to complete - about the only time this is not the case is when you've got a very large processing capacity and can distribute the task effective (e.g. map-reduce). Are HTTP calls (curl) an efficient way to distribute work like this? No. There are other methods, including synchronous and asynchronous messaging, batch processing, process forking, threads....each with their own benefits and complications - and we don't know what the problem you are trying to solve is.
So even before we get to your specific questions, this does not look like a good strategy.
Will using curl to run external script files prevent the caller file triggering maximum execution time
It will be constrained by whatever timeouts are configured on the target server - if that's the same server as the invoking script, then it will be the same timeouts.
will the curl still wait for a response from the server and kill my page?
I don't know what you're asking here - it rather implies that there are functional dependenciesyou've not told us about.
It sounds like you've picked a solution and are now trying to make it fit your problem.
I have a simple web app which lists a private group of people and the last message they posted.
I'm currently just polling by using Ajax to hit a php script every 30 seconds, which gets all people and latest messages then returns as JSON. I parse that and update the DOM.
Not very efficient since most of the time, nothing has changed but it still gets all data each 30 seconds.
Is there anything basic I can do, just with the code to improve this?
Should I use something like pusher.com?
There are a lot of tutorials out there on implementing long polling but I'd like to keep this simple as possible.
when you use request http every 30 second it's possible many resource will be use, if there are 1000 users and more i think it's not good for web server,
i have suggestion using Nodejs , node js is javascript server a platform built on Chrome's JavaScript runtime for easily building fast,and support longpolling ,and non block request.
with nodejs you can build your webserver that can be handle many users and for realtime application.
there are many framework can be implement with node js
socket.io
express
and this is simple tutorial if you want to try that..
http://martinsikora.com/nodejs-and-websocket-simple-chat-tutorial
Without having to change much of the infrastructure, you can return a simple message indicating whether anything has been changed or not.
So, if http://localhost/example/request is returning:
{
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
on each request, you can instead return the following if nothing has been updated:
{
"updated": false
}
Additionally, you can also have updated: true indicating it's been updated:
{
"updated": true,
"phoneNumbers": [
{
"type": "work",
"number": "111-222-3333"
}
]
}
Overall, all you will have to do is check for the updated property of the returned object on each query. You only need to parse the response if updated is true
In the grand scheme of things though, the overhead caused by each HTTP request. Every time you poll, a new request is made to the browser. Having a lot of concurrent users will start introducing performance issues. I would suggest looking at real-time web frameworks such as Node.js (as viyancs mentioned) as they maintain a persistent connection for each user to allow "pushing" data (rather than polling) hence reducing the HTTP overhead.
In such a case, a better solution would be to use XHR Long Polling. It works somewhat similar to what you're doing right now, i.e. by making AJAX requests.
Here's how it works:
You make the AJAX request as you right are now.
The server side script only returns (echoes) when there is an update. If there isn't one, it keeps looping and waits for an update. This keeps the AJAX request pending.
When there is an update, your server side script returns, your client side JS updates the DOM, and immediately issues a new AJAX request.
PS - For this to work, you'll have to make sure the script isn't set to timeout after 30sec.
In psuedo-code, this is what your server side script will look like
$newInfo = false;
while ($newInfo === False) {
$newInfo = getNewInfo(); // a method that returns the new info. False if nothing has changed
// sleep here for a bit
usleep(500000);
}
// will only reach here when something has changed
echo json_encode($newInfo);
I have to scrap a web site where i need to fetch multiple URLs and then process them one by one. The current process somewhat goes like this.
I fetch a base URL and get all secondary URLs from this page, then for each secondary url I fetch that URL, process found page, download some photos (which takes quite a long time) and store this data to database, then fetch next URL and repeat the process.
In this process, I think I am wasting some time in fetching secondary URL at the start of each iteration. So I am trying to fetch next URLs in parallel while processing first iteration.
The solution in my mind is, from main process call a PHP script, say downloader, which will download all the URL (with curl_multi or wget) and store them in some database.
My questions are
How to call such downloder asynchronously, I don't want my main script to wait till downloder completes.
Any location to store downloaded data, such as shared memory. Of course, other than database.
There any chances that data gets corrupt while storing and retrieving, how to avoid this?
Also, please guide me know if anyone have a better plan.
When I hear someone uses curl_multi_exec it usually turns out they just load it with, say, 100 urls, then wait when all complete, and then process them all, and then start over with the next 100 urls... Blame me, I was doing so too, but then I found out that it is possible to remove/add handles to curl_multi while something is still in progress, And it really saves a lot of time, especially if you reuse already open connections. I wrote a small library to handle queue of requests with callbacks; I'm not posting full version here of course ("small" is still quite a bit of code), but here's a simplified version of the main thing to give you the general idea:
public function launch() {
$channels = $freeChannels = array_fill(0, $this->maxConnections, NULL);
$activeJobs = array();
$running = 0;
do {
// pick jobs for free channels:
while ( !(empty($freeChannels) || empty($this->jobQueue)) ) {
// take free channel, (re)init curl handle and let
// queued object set options
$chId = key($freeChannels);
if (empty($channels[$chId])) {
$channels[$chId] = curl_init();
}
$job = array_pop($this->jobQueue);
$job->init($channels[$chId]);
curl_multi_add_handle($this->master, $channels[$chId]);
$activeJobs[$chId] = $job;
unset($freeChannels[$chId]);
}
$pending = count($activeJobs);
// launch them:
if ($pending > 0) {
while(($mrc = curl_multi_exec($this->master, $running)) == CURLM_CALL_MULTI_PERFORM);
// poke it while it wants
curl_multi_select($this->master);
// wait for some activity, don't eat CPU
while ($running < $pending && ($info = curl_multi_info_read($this->master))) {
// some connection(s) finished, locate that job and run response handler:
$pending--;
$chId = array_search($info['handle'], $channels);
$content = curl_multi_getcontent($channels[$chId]);
curl_multi_remove_handle($this->master, $channels[$chId]);
$freeChannels[$chId] = NULL;
// free up this channel
if ( !array_key_exists($chId, $activeJobs) ) {
// impossible, but...
continue;
}
$activeJobs[$chId]->onComplete($content);
unset($activeJobs[$chId]);
}
}
} while ( ($running > 0 && $mrc == CURLM_OK) || !empty($this->jobQueue) );
}
In my version $jobs are actually of separate class, not instances of controllers or models. They just handle setting cURL options, parsing response and call a given callback onComplete.
With this structure new requests will start as soon as something out of the pool finishes.
Of course it doesn't really save you if not just retrieving takes time but processing as well... And it isn't a true parallel handling. But I still hope it helps. :)
P.S. did a trick for me. :) Once 8-hour job now completes in 3-4 mintues using a pool of 50 connections. Can't describe that feeling. :) I didn't really expect it to work as planned, because with PHP it rarely works exactly as supposed... That was like "ok, hope it finishes in at least an hour... Wha... Wait... Already?! 8-O"
You can use curl_multi: http://www.somacon.com/p537.php
You may also want to consider doing this client side and using Javascript.
Another solution is to write a hunter/gatherer that you submit an array of URLs to, then it does the parallel work and returns a JSON array after it's completed.
Put another way: if you had 100 URLs you could POST that array (probably as JSON as well) to mysite.tld/huntergatherer - it does whatever it wants in whatever language you want and just returns JSON.
Aside from the curl multi solution, another one is just having a batch of gearman workers. If you go this route, I've found supervisord a nice way to start a load of deamon workers.
Things you should look at in addition to CURL multi:
Non-blocking streams (example: PHP-MIO)
ZeroMQ for spawning off many workers that do requests asynchronously
While node.js, ruby EventMachine or similar tools are quite great for doing this stuff, the things I mentioned make it fairly easy in PHP too.
Try execute from PHP, python-pycurl scripts. Easier, faster than PHP curl.