Use non-blocking streams to paralellize REST-Api requests in PHP? - php

Consider the following scenario:
http://www.restserver.com/example.php returns some content that I want to work with in my web-application.
I don't want to load it using ajax (SEO issues etc.)
My page takes 100ms to generate, the REST resource also takes 100ms to be loaded.
We assume that the 100ms generation time of my website occour before I begin working with the REST resource. What comes after that can be neglected.
Example Code:
Index.php of my website
<?
do_some_heavy_mysql_stuff(); // takes 100 ms
get_rest_resource(); // takes 100 ms
render_html_with_data_from_mysql_and_rest(); // takes neglectable amount of time
?>
Website will take ~200ms to generate.
I want to turn this into:
<?
Restclient::initiate_rest_loading(); // takes 0ms
do_some_heavy_mysql_stuff(); // takes 100 ms
Restclient::get_rest_resource(); // takes 0 ms because 100 ms have already passed since initiation
render_html_with_data_from_mysql_and_rest(); // takes neglectable amount of time
?>
Website will take ~100ms to generate.
To accomplis this I thought about using something like this:
(I am pretty sure this code will not work because this question is all about asking how to accomplish this, and whether its possible. I just thought some naive code could demonstrate it best)
class Restclient {
public static $buffer;
public static function initiate_rest_loading() {
// open resource
$handle = fopen ("http://www.restserver.com/example.php", "r");
// set to non blocking so fgets will return immediately
stream_set_blocking($handle,0);
// initate loading, but return immediately to continue website generation
fgets($handle, 40960);
}
public static function get_rest_resource() {
// set stream to blocking again because now we really want the data
stream_set_blocking($handle,1);
// get the data and save it so templates can work with it
self::$buffer = fgets($handle, 40960); templates
}
}
So final question:
Is this possible and how?
What do I have to keep an eye on (internal buffer overflows, stream lengths etc.)
Are there better methods?
Does this well work with http resources?
Any input is appriciated!
I hope I explained it understandable. If anything is unclear, please leave a comment, so I can rephrase it!

As "any input is appreciated", here is mine:
What you want is called asynchronous (you want to something while something else is being done "in the background").
To solve your problem, I thought on this:
Separate do_some_heavy_mysql_stuff and get_rest_resource in two different PHP scripts.
Use cURL "multi" ability to do simultaneous requests. Please, check:
curl_multi_init and related PHP functions
Simultaneous HTTP requests in PHP with cURL
This way, you can perform both scripts at the same time. Using cURL multi features, you can call http://example.com/do_some_heavy_mysql_stuff.php and http://example.com/get_rest_resource.php at the same time, and then play with the results as soon as they're available.
These are my first thoughts, and Iim sharing them with you. Maybe there are different and more interesting approaches... Good luck!

Related

Is it possible to php script wait for done function and continue?

I'm working on something that script need to wait for do childFunc and after that return the result of childFunc and after that, the script continue. Something like async and await in javascript.
<?php
$loginResponse = $system->login($username, $password);
if ($loginResponse !== null && $loginResponse->isTwoFactorRequired()) {
$twoFactorIdentifier = $loginResponse->TwoFactorInfo()->getTwoFactorIdentifier();
// I need to wait here that myChaildFucn LoadView and get data from user and then after some process retuen result!!!!
$verificationCode = myChaildFucn();
$system->checkTwoFactorLogin($username, $password, $twoFactorIdentifier, $verificationCode,);
}
Salaam,
(if you want to speak Persian, I'm okay but I prefer to answer by English to ensure other people who has same your problem can find the solution)
so as you know, PHP is a synchronous engine and we have to use ajax to split our requests or use some functions to simulate Asynchronous performances.
anyway...
we have two options to solve this issue generally:
use sleep function : if you are sure about your delay timing, so just use it and define a delay to stop your app for seconds, otherwise It's not a good idea for your situation.
use foxyntax_ci in https://github.com/LegenD1995/foxyntax_ci : if you have SPA or using REST API, It's your best option!! actually it has some libraries on codeigniter and help you to build your queries, working with files, session, cookies, config authorization and ... with Asynchronous function from javascript. (NOTE: It's Alpha version but it hasn't bugs, just needs to review in media function to improve performance).

execute three function simultaneously

I have sript php with three function like this:
public function a($html,$text)
{
//blaa
return array();
}
public function b($html,$text){
//blaa
return array();
}
public function c($html,$text){
//blaa
return array();
}
require_once 'simple_html_dom.php';
$a=array();
$html=new simple_html_dom();
$a=$this->a($html,$text);
$b=$this->b($html,$text);
$c=$this->c($html,$text);
$html->clear();
unset($html);
$a=array_merge($a, $c);
$a=array_merge($a, $b);
a($html,$text) takes 5 seconds before giving a result
b($html,$text) takes 10 seconds before giving a result
c($html,$text) takes 12 seconds before giving a result
Thus the system takes 27 seconds before geving me a result, but I want take my result in 12 seconds. I can't use threads because my hosting does not support threads. How can I solve this problem?
PHP does not support this out of the box. If you really want to do this, you have two basic options (yep, it's going to be dirty). If you want a serious solution depending on your actual use-case, there is another option to consider.
Option 1: Use some AJAX-trickery
Create a page with a button that triggers three AJAX-calls to the different functions that you want to call.
Option 2: Run a command
If you're on UNIX, you can trigger a command from the PHP script to run a PHP script (php xyz.php) and that actually runs it on a different thread.
Serious option: use queues
Seriously: use a queue system like rabbitMQ or BeanstalkD to do these kind of things. Laravel supports it out of the box.
If the wait time is caused by blocking IO (waiting for server response) then curl_multi might help.
From the code you posted, though, it doesn't look like is your problem.
It looks more like simple html dom is taking a long time to parse your html. That's not too surprising because it's not a very good library. If this is the case you should consider switching to DomXPath.
You might wanna look into jQuery deferred objects.... $.when should handle this kinda of situation.

PHP - set time limit effectively

I have a simple script that makes redirection to mobile version of a website if it finds that user is browsing on mobile phone. It uses Tera-WURFL webservice to acomplish that and it will be placed on other hosting than Tera-WURFL itself. I want to protect it, in case of Tera-WURFL hosting downtime. In other words, if my script takes more than a second to run, then stop executing it and just redirect to regular website. How to do it effectively (so that the CPU would not be overly burdened by the script)?
EDIT: It looks that TeraWurflRemoteClient class have a timeout property. Read below. Now I need to find how to include it in my script, so that it would redirect to regular website in case of this timeout.
Here is the script:
// Instantiate a new TeraWurflRemoteClient object
$wurflObj = new TeraWurflRemoteClient('http://my-Tera-WURFL-install.pl/webservicep.php');
// Define which capabilities you want to test for. Full list: http://wurfl.sourceforge.net/help_doc.php#product_info
$capabilities = array("product_info");
// Define the response format (XML or JSON)
$data_format = TeraWurflRemoteClient::$FORMAT_JSON;
// Call the remote service (the first parameter is the User Agent - leave it as null to let TeraWurflRemoteClient find the user agent from the server global variable)
$wurflObj->getCapabilitiesFromAgent(null, $capabilities, $data_format);
// Use the results to serve the appropriate interface
if ($wurflObj->getDeviceCapability("is_tablet") || !$wurflObj->getDeviceCapability("is_wireless_device") || $_GET["ver"]=="desktop") {
header('Location: http://website.pl/'); //default index file
} else {
header('Location: http://m.website.pl/'); //where to go
}
?>
And here is source of TeraWurflRemoteClient.php that is being included. It has optional timeout argument as mentioned in documentation:
// The timeout in seconds to wait for the server to respond before giving up
$timeout = 1;
TeraWurflRemoteClient class have a timeout property. And it is 1 second by default, as I see in documentation.
So, this script won't be executed longer than a second.
Try achieving this by setting a very short timeout on the HTTP request to TeraWurfl inside their class, so that if the response doesn't come back in like 2-3 secs, consider the check to be false and show the full website.
The place to look for setting a shorter timeout might vary depending on the transport you use to make your HTTP request. Like in Curl you can set the timeout for the HTTP request.
After this do reset your HTTP request timeout back to what it was so that you don't affect any other code.
Also I found this while researching on it, you might want to give it a read, though I would say stay away from forking unless you are very well aware of how things work.
And just now Adelf posted that TeraWurflRemoteClient class has a timeout of 1 sec by default, so that solves your problem but I will post my answer anyway.

Downloading pages in parallel using PHP

I have to scrap a web site where i need to fetch multiple URLs and then process them one by one. The current process somewhat goes like this.
I fetch a base URL and get all secondary URLs from this page, then for each secondary url I fetch that URL, process found page, download some photos (which takes quite a long time) and store this data to database, then fetch next URL and repeat the process.
In this process, I think I am wasting some time in fetching secondary URL at the start of each iteration. So I am trying to fetch next URLs in parallel while processing first iteration.
The solution in my mind is, from main process call a PHP script, say downloader, which will download all the URL (with curl_multi or wget) and store them in some database.
My questions are
How to call such downloder asynchronously, I don't want my main script to wait till downloder completes.
Any location to store downloaded data, such as shared memory. Of course, other than database.
There any chances that data gets corrupt while storing and retrieving, how to avoid this?
Also, please guide me know if anyone have a better plan.
When I hear someone uses curl_multi_exec it usually turns out they just load it with, say, 100 urls, then wait when all complete, and then process them all, and then start over with the next 100 urls... Blame me, I was doing so too, but then I found out that it is possible to remove/add handles to curl_multi while something is still in progress, And it really saves a lot of time, especially if you reuse already open connections. I wrote a small library to handle queue of requests with callbacks; I'm not posting full version here of course ("small" is still quite a bit of code), but here's a simplified version of the main thing to give you the general idea:
public function launch() {
$channels = $freeChannels = array_fill(0, $this->maxConnections, NULL);
$activeJobs = array();
$running = 0;
do {
// pick jobs for free channels:
while ( !(empty($freeChannels) || empty($this->jobQueue)) ) {
// take free channel, (re)init curl handle and let
// queued object set options
$chId = key($freeChannels);
if (empty($channels[$chId])) {
$channels[$chId] = curl_init();
}
$job = array_pop($this->jobQueue);
$job->init($channels[$chId]);
curl_multi_add_handle($this->master, $channels[$chId]);
$activeJobs[$chId] = $job;
unset($freeChannels[$chId]);
}
$pending = count($activeJobs);
// launch them:
if ($pending > 0) {
while(($mrc = curl_multi_exec($this->master, $running)) == CURLM_CALL_MULTI_PERFORM);
// poke it while it wants
curl_multi_select($this->master);
// wait for some activity, don't eat CPU
while ($running < $pending && ($info = curl_multi_info_read($this->master))) {
// some connection(s) finished, locate that job and run response handler:
$pending--;
$chId = array_search($info['handle'], $channels);
$content = curl_multi_getcontent($channels[$chId]);
curl_multi_remove_handle($this->master, $channels[$chId]);
$freeChannels[$chId] = NULL;
// free up this channel
if ( !array_key_exists($chId, $activeJobs) ) {
// impossible, but...
continue;
}
$activeJobs[$chId]->onComplete($content);
unset($activeJobs[$chId]);
}
}
} while ( ($running > 0 && $mrc == CURLM_OK) || !empty($this->jobQueue) );
}
In my version $jobs are actually of separate class, not instances of controllers or models. They just handle setting cURL options, parsing response and call a given callback onComplete.
With this structure new requests will start as soon as something out of the pool finishes.
Of course it doesn't really save you if not just retrieving takes time but processing as well... And it isn't a true parallel handling. But I still hope it helps. :)
P.S. did a trick for me. :) Once 8-hour job now completes in 3-4 mintues using a pool of 50 connections. Can't describe that feeling. :) I didn't really expect it to work as planned, because with PHP it rarely works exactly as supposed... That was like "ok, hope it finishes in at least an hour... Wha... Wait... Already?! 8-O"
You can use curl_multi: http://www.somacon.com/p537.php
You may also want to consider doing this client side and using Javascript.
Another solution is to write a hunter/gatherer that you submit an array of URLs to, then it does the parallel work and returns a JSON array after it's completed.
Put another way: if you had 100 URLs you could POST that array (probably as JSON as well) to mysite.tld/huntergatherer - it does whatever it wants in whatever language you want and just returns JSON.
Aside from the curl multi solution, another one is just having a batch of gearman workers. If you go this route, I've found supervisord a nice way to start a load of deamon workers.
Things you should look at in addition to CURL multi:
Non-blocking streams (example: PHP-MIO)
ZeroMQ for spawning off many workers that do requests asynchronously
While node.js, ruby EventMachine or similar tools are quite great for doing this stuff, the things I mentioned make it fairly easy in PHP too.
Try execute from PHP, python-pycurl scripts. Easier, faster than PHP curl.

PHP Singleton class for all requests

I have a simple problem. I use php as server part and have an html output. My site shows a status about an other server. So the flow is:
Browser user goes on www.example.com/status
Browser contacts www.example.com/status
PHP Server receives request and ask for stauts on www.statusserver.com/status
PHP Receives the data, transforms it in readable HTML output and send it back to the client
Browser user can see the status.
Now, I've created a singleton class in php which accesses the statusserver only 8 seconds. So it updates the status all 8 seconds. If a user requests for update inbetween, the server returns the locally (on www.example.com) stored status.
That's nice isn't it? But then I did an easy test and started 5 browser windows to see if it works. Here it comes, the php server created a singleton class for each request. So now 5 Clients requesting all 8 seconds the status on the statusserver. this means I have every 8 second 5 calls to the status server instead of one!
Isn't there a possibility to provide only one instance to all users within an apache server? That would be solve the problem in case 1000 users are connecting to www.example.com/status....
thx for any hints
=============================
EDIT:
I already use a caching on harddrive:
public function getFile($filename)
{
$diff = (time()-filemtime($filename));
//echo "diff:$diff<br/>";
if($diff>8){
//echo 'grösser 8<br/>';
self::updateFile($filename);
}
if (is_readable($filename)) {
try {
$returnValue = #ImageCreateFromPNG($filename);
if($returnValue == ''){
sleep(1);
return self::getFile($filename);
}else{
return $returnValue;
}
} catch (Exception $e){
sleep(1);
return self::getFile($filename);
}
} else {
sleep(1);
return self::getFile($filename);
}
}
this is the call in the singleton. I call for a file and save it on harddrive. but all the request call it at same time and start requesting the status server.
I think the only solution would be a standalone application which does an update every 8 seconds on the file... All request should just read the file and nomore able to update it.
This standalone could be a perl script or something similar...
Php requests are handled by different processes and each of them have a different state, there isn't any resident process like in other web development framework. You should handle that behavior directly in your class using for instance some caching.
The method which query the server status should have this logic
public function getStatus() {
if (!$status = $cache->load()) {
// cache miss
$status = // do your query here
$cache->save($status); // store the result in cache
}
return $status;
}
In this way only one request of X will fetch the real status. The X value depends on your cache configuration.
Some cache library you can use:
APC
Memcached
Zend_Cache which is just a wrapper for actual caching engines
Or you can store the result in plain text file and on every request check for the m_time of the file itself and rewrite it if more than xx seconds are passed.
Update
Your code is pretty strange, why all those sleep calls? Why a try/catch block when ImageCreateFromPNG does not throw?
You're asking a different question, since php is not an application server and cannot store state across processes your approach is correct. I suggest you to use APC (uses shared memory so it would be at least 10x faster than reading a file) to share status across different processes. With this approach your code could become
public function getFile($filename)
{
$latest_update = apc_fetch('latest_update');
if (false == $latest_update) {
// cache expired or first request
apc_store('latest_update', time(), 8); // 8 is the ttl in seconds
// fetch file here and save on local storage
self::updateFile($filename);
}
// here you can process the file
return $your_processed_file;
}
With this approach the code in the if part will be executed from two different processes only if a process is blocked just after the if line, which should not happen because is almost an atomic operation.
Furthermore if you want to ensure that you should use something like semaphores to handle that, but it would be an oversized solution for this kind of requirement.
Finally imho 8 seconds is a small interval, I'd use something bigger, at least 30 seconds, but this depends from your requirements.
As far as I know it is not possible in PHP. However, you surely can serialize and cache the object instance.
Check out http://php.net/manual/en/language.oop5.serialization.php

Categories