In my application I'm using the GuzzleHttp library, but it not probably the problem, but is't good to say it.
Every minute (using cron) I need to get data from 40+ addresses, so I took GuzzleHttp lib to be fast as possible.
Guzzle code:
$client = new Client();
$rectangles = $this->db->query("SELECT * FROM rectangles");
$requests = function ($rectangles)
{
foreach($rectangles as $rectangle)
{
// some GEO coords (It's not important)
$left = $rectangle["lft"];
$right = $rectangle["rgt"];
$top = $rectangle["top"];
$bottom = $rectangle["bottom"];
$this->indexes[] = $rectangle;
$uri = "https://example.com/?left=$left&top=$top&right=$right&bototm=$bottom";
yield new Request("GET", $uri);
}
};
$pool = new Pool($client, $requests($rectangles), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
$resp = $response->getBody();
$carray = json_decode($resp,true);
if($carray["data"] != null)
{
$alerts = array_filter($carray["data"], function($alert) {
return $alert["type"] == 'xxx';
});
$this->data = array_merge($this->data, $alerts);
$this->total_count += count($alerts);
}
},
'rejected' => function ($reason, $index) {},
]);
$promise = $pool->promise();
$promise->wait();
return $this->data;
Of course i made a benchmark of this.
1. getting data from another server 0.000xx sec
2. json_decode 0.001-0.0100 (this is probably the problem :-()
The entire code takes about 6-8 seconds. It depends on the amount of data that is on a remote server.
All the time I thought Guzzle performs request asynchronously, so it will takes time as the longest request.
(slowest request = 200 ms == all request = 200 ms) - But this is probably not true! Or I am doing something wrong.
I used an associative array in json_decode (I feel that this is an acceleration of 1 sec (I'm not sure...)).
My question is, can I this code more optimize and speed it up?
I wish to make it fast as one the slowest request (0.200 sec).
PS: The data that I'm getting from URLs are just long JSONs. Thanks!
EDIT: I changed the 'concurrency' => 5 to 'concurrency' => 100 and now the duration is about 2-4 sec
To start, increase the concurrency value in the Pool config to the total number of requests you need to send. This should be fine and may in fact get you even faster.
In regards to speeding up json_decode by milliseconds, this probably depends on a lot of factors including the hardware you are using on the server that processes the JSON as well the varying sizes of the JSON data. I don't think there is something you could do programmatically in PHP to speed up that core function. I could be wrong though.
Another part of your code to look at is: $this->data = array_merge($this->data, $alerts); You could try using a loop instead.
You also are performing double work with array_filter where internally the array is being iterated over before the array_merge.
So, instead of:
if ($carray["data"] != null) {
$alerts = array_filter($carray["data"], function($alert) {
return $alert["type"] == 'xxx';
});
$this->data = array_merge($this->data, $alerts);
$this->total_count += count($alerts);
}
Maybe try this:
if ($carray["data"] != null) {
foreach ($carray["data"] as $cdata) {
if ($cdata["type"] == 'xxx') {
$this-data[] = $cdata;
$this->total_count++;
}
}
}
Related
In my scenario I could be required to make over 100 curl requests to get information that I need. There's no way to get this information beforehand, and I don't have access to the server that I will be making the requests to. My plan is to use curl_multi_init(). Each response will come in json. The problem is that I need to receive the information in the order that I placed it otherwise I won't know where everything goes after the response comes back. How do I solve this problem.
When you get the handles back from curl_multi_info_read, you can compare those handles against your keyed list, then of course use the key to know where your response goes. Here's the direct implementation, based on a model I use for a scraper:
// here's our list of URL, in the order we care about
$easy_handles['google'] = curl_init('https://google.com/');
$easy_handles['bing'] = curl_init('https://bing.com/');
$easy_handles['duckduckgo'] = curl_init('https://duckduckgo.com/');
// our responses will be here, keyed same as URL list
$responses = [];
// here's the code to do the multi-request -- it's all boilerplate
$common_options = [ CURLOPT_FOLLOWLOCATION => true, CURLOPT_RETURNTRANSFER => true ];
$multi_handle = curl_multi_init();
foreach ($easy_handles as $easy_handle) {
curl_setopt_array($easy_handle, $common_options);
curl_multi_add_handle($multi_handle, $easy_handle);
}
do {
$status = curl_multi_exec($multi_handle, $runCnt);
assert(CURLM_OK === $status);
do {
$status = curl_multi_select($multi_handle, 2/*seconds timeout*/);
if (-1 === $status) usleep(10); // reported bug in PHP
} while (0 === $status);
while (false !== ($info = curl_multi_info_read($multi_handle))) {
foreach ($easy_handles as $key => $easy_handle) { // find the response handle
if ($info['handle'] === $easy_handle) { // from our list
if (CURLE_OK === $info['result']) {
$responses[$key] = curl_multi_getcontent($info['handle']);
} else {
$responses[$key] = new \RuntimeException(
curl_strerror($info['result'])
);
}
}
}
}
} while (0 < $runCnt);
Most of this is boilerplate machinery to do the multi fetch. The lines that target your specific question are:
foreach ($easy_handles as $key => $easy_handle) { // find the response handle
if ($info['handle'] === $easy_handle) { // from our list
if (CURLE_OK === $info['result']) {
$responses[$key] = curl_multi_getcontent($info['handle']);
Loop over your list comparing the returned handle against each stored handle, then use the corresponding key to fill in your response.
Obviously, since the requests are asynchronous, you cannot predict the order in which the responses will arrive. Therefore, in your design, you must provide for each request to include "some random bit of information" – a so-called nonce – which each client will somehow be obliged to return to you verbatim.
Based upon this "nonce," you will then be able to pair each response to the request which originated it – and to discard any random bits of garbage that wander in "out of the blue."
Otherwise, there is no(!) solution to your problem.
I am attempting to use guzzle promises in order to make some http calls, to illustrate what I have, I have made this simple example where a fake http request would take 5 seconds:
$then = microtime(true);
$promise = new Promise(
function() use (&$promise) {
//Make a request to an http server
$httpResponse = 200;
sleep(5);
$promise->resolve($httpResponse);
});
$promise2 = new Promise(
function() use (&$promise2) {
//Make a request to an http server
$httpResponse = 200;
sleep(5);
$promise2->resolve($httpResponse);
});
echo 'PROMISE_1 ' . $promise->wait();
echo 'PROMISE_2 ' . $promise2->wait();
echo 'Took: ' . (microtime(true) - $then);
Now what I would want to do is start both of them, and then make both echo's await for the response. What actually happens is promise 1 fires, waits 5 seconds then fires promise 2 and waits another 5 seconds.
From my understanding I should maybe be using the ->resolve(); function of a promise to make it start, but I dont know how to pass resolve a function in which I would make an http call
By using wait() you're forcing the promise to be resolved synchronously: https://github.com/guzzle/promises#synchronous-wait
According to the Guzzle FAQ you should use requestAsync() with your RESTful calls:
Can Guzzle send asynchronous requests?
Yes. You can use the requestAsync, sendAsync, getAsync, headAsync,
putAsync, postAsync, deleteAsync, and patchAsync methods of a client
to send an asynchronous request. The client will return a
GuzzleHttp\Promise\PromiseInterface object. You can chain then
functions off of the promise.
$promise = $client->requestAsync('GET', 'http://httpbin.org/get');
$promise->then(function ($response) {
echo 'Got a response! ' . $response->getStatusCode(); });
You can force an asynchronous response to complete using the wait()
method of the returned promise.
$promise = $client->requestAsync('GET', 'http://httpbin.org/get');
$response = $promise->wait();
This question is a little old but I see no answer, so I'll give it a shot, maybe someone will find it helpful.
You can use the function all($promises).
I can't find documentation about this function but you can find its implementation here.
The comment above this function starts like this:
Given an array of promises, return a promise that is fulfilled when all the items in the array are fulfilled.
Sounds like what you are looking for, so you can do something like this:
$then = microtime(true);
$promises = [];
$promises[] = new Promise(
function() use (&$promise) {
//Make a request to an http server
$httpResponse = 200;
sleep(5);
$promise->resolve($httpResponse);
});
$promises[] = new Promise(
function() use (&$promise2) {
//Make a request to an http server
$httpResponse = 200;
sleep(5);
$promise2->resolve($httpResponse);
});
all($promises)->wait();
echo 'Took: ' . (microtime(true) - $then);
If this function isn't the one that helps you solve your problem, there are other interesting functions in that file like some($count, $promises), any($promises) or settle($promises).
You can use the function Utils::all($promises)->wait();
Here is code example for "guzzlehttp/promises": "^1.4"
$promises = [];
$key = 0;
foreach(something...) {
$key++;
$promises[$key] = new Promise(
function() use (&$promises, $key) {
// here you can call some sort of async operation
// ...
// at the end call ->resolve method
$promises[$key]->resolve('bingo');
}
);
}
$res = Utils::all($promises)->wait();
It is important that your operation in promise must be non-blocking if you want to get concurrent workflow. For example, sleep(1) is blocking operation. So, 10 promises with sleep(1) - together are going to wait 10 sec anyway.
my web app requires making 7 different soap wsdl api requests to complete one task (I need the users to wait for the result of all the requests). The avg response time is 500 ms to 1.7 second for each request. I need to run all these request in parallel to speed up the process.
What's the best way to do that:
pthreads or
Gearman workers
fork process
curl multi (i have to build the xml soap body)
Well the first thing to say is, it's never really a good idea to create threads in direct response to a web request, think about how far that will actually scale.
If you create 7 threads for everyone that comes along and 100 people turn up, you'll be asking your hardware to execute 700 threads concurrently, which is quite a lot to ask of anything really...
However, scalability is not something I can usefully help you with, so I'll just answer the question.
<?php
/* the first service I could find that worked without authorization */
define("WSDL", "http://www.webservicex.net/uklocation.asmx?WSDL");
class CountyData {
/* this works around simplexmlelements being unsafe (and shit) */
public function __construct(SimpleXMLElement $element) {
$this->town = (string)$element->Town;
$this->code = (string)$element->PostCode;
}
public function run(){}
protected $town;
protected $code;
}
class GetCountyData extends Thread {
public function __construct($county) {
$this->county = $county;
}
public function run() {
$soap = new SoapClient(WSDL);
$result = $soap->getUkLocationByCounty(array(
"County" => $this->county
));
foreach (simplexml_load_string(
$result->GetUKLocationByCountyResult) as $element) {
$this[] = new CountyData($element);
}
}
protected $county;
}
$threads = [];
$thread = 0;
$threaded = true; # change to false to test without threading
$counties = [ # will create as many threads as there are counties
"Buckinghamshire",
"Berkshire",
"Yorkshire",
"London",
"Kent",
"Sussex",
"Essex"
];
while ($thread < count($counties)) {
$threads[$thread] =
new GetCountyData($counties[$thread]);
if ($threaded) {
$threads[$thread]->start();
} else $threads[$thread]->run();
$thread++;
}
if ($threaded)
foreach ($threads as $thread)
$thread->join();
foreach ($threads as $county => $data) {
printf(
"Data for %s %d\n", $counties[$county], count($data));
}
?>
Note that, the SoapClient instance is not, and can not be shared, this may well slow you down, you might want to enable caching of wsdl's ...
I am developing a simple RESTful API using Laravel 4.
I have set a Route that calls a function of my Controller that basically does this:
If information is in the database, pack it in a JSON object and return a response
Else try to download it (html/xml parsing), store it and finally pack the JSON response and send it.
I have noticed that the CPU load while doing a total of 1700 requests, only 2 at a time together, raises to 70-90%.
I am a complete php and laravel beginner and I've made the API following this tutorial, maybe I'm probably doing something wrong or it's just a proof of concept lacking of optimzations. How can I improve this code? (starting function is getGames)
Do you think the root of all problems is Laravel or I should obtain the same result even changing framework/using raw PHP?
UPDATE1 I also set a file Cache, but the CPU load is still ~50%.
UPDATE2 I set the query rate at two each 500ms and the CPU load lowered at 12%, so I guess this code is missing queue handling or something like this.
class GameController extends BaseController{
private static $platforms=array(
"Atari 2600",
"Commodore 64",
"Sega Dreamcast",
"Sega Game Gear",
"Nintendo Game Boy",
"Nintendo Game Boy Color",
"Nintendo Game Boy Advance",
"Atari Lynx",
"M.A.M.E.",
"Sega Mega Drive",
"Colecovision",
"Nintendo 64",
"Nintendo DS",
"Nintendo Entertainment System (NES)",
"Neo Geo Pocket",
"Turbografx 16",
"Sony PSP",
"Sony PlayStation",
"Sega Master System",
"Super Nintendo (SNES)",
"Nintendo Virtualboy",
"Wonderswan");
private function getDataTGDB($name,$platform){
$url = 'http://thegamesdb.net/api/GetGame.php?';
if(null==$name || null==$platform) return NULL;
$url.='name='.urlencode($name);
$xml = simplexml_load_file($url);
$data=new Data;
$data->query=$name;
$resultPlatform = (string)$xml->Game->Platform;
$data->platform=$platform;
$data->save();
foreach($xml->Game as $entry){
$games = Game::where('gameid',(string)$entry->id)->get();
if($games->count()==0){
if(strcasecmp($platform , $entry->Platform)==0 ||
(strcasecmp($platform ,"Sega Mega Drive")==0 &&
($entry->Platform=="Sega Genesis" ||
$entry->Platform=="Sega 32X" ||
$entry->Platform=="Sega CD"))){
$game = new Game;
$game->gameid = (string)$entry->id;
$game->title = (string)$entry->GameTitle;
$game->releasedate = (string)$entry->ReleaseDate;
$genres='';
if(NULL!=$entry->Genres->genre)
foreach($entry->Genres->genre as $genre){
$genres.=$genre.',';
}
$game->genres=$genres;
unset($genres);
$game->description = (string)$entry->Overview;
foreach($entry->Images->boxart as $boxart){
if($boxart["side"]=="front"){
$game->bigcoverurl = (string)$boxart;
$game->coverurl = (string) $boxart["thumb"];
} continue;
}
$game->save();
$data->games()->attach($game->id);
}
}
else foreach($games as $game){
$data->games()->attach($game->id);
}
}
unset($xml);
unset($url);
return $this->printJsonArray($data);
}
private function getArcadeHits($name){
$url = "http://www.arcadehits.net/index.php?p=roms&jeu=";
$url .=urlencode($name);
$html = file_get_html($url);
$data = new Data;
$data->query=$name;
$data->platform='M.A.M.E.';
$data->save();
$games = Game::where('title',$name)->get();
if($games->count()==0){
$game=new Game;
$game->gameid = -1;
$title = $html->find('h4',0)->plaintext;
if("Derniers jeux commentés"==$title)
{
unset($game);
return Response::json(array('status'=>'404'),200);
}
else{
$game->title=$title;
$game->description="(No description.)";
$game->releasedate=$html->find('a[href*=yearz]',0)->plaintext;
$game->genres = $html->find('a[href*=genre]',0)->plaintext;
$minithumb = $html->find('img.minithumb',0);
$game->coverurl = $minithumb->src;
$game->bigcoverurl = str_replace("/thumb/","/jpeg/",$minithumb->src);
$game->save();
$data->games()->attach($game->id);
}
}
unset($html);
unset($url);
return $this->printJsonArray($data);
}
private function printJsonArray($data){
$games = $data->games()->get();
$array_games = array();
foreach($games as $game){
$array_games[]=array(
'GameTitle'=>$game->title,
'ReleaseDate'=>$game->releasedate,
'Genres'=>$game->genres,
'Overview'=>$game->description,
'CoverURL'=>$game->coverurl,
'BigCoverURL'=>$game->bigcoverurl
);
}
$result = Response::json(array(
'status'=>'200',
'Game'=>$array_games
),200);
$key = $data->query.$data->platform;
if(!Cache::has($key))
Cache::put($key,$result,1440);
return $result;
}
private static $baseImgUrl = "";
public function getGames($apikey,$title,$platform){
$key = $title.$platform;
if(Cache::has($key)) return Cache::get($key);
if(!in_array($platform,GameController::$platforms)) return Response::json(array("status"=>"403","message"=>"non valid platform"));
$datas = Data::where('query',$title)
->where('platform',$platform)
->get();
//If this query has already been done we return data,otherwise according to $platform
//we call the proper parser.
if($datas->count()==0){
if("M.A.M.E."==$platform){
return $this->getArcadeHits($title);
}
else{
return $this->getDataTGDB($title,$platform);
}
} else{
else return $this->printJsonArray($datas->first());
}
}
}
?>
You're trying to retrieve data from others' servers. That is putting your CPU "on hold" until the data is fully retrieved. That's what is making your code be so "CPU expensive" (couldn't find other stuff that fits here =/ ), cause your script is waiting until the data is received and then release the script (CPU) work.
I strongly suggest that you make asynchronous calls. That would release your CPU to work on the code, while other part of your system is getting the information you need.
I hope that'll be some help! =D
UPDATE
To make examples, I'd have to re-factor your code (and I'm lazy as anything!). But, I can tell you for sure: If you put your request code, those who make calls to others site's XML, onto a queue you would gain a lot of free CPU time. Every request are redirected for a queue. Once they're ready, you treat them as you wish. Laravel has a beautiful way for dealing with queues.
what I would do first is to use a profiler to find out which parts would need an optimization. You can use for example this:
http://xdebug.org/docs/profiler
As well you didn't specify what kind of cpu is it, how many cores are you using? Is this a problem that your cpu is getting used that high?
you should use Laravel's Queue system along with beanstalkd for example and then monitor the queue (worker) with artisan queue:listen
So i have the following code:
private function getArtistInfo($artist){
$artisan = json_decode($artist, true);
$artistObj = array();
//fb($artist);
$artistObj['id'] = $artisan['name']['ids']['nameId'];
$memcache = new Memcached($artistObj['id']);
$artistCache = $memcache->getMemcache();
if($artistCache === false){
$artistObj['name'] = $artisan['name']['name'];
$artistObj['image'] = $artisan['name']['images'][0]['url'];
$initArtist = array('id' => $artistObj['id'], 'name' => $artistObj['name'], 'image' => $artistObj['image']);
$artistObj = $this->buildArtist($artisan, $artistObj);
$memcache->setMemcache($artistObj);
}
else{
$initArtist = array('id' => $artistCache['id'], 'name' => $artistCache['name'], 'image' => $artistCache['image']);
}
return $initArtist;
}
Now the code works but it takes getArtistInfo() too long to finish when i just want the $initArtist value; I would like my client to get right away the $initArtist once its constructed, and somehow let the caching of $artistObj runs in the background.
So far i have read up on several different topic i thought might be useful: event delegation, callback function, call_user_func, observer pattern, threading, gearman etc. However, I have no idea which one of them would actually do what i want. Please point me to the right direction.
EDIT:
My Memcached class:
class Memcached {
private static $MEMCACHED_HOST = "localhost";
private static $MEMCACHED_PORT = "11211";
private $id, $key, $memcache, $cacheOK;
function __construct ($id){
$this->id = $id;
$this->key = 'artistID_'. $this->id;
$this->memcache = new Memcache;
$this->cacheOK = $this->memcache->connect(Memcached::$MEMCACHED_HOST, Memcached::$MEMCACHED_PORT);
}
protected function getMemcache(){
$artistInfo = null;
if($this->cacheOK === true){
$artistInfo = $this->memcache->get($this->key);
}
if($artistInfo === false){
return false;
}
return $artistInfo;
}
public function setMemcache($artistInfo){
$this->memcache->set($this->key, $artistInfo, 0, 60);
}
}
My buildArtist() code:
private function buildArtist($artisan, $artistObj){
$artistObj['amgID'] = $artisan['name']['ids']['amgPopId'];
$discography = $artisan['name']['discography'];
foreach($discography as $album){
$albumID = $album['ids']['amgPopId'];
preg_match('/(\d+)/', $albumID, $matches);
$albumObj['amgAlbumID'] = $matches[1];
$albumObj['title'] = $album['title'];
$albumObj['releaseDate'] = $album['year'];
$albumObj['more'] = $this->getMoreMusic($albumObj['title'], $artistObj['name']);
$artistObj['discography'][] = $albumObj;
}
return $artistObj;
}
Well, it's not entirely clearly how long too long is, or which part of this code is what's slowing you down. For all we know, the slow part isn't the part that stores the data in Memcached.
In any case, once you do identify that this is your bottleneck, one thing you can do to accomplish this type of out of order execution is use a brokerless messaging queue like ZeroMQ to accept JSON object that need cached. A separate PHP script can then take on the job of processing and caching these requests asynchronously outside of any web-request. This separate script could be run through a cron-job or some other job manager that handles the caching part in parallel.
You want to use set and get rather than using the memcache persistence ID, i'm not even sure what setMemcache and getMemcache are but they aren't in the extension documentation.
Here's an example from the documentation:
<?php
$m = new Memcached();
$m->addServer('localhost', 11211);
if (!($ip = $m->get('ip_block'))) {
if ($m->getResultCode() == Memcached::RES_NOTFOUND) {
$ip = array();
$m->set('ip_block', $ip);
} else {
/* log error */
/* ... */
}
}
Please show the code of buildArtist for help on optimizing it.