I've been working on a script that makes close to a thousand async requests using getAsync and Promise\Settle. Each page requested it then parsed using Symphony crawler filter method (Also slow but a separate issue.)
My code looks something like this:
$requestArray = [];
$request = new Client($url);
foreach ($thousandItemArray as $item) {
$requestArray[] = $request->getAsync(null, $query);
}
$results = Promise\settle($request)->wait(true);
foreach ($results as $item) {
$item->crawl();
}
Is there a way I can crawl the requested pages as they come in rather than waiting for them all and then crawling. Am i right in thinking this would speed things up if possible?
Thanks for your help in advance.
You can. getAsync() returns a promise, so you can assign an action to it using ->then().
$promisesList[] = $request->getAsync(/* ... */)->then(
function (Response $resp) {
// Do whatever you want right after the response is available.
}
);
$results = Promise\settle($request)->wait(true);
P.S.
Probably you want to limit the concurrency level to some number of requests (not to start all the requests at once). If yes, use each_limit() function instead of settle. And vote for my PR to be able to use settle_limit() ;)
Related
I'm just looking at the Microsoft Graph API PHP SDK to get a bunch of resources, notably Users.
Looking a the SDK docs, there's 2 ways to get users, one using the createRequest() method and the other using the createCollectionRequest() method.
The docs suggests using the createCollectionRequest() and then just doing a while loop, array_merge and getPage() to create an array.
while (!$docGrabber->isEnd()) {
$docs = array_merge($docs,$docGrabber->getPage());
}
The issue is, I have a collection of ~50,000 users, so this method isn't particularly efficient.
I guess the biggest issue, i that the above example (using the while loop) is to avoid using the #odata.nextLink that the API returns.
But, what if we actually want to use this, instead of returning every single record in a single array?
Thanks
Instead of using getPage() and that sample, you can access the nextlink with something like this:
$url = "/users";
// Get the first page
$response = $graph->createCollectionRequest("GET", $url)
->setPageSize(50)
->execute();
if ($response->getNextLink())
{
$url = $response->getNextLink();
// TODO: remove https://graph.microsoft.com/v1.0 part of nextlink
} else {
// There are no more pages.
return null;
}
// get the next page, page size is already set in the next link
$response = $graph->createCollectionRequest("GET", $url)
->execute();
I have an array of altcoins i want to query using bittrex api. I loop through them and pass it on to the API as a parameter Like so:
$coins = array("BTC-LTC", "BTC-PTOY", "BTC-BLK");
for($i=0; $i<3; $i++) {
$mkt_code = $coins[$i];
getPrice($mkt_code);
}
function getPrice($mkt_code) {
$uri = 'https://bittrex.com/api/v1.1/public/getticker?market='.$mkt_code;
$obj = json_decode(file_get_contents($uri));
/*do something like save prices in a database*/
}
But if i set a cron job to run this file say every minute and array size is over 100 this could take quite a while to execute sequentially. I want a way of doing it concurrently so that i could get the prices all at once, much faster and in real time. Help please, i'm relatively new to this. Thanks
Use Guzzle's concurrent request feature.
Is there a way to sideload (load multiple API calls at the same time) API calls to lessen the impact on API call limits, using PHP?
For example, we're using the EchoNest API to gather information on musicians. When the artist page on our site is accessed, we run multiple functions which each call a different API method that returns the specific data that we need. Everything works and looks awesome!
Here are a few (abbreviated) methods that we're calling that each count against our call limit:
function artistPageNews() {
$artist_name = $_GET['artistname'];
$results = iTunes::search($artist_name, array(
'entity' => 'musicVideo'
))->results;
$echonest_api_key = "OUR_API_KEY";
// News Method
$echonest_news = 'http://developer.echonest.com/api/v4/artist/news?api_key='.$echonest_api_key.'&name='.str_replace(" ", "+", $artist_name).'&format=json&results=2&start=0';
$echonest_news_json = file_get_contents($echonest_news);
$news_json = json_decode($echonest_news_json);
$news_entry = $news_json->response->news;
foreach ($news_entry as $news) {
// Do Magic Stuff Here...
}
}
function artistPageVideos() {
$artist_name = $_GET['artistname'];
$results = iTunes::search($artist_name, array(
'entity' => 'musicVideo'
))->results;
$echonest_api_key = "OUR_API_KEY";
// Videos Method
$echonest_videos = 'http://developer.echonest.com/api/v4/artist/video?api_key='.$echonest_api_key.'&name='.str_replace(" ", "+", $artist_name).'&format=json&results=6&start=0';
$echonest_videos_json = file_get_contents($echonest_videos);
$videos_json = json_decode($echonest_videos_json);
$videos_entry = $videos_json->response->video;
foreach ($videos_entry as $video) {
// Do More Magic Stuff Here...
}
}
We have maybe about 7 (or more) of these methods that are called on each Artist page load. Obviously this can mean trouble when lots of people are viewing the artist pages every hour.
I understand that there's a way to store the more static information into a database and use that info instead of calling the API methods on every request. I am currently exploring that option. But I also read here that there may be a way to 'sideload' the API calls so that you can make multiple requests at one time. In that example, they're using Curl. I'm trying to do this with PHP.
curl https://{subdomain}.zendesk.com/api/v2/help_center/fr/articles.json?include=users \
-v -u {email_address}:{password}
Can anyone help me get started with this or perhaps recommend a better way to do this, such as storing this information into a database or table and pulling from that instead of calling the API every time?
Thanks in advance.
I need to get the state and country from the visitor IP. I will be using the country info to showcase custom made products. As for the state info it will not be used for the same purpose but only for record keeping to track the demand.
I have found on this site an instance of using the ipinfo.io API with this example code:
function ip_details($ip) {
$json = file_get_contents("http://ipinfo.io/{$ip}/json");
$details = json_decode($json);
return $details;
}
However, since I do not need the full details, I see that the site does allow to just grab single fields. So I am considering using these 2:
1) ipinfo.io/{ip}/region
2) ipinfo.io/{ip}/country
like so:
function ip_details($ip) {
$ip_state = file_get_contents("http://ipinfo.io/{$ip}/region");
$ip_country = file_get_contents("http://ipinfo.io/{$ip}/country");
return $ip_state . $ip_country;
}
OR would I be better off going with:
function ip_details($ip) {
$json = file_get_contents("http://ipinfo.io/{$ip}/geo");
$details = json_decode($json);
return $details;
}
The last one has the "/geo" in the url to slim down the selection from the first one with "/json". Currently I am leaning to the second option above by using 2 file_get_contents but wanted to know if it is slower than the last one having it in an array. Just want to minimize the load time. Or if any other method can be given it would be much appreciated.
In short, go for your second option, with a single request (file_get_contents makes a get request when parsed a url):
The result is a simple array, access the details you want via its key:
function ip_details($ip) {
$json = file_get_contents("http://ipinfo.io/{$ip}/geo");
$details = json_decode($json);
return $details;
}
$ipinfo = ip_details('86.178.xxx.xxx');
echo $ipinfo['country']; //GB
//etc
Regarding speed difference - 99% of the overhead is network latency, so making ONE request and parsing the details you need will be much faster than making 2 separate requests for individual details
I'm writing a chat application for joomla (apache server) and use this construction to emulate long-polling (server side):
function get_messages($last_id) {
$time = time();
while((time() - $time) < 25) {
$sql = 'SELECT * FROM #__messages WHERE `id` >'.intval($last_id);
$db->setQuery($sql);
$rows = $db->loadAssocList();
if (count($rows)>0) {
echo 'JSON STRING HERE';
} else {
flush();
}
usleep(5000000);
}
}
How Can I optimize this part of code.
Should I use infinite looping or should I avoid while construction?
P/S: I know Apache is not best choice to write chat app for and node.js is better.
Thanks!
Infinite loops are never a good idea because they hammer your server resources. You are better off having JS providing the intermittent polling to your get_messages function. Use a timeout and embed the script on any page that shows the messages.
I'm going to answer based on the limited information I've got to help you in the broadest way possible following industry standards. You need to not code in the way you currently are because it is very inefficient and quite frankly dangerous.
Here is the mootools code required to run an intervaled polling (I've used Mootools as you said you're using Joomla, I've assumed you're using 1.6+ as 1.5 is EOL this month):
//this sets how often you want to update (in milliseconds).
setInterval('chatPoll()',2000);
//this function essentially just grabs the raw data
//from the specified url and dumps it into the specified div
function chatPoll()
{
var unixTimestamp Math.round(new Date().getTime() / 1000)
var req = new Request({
method: 'get',
url: $('ajax-alert').get('http://www.yoururltoupdate.com/file.php?last=' + (unixTimestamp-2),
data: { 'do' : '1' },
onComplete: function(response) { response.inject($('my-chat-wrapper')); }
}).send();
}
Your PHP file should look something look like this:
get_messages($_GET['last']);
function get_messages($last_id) {
$sql = 'SELECT * FROM #__messages WHERE `id` >'.intval($last_id);
$db->setQuery($sql);
$rows = $db->loadAssocList();
if (count($rows)>0) {
echo json_encode($rows);
}
}
I haven't fully tested this code but it should work and if not will definitely help answer your query as to how what you're trying to do should be achieved rather than the way you originally posted. If you really wanted to get fancy you could check out node.js as well. There is also tons of extensions for Joomla which work as chat mediums for support purposes if that's what you were after.