Microsoft Graph API - paging large collections - php

I'm just looking at the Microsoft Graph API PHP SDK to get a bunch of resources, notably Users.
Looking a the SDK docs, there's 2 ways to get users, one using the createRequest() method and the other using the createCollectionRequest() method.
The docs suggests using the createCollectionRequest() and then just doing a while loop, array_merge and getPage() to create an array.
while (!$docGrabber->isEnd()) {
$docs = array_merge($docs,$docGrabber->getPage());
}
The issue is, I have a collection of ~50,000 users, so this method isn't particularly efficient.
I guess the biggest issue, i that the above example (using the while loop) is to avoid using the #odata.nextLink that the API returns.
But, what if we actually want to use this, instead of returning every single record in a single array?
Thanks

Instead of using getPage() and that sample, you can access the nextlink with something like this:
$url = "/users";
// Get the first page
$response = $graph->createCollectionRequest("GET", $url)
->setPageSize(50)
->execute();
if ($response->getNextLink())
{
$url = $response->getNextLink();
// TODO: remove https://graph.microsoft.com/v1.0 part of nextlink
} else {
// There are no more pages.
return null;
}
// get the next page, page size is already set in the next link
$response = $graph->createCollectionRequest("GET", $url)
->execute();

Related

Pagination not getting previous cursor google app engine in php

Hi have implement pagination in google app engine
with this code
https://github.com/GoogleCloudPlatform/php-docs-samples/blob/master/datastore/api/src/functions/concepts.php
function cursor_paging(DatastoreClient $datastore, $pageSize, $pageCursor = '')
{
$query = $datastore->query()
->kind('Task')
->limit($pageSize)
->start($pageCursor);
$result = $datastore->runQuery($query);
$nextPageCursor = '';
$entities = [];
/* #var Entity $entity */
foreach ($result as $entity) {
$nextPageCursor = $entity->cursor();
$entities[] = $entity;
}
return array(
'nextPageCursor' => $nextPageCursor,
'entities' => $entities
);
}
geting the next Cursor but did not get the previous cursor form this
Indeed, dealing with the previous cursor is something of a problem - even in other languages, it seems to be the bigger issue, when paginating data. There isn't much data or articles in the internet as well, on how to achieve that.
I will try to explain as much as I can, the possibilities, I believe, are available to you.
As per this other question from the Community here, you can create a new cursor with the information previous page, so you can use it for your pagination. This means that you would have three cursors now: $pageCursor, $nextPageCursor and $previousPageCursor. Using like this, you should be able to maintain the actual data, before setting the one for the next page.
Besides that, there is a reverse method in PHP that might help you, in case you want to count how many pages there will be on your application and put it backwards in the previous cursor. This way, you will have the inverted order, to use in your pagination as well - I still believe that giving a try in the first option, would be the best.
I have found some other useful questions from the Community, that might help you achieve your goal - that I believe are worth taking a look at it as well.
How do appengine cursors work?
App Engine datastore paging - previous page
Let me know if the information helped you!

Guzzle Async process response as it comes in

I've been working on a script that makes close to a thousand async requests using getAsync and Promise\Settle. Each page requested it then parsed using Symphony crawler filter method (Also slow but a separate issue.)
My code looks something like this:
$requestArray = [];
$request = new Client($url);
foreach ($thousandItemArray as $item) {
$requestArray[] = $request->getAsync(null, $query);
}
$results = Promise\settle($request)->wait(true);
foreach ($results as $item) {
$item->crawl();
}
Is there a way I can crawl the requested pages as they come in rather than waiting for them all and then crawling. Am i right in thinking this would speed things up if possible?
Thanks for your help in advance.
You can. getAsync() returns a promise, so you can assign an action to it using ->then().
$promisesList[] = $request->getAsync(/* ... */)->then(
function (Response $resp) {
// Do whatever you want right after the response is available.
}
);
$results = Promise\settle($request)->wait(true);
P.S.
Probably you want to limit the concurrency level to some number of requests (not to start all the requests at once). If yes, use each_limit() function instead of settle. And vote for my PR to be able to use settle_limit() ;)

Magento2 Swagger generated php client broken

I've generated a PHP client library for Magento2 using swagger-codegen. I'm able to connect to Magento and just trying some methods to see how usable the generated client is. It seems like I'm either missing something or maybe the swagger spec published by Magento is not quite there yet.
In particular, invoking the various list operations seems to be marginalized by design, and broken by nature in the generated swagger client. Take for example the operation to list products, /V1/products. Swagger UI indicates this can be parameterized with GET parameters (and in fact seems you must - when I try calling it with no parameters Magento returns an HTTP 400). Here's the sample generated code from the Markdown Swagger generated along with the client library
try {
$result = $api_instance->catalogProductRepositoryV1GetListGet(
$search_criteria_filter_groups_filters_field,
$search_criteria_filter_groups_filters_value,
$search_criteria_filter_groups_filters_condition_type,
$search_criteria_sort_orders_field,
$search_criteria_sort_orders_direction, $search_criteria_page_size,
$search_criteria_current_page);
print_r($result);
} catch (Exception $e) {
echo
'Exception when calling CatalogProductRepositoryVApi->catalogProductRepositoryV1GetListGet: ',
$e->getMessage(), "\n";
}
The first thing I notice is that these parameters only allow a single entry for each field, when the API actually allows you to define multiple filter_groups, multiple filters per filter_group etc. This great blog post helped me understand how the API is supposed to work.
Stepping back though, and supposing a limit of one filter_group and one filter for that group are acceptable and just trying to use the generated client on faith, I tried to put together a simple call
// Fetch all products with a contrived like query
$oMageClient = new Swagger\Client\Api\CatalogProductRepositoryVApi($oApiClient);
$result = $oMageClient->catalogProductRepositoryV1GetListGet('name', '%', 'like');
Magento complains with an HTTP 400, and it's because of the generated client's request params:
searchCriteria[filterGroups][][filters][][field]=name&searchCriteria[filterGroups][][filters][][value]=%&searchCriteria[filterGroups][][filters][][conditionType]=like
What it's done is broken up the parameters into different filter_groups... Sure enough when I look at the generated Swagger\Client\Api\CatalogProductRepositoryVApi:: catalogProductRepositoryV1GetListGetWithHttpInfo method I find the culprit where the query params are set. By changing
// query params
if ($search_criteria_filter_groups_filters_field !== null) {
$queryParams['searchCriteria[filterGroups][][filters][][field]'] = $this->apiClient->getSerializer()->toQueryValue($search_criteria_filter_groups_filters_field);
}// query params
if ($search_criteria_filter_groups_filters_value !== null) {
$queryParams['searchCriteria[filterGroups][][filters][][value]'] = $this->apiClient->getSerializer()->toQueryValue($search_criteria_filter_groups_filters_value);
}// query params
if ($search_criteria_filter_groups_filters_condition_type !== null) {
$queryParams['searchCriteria[filterGroups][][filters][][conditionType]'] = $this->apiClient->getSerializer()->toQueryValue($search_criteria_filter_groups_filters_condition_type);
}
to
// query params
if ($search_criteria_filter_groups_filters_field !== null) {
$queryParams['searchCriteria[filterGroups][0][filters][0][field]'] = $this->apiClient->getSerializer()->toQueryValue($search_criteria_filter_groups_filters_field);
}// query params
if ($search_criteria_filter_groups_filters_value !== null) {
$queryParams['searchCriteria[filterGroups][0][filters][0][value]'] = $this->apiClient->getSerializer()->toQueryValue($search_criteria_filter_groups_filters_value);
}// query params
if ($search_criteria_filter_groups_filters_condition_type !== null) {
$queryParams['searchCriteria[filterGroups][0][filters][0][conditionType]'] = $this->apiClient->getSerializer()->toQueryValue($search_criteria_filter_groups_filters_condition_type);
}
I'm able to get a response back from Magento. So I have a couple of questions
So is there an issue with the JSON Magento is publishing causing the generated Swagger code to be buggy? Or is there some other step I've messed up in generating the client?
It feels like something's not right, because if you look at the blog article and the generated Swagger documentation, Swagger is suggesting the filter_groups parameter is a string, when really it should be an array of objects.

Sideload API Calls with PHP

Is there a way to sideload (load multiple API calls at the same time) API calls to lessen the impact on API call limits, using PHP?
For example, we're using the EchoNest API to gather information on musicians. When the artist page on our site is accessed, we run multiple functions which each call a different API method that returns the specific data that we need. Everything works and looks awesome!
Here are a few (abbreviated) methods that we're calling that each count against our call limit:
function artistPageNews() {
$artist_name = $_GET['artistname'];
$results = iTunes::search($artist_name, array(
'entity' => 'musicVideo'
))->results;
$echonest_api_key = "OUR_API_KEY";
// News Method
$echonest_news = 'http://developer.echonest.com/api/v4/artist/news?api_key='.$echonest_api_key.'&name='.str_replace(" ", "+", $artist_name).'&format=json&results=2&start=0';
$echonest_news_json = file_get_contents($echonest_news);
$news_json = json_decode($echonest_news_json);
$news_entry = $news_json->response->news;
foreach ($news_entry as $news) {
// Do Magic Stuff Here...
}
}
function artistPageVideos() {
$artist_name = $_GET['artistname'];
$results = iTunes::search($artist_name, array(
'entity' => 'musicVideo'
))->results;
$echonest_api_key = "OUR_API_KEY";
// Videos Method
$echonest_videos = 'http://developer.echonest.com/api/v4/artist/video?api_key='.$echonest_api_key.'&name='.str_replace(" ", "+", $artist_name).'&format=json&results=6&start=0';
$echonest_videos_json = file_get_contents($echonest_videos);
$videos_json = json_decode($echonest_videos_json);
$videos_entry = $videos_json->response->video;
foreach ($videos_entry as $video) {
// Do More Magic Stuff Here...
}
}
We have maybe about 7 (or more) of these methods that are called on each Artist page load. Obviously this can mean trouble when lots of people are viewing the artist pages every hour.
I understand that there's a way to store the more static information into a database and use that info instead of calling the API methods on every request. I am currently exploring that option. But I also read here that there may be a way to 'sideload' the API calls so that you can make multiple requests at one time. In that example, they're using Curl. I'm trying to do this with PHP.
curl https://{subdomain}.zendesk.com/api/v2/help_center/fr/articles.json?include=users \
-v -u {email_address}:{password}
Can anyone help me get started with this or perhaps recommend a better way to do this, such as storing this information into a database or table and pulling from that instead of calling the API every time?
Thanks in advance.

IP location using ipinfo.io

I need to get the state and country from the visitor IP. I will be using the country info to showcase custom made products. As for the state info it will not be used for the same purpose but only for record keeping to track the demand.
I have found on this site an instance of using the ipinfo.io API with this example code:
function ip_details($ip) {
$json = file_get_contents("http://ipinfo.io/{$ip}/json");
$details = json_decode($json);
return $details;
}
However, since I do not need the full details, I see that the site does allow to just grab single fields. So I am considering using these 2:
1) ipinfo.io/{ip}/region
2) ipinfo.io/{ip}/country
like so:
function ip_details($ip) {
$ip_state = file_get_contents("http://ipinfo.io/{$ip}/region");
$ip_country = file_get_contents("http://ipinfo.io/{$ip}/country");
return $ip_state . $ip_country;
}
OR would I be better off going with:
function ip_details($ip) {
$json = file_get_contents("http://ipinfo.io/{$ip}/geo");
$details = json_decode($json);
return $details;
}
The last one has the "/geo" in the url to slim down the selection from the first one with "/json". Currently I am leaning to the second option above by using 2 file_get_contents but wanted to know if it is slower than the last one having it in an array. Just want to minimize the load time. Or if any other method can be given it would be much appreciated.
In short, go for your second option, with a single request (file_get_contents makes a get request when parsed a url):
The result is a simple array, access the details you want via its key:
function ip_details($ip) {
$json = file_get_contents("http://ipinfo.io/{$ip}/geo");
$details = json_decode($json);
return $details;
}
$ipinfo = ip_details('86.178.xxx.xxx');
echo $ipinfo['country']; //GB
//etc
Regarding speed difference - 99% of the overhead is network latency, so making ONE request and parsing the details you need will be much faster than making 2 separate requests for individual details

Categories