So im using Guzzle 6 to make indeterminate concurrent api calls, but one of the things I want to do it keep track of which array value the promise is currently processing since I originally process the api calls based on database query result. And after that I want to update the value back into the database with whatever I get back from the api.
use GuzzleHttp\Pool;
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
$client = new Client();
$requests = function () {
$uri = 'http://127.0.0.1:8126/guzzle-server/perf';
foreach($database_result as $res) {
/*the res array contains
['id' => 'db id', 'query' => 'get query array'];
*/
$url = $uri . '?' . http_build_query($res['query']);
yield new Request('GET', $url);
}
};
$pool = new Pool($client, $requests(), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
/**
* HERE i want to be able to somehow
* retrieve the current responses db id
* this way I can obviously update anything
* i want on the db side
*/
},
'rejected' => function ($reason, $index) {
/**
* HERE i want to be able to somehow
* retrieve the current responses db id
* this way I can obviously update anything
* i want on the db side
*/
},
]);
// Initiate the transfers and create a promise
$promise = $pool->promise();
// Force the pool of requests to complete.
$promise->wait();
...
Any help with this would be amazing. I want to get advice on how to best approach this situation. I would prefer to do it in a smart, logical manner.
Thank you for your help
So I figured this out.
Basically
$requests = function () {
$uri = 'http://127.0.0.1:8126/guzzle-server/perf';
foreach($database_result as $key => $res) {
/*the res array was updated to be
['id' => 'get query array'];
*/
$url = $uri . '?' . http_build_query($res);
//here is the key difference in change
yield $key => new Request('GET', $url);
}
};
Now later the index in the pool functionality will contain the index you want.
Hope this helps.
Reference: https://github.com/guzzle/guzzle/pull/1203
Related
I've a problem using guzzleHttp with multiple requests. I want to get the path of an url after setup the request but before sending it. Here's my code:
include "../../vendor/autoload.php";
use GuzzleHttp\Client;
/* Initiate Guzzle Client */
$client = new Client([
"verify" => false, // disable ssl certificate verification
"timeout" => 30, // maximum timeout for requests
"http_errors" => false, // disable exceptions
]);
$requests = [];
$requests["a"] = $client->requestAsync('GET', "https://www.aaa.de/aaa.html");
$requests["b"] = $client->requestAsync('GET', "https://www.bbb.de/bbb.html");
$requests["c"] = $client->requestAsync('GET', "https://www.ccc.de/ccc.html");
$content = performMultiRequest($requests);
function performMultiRequest($requests)
{
foreach ($requests as $key => $object) {
print_r($object);
exit;
}
/**
* here comes more to send the requests, but that doesn't care for this problem
*/
}
In this case I get a GuzzleHttp\Promise\Promise Object. My goal is to get only the path /aaa.html from $object. It must be found inside the function performMultiRequest(). There's no chance to read and parse the url before, e.g. when the requestAsync() is used.
This is the relevant part of $object I need:
I tried SO, Guzzle documentation, google, trail & error... nothing found... any ideas?
I have this code with a while loop to exceed the limit of the API of 1000 records by merging all the arrays in the response together in one array and passing it to the view, but it takes too long waiting time, is there any better way to do it and speed up the process?
this is my code
public function guzzleGet()
{
$aData = [];
$sCursor = null;
while($aResponse = $this->guzzleGetData($sCursor))
{
if(empty($aResponse['data']))
{
break;
}
else
{
$aData = array_merge($aData, $aResponse['data']);
if(empty($aResponse['meta']['next_cursor']))
{
break;
}
else
{
$sCursor = $aResponse['meta']['next_cursor'];
}
}
}
$user = Auth::user()->name;
return view("".$user."/home")->with(['data' => json_encode($aData)]);
}
protected function guzzleGetData($sCursor = null)
{
$client = new \GuzzleHttp\Client();
$token = 'token';
$response = $client->request('GET', 'https://data.beneath.dev/v1/user/project/table', [
'headers' => [
'Authorization' => 'Bearer '.$token,
],
'query' => [
'limit' => 1000,
'cursor' => $sCursor
]
]);
if($response->getBody())
{
return json_decode($response->getBody(), true) ?: [];
}
return [];
}
You would have to debug where is the bottleneck. If the reason its slow is your network/bandwidth, then that might be the issue. It could also be the API is limiting your download speed.
Another bottleneck could be the download speed of the client. Since you are building up a big array, when the server sends it to the client it needs to download it and can take time.
You could potentially increase speed a little by reusing the same curl handler, or guzzle client. Another are you could improve is the array_merge, you can use your own custom logic explained here: https://stackoverflow.com/a/23348715/8485567.
If you have control of the external API, make sure to use gzip and HTTP/2 or possibly even gRPC instead of http.
However I would recommend you do this on the client side using JS, like that you can avoid the additional bandwidth it takes for the client to download it from the server. You could use the same approach of limit 1000, or could even stream the response as it comes in and render it.
I'm playing around with the GuzzleHttp client, GuzzleCacheMiddleware and Memcached.
The setup is calling the same url with different parameters.
This results in one! memcached hit, so I think the memcached key is created from the url and only the url.
Can I somehow change this behaviour, so the key includes a md5 of the parameters?
You would have to create your own CacheStrategy class. For example you can extend PrivateCacheStrategy class and override getCacheKey method which is responsible for creating the cache key.
https://github.com/Kevinrob/guzzle-cache-middleware/blob/master/src/Strategy/PrivateCacheStrategy.php#L123
You are right that it creates storage key based on only the URL and request method.
Decided to look into it. You are right that it needs GreedyCacheStrategy because it literally caches everything regardless of any RFC standards.
Custom class for cache key creating.
class ParamsGreedyCacheStrategy extends GreedyCacheStrategy
{
/**
* Ignoring any headers, just straight up cache key based on method, URI, request body/params
*
* #param RequestInterface $request
* #param KeyValueHttpHeader|null $varyHeaders
* #return string
*/
protected function getCacheKey(RequestInterface $request, KeyValueHttpHeader $varyHeaders = null)
{
return hash(
'sha256',
'greedy' . $request->getMethod() . $request->getUri() . $request->getBody()
);
}
}
Creating requests. I used Laravel caching here, you can use memcached. I also allow POST HTTP method to be cached, because by default only GET is being cached!
$handlerStack = HandlerStack::create();
$cacheMiddleware = new CacheMiddleware(
new ParamsGreedyCacheStrategy(
new LaravelCacheStorage(
Cache::store('file')
),
10
)
);
// Not documented, but if you look at the source code they have methods for setting allowed HTTP methods. By default, only GET is allowed (per standards).
$cacheMiddleware->setHttpMethods(['GET' => true, 'POST' => true]);
$handlerStack->push(
$cacheMiddleware,
'cache'
);
$client = new Client([
'base_uri' => 'https://example.org',
'http_errors' => false,
'handler' => $handlerStack
]);
for($i = 0; $i < 4; $i++) {
$response = $client->post('/test', [
'form_params' => ['val' => $i]
]);
// Middleware attaches 'X-Kevinrob-Cache' header that let's us know if we hit the cache or not!
dump($response->getHeader('X-Kevinrob-Cache'));
}
I don't know if it's the right terms to employ...
I made an API, in which the answer is sent by the die() function, to avoid some more useless calculations and/or functions calls.
example :
if (isset($authorize->refusalReason)) {
die ($this->api_return(true, [
'resultCode' => $authorize->resultCode,
'reason' => $authorize->refusalReason
]
));
}
// api_return method:
protected function api_return($error, $params = []) {
$time = (new DateTime())->format('Y-m-d H:i:s');
$params = (array) $params;
$params = ['error' => $error, 'date_time' => $time] + $params;
return (Response::json($params)->sendHeaders()->getContent());
}
But my website is based on this API, so I made a function to create a Request and return the contents of it, based on its URI, method, params, and headers:
protected function get_route_contents($uri, $type, $params = [], $headers = []) {
$request = Request::create($uri, $type, $params);
if (Auth::user()->check()) {
$request->headers->set('S-token', Auth::user()->get()->Key);
}
foreach ($headers as $key => $header) {
$request->headers->set($key, $header);
}
// things to merge the Inputs into the new request.
$originalInput = Request::input();
Request::replace($request->input());
$response = Route::dispatch($request);
Request::replace($originalInput);
$response = json_decode($response->getContent());
// This header cancels the one there is in api_return. sendHeaders() makes Content-Type: application/json
header('Content-Type: text/html');
return $response;
}
But now when I'm trying to call an API function, The request in the API dies but dies also my current Request.
public function postCard($token) {
$auth = $this->get_route_contents("/api/v2/booking/payment/card/authorize/$token", 'POST', Input::all());
// the code below is not executed since the API request uses die()
if ($auth->error === false) {
return Redirect::route('appts')->with(['success' => trans('messages.booked_ok')]);
}
return Redirect::back()->with(['error' => $auth->reason]);
}
Do you know if I can handle it better than this ? Any suggestion of how I should turn my code into ?
I know I could just use returns, but I was always wondering if there were any other solutions. I mean, I want to be better, so I wouldn't ask this question if I knew for sure that the only way of doing what I want is using returns.
So it seems that you are calling an API endpoint through your code as if it is coming from the browser(client) and I am assuming that your Route:dispatch is not making any external request(like curl etc)
Now There can be various approaches to handle this:
If you function get_route_contents is going to handle all the requests, then you need to remove the die from your endpoints and simply make them return the data(instead of echoing). Your this "handler" will take care of response.
Make your Endpoint function to have an optional parameter(or some property set in the $request variable), which will tell the function that this is an internal request and data should be returned, when the request comes directly from a browser(client) you can do echo
Make an external call your code using curl etc(only do this if there is no other option)
I'm using Guzzle (http://guzzlephp.org) to GET a large number of urls (~300k) . The urls are retrieved from an Elastic Search instance, and I would like to keep adding urls to a Pool so the Pool stays rather small instead of adding them all at once.
Is this possible? I looked at the Pool.php, but did not find a way to do this. Is there a way?
Use while and generator (yield).
$client = new GuzzleHttp\Client();
$client = new Client();
$requests = function () {
$uris = ['http://base_url'];
$visited_uris = []; // maybe database instead of array
while(len($uris)>0)
yield new Request('GET', array_pop($uris));
}
};
$pool = new Pool($client, $requests(), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
$new_uri = get_new_uri(); // implement function to get new $uri
if(in_array($new_uri, $visited_uris)) {
array_push($uris, $uri);
}
array_push($visited_uris, $uri);
}
]);
$promise = $pool->promise();
$promise->wait();