I'm using Guzzle (http://guzzlephp.org) to GET a large number of urls (~300k) . The urls are retrieved from an Elastic Search instance, and I would like to keep adding urls to a Pool so the Pool stays rather small instead of adding them all at once.
Is this possible? I looked at the Pool.php, but did not find a way to do this. Is there a way?
Use while and generator (yield).
$client = new GuzzleHttp\Client();
$client = new Client();
$requests = function () {
$uris = ['http://base_url'];
$visited_uris = []; // maybe database instead of array
while(len($uris)>0)
yield new Request('GET', array_pop($uris));
}
};
$pool = new Pool($client, $requests(), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
$new_uri = get_new_uri(); // implement function to get new $uri
if(in_array($new_uri, $visited_uris)) {
array_push($uris, $uri);
}
array_push($visited_uris, $uri);
}
]);
$promise = $pool->promise();
$promise->wait();
Related
Guzzle provides a mechanism to send concurrent requests: Pool. I used the example from the docs: http://docs.guzzlephp.org/en/stable/quickstart.html#concurrent-requests. It works quite fine, sends concurrent requests and everything is awesome except one thing: it seems Guzzle ignores HTTP/2 in this case.
I've prepared a simplified script that sends two requests to https://stackoverflow.com, the first one is using Pool, the second one is just a regular Guzzle request. Only the regular request connects via HTTP/2.
<?php
include_once 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
$client = new Client([
'version' => 2.0,
'debug' => true
]);
/************************/
$requests = function () {
yield new Request('GET', 'https://stackoverflow.com');
};
$pool = new Pool($client, $requests());
$promise = $pool->promise();
$promise->wait();
/************************/
$client->get('https://stackoverflow.com', [
'version' => 2.0,
'debug' => true,
]);
Here is an output: https://pastebin.com/k0HaDWt6 (I highlighted important parts with "!!!!!")
Does anybody know why Guzzle does this and how to make Pool work with HTTP/2?
Found what was wrong: new Client() doesn't actually accept 'version' as an option if passed to Pool requests are created as new Request(). Either the protocol version must be provided as an option of every request or the requests must be created as $client->getAsync() (or ->postAsync or whatever).
See the corrected code:
...
$client = new Client([
'debug' => true
]);
$requests = function () {
yield new Request('GET', 'https://stackoverflow.com', [], null, '2.0');
};
/* OR
$client = new Client([
'version' => 2.0,
'debug' => true
]);
$requests = function () use ($client) {
yield function () use ($client) {
return $client->getAsync('https://stackoverflow.com');
};
};
*/
$pool = new Pool($client, $requests());
$promise = $pool->promise();
$promise->wait();
...
Is there any way to mock response and request in Guzzle?
I have a class which sends some request and I want to test.
In Guzzle doc I found a way how can I mock response and request separately. But how can I combine them?
Because, If use history stack, guzzle trying to send a real request.
And visa verse, when I mock response handler can't test request.
class MyClass {
public function __construct($guzzleClient) {
$this->client = $guzzleClient;
}
public function registerUser($name, $lang)
{
$body = ['name' => $name, 'lang' = $lang, 'state' => 'online'];
$response = $this->sendRequest('PUT', '/users', ['body' => $body];
return $response->getStatusCode() == 201;
}
protected function sendRequest($method, $resource, array $options = [])
{
try {
$response = $this->client->request($method, $resource, $options);
} catch (BadResponseException $e) {
$response = $e->getResponse();
}
$this->response = $response;
return $response;
}
}
Test:
class MyClassTest {
//....
public function testRegisterUser()
{
$guzzleMock = new \GuzzleHttp\Handler\MockHandler([
new \GuzzleHttp\Psr7\Response(201, [], 'user created response'),
]);
$guzzleClient = new \GuzzleHttp\Client(['handler' => $guzzleMock]);
$myClass = new MyClass($guzzleClient);
/**
* But how can I check that request contains all fields that I put in the body? Or if I add some extra header?
*/
$this->assertTrue($myClass->registerUser('John Doe', 'en'));
}
//...
}
#Alex Blex was very close.
Solution:
$container = [];
$history = \GuzzleHttp\Middleware::history($container);
$guzzleMock = new \GuzzleHttp\Handler\MockHandler([
new \GuzzleHttp\Psr7\Response(201, [], 'user created response'),
]);
$stack = \GuzzleHttp\HandlerStack::create($guzzleMock);
$stack->push($history);
$guzzleClient = new \GuzzleHttp\Client(['handler' => $stack]);
First of all, you don't mock requests. The requests are the real ones you are going to use in production. The mock handler is actually a stack, so you can push multiple handlers there:
$container = [];
$history = \GuzzleHttp\Middleware::history($container);
$stack = \GuzzleHttp\Handler\MockHandler::createWithMiddleware([
new \GuzzleHttp\Psr7\Response(201, [], 'user created response'),
]);
$stack->push($history);
$guzzleClient = new \GuzzleHttp\Client(['handler' => $stack]);
After you run your tests, $container will have all transactions for you to assert. In your particular test - a single transaction. You are interested in $container[0]['request'], since $container[0]['response'] will contain your canned response, so there is nothing to assert really.
So im using Guzzle 6 to make indeterminate concurrent api calls, but one of the things I want to do it keep track of which array value the promise is currently processing since I originally process the api calls based on database query result. And after that I want to update the value back into the database with whatever I get back from the api.
use GuzzleHttp\Pool;
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
$client = new Client();
$requests = function () {
$uri = 'http://127.0.0.1:8126/guzzle-server/perf';
foreach($database_result as $res) {
/*the res array contains
['id' => 'db id', 'query' => 'get query array'];
*/
$url = $uri . '?' . http_build_query($res['query']);
yield new Request('GET', $url);
}
};
$pool = new Pool($client, $requests(), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
/**
* HERE i want to be able to somehow
* retrieve the current responses db id
* this way I can obviously update anything
* i want on the db side
*/
},
'rejected' => function ($reason, $index) {
/**
* HERE i want to be able to somehow
* retrieve the current responses db id
* this way I can obviously update anything
* i want on the db side
*/
},
]);
// Initiate the transfers and create a promise
$promise = $pool->promise();
// Force the pool of requests to complete.
$promise->wait();
...
Any help with this would be amazing. I want to get advice on how to best approach this situation. I would prefer to do it in a smart, logical manner.
Thank you for your help
So I figured this out.
Basically
$requests = function () {
$uri = 'http://127.0.0.1:8126/guzzle-server/perf';
foreach($database_result as $key => $res) {
/*the res array was updated to be
['id' => 'get query array'];
*/
$url = $uri . '?' . http_build_query($res);
//here is the key difference in change
yield $key => new Request('GET', $url);
}
};
Now later the index in the pool functionality will contain the index you want.
Hope this helps.
Reference: https://github.com/guzzle/guzzle/pull/1203
I'm searching to retrieve the request total time in Guzzle 6, just after a simple GET request :
$client = new GuzzleHttp\Client();
$response = client->get('http://www.google.com/');
But can't find anything in the docs about that. Any idea ?
Thanks a lot.
In Guzzle 6.1.0 You can use the 'on_stats' request option to get transfer time etc.
More information can be found at Request Options - on_stats
https://github.com/guzzle/guzzle/releases/tag/6.1.0
You can use setter and getter.
private $totaltime = 0;
public function getTotaltime(){
return $this->totaltime;
}
public function setTotaltime($time){
$this->totaltime = $time;
}
$reqtime= new self();
$response = $client->post($endpointLogin, [
'json' => $payload,
'headers' => $this->header,
'on_stats' => function (TransferStats $stats) use ($reqtime) {
$stats->getTransferTime();
//** set it here **//
$reqtime->setTotaltime($stats->getTransferTime());
}
]);
dd($reqtime->getTotaltime());
An specific example based on the #Michael post.
$client = new GuzzleHttp\Client();
$response = $client->get('http://www.google.com/', [
'on_stats' => function (\GuzzleHttp\TransferStats $stats) {
echo $stats->getEffectiveUri() . ' : ' . $stats->getTransferTime();
}
]);
$client = new GuzzleHttp\Client();
$one = microtime(1);
$response = $client->get('http://www.google.com/');
$two = microtime(1);
echo 'Total Request time: '. ( $two - $one );
I had a similar problem although it's still Guzzle 5.3.
See Guzzle 5.3 - Get request duration for asynchronous requests
Maybe listening to an event in Guzzle6 and retrieving the TransferInfo will do the trick for you too.
This works for synchronous and asynchronous requests alike.
I need to send multiple requests so I want to implement a batch request.
How can we do it in Guzzle6?
Using the the old way:
$client->send(array(
$client->get($courses), //api url
$client->get($job_categories), //api url
));
is giving me the error:
GuzzleHttp\Client::send() must implement interface Psr\Http\Message\RequestInterface, array given
try something like this
$client = new Client();
foreach ($links as $link) {
$requests[] = new Request('GET', $link);
}
$responses = Pool::batch($client, $requests, array(
'concurrency' => 15,
));
foreach ($responses as $response) {
//do something
}
don't forget
use GuzzleHttp\Pool;
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;