I am using guzzle promises to send a concurrent request but I want to control the concurrency that's why I want to use guzzle pool. How I can transform guzzle promises into guzzle pool. here is my code:
public function getDispenceryforAllPage($dispencery)
{
$GetAllproducts = [];
$promiseGetPagination = $this->client->getAsync($dispencery)
->then(function ($response) {
return $this->getPaginationNumber($response->getBody()->getContents());
});
$Pagination = $promiseGetPagination->wait();
$pagearray = array();
for($i=1;$i<=$Pagination; $i++){
$pagearray[] = $i;
}
foreach($pagearray as $page_no) {
$GetAllproducts[] = $this->client->getAsync($dispencery.'?page='.$page_no)
->then(function ($response) {
$promise = $this->getData($response->getBody()->getContents());
return $promise;
});
}
$results = GuzzleHttp\Promise\settle($GetAllproducts)->wait();
return $results;
}
I have the below working example for guzzle 6.
I use postAsync and pool.
function postInBulk($inputs)
{
$client = new Client([
'base_uri' => 'https://a.b.com'
]);
$headers = [
'Authorization' => 'Bearer token_from_directus_user'
];
$requests = function ($a) use ($client, $headers) {
for ($i = 0; $i < count($a); $i++) {
yield function() use ($client, $headers) {
return $client->postAsync('https://a.com/project/items/collection', [
'headers' => $headers,
'json' => [
"snippet" => "snippet",
"rank" => "1",
"status" => "published"
]
]);
};
}
};
$pool = new Pool($client, $requests($inputs),[
'concurrency' => 5,
'fulfilled' => function (Response $response, $index) {
// this is delivered each successful response
},
'rejected' => function (RequestException $reason, $index) {
// this is delivered each failed request
},
]);
$pool->promise()->wait();
}
Just use each_limit() or each_limit_all() (instead of settle()) with a generator.
function getDispenceryforAllPage($dispencery)
{
$promiseGetPagination = $this->client->getAsync($dispencery)
->then(function ($response) {
return $this->getPaginationNumber($response->getBody()->getContents());
});
$Pagination = $promiseGetPagination->wait();
$pagearray = range(1, $Pagination);
$requestGenerator = function () use ($dispencery, $pagearray) {
foreach ($pagearray as $page_no) {
yield $this->client->getAsync($dispencery . '?page=' . $page_no)
->then(function ($response) {
return $this->getData($response->getBody()->getContents());
});
}
};
// Max 5 concurrent requests
$results = GuzzleHttp\Promise\each_limit_all($requestGenerator(), 5)->wait();
return $results;
}
I have modified your code to support pool.
class GuzzleTest
{
private $client;
public function __construct($baseUrl)
{
$this->client = new \GuzzleHttp\Client([// Base URI is used with relative requests
'base_uri' => $baseUrl,
// You can set any number of default request options.
'timeout' => 2.0,]);
}
public function getDispenceryforAllPage($dispencery)
{
$GetAllproducts = [];
$promiseGetPagination = $this->client->getAsync($dispencery)
->then(function ($response) {
return $this->getPaginationNumber($response->getBody()->getContents());
});
$Pagination = $promiseGetPagination->wait();
$pagearray = array();
for ($i = 1; $i <= $Pagination; $i++) {
$pagearray[] = $i;
}
$pool = new \GuzzleHttp\Pool($this->client, $this->_yieldRequest($pagearray, $dispencery), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
// this is delivered each successful response
},
'rejected' => function ($reason, $index) {
// this is delivered each failed request
},
]);
// Initiate the transfers and create a promise
$poolPromise = $pool->promise();
// Force the pool of requests to complete.
$results = $poolPromise->wait();
return $results;
}
private function _yieldRequest($pagearray, $dispencery){
foreach ($pagearray as $page_no) {
$uri = $dispencery . '?page=' . $page_no;
yield function() use ($uri) {
return $this->client->getAsync($uri);
};
}
}
}
Related
I am using Symfony 4.4 and PHP 7.4.
I would like to use Guzzle Pool with multipart and I need to add a proxy.
$requests = function ($total) {
$uri = $this->parameters['endpoint'] . self::EMAIL_ENDPOINT;
for ($i = 0; $i < $total; $i++) {
yield new Request(
'POST', $uri,[], new MultipartStream($this->multiPartConfiguration->getMultipart())
);
}
};
$pool = new Pool($this->client, $requests(4185), [
//'concurrency' => 5,
'fulfilled' => function (Response $response, $index) {
$data = json_decode((string)$response->getBody(), true);
dump($data);
},
'rejected' => function (RequestException $reason, $index) {
dump('Reason : '.$reason->getMessage());
},
]);
I don't know where I can add the option proxy.
Thanks
I try load many urls async but can't think of anything how can I know proccess url..Code below:
$requests = function ($total, $urls) {
for ($i = 0; $i < $total; $i++) {
yield new Request('GET', $urls[$i]);
}
};
$pool = new Pool($client, $requests(50, $urls), [
'concurrency' => 5,
'fulfilled' => function (Response $response, $index) {
// How i can this know what url was proccessed?
},
'rejected' => function (RequestException $reason, $index) {
// And here too..
},
]);
$promise = $pool->promise();
$promise->wait();
Please help, any ideas..
I'm using guzzle in a loop to get an array of promises. This string in the loop:
$rpomises[] = $this->client->getAsync($url, $options);
Next, I make:
$res = collect(Promise\settle($promises)->wait());
One of the items from the result is:
As you can see it's just an array with string field and GuzzleHttp\Rsr7\Response object. So how I can get requested URL from this construction?
Thank you for any help!
I ran into the same issue.
My uses:
use GuzzleHttp\Client;
use GuzzleHttp\Promise\EachPromise;
use GuzzleHttp\Psr7\Response;
use Requests;
Extracted function from class:
public function getUrlsInParallelRememberingSource($urls,
$numberThreads = 10)
{
$client = new Client();
$responsesModified = [];
$promises = (function () use ($urls, $client, &$responsesModified)
{
foreach ($urls as $url) {
yield $client->getAsync($url)->then(function($response) use ($url, &$responsesModified)
{
$data = [
'url' => $url,
'body' => 'res' // pass here whatever you want
];
$responsesModified[] = $data;
return $response;
});
}
})();
$eachPromise = new EachPromise($promises,
[
'concurrency' => $numberThreads,
'fulfilled' => function (Response $response)
{
},
'rejected' => function ($reason)
{
}
]);
$eachPromise->promise()->wait();
return $responsesModified;
}
This gives follwing result:
http://i.kagda.ru/5001133750442_01-11-2020-00:34:55_5001.png
When I catch many exceptions I want to stop sending requests in Guzzle. Does anybody know how can do that?
Here my snippet of code:
protected function parseAsyncCustomers($urls)
{
$promises = (function () use ($urls) {
do {
$uri = new Uri(current($urls));
$request = new Request('GET', $uri, ['User-Agent' => UserAgent::random()]);
yield $this->httpClient->sendAsync($request, [
'timeout' => 15,
'connect_timeout' => 15,
]);
} while (next($urls) !== false);
})();
(new \GuzzleHttp\Promise\EachPromise($promises, [
// Multiple Concurrent HTTP Requests
'concurrency' => 10,
'fulfilled' => function (ResponseInterface $response, $index) {
$content = $response->getBody()->getContents();
$this->parseCustomerContent($content, $index);
},
'rejected' => function ($reason, $index) {
// This is delivered each failed request
if ($reason instanceof GuzzleException) {
if ($this->reject++ > 30) {
// how can stop sending next requests?
}
}
},
]))->promise()->wait();
}
There is a fourth parameter to the rejected callback which represents the whole EachPromise. You can reject it in your condition, and it will stop the execution flow.
'rejected' => function ($reason, $index, $idx, $aggregate) {
// This is delivered each failed request
if ($reason instanceof GuzzleException) {
if ($this->reject++ > 30) {
$aggregate->reject('Attempts limit exceeded')
}
}
},
I am trying to write HTML parser with the help of Goutte. It works very well. However Goutte uses blocking requests. This works well if you are dealing with a single service. If I want to query lots of services which are independent from each other, this causes a problem. Goutte uses BrowserKit and Guzzle. I have tried to change doRequest function but it failed with
Argument 1 passed to
Symfony\Component\BrowserKit\CookieJar::updateFromResponse() must be
an instance of Symfony\Component\BrowserKit\Response
protected function doRequest($request)
{
$headers = array();
foreach ($request->getServer() as $key => $val) {
$key = strtolower(str_replace('_', '-', $key));
$contentHeaders = array('content-length' => true, 'content-md5' => true, 'content-type' => true);
if (0 === strpos($key, 'http-')) {
$headers[substr($key, 5)] = $val;
}
// CONTENT_* are not prefixed with HTTP_
elseif (isset($contentHeaders[$key])) {
$headers[$key] = $val;
}
}
$cookies = CookieJar::fromArray(
$this->getCookieJar()->allRawValues($request->getUri()),
parse_url($request->getUri(), PHP_URL_HOST)
);
$requestOptions = array(
'cookies' => $cookies,
'allow_redirects' => false,
'auth' => $this->auth,
);
if (!in_array($request->getMethod(), array('GET', 'HEAD'))) {
if (null !== $content = $request->getContent()) {
$requestOptions['body'] = $content;
} else {
if ($files = $request->getFiles()) {
$requestOptions['multipart'] = [];
$this->addPostFields($request->getParameters(), $requestOptions['multipart']);
$this->addPostFiles($files, $requestOptions['multipart']);
} else {
$requestOptions['form_params'] = $request->getParameters();
}
}
}
if (!empty($headers)) {
$requestOptions['headers'] = $headers;
}
$method = $request->getMethod();
$uri = $request->getUri();
foreach ($this->headers as $name => $value) {
$requestOptions['headers'][$name] = $value;
}
// Let BrowserKit handle redirects
$promise = $this->getClient()->requestAsync($method,$uri,$requestOptions);
$promise->then(
function (ResponseInterface $response) {
return $this->createResponse($response);
},
function (RequestException $e) {
$response = $e->getResponse();
if (null === $response) {
throw $e;
}
}
);
$promise->wait();
}
How can I change Goutte\Client.php so that it does requests asynchronously? Is that is not possible, how can I run my scrappers which targets different endpoints simultaneously? Thanks
Goutte is essentially a bridge between Guzzle and Symphony's Browserkit and DomCrawler.
The biggest drawback with using Goutte is that all requests are made sychronouslly
To complete things asychronously you will have to forego using Goutte and directly use Guzzle and DomCrawler.
For example:
$requests = [
new GuzzleHttp\Psr7\Request('GET', $uri[0]),
new GuzzleHttp\Psr7\Request('GET', $uri[1]),
new GuzzleHttp\Psr7\Request('GET', $uri[2]),
new GuzzleHttp\Psr7\Request('GET', $uri[3]),
new GuzzleHttp\Psr7\Request('GET', $uri[4]),
new GuzzleHttp\Psr7\Request('GET', $uri[5]),
new GuzzleHttp\Psr7\Request('GET', $uri[6]),
];
$client = new GuzzleHttp\Client();
$pool = new GuzzleHttp\Pool($client, $requests, [
'concurreny' => 5, //how many concurrent requests we want active at any given time
'fulfilled' => function ($response, $index) {
$crawler = new Symfony\Component\DomCrawler\Crawler(null, $uri[$index]);
$crawler->addContent(
$response->getBody()->__toString(),
$response->getHeader['Content-Type'][0]
);
},
'rejected' => function ($response, $index) {
// do something if the request failed.
},
]);
$promise = $pool->promise();
$promise->wait();