How to update object variables from promises closure - php

I've got object variables that I want to update inside promises guzzle with closure:
foreach ($urls as $i => $url) {
$this->facebook[$url] = 0;
$this->googlePlus[$url] = 0;
$this->pinterest[$url] = 0;
$this->twitter[$url] = 0;
$this->metaResults[$url] = [
'url' => false,
'title' => false,
'desc' => false,
'h1' => false,
'word_count' => 0,
'keyword_count' => 0
];
$that = $this;
$promise = $client->getAsync($url)->then(function ($content) {
return $content->getBody()->getContents();
})->then(function($html) use (&$url, &$that) {
$that->metaResults[$url] = $this->parseMeta($html);
});
$promeses['meta'][$url] = $promise;
}
$responses = Promise\Utils::settle($promises)->wait();
The problem as you can see above $that->metaResults[$url] = $this->parseMeta($html); this is never saved on that object var. Is there a way to do this?

It seems to me like there are few errors. If you want to use $url and $that shouldn't you pass it in to callbacks registered with the promises's then method for first one. Also I think that $this will not be accessible inside the callback registered with the then(). Though you will need to check for $this.
$promise = $client->getAsync($url)
->then(function (ResponseInterface $content) use ($url, $that) {
return $content->getBody()->getContents();
})
->then(function($html) use ($url, $that) {
$that->metaResults[$url] = $this->parseMeta($html);
});
$promeses['meta'][$url] = $promise;
reference

Related

Laravel get data out of foreach loop

The below code shows the error (on the line if ($response) {):
Undefined variable: response
I am checking the if condition inside the foreach because I wanted to check whether each id in the UserEnabledNotifications table exists in notifications table. Also dump($response); inside the if condition of foreach shows data.
Can I get the data in $response outside the foreach loop? What shall I try?
$notificationData = UserEnabledNotifications::all();
foreach ($notificationData->where('status', 'true') as $user => $value) {
if (Notifications::where('userEnabledNotificationsId', $value['id'])->exists() == false) {
$notificationTypeName = NotificationTypes::where('id', $value['notificationTypesId'])
->value('notificationTypeName');
$userData = User::where('id', $value['userId'])
->get()
->toArray();
$data = [];
$data['notificationTypesId'] = $value['notificationTypesId'];
$data['notificationTypeName'] = $notificationTypeName;
$data['userId'] = $value['userId'];
$data['email'] = $userData[0]['email'];
$data['recipientName'] = $userData[0]['FullName'];
$data['userEnabledNotificationsId'] = $value['id'];
$response = Notifications::create($data);
//dump($response);
$tags[] = $response;
}
}
if ($response) {
return response()->json([
'message' => 'success',
'data' => $tags,
'statusCode' => 200,
'status' => 'success'
], 200);
}
You define $response in first if body but you need $response = null above that.
You might create a private or protected variable, and put it outside, and then access it directly or via functions
$notificationData = UserEnabledNotifications::all();
private $reponse = null;
foreach ($notificationData->where('status', 'true') as $user => $value) {
if(Notifications::where('userEnabledNotificationsId',$value['id'])->exists()==false){
$notificationTypeName = NotificationTypes::where('id', $value['notificationTypesId'])->value('notificationTypeName');
$userData = User::where('id', $value['userId'])->get()->toArray();
$data = [];
$data['notificationTypesId'] = $value['notificationTypesId'];
$data['notificationTypeName'] = $notificationTypeName;
$data['userId'] = $value['userId'];
$data['email'] = $userData[0]['email'];
$data['recipientName'] = $userData[0]['FullName'];
$data['userEnabledNotificationsId'] = $value['id'];
$response = Notifications::create($data);
$tags[] = $response;
}
}
if ($response) {
return response()->json([
'message' => 'success',
'data' => $tags,
'statusCode' => 200,
'status' => 'success'
], 200);
}
But now each place you would need to check whether responses are null or not.
Why private or protected or public?
Check this answer : What is the difference between public, private, and protected?
I quote
public scope to make that property/method available from anywhere, other classes, and instances of the object.
private scope when you want your property/method to be visible in its own class only.
protected scope when you want to make your property/method visible in all classes that extend current class including the parent class.
Simply declare a null or an empty array in a $response variable and you will be able to get the data out of the loop!

How to get requested url for a guzzle async request?

I'm using guzzle in a loop to get an array of promises. This string in the loop:
$rpomises[] = $this->client->getAsync($url, $options);
Next, I make:
$res = collect(Promise\settle($promises)->wait());
One of the items from the result is:
As you can see it's just an array with string field and GuzzleHttp\Rsr7\Response object. So how I can get requested URL from this construction?
Thank you for any help!
I ran into the same issue.
My uses:
use GuzzleHttp\Client;
use GuzzleHttp\Promise\EachPromise;
use GuzzleHttp\Psr7\Response;
use Requests;
Extracted function from class:
public function getUrlsInParallelRememberingSource($urls,
$numberThreads = 10)
{
$client = new Client();
$responsesModified = [];
$promises = (function () use ($urls, $client, &$responsesModified)
{
foreach ($urls as $url) {
yield $client->getAsync($url)->then(function($response) use ($url, &$responsesModified)
{
$data = [
'url' => $url,
'body' => 'res' // pass here whatever you want
];
$responsesModified[] = $data;
return $response;
});
}
})();
$eachPromise = new EachPromise($promises,
[
'concurrency' => $numberThreads,
'fulfilled' => function (Response $response)
{
},
'rejected' => function ($reason)
{
}
]);
$eachPromise->promise()->wait();
return $responsesModified;
}
This gives follwing result:
http://i.kagda.ru/5001133750442_01-11-2020-00:34:55_5001.png

Use Guzzle pool instead of guzzle promises

I am using guzzle promises to send a concurrent request but I want to control the concurrency that's why I want to use guzzle pool. How I can transform guzzle promises into guzzle pool. here is my code:
public function getDispenceryforAllPage($dispencery)
{
$GetAllproducts = [];
$promiseGetPagination = $this->client->getAsync($dispencery)
->then(function ($response) {
return $this->getPaginationNumber($response->getBody()->getContents());
});
$Pagination = $promiseGetPagination->wait();
$pagearray = array();
for($i=1;$i<=$Pagination; $i++){
$pagearray[] = $i;
}
foreach($pagearray as $page_no) {
$GetAllproducts[] = $this->client->getAsync($dispencery.'?page='.$page_no)
->then(function ($response) {
$promise = $this->getData($response->getBody()->getContents());
return $promise;
});
}
$results = GuzzleHttp\Promise\settle($GetAllproducts)->wait();
return $results;
}
I have the below working example for guzzle 6.
I use postAsync and pool.
function postInBulk($inputs)
{
$client = new Client([
'base_uri' => 'https://a.b.com'
]);
$headers = [
'Authorization' => 'Bearer token_from_directus_user'
];
$requests = function ($a) use ($client, $headers) {
for ($i = 0; $i < count($a); $i++) {
yield function() use ($client, $headers) {
return $client->postAsync('https://a.com/project/items/collection', [
'headers' => $headers,
'json' => [
"snippet" => "snippet",
"rank" => "1",
"status" => "published"
]
]);
};
}
};
$pool = new Pool($client, $requests($inputs),[
'concurrency' => 5,
'fulfilled' => function (Response $response, $index) {
// this is delivered each successful response
},
'rejected' => function (RequestException $reason, $index) {
// this is delivered each failed request
},
]);
$pool->promise()->wait();
}
Just use each_limit() or each_limit_all() (instead of settle()) with a generator.
function getDispenceryforAllPage($dispencery)
{
$promiseGetPagination = $this->client->getAsync($dispencery)
->then(function ($response) {
return $this->getPaginationNumber($response->getBody()->getContents());
});
$Pagination = $promiseGetPagination->wait();
$pagearray = range(1, $Pagination);
$requestGenerator = function () use ($dispencery, $pagearray) {
foreach ($pagearray as $page_no) {
yield $this->client->getAsync($dispencery . '?page=' . $page_no)
->then(function ($response) {
return $this->getData($response->getBody()->getContents());
});
}
};
// Max 5 concurrent requests
$results = GuzzleHttp\Promise\each_limit_all($requestGenerator(), 5)->wait();
return $results;
}
I have modified your code to support pool.
class GuzzleTest
{
private $client;
public function __construct($baseUrl)
{
$this->client = new \GuzzleHttp\Client([// Base URI is used with relative requests
'base_uri' => $baseUrl,
// You can set any number of default request options.
'timeout' => 2.0,]);
}
public function getDispenceryforAllPage($dispencery)
{
$GetAllproducts = [];
$promiseGetPagination = $this->client->getAsync($dispencery)
->then(function ($response) {
return $this->getPaginationNumber($response->getBody()->getContents());
});
$Pagination = $promiseGetPagination->wait();
$pagearray = array();
for ($i = 1; $i <= $Pagination; $i++) {
$pagearray[] = $i;
}
$pool = new \GuzzleHttp\Pool($this->client, $this->_yieldRequest($pagearray, $dispencery), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
// this is delivered each successful response
},
'rejected' => function ($reason, $index) {
// this is delivered each failed request
},
]);
// Initiate the transfers and create a promise
$poolPromise = $pool->promise();
// Force the pool of requests to complete.
$results = $poolPromise->wait();
return $results;
}
private function _yieldRequest($pagearray, $dispencery){
foreach ($pagearray as $page_no) {
$uri = $dispencery . '?page=' . $page_no;
yield function() use ($uri) {
return $this->client->getAsync($uri);
};
}
}
}

Multi-threaded downloading files with Guzzle HTTP client: EachPromises vs Pool objects

for testing purposes, I have an array of 2000 image URIs (strings) with I download asynchronously with this functions. After some googling & testing & trying I've come up with 2 functions that both of them work (well to be honest downloadFilesAsync2 throws a InvalidArgumentException at the last line).
The function downloadFilesAsync2 is based on the class GuzzleHttp\Promise\EachPromise and downloadFilesAsync1 is based on the GuzzleHttp\Pool class.
Both functions download pretty well the 2000 files asynchronously, with the limit of 10 threads at the same time.
I know that they work, but nothing else. I wonder if someone could explain both aproaches, if one is better than the other, implications, etc.
// for the purpose of this question i've reduced the array to 5 files!
$uris = array /
"https://cdn.enchufix.com/media/catalog/product/u/n/unix-48120.jpg",
"https://cdn.enchufix.com/media/catalog/product/u/n/unix-48120-01.jpg",
"https://cdn.enchufix.com/media/catalog/product/u/n/unix-48120-02.jpg",
"https://cdn.enchufix.com/media/catalog/product/u/n/unix-48120-03.jpg",
"https://cdn.enchufix.com/media/catalog/product/u/n/unix-48120-04.jpg",
);
function downloadFilesAsync2(array $uris, string $dir, $overwrite=true) {
$client = new \GuzzleHttp\Client();
$requests = array();
foreach ($uris as $i => $uri) {
$loc = $dir . DIRECTORY_SEPARATOR . basename($uri);
if ($overwrite && file_exists($loc)) unlink($loc);
$requests[] = new GuzzleHttp\Psr7\Request('GET', $uri, ['sink' => $loc]);
echo "Downloading $uri to $loc" . PHP_EOL;
}
$pool = new \GuzzleHttp\Pool($client, $requests, [
'concurrency' => 10,
'fulfilled' => function (\Psr\Http\Message\ResponseInterface $response, $index) {
// this is delivered each successful response
echo 'success: '.$response->getStatusCode().PHP_EOL;
},
'rejected' => function ($reason, $index) {
// this is delivered each failed request
echo 'failed: '.$reason.PHP_EOL;
},
]);
$promise = $pool->promise(); // Start transfers and create a promise
$promise->wait(); // Force the pool of requests to complete.
}
function downloadFilesAsync1(array $uris, string $dir, $overwrite=true) {
$client = new \GuzzleHttp\Client();
$promises = (function () use ($client, $uris, $dir, $overwrite) {
foreach ($uris as $uri) {
$loc = $dir . DIRECTORY_SEPARATOR . basename($uri);
if ($overwrite && file_exists($loc)) unlink($loc);
yield $client->requestAsync('GET', $uri, ['sink' => $loc]);
echo "Downloading $uri to $loc" . PHP_EOL;
}
})();
(new \GuzzleHttp\Promise\EachPromise(
$promises, [
'concurrency' => 10,
'fulfilled' => function (\Psr\Http\Message\ResponseInterface $response) {
// echo "\t=>\tDONE! status:" . $response->getStatusCode() . PHP_EOL;
},
'rejected' => function ($reason, $index) {
echo 'ERROR => ' . strtok($reason->getMessage(), "\n") . PHP_EOL;
},
])
)->promise()->wait();
}
First, I will address the InvalidArgumentException within the downloadFilesAsync2 method. There are actually a pair of issues with this method. Both relate to this:
$requests[] = $client->request('GET', $uri, ['sink' => $loc]);
The first issue is the fact that Client::request() is a synchronous utility method which wraps $client->requestAsync()->wait(). $client->request() will return an instance of Psr\Http\Message\ResponseInterface, as a result $requests[] will actually be populated with ResponseInterface implementations. This is what, ultimately causes the InvalidArgumentException as the $requests does not contain any Psr\Http\Message\RequestInterface's, and the exception is thrown from within Pool::__construct().
A corrected version of this method should contain code which looks more like:
$requests = [
new Request('GET', 'www.google.com', [], null, 1.1),
new Request('GET', 'www.ebay.com', [], null, 1.1),
new Request('GET', 'www.cnn.com', [], null, 1.1),
new Request('GET', 'www.red.com', [], null, 1.1),
];
$pool = new Pool($client, $requests, [
'concurrency' => 10,
'fulfilled' => function(ResponseInterface $response) {
// do something
},
'rejected' => function($reason, $index) {
// do something error handling
},
'options' => ['sink' => $some_location,],
]);
$promise = $pool->promise();
$promise->wait();
To answer your second question, "What is the difference between these two methods", the answer is simply, there is none. To explain this, let me copy and paste Pool::__construct():
/**
* #param ClientInterface $client Client used to send the requests.
* #param array|\Iterator $requests Requests or functions that return
* requests to send concurrently.
* #param array $config Associative array of options
* - concurrency: (int) Maximum number of requests to send concurrently
* - options: Array of request options to apply to each request.
* - fulfilled: (callable) Function to invoke when a request completes.
* - rejected: (callable) Function to invoke when a request is rejected.
*/
public function __construct(
ClientInterface $client,
$requests,
array $config = []
) {
// Backwards compatibility.
if (isset($config['pool_size'])) {
$config['concurrency'] = $config['pool_size'];
} elseif (!isset($config['concurrency'])) {
$config['concurrency'] = 25;
}
if (isset($config['options'])) {
$opts = $config['options'];
unset($config['options']);
} else {
$opts = [];
}
$iterable = \GuzzleHttp\Promise\iter_for($requests);
$requests = function () use ($iterable, $client, $opts) {
foreach ($iterable as $key => $rfn) {
if ($rfn instanceof RequestInterface) {
yield $key => $client->sendAsync($rfn, $opts);
} elseif (is_callable($rfn)) {
yield $key => $rfn($opts);
} else {
throw new \InvalidArgumentException('Each value yielded by '
. 'the iterator must be a Psr7\Http\Message\RequestInterface '
. 'or a callable that returns a promise that fulfills '
. 'with a Psr7\Message\Http\ResponseInterface object.');
}
}
};
$this->each = new EachPromise($requests(), $config);
}
now if we compare that to an a simplified version of the code within the downloadFilesAsync1 method:
$promises = (function () use ($client, $uris) {
foreach ($uris as $uri) {
yield $client->requestAsync('GET', $uri, ['sink' => $some_location]);
}
})();
(new \GuzzleHttp\Promise\EachPromise(
$promises, [
'concurrency' => 10,
'fulfilled' => function (\Psr\Http\Message\ResponseInterface $response) {
// do something
},
'rejected' => function ($reason, $index) {
// do something
},
])
)->promise()->wait();
In both examples, there is a generator which yields promises that resolve to instances of ResponseInterface and that generator along with the configuration array (fulfilled callable, rejected callable, concurrency) is also fed into a new instance of EachPromise.
In summary:
downloadFilesAsync1 is functionally the same thing as using Pool only without the error checking that has been built into Pool::__construct().
There are a few errors within downloadFilesAsync2 which will cause the files to be downloaded in a synchronous fashion prior to receiving an InvalidArgumentException when the Pool is instantiated.
My only recommendation is: use whichever feels more intuitive for you to use.

Async HTML parser with Goutte

I am trying to write HTML parser with the help of Goutte. It works very well. However Goutte uses blocking requests. This works well if you are dealing with a single service. If I want to query lots of services which are independent from each other, this causes a problem. Goutte uses BrowserKit and Guzzle. I have tried to change doRequest function but it failed with
Argument 1 passed to
Symfony\Component\BrowserKit\CookieJar::updateFromResponse() must be
an instance of Symfony\Component\BrowserKit\Response
protected function doRequest($request)
{
$headers = array();
foreach ($request->getServer() as $key => $val) {
$key = strtolower(str_replace('_', '-', $key));
$contentHeaders = array('content-length' => true, 'content-md5' => true, 'content-type' => true);
if (0 === strpos($key, 'http-')) {
$headers[substr($key, 5)] = $val;
}
// CONTENT_* are not prefixed with HTTP_
elseif (isset($contentHeaders[$key])) {
$headers[$key] = $val;
}
}
$cookies = CookieJar::fromArray(
$this->getCookieJar()->allRawValues($request->getUri()),
parse_url($request->getUri(), PHP_URL_HOST)
);
$requestOptions = array(
'cookies' => $cookies,
'allow_redirects' => false,
'auth' => $this->auth,
);
if (!in_array($request->getMethod(), array('GET', 'HEAD'))) {
if (null !== $content = $request->getContent()) {
$requestOptions['body'] = $content;
} else {
if ($files = $request->getFiles()) {
$requestOptions['multipart'] = [];
$this->addPostFields($request->getParameters(), $requestOptions['multipart']);
$this->addPostFiles($files, $requestOptions['multipart']);
} else {
$requestOptions['form_params'] = $request->getParameters();
}
}
}
if (!empty($headers)) {
$requestOptions['headers'] = $headers;
}
$method = $request->getMethod();
$uri = $request->getUri();
foreach ($this->headers as $name => $value) {
$requestOptions['headers'][$name] = $value;
}
// Let BrowserKit handle redirects
$promise = $this->getClient()->requestAsync($method,$uri,$requestOptions);
$promise->then(
function (ResponseInterface $response) {
return $this->createResponse($response);
},
function (RequestException $e) {
$response = $e->getResponse();
if (null === $response) {
throw $e;
}
}
);
$promise->wait();
}
How can I change Goutte\Client.php so that it does requests asynchronously? Is that is not possible, how can I run my scrappers which targets different endpoints simultaneously? Thanks
Goutte is essentially a bridge between Guzzle and Symphony's Browserkit and DomCrawler.
The biggest drawback with using Goutte is that all requests are made sychronouslly
To complete things asychronously you will have to forego using Goutte and directly use Guzzle and DomCrawler.
For example:
$requests = [
new GuzzleHttp\Psr7\Request('GET', $uri[0]),
new GuzzleHttp\Psr7\Request('GET', $uri[1]),
new GuzzleHttp\Psr7\Request('GET', $uri[2]),
new GuzzleHttp\Psr7\Request('GET', $uri[3]),
new GuzzleHttp\Psr7\Request('GET', $uri[4]),
new GuzzleHttp\Psr7\Request('GET', $uri[5]),
new GuzzleHttp\Psr7\Request('GET', $uri[6]),
];
$client = new GuzzleHttp\Client();
$pool = new GuzzleHttp\Pool($client, $requests, [
'concurreny' => 5, //how many concurrent requests we want active at any given time
'fulfilled' => function ($response, $index) {
$crawler = new Symfony\Component\DomCrawler\Crawler(null, $uri[$index]);
$crawler->addContent(
$response->getBody()->__toString(),
$response->getHeader['Content-Type'][0]
);
},
'rejected' => function ($response, $index) {
// do something if the request failed.
},
]);
$promise = $pool->promise();
$promise->wait();

Categories