How to perform multiple Guzzle requests at the same time? - php

I can perform single requests using Guzzle and I'm very pleased with Guzzle's performance so far however, I read in the Guzzle API something about MultiCurl and Batching.
Could someone explain to me how to make multiple requests at the same time? Async if possible. I don't know if that is what they mean with MultiCurl. Sync would also be not a problem. I just want to do multiple requests at the same time or very close (short space of time).

From the docs:
http://guzzle3.readthedocs.org/http-client/client.html#sending-requests-in-parallel
For an easy to use solution that returns a hash of request objects mapping to a response or error, see http://guzzle3.readthedocs.org/batching/batching.html#batching
Short example:
<?php
$client->send(array(
$client->get('http://www.example.com/foo'),
$client->get('http://www.example.com/baz'),
$client->get('http://www.example.com/bar')
));

An update related to the new GuzzleHttp guzzlehttp/guzzle
Concurrent/parallel calls are now run through a few different methods including Promises.. Concurrent Requests
The old way of passing a array of RequestInterfaces will not work anymore.
See example here
$newClient = new \GuzzleHttp\Client(['base_uri' => $base]);
foreach($documents->documents as $doc){
$params = [
'language' =>'eng',
'text' => $doc->summary,
'apikey' => $key
];
$requestArr[$doc->reference] = $newClient->getAsync( '/1/api/sync/analyze/v1?' . http_build_query( $params) );
}
$time_start = microtime(true);
$responses = \GuzzleHttp\Promise\unwrap($requestArr); //$newClient->send( $requestArr );
$time_end = microtime(true);
$this->get('logger')->error(' NewsPerf Dev: took ' . ($time_end - $time_start) );
Update:
As suggested in comments and asked by #sankalp-tambe, you can also use a different approach to avoid that a set of concurrent request with a failure will not return all the responses.
While the options suggested with Pool is feasible i still prefer promises.
An example with promises is to use settle and and wait methods instead of unwrap.
The difference from the example above would be
$responses = \GuzzleHttp\Promise\settle($requestArr)->wait();
I have created a full example below for reference on how to handle the $responses too.
require __DIR__ . '/vendor/autoload.php';
use GuzzleHttp\Client as GuzzleClient;
use GuzzleHttp\Promise as GuzzlePromise;
$client = new GuzzleClient(['timeout' => 12.0]); // see how i set a timeout
$requestPromises = [];
$sitesArray = SiteEntity->getAll(); // returns an array with objects that contain a domain
foreach ($sitesArray as $site) {
$requestPromises[$site->getDomain()] = $client->getAsync('http://' . $site->getDomain());
}
$results = GuzzlePromise\settle($requestPromises)->wait();
foreach ($results as $domain => $result) {
$site = $sitesArray[$domain];
$this->logger->info('Crawler FetchHomePages: domain check ' . $domain);
if ($result['state'] === 'fulfilled') {
$response = $result['value'];
if ($response->getStatusCode() == 200) {
$site->setHtml($response->getBody());
} else {
$site->setHtml($response->getStatusCode());
}
} else if ($result['state'] === 'rejected') {
// notice that if call fails guzzle returns is as state rejected with a reason.
$site->setHtml('ERR: ' . $result['reason']);
} else {
$site->setHtml('ERR: unknown exception ');
$this->logger->err('Crawler FetchHomePages: unknown fetch fail domain: ' . $domain);
}
$this->entityManager->persist($site); // this is a call to Doctrines entity manager
}
This example code was originally posted here.

Guzzle 6.0 has made sending multiple async requests very easy.
There are multiple ways to do it.
You can create the async requests and add the resultant promises to a single array, and get the result using the settle() method like this:
$promise1 = $client->getAsync('http://www.example.com/foo1');
$promise2 = $client->getAsync('http://www.example.com/foo2');
$promises = [$promise1, $promise2];
$results = GuzzleHttp\Promise\settle($promises)->wait();
You can now loop through these results and fetch the response using GuzzleHttpPromiseall or GuzzleHttpPromiseeach. Refer to this article for further details.
In case if you have an indeterminate number of requests to be sent(say 5 here), you can use GuzzleHttp/Pool::batch().
Here is an example:
$client = new Client();
// Create the requests
$requests = function ($total) use($client) {
for ($i = 1; $i <= $total; $i++) {
yield new Request('GET', 'http://www.example.com/foo' . $i);
}
};
// Use the Pool::batch()
$pool_batch = Pool::batch($client, $requests(5));
foreach ($pool_batch as $pool => $res) {
if ($res instanceof RequestException) {
// Do sth
continue;
}
// Do sth
}

Related

PHP webcrawler programmed in Visual Studio Code has problems with unknown class, how do I fix that?

and thanks in advance. I try to build a webscraper with PHP and I use Visual Studio Code.
When I run the following code, the following problem shows up:
Use of unknown class: 'Goutte\Client'
Does anyone know how to solve that issue?
I have googled all over the place, looked at SO and asked the forbidden one, but still after three days I have not achieved any progress. (I am also a noob, so maybe it is not as difficult to solve as I think).
Looking forward to your feedback and tips.
<?php
require 'vendor/autoload.php';
use Goutte\Client;
// Initialize the Goutte client
$client = new Client();
// Create a new array to store the scraped data
$data = array();
// Loop through the pages
for ($i = 0; $i < 3; $i++) {
// Make a request to the website
$crawler = $client->request('GET', 'https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives_de?page=' . $i);
// Find all the initiatives on the page
$crawler->filter('.initiative')->each(function ($node) use (&$data) {
// Extract the information for each initiative
$title = $node->filter('h3')->text();
$link = $node->filter('a')->attr('href');
$description = $node->filter('p')->text();
$deadline = $node->filter('time')->attr('datetime');
// Append the data for the initiative to the data array
$data[] = array($title, $link, $description, $deadline);
});
// Sleep for a random amount of time between 5 and 10 seconds
$sleep = rand(5,10);
sleep($sleep);
}
// Open the output file
$fp = fopen('initiatives.csv', 'w');
// Write the header row
fputcsv($fp, array('Title', 'Link', 'Description', 'Deadline'));

How to MODIFY a Google Docs document via API using search-and-replace?

I need an example of how to modify an existing document with existing text in Google Docs via API. The documentation only shows how to insert and delete text, but not how to update. Have been looking frantically on the web to find examples or a direction on how to do it but without luck.
Finally figured it out myself.
First, follow this video to prepare authentication to the Google Docs API (even though it's about Google Sheets but the process is basically the same). Basically it consists of these steps:
create project in Google Developer Console
enable Google Docs API
create credentials, including a service account for programmatic access
share your document with the service account client email address
install Google API's PHP client: composer require google/apiclient
Then create a script like the following:
require_once(__DIR__ .'/vendor/autoload.php');
$client = new \Google_Client();
$client->setApplicationName('Some name'); //this name doesn't matter
$client->setScopes([\Google_Service_Docs::DOCUMENTS]);
$client->setAccessType('offline');
$client->setAuthConfig(__DIR__ .'/googleapi-credentials.json'); //see https://www.youtube.com/watch?v=iTZyuszEkxI for how to create this file
$service = new \Google_Service_Docs($client);
$documentId = 'YOUR-DOCUMENT-ID-GOES-HERE'; //set your document ID here, eg. "j4i1m57GDYthXKqlGce9WKs4tpiFvzl1FXKmNRsTAAlH"
$doc = $service->documents->get($documentId);
// Collect all pieces of text (see https://developers.google.com/docs/api/concepts/structure to understand the structure)
$allText = [];
foreach ($doc->body->content as $structuralElement) {
if ($structuralElement->paragraph) {
foreach ($structuralElement->paragraph->elements as $paragraphElement) {
if ($paragraphElement->textRun) {
$allText[] = $paragraphElement->textRun->content;
}
}
}
}
// Go through and create search/replace requests
$requests = $textsAlreadyDone = $forEasyCompare = [];
foreach ($allText as $currText) {
if (in_array($currText, $textsAlreadyDone, true)) {
// If two identical pieces of text are found only search-and-replace it once - no reason to do it multiple times
continue;
}
if (preg_match_all("/(.*?)(dogs)(.*?)/", $currText, $matches, PREG_SET_ORDER)) {
//NOTE: for simple static text searching you could of course just use strpos()
// - and then loop on $matches wouldn't be necessary, and str_replace() would be simplified
$modifiedText = $currText;
foreach ($matches as $match) {
$modifiedText = str_replace($match[0], $match[1] .'cats'. $match[3], $modifiedText);
}
$forEasyCompare[] = ['old' => $currText, 'new' => $modifiedText];
$replaceAllTextRequest = [
'replaceAllText' => [
'replaceText' => $modifiedText,
'containsText' => [
'text' => $currText,
'matchCase' => true,
],
],
];
$requests[] = new \Google_Service_Docs_Request($replaceAllTextRequest);
}
$textsAlreadyDone[] = $currText;
}
// you could dump out $forEasyCompare to see the changes that would be made
$batchUpdateRequest = new \Google_Service_Docs_BatchUpdateDocumentRequest(['requests' => $requests]);
$response = $service->documents->batchUpdate($documentId, $batchUpdateRequest);
This is my way - easy one
public function replaceText($search, $replace)
{
$client = $this->getClient();
$service = new \Google_Service_Docs($client);
$documentId = ''; // Put your document ID here
$e = new \Google_Service_Docs_SubstringMatchCriteria();
$e->text = "{{".$search."}}";
$e->setMatchCase(false);
$requests[] = new \Google_Service_Docs_Request(array(
'replaceAllText' => array(
'replaceText' => $replace,
'containsText' => $e
),
));
$batchUpdateRequest = new \Google_Service_Docs_BatchUpdateDocumentRequest(array(
'requests' => $requests
));
$response = $service->documents->batchUpdate($documentId, $batchUpdateRequest);
}

Get total number of members in Discord using PHP

I have a Discord servern with 1361 members and on my website I want to display a total number of joined users.
I have figured out how to get all online Members on the server using:
<?php
$jsonIn = file_get_contents('https://discordapp.com/api/guilds/356230556738125824/widget.json');
$JSON = json_decode($jsonIn, true);
$membersCount = count($JSON['members']);
echo "Number of members: " . $membersCount;
?>
What should I do differently to get a total number of ALL users that have joined the server, and not just display the online members?
Now, I realize I am reviving a pretty old thread here, but I figure some might still use an answer. As jrenk pointed out, you should instead access https://discordapp.com/api/guilds/356230556738125824/members.
Your 404: Unauthorized comes from the fact that you are -you guessed it- not authorized.
If you have created a bot, it is fairly easy: just add a request header Authorization: Bot YOUR_BOT_TOKEN_HERE. If you use a normal Discord account, the whole problem is a bit more tricky:
You will first have to send a POST request to https://discordapp.com/api/auth/login and set the body to {"email": "EMAIL_HERE", "password": "PASSWORD_HERE"}.
You will get a response with the parameter token. Save this token, you will need it later. BUT:
NEVER, UNDER ANY CIRCUMSTANCES show anyone this token, as it is equivalent to your login credentials!
With this token, you can now send a POST request to the same address: https://discordapp.com/api/auth/login, but now add the header Authorization: YOUR_BOT_TOKEN_HERE. Note the missing "Bot" at the beginning.
Also, what you mustn't forget:
If you don't add the parameter ?limit=MAX_USERS, you will only get the first guild member. Take a look here to see details.
You have to count the number of online member
here is the working code
<?php
$members = json_decode(file_get_contents('https://discordapp.com/api/guilds/356230556738125824/widget.json'), true)['members'];
$membersCount = 1;
foreach ($members as $member) {
if ($member['status'] == 'online') {
$membersCount++;
}
}
echo "Number of members: " . $membersCount;
?>
You need a bot on your discord server to get all members. Use the Discord js library for example.
First create a discord bot and get a token, see the following url:
https://github.com/reactiflux/discord-irc/wiki/Creating-a-discord-bot-&-getting-a-token
As #2Kreeper noted, do not reveal your token publicly.
Then use the following code, replacing "enter-bot-token-here" and "enter-guild-id-here" with your own information:
<?php
$json_options = [
"http" => [
"method" => "GET",
"header" => "Authorization: Bot enter-bot-token-here"
]
];
$json_context = stream_context_create($json_options);
$json_get = file_get_contents('https://discordapp.com/api/guilds/enter-guild-id-here/members?limit=1000', false, $json_context);
$json_decode = json_decode($json_get, true);
echo '<h2>Member Count</h2>';
echo count($json_decode);
echo '<h2>JSON Output</h2>';
echo '<pre>';
print_r($json_decode);
echo '</pre>';
?>
For anyone still interested, here's the solution I currently use using RestCord:
use RestCord\DiscordClient;
$serverId = <YourGuildId>;
$discord = new DiscordClient([
'token' => '<YourBotToken>'
]);
$limit = 1000;
$membercnt = 0;
$_ids = array();
function getTotalUsersCount($ids, $limit, $serverId, $discord) {
if( count($ids) > 0 ) {
$last_id = max($ids);
$last_id = (int)$last_id;
} else {
$last_id = null;
}
$members = $discord->guild->listGuildMembers(['guild.id' => $serverId, 'limit' => $limit, 'after' => $last_id]);
$_ids = array();
foreach( $members as $member ) {
$ids[] = $member->user->id;
$_ids[] = $member->user->id;
}
if( count($_ids) > 0 ) {
return getTotalUsersCount($ids, $limit, $serverId, $discord);
} else {
return $ids;
}
}
$ids = getTotalUsersCount($_ids, $limit, $serverId, $discord);
$membercnt = count($ids);
echo "Member Count: " . $membercnt;
In addition to Soubhagya Kumar's answer comment by iTeY you can simply use count(), there is no need to loop if you do not require a loop.
I'm reviving this since it still seems to be relevant and the other answers seem a bit too complex I think (maybe the API used to be bad(?)). So:
Generate a permanent discord invite and keep the code at the end (https://discord.gg/xxxxxxx) and then all you do is this:
<?php
$server_code = "xxxxxxx";
$url = "https://discord.com/api/v9/invites/".$server_code."?with_counts=true&with_expiration=true";
$jsonIn = file_get_contents($url);
$json_obj = json_decode($jsonIn, $assoc = false);
$total = $json_obj ->approximate_member_count;
?>
And there you go, that's the total member count. Keep in mind, this will also count the bots I think so you have to account for that if you want to refine it even more

Facebook PHP SDK "Too many requests in batch message. Maximum batch size is 50"

I'm trying to get the profiles of a large group of friends and I'm getting the error:
Too many requests in batch message. Maximum batch size is 50
From the API. Now I understand the error message but I thought I built the function to mitigate this error. I specifically make the calls in chunks of 50. I don't change $chunk_size in any of the methods that call it so I don't really know what is going on here.
This is the function that is spitting out the error:
protected function getFacebookProfiles($ids, array $fields = array('name', 'picture'), $chunk_size = 50)
{
$facebook = App::make('Facebook');
$fields = implode(',', $fields);
$requests = array();
foreach ($ids as $id) {
$requests[] = array('method' => 'GET', 'relative_url' => "{$id}?fields={$fields}");
}
$responses = array();
$chunks = array_chunk($requests, $chunk_size);
foreach ($chunks as $chunk) {
$batch = json_encode($requests);
$response = $facebook->api("?batch={$batch}", 'POST');
foreach ($response as &$profile) {
$profile = json_decode($profile['body']);
if (empty($profile->picture->data)) {
// something has gone REALLY wrong, this should never happen but if it does we'll have more debug information
if (empty($profile->error->message)) {
throw new Exception('Unexpected error when retrieving user information for IDs:' . implode(', ', $ids));
}
$profile->error = (array) $profile->error;
throw new FacebookApiException((array) $profile);
}
$profile->picture = $profile->picture->data;
}
$responses = array_merge($responses, $response);
}
return $responses;
}
You are not using your $chunk variable in the JSON you generate for your API call, but still the original, unmodified $requests.
Happy slamming :-)

Symfony2 HTTP cache: is there a way to ignore query parameters when generating the cache?

I found a post outlining how to exclude parameters from the cache for Symfony 1.4 and I would like to do something similiar for Symfony 2.3.
When using say Adwords a bunch of query parameters will be included in the uri that have nothing to do with the rendering of the page [gclid, x, y, utm_source, utm_medium, utm_campaign, utm_content] and I would like a way to tell the Symfony2 cache that the following pages are the same and cache them as one page:
http://www.example.com
http://www.example.com?gclid=1
Anyone know how to do this?
Assuming you are using Symfony2 AppCache and not Varnish. AppCache is a php reverse proxy : it caches an URI response and process headers.
Obviously, the following uris :
http://www.example.com
http://www.example.com?gclid=1
are different, so the trick will be to make them equals for the reverse proxy.
You can do that on many levels :
on the web server
on the Request object
on the AppCache Storage
IMO the easier solution is to remove them from Request at creation. The following code do the trick directly in app.php, if you want you can do the same with a subclass of Request object but you will have to deal with boostrap.
require_once __DIR__.'/../app/AppKernel.php';
require_once __DIR__.'/../app/AppCache.php';
$kernel = new AppKernel('prod', false);
$kernel->loadClassCache();
$kernel = new AppCache($kernel);
Request::enableHttpMethodParameterOverride();
$request = Request::createFromGlobals();
//Modify query string here
$qs = $request->server->get('QUERY_STRING');
if ('' != $qs) {
$parts = array();
foreach (explode('&', $qs) as $chunk) {
$param = explode("=", $chunk);
if (!$param || !in_array($param[0], array('gclid', 'x', 'y', 'utm_source', 'utm_medium', 'utm_campaign', 'utm_content'))) {
$parts[] = $chunk;
}
}
$request->server->set('QUERY_STRING', implode('&', $parts));
}
$response = $kernel->handle($request);
$response->send();
$kernel->terminate($request, $response);
Based on LFIs answer, I simplified the whole construct a little bit (AppBundle::CACHE_SKIP_PARAMS contains the array of skipped parameters):
// Exclude irrelevant parameters
$qs = $request->server->get('QUERY_STRING');
if ('' != $qs)
{
parse_str($qs, $params);
if (is_array($params) && !empty($params))
{
$relevantParams = [];
foreach ($params as $key => $value)
{
if (!in_array($key, \AppBundle\AppBundle::CACHE_SKIP_PARAMS))
{
$relevantParams[$key] = $value;
}
}
$params = $relevantParams;
}
$request->server->set('QUERY_STRING', implode('&', $params));
}

Categories