This actually follows on from a previous question I had that, unfortunately, did not receive any answers so I'm not exactly holding my breath for a response but I understand this can be a bit of a tricky issue to solve.
I am currently trying to implement rate limiting on outgoing requests to an external API to match the limit on their end. I have tried to implement a token bucket library (https://github.com/bandwidth-throttle/token-bucket) into the class we are using to manage Guzzle requests for this particular API.
Initially, this seemed to be working as intended but we have now started seeing 429 responses from the API as it no longer seems to be correctly rate limiting the requests.
I have a feeling what is happening is that the number of tokens in the bucket is now being reset every time the API is called due to how Symfony handles services.
I am setting currently setting up the bucket location, rate and starting amount in the service's constructor:
public function __construct()
{
$storage = new FileStorage(__DIR__ . "/api.bucket");
$rate = new Rate(50, Rate::MINUTE);
$bucket = new TokenBucket(50, $rate, $storage);
$this->consumer = new BlockingConsumer($bucket);
$bucket->bootstrap(50);
}
I'm then attempting to consume a token before each request:
public function fetch(): array
{
try {
$this->consumer->consume(1);
$response = $this->client->request(
'GET', $this->buildQuery(), [
'query' => array_merge($this->params, ['api_key' => $this->apiKey]),
'headers' => [ 'Content-type' => 'application/json' ]
]
);
} catch (ServerException $e) {
// Process Server Exception
} catch (ClientException $e) {
// Process Client Exception
}
return $this->checkResponse($response);
}
I can't see anything obvious in that, that would allow it to request more than 50 times per minute unless the amount of available tokens was being reset on each request.
This is being supplied to a set of repository services that handle converting the data from each endpoint into objects used within the system. Consumers use the appropriate repository to request the data needed to complete their process.
If the amount of tokens is being reset by the bootstrap function being in service constructor, where should it be moved to within the Symfony framework that would still work with consumers?
I assume that it should work, but maybe try to move the ->bootstrap(50) call from every request? Not sure, but it can be the reason.
Anyway it's better to do that only once, as a part of your deployment (every time you deploy a new version). It doesn't have anything with Symfony, really, because the framework doesn't have any restrictions on deployment procedure. So it depends on how you do the deployment.
P.S. Have you considered to just handle 429 errors from the server? IMO you can wait (that's what BlockingConsumer does inside) when you receive 429 error. It's simpler and doesn't require an additional layer in your system.
BTW, have you considered nginx's ngx_http_limit_req_module as an alternative solution? It usually comes with nginx by default, so no additional actions to install, only a small configuration is required.
You can place an nginx proxy behind your code and the target web service and enable limits on it. Then in your code you will handle 429 as usual, but the requests will be throttled by your local nginx proxy, not by the external web service. So the final destination will get only limited amount of requests.
I have found a trick using Guzzle bundle for symfony.
I had to improve a sequential program sending GET requests to a Google API. In code example, it a pagespeed URL.
To have a rate limit, there an option to delay the requests before they are sent asynchronously.
Pagespeed rate limit is 200 requests per minute.
A quick calculation gives 200/60 = 0.3s per request.
Here is the code I tested on 300 urls, getting a fantastic result of no error, except if the url passed as a parameter in the GET request gives a 400 HTTP Error (Bad request).
I put a delay of 0.4s and the average result time is less then 0.2s, whereas it took more than a minute with a sequential program.
use GuzzleHttp;
use GuzzleHttp\Client;
use GuzzleHttp\Promise\EachPromise;
use GuzzleHttp\Exception\ClientException;
// ... Now inside class code ... //
$client = new GuzzleHttp\Client();
$promises = [];
foreach ($requetes as $i=>$google_request) {
$promises[] = $client->requestAsync('GET', $google_request ,['delay'=>0.4*$i*1000]); // delay is the trick not to exceed rate limit (in ms)
}
GuzzleHttp\Promise\each_limit($promises, function(){ // function returning the number of concurrent requests
return 100; // 1 or 100 concurrent request(s) don't really change execution time
}, // Fulfilled function
function ($response,$index)use($urls,$fp) { // $urls is used to get the url passed as a parameter in GET request and $fp a csv file pointer
$feed = json_decode($response->getBody(), true); // Get array of results
$this->write_to_csv($feed,$fp,$urls[$index]); // Write to csv
}, // Rejected function
function ($reason,$index) {
if ($reason instanceof GuzzleHttp\Exception\ClientException) {
$message = $reason->getMessage();
var_dump(array("error"=>"error","id"=>$index,"message"=>$message)); // You could write the errors to a file or database too
}
})->wait();
Related
I'm creating an application in Symfony, and I need to retrieve a large number of customer records (tens of thousands) from an external API endpoint, then store it in a Doctrine database. The API will only return 100 results at a time, so it's paginated into a few hundred pages. Triggering this synchronously resulted in a very long wait before running out of memory (Not surprising), so I pulled the code out to a message handler, which essentially looks like this
public function __invoke(GetCustomersMessage $message)
{
$response = $this->client->request('GET', $message->getQueryUrl() . '&page=' . $message->getPage(), [
'auth_basic' => $this->authCreds,
]);
$statusCode = $response->getStatusCode();
if ($statusCode != 200) {
throw new \Exception('Error: ' . $statusCode);
}
$content = $response->getContent();
$data = json_decode($content, true);
$result = $this->createCustomers($data, $message->getStoreId());
// If we successfully added all new customers, get another batch.
if ($result === true) {
$this->bus->dispatch(new GetCustomersMessage($message->getQueryUrl(), $message->getPage() + 1, $message->getStoreId()));
}
}
So essentially I query the api, then try to add those customers to my database, and if they all get added successfully, I dispatch a message to get the next batch. The message transport is the async Doctrine transport. I spun up a worker to consume the messages, and it synced way more customers than the previous attempt, but still ran out of memory after about 250. I was surprised to see though that when the worker died, it didn't leave the final message as incomplete, nor move it to the "failed" queue, it was just gone, so when I created another worker it was unable to pick back up where the other one left off.
This is my first time attempting a messenger/bus architecture, am I approaching this wrong? I considered queueing up all the messages at once, but I still believe I'd lose the data contained in whichever message the worker died on. Secondarily the intention is that this would run whenever we need to sync customers, so it stops when it reaches a record we already have in the database, if I queue up a message for every batch it would make a few hundred useless calls on every sync besides the first. Is there a way to monitor a worker and kill it before it reaches the memory limit?
My PHP script is working with multiple external services using API requests and theirs SDKs. It happens, that requests might fail and I would like to implement a retry mechanism.
SDKs don't have this feature built-in and they will just throw an Exception if request was not successful.
I've created multiple functions that use SDKs to retrieve / upload required data and now I would like to add retry mechanism if any of SDK requests fail. I could add a piece of code around every SDK call like this:
$retry = 0;
while ($retry < 3) {
try {
$account = $client->load('account');
} catch (\Exception $e) {
$retry++;
}
}
But this means that I will add the same piece of code to every SDK function call and it's going to be a real mess.
Is there any possibility in PHP to create a generic function or some kind of wrapper that will retry executing function if it has failed (thrown an Exception)?
P.S. SDKs use curl to send requests. Maybe, it's possible to tweak curl to implement this kind of retry mechanism?
I need to download a large number of large files, stored across multiple identical servers. A file, like '5.doc', that is stored on server 3, is also stored on server 55.
To speed this up, instead of using just one server to download all the files one after another, I'm using all servers at the same time. The problem is that one of the servers may be much slower than the others, or may even be down. When using Guzzle to batch download files, all of the files in that batch must be downloaded before another batch starts.
Is there a way to immediately start downloading another file alongside others so that all of the servers are constantly downloading a file?
If a server is down, I've set a timeout of 300 seconds and when this is reached Guzzle will catch it's ConnectionException.
How do I identify which of the promises (downloads) have failed so I can cancel them? Can I get information about which file/server failed?
Below is a simplified example of the code I'm using to illustrate the point. Thanks for the help!
$filesToDownload = [['5.doc', '8.doc', '10.doc'], ['1.doc', '9.doc']]; //The file names that we need to download
$availableServers = [3, 55, 88]; //Server id's that are available
foreach ($filesToDownload as $index => $fileBatchToDownload) {
$promises = [];
foreach ($availableServers as $key => $availableServer) {
array_push(
$promises, $client->requestAsync('GET', 'http://domain.com/' . $fileBatchToDownload[$index][$key], [
'timeout' => 300,
'sink' => '/assets/' . $fileBatchToDownload[$index][$key]
])
);
$database->updateRecord($fileBatchToDownload[$index][$key], ['is_cached' => 1]);
}
try {
$results = Promise\unwrap($promises);
$results = Promise\settle($promises)->wait();
} catch (\GuzzleHttp\Exception\ConnectException $e) {
//When can't connect to the server or didn't download within timeout
foreach ($e->failed() as $failedPromise) {
//Re-set record in database to is_cached = 0
//Delete file from server
//Remove this server from the $availableServers list as it may be down or too slow
//Re-add this file to the next batch to download $filesToDownload
}
}
}
I'm not sure how you are doing an asynchronous download of one file from multiple servers using Guzzle, but getting array index of failed requests can be done by promise's then() method:
array_push(
$promises,
$client->requestAsync('GET', "http://localhost/file/{$id}", [
'timeout' => 10,
'sink' => "/assets/{$id}"
])->then(function() {
echo 'Success';
},
function() use ($id) {
echo "Failed: $id";
}
)
);
then() accepts two callbacks. First one is triggered on success and the second one on failure. Source calls them $onFullfilled and $onRejected. Other usages are documented in guzzle documentation. This way you can start downloading a file immediately after its failure.
Can I get information about which file/server failed?
When a promise failed then it means request remained unfulfilled. In this case you can get host and requested path by passing an instance of RequestException class to second then()'s callback:
use GuzzleHttp\Exception\RequestException;
.
.
.
array_push(
$promises,
$client->requestAsync('GET', "http://localhost/file/{$id}", [
'timeout' => 10,
'sink' => "/assets/{$id}"
])->then(function() {
echo 'Success';
},
function(RequestException $e) {
echo "Host: ".$e->getRequest()->getUri()->getHost(), "\n";
echo "Path: ".$e->getRequest()->getRequestTarget(), "\n";
}
)
);
So you will have full information about failing host and file's name. If you may need access to more information you should know that $e->getRequest() returns an instance of GuzzleHttp\Psr7\Request class and all methods on this class are available to be used here. (Guzzle and PSR-7)
When an item is successfully downloaded, can we then immediately
start a new file download on this free server, whilst the other files
are still downloading?
I think you should decide to download new files only on creating promises at the very beginning and repeat/renew failed requests within second callback. Trying to make new promises followed by a successful promise may result in an endless process with downloading duplicated files and that's not simple to handle.
I'm working on trace logger of sorts that pushes log message requests onto a Queue on a Service Bus, to later be picked off by a worker role which would insert them into the table store. While running on my machine, this works just fine (since I'm the only one using it), but once I put it up on a server to test, it produced the following error:
HTTP_Request2_MessageException: Malformed response: in D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2\Adapter\Socket.php on line 1013
0 HTTP_Request2_Response->__construct('', true, Object(Net_URL2)) D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2\Adapter\Socket.php:1013
1 HTTP_Request2_Adapter_Socket->readResponse() D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2\Adapter\Socket.php:139
2 HTTP_Request2_Adapter_Socket->sendRequest(Object(HTTP_Request2)) D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2.php:939
3 HTTP_Request2->send() D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\Common\Internal\Http\HttpClient.php:262
4 WindowsAzure\Common\Internal\Http\HttpClient->send(Array, Object(WindowsAzure\Common\Internal\Http\Url)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\Common\Internal\RestProxy.php:141
5 WindowsAzure\Common\Internal\RestProxy->sendContext(Object(WindowsAzure\Common\Internal\Http\HttpCallContext)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\Common\Internal\ServiceRestProxy.php:86
6 WindowsAzure\Common\Internal\ServiceRestProxy->sendContext(Object(WindowsAzure\Common\Internal\Http\HttpCallContext)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\ServiceBus\ServiceBusRestProxy.php:139
7 WindowsAzure\ServiceBus\ServiceBusRestProxy->sendMessage('<queuename>/mes…', Object(WindowsAzure\ServiceBus\Models\BrokeredMessage)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\ServiceBus\ServiceBusRestProxy.php:155
⋮
I've seen previous posts that describe similar issues; Namely:
Windows Azure PHP Queue REST Proxy Limit (Stack Overflow)
Operations on HTTPS do not work correctly (GitHub)
That imply that this is a known issue regarding the PHP Azure Storage libraries, where there are a limited amount of HTTPS connections allowed. Before requirements were changed, I was accessing the table store directly, and ran into this same issue, and fixed it in the way the first link describes.
The problem is that the Service Bus endpoint in the connection string, unlike Table Store (etc.) connection string endpoints, MUST be 'HTTPS'. Trying to use it with 'HTTP' will return a 400 - Bad Request error.
I was wondering if anyone had any ideas on a potential workaround. Any advice would be greatly appreciated.
Thanks!
EDIT (After Gary Liu's Comment):
Here's the code I use to add items to the queue:
private function logToAzureSB($source, $msg, $severity, $machine)
{
// Gather all relevant information
$msgInfo = array(
"Severity" => $severity,
"Message" => $msg,
"Machine" => $machine,
"Source" => $source
);
// Encode it to a JSON string, and add it to a Brokered message.
$encoded = json_encode($msgInfo);
$message = new BrokeredMessage($encoded);
$message->setContentType("application/json");
// Attempt to push the message onto the Queue
try
{
$this->sbRestProxy->sendQueueMessage($this->azureQueueName, $message);
}
catch(ServiceException $e)
{
throw new \DatabaseException($e->getMessage, $e->getCode, $e->getPrevious);
}
}
Here, $this->sbRestProxy is a Service Bus REST Proxy, set up when the logging class initializes.
On the recieving end of things, here's the code on the Worker role side of this:
public override void Run()
{
// Initiates the message pump and callback is invoked for each message that is received, calling close on the client will stop the pump.
Client.OnMessage((receivedMessage) =>
{
try
{
// Pull the Message from the recieved object.
Stream stream = receivedMessage.GetBody<Stream>();
StreamReader reader = new StreamReader(stream);
string message = reader.ReadToEnd();
LoggingMessage mMsg = JsonConvert.DeserializeObject<LoggingMessage>(message);
// Create an entry with the information given.
LogEntry entry = new LogEntry(mMsg);
// Set the Logger to the appropriate table store, and insert the entry into the table.
Logger.InsertIntoLog(entry, mMsg.Service);
}
catch
{
// Handle any message processing specific exceptions here
}
});
CompletedEvent.WaitOne();
}
Where Logging Message is a simple object that basically contains the same fields as the Message Logged in PHP (Used for JSON Deserialization), LogEntry is a TableEntity which contains these fields as well, and Logger is an instance of a Table Store Logger, set up during the worker role's OnStart method.
This was a known issue with the Windows Azure PHP, which hasn't been looked at in a long time, nor has it been fixed. In the time between when I posted this and now, We ended up writing a separate API web service for logging, and had our PHP Code send JSON strings to it over cURL, which works well enough as a temporary work around. We're moving off of PHP now, so this wont be an issue for much longer anyways.
We're currently looking into doing some performance tweaking on a website which relies heavily on a Soap webservice. But ... our servers are located in Belgium and the webservice we connect to is locate in San Francisco so it's a long distance connection to say the least.
Our website is PHP powered, using PHP's built in SoapClient class.
On average a call to the webservice takes 0.7 seconds and we are doing about 3-5 requests per page. All possible request/response caching is already implemented so we are now looking at other ways to improved the connection speed.
This is the code which instantiates the SoapClient, what i'm looking for now is other ways/methods to improve speed on single requestes. Anyone has idea's or suggestions?
private function _createClient()
{
try {
$wsdl = sprintf($this->config->wsUrl.'?wsdl', $this->wsdl);
$client = new SoapClient($wsdl, array(
'soap_version' => SOAP_1_1,
'encoding' => 'utf-8',
'connection_timeout' => 5,
'cache_wsdl' => 1,
'trace' => 1,
'features' => SOAP_SINGLE_ELEMENT_ARRAYS
));
$header_tags = array('username' => new SOAPVar($this->config->wsUsername, XSD_STRING, null, null, null, $this->ns),
'password' => new SOAPVar(md5($this->config->wsPassword), XSD_STRING, null, null, null, $this->ns));
$header_body = new SOAPVar($header_tags, SOAP_ENC_OBJECT);
$header = new SOAPHeader($this->ns, 'AuthHeaderElement', $header_body);
$client->__setSoapHeaders($header);
} catch (SoapFault $e){
controller('Error')->error($id.': Webservice connection error '.$e->getCode());
exit;
}
$this->client = $client;
return $this->client;
}
So, the root problem is number of request you have to do. What about creating grouped services ?
If you are in charge of the webservices, you could create specialized webservices which do multiple operations at the same time so your main app can just do one request per page.
If not you can relocate your app server near SF.
If relocating all the server is not possible and you can not create new specialized webservices, you could add a bridge, located near the webservices server. This bridge would provide the specialized webservices and be in charge of calling the atomic webservices. Instead of 0.7s * 5 you'd have 0.7s + 5 * 0.1 for example.
PHP.INI
output_buffering = On
output_handler = ob_gzhandler
zlib.output_compression = Off
Do you know for sure that it is the network latency slowing down each request? 0.7s seems a long round time, as Benoit says. I'd look at doing some benchmarking - you can do this with curl, although I'm not sure how this would work with your soap client.
Something like:
$ch = curl_init('http://path/to/sanfrancisco/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
$info will return an array including elements for total_time, namelookup_time, connect_time, pretransfer_time, starttransfer_time and redirect_time. From these you should be able to work out whether it's the dns, request, the actual soap server or the response that's taking up the time.
One obvious thing that's just occurred to me is are you requesting the SOAP server via a domain or an IP? If you're using a domain, your dns might be slowing things down significantly (although it will be cached at several stages). Check your local dns time (in your soap client or php.ini - not sure) and the TTL of your domain (in your DNS Zone). Set up a static IP for your SanFran server and reference it that way if not already.
Optimize the Servers (not the client!) HTTP response by using caching and HTTP compressing. Check out the tips at yahoo http://developer.yahoo.com/performance/rules.html
1 You can assert your soap server use gzip compression for http content, as well as your site output does. A 0,7s roundup to SF seems a bit long, it's either webservice is long to answer, either there is an important natwork latency.
If you can, give a try to other hosting companies for your belgium server, in France some got a far better connectivity to the US than others.
I experienced to move a website from one host o another and network latency between Paris and New york has almost doubled ! it's uge and my client with a lot of US visitors was unhappy with it.
The solution of relocating web server to SF can be an option, you'll get a far better connectivity between servers, but be careful of latency if your visitors are mainly located in Europe.
2 You can use an opcode cache mecanism, such as xcache or APC. It wil not change the soap latency, but will improve php execution time.
3 Depending if soap request are repetitive, and how long could a content update could be extended, you can give it a real improvement using cache on soap results. I suggest you to use in-memory caching system (Like xcache/memcached or other) because they're ay much faster than file or DB cache system.
From your class, the createclient method isn't the most adapted exemple functionality to be cached, but for any read operation it's just the best way to perf :
private function _createClient()
{
$xcache_key = 'clientcache'
if (!xcache_isset($key)) {
$ttl = 3600; //one hour cache lifetime
$client = $this->_getClient(); ///private method embedding your soap request
xcache_set($xcache_key, $client, $ttl);
return $client;
}
//return result form mem cache
return xcache_get($xcache_key);
}
The example is for xcache extension, but you can use other systems in a very similar manner
4 To go further you can use similar mecanism to cache your php processing results (like template rendering output and other ressource consumming operations). The key to success with this technic is to know exactly wich part is cached and for how long it will stay withous refreshing.
Any chance of using an AJAX interface.. if the requests can be happening in the background, you will not seem to be left waiting for the response.