PHP cUrl Rate Limiting per API - php

I am working on an API mashup in PHP of various popular APIs and would liek to implement rate limiting to ensure i am playing nice.
I did some research and have taken a look at CURLOPT_MAXCONNECTS and CURLOPT_TIMEOUT but I have some confusion about how they function.
As I understand it, likely incorrectly:
CURLOPT_MAXCONNECTS
---
Each script that calls a cUrl request opens a connection.
When the MAXCONNECTS limit is reached, then the server delays the request.
CURLOPT_TIMEOUT
---
The amount of time that the server will wait to make a connection.
Working with MAXCONNECTS, does that mean that cUrl will make the listed
number of connections and then wait up to TIMEOUT for an open thread?
So-- I am, obviously, very confused about how cUrl actually functions with these parameters. The application I am developing needs to limit cUrl requests at different limits for each API I am calling. As I understand things, the cUrl options are server wide? Is there some method of attaching a token to a specific cUrl call and applying the limit per API that way? Do I need to work some global/shared memory magic?
Your truly and considerably confused,
Samantha.

CURLOPT_MAXCONNECTS is just the maximum number of simultaneous requests.
CURLOPT_TIMEOUT is the time cURL will wait before abording request in the case that there is no answer.
You'll have to work you limits manually

Related

Php efficiently set read timeout on HTTPS requests

Is there a way to efficiently set a timeout on an https request in php?
There is a solution with curl simulating a SOAP request, however it is significantly slower.
Benchmarking it against a slow service shows an average of 6.5 seconds instead of 5.5 seconds the SOAP Client takes for the same request.
There is an alternative of stream socket
(implemented by Zend Framework on ZendHttpClientAdapterSocket) however the
stream_set_timeout function does not seem to work on https connections.
Please note that the issue is the read timeout (time to get a response) and not the connect timeout (time to do the handshake) which works on both http and soap requests
Finding a solution to make curl faster would resolve the issue as well.
USER STORY
I am making requests on an external https soap webservice using zend soap client.
The service usually responds on average on 5.5 seconds when the network is ok.
When there are network issues however, some requests take up to 7 to 8 minutes
consuming server resources.
I can use curl and force a timeout and then i am solving my problems when there are network issues with the webservice.
However my average response time goes up to 6.5 seconds when the network is ok
The business requirement suggests that requests that take longer than 30 seconds should rather be dropped in order to ensure stability of the system.
That depends on what you're using other than cURL.
If you're using streams you can just use stream_set_timeout (which sets the read-write timeout).
The connect timeout you can specify in fsockopen or however you create your stream.
See if you can specify a read-write timeout in your SOAP client?

Twitter API works locally, but is spotty on remote server

I wrote a script that pulls the current top Twitter trends using cURL and it works 100% of the time locally but when I FTP it up to my mediatemple server it seems to only work sometimes. Is this caused by Twitter? Mediatemple? Some error in my code?
EDIT: How can I cache content in a flat-file?
If the code works sometimes that suggests it is not a problem with your code so there are two logical areas for potential blame:
1) Web Server Load
This could be your server is to bogged down. If the server (not just your site - consider this if your on shared hosting) is experiencing a heavy load then it may take your server too long to complete the curl request. to combat this try and increase the timeout time on the request using the following:
CURLOPT_CONNECTTIMEOUT
2) Twitter Rate Limmit
Twitter limits the number of API calls you can make from one authorized account per hour (I believe the number is around 100ish - check their API Documentation) If you are hitting this limit you will be declined further calls until the 1 hour anniversary of the first call. To combat this have either a cron job run the curl at a set interval and cache the result in a text file or database or store the time of each request made and use an IF to only allow one request every 2 or 3 mins, cache the results and pull the results from the cache.
Making a call to the twitter API every time there is a page load is a waste of resources, bandwith and could reduce page load time.

cURL sometimes returning blank string for a valid URL

I'm using the rolling-curl [https://github.com/LionsAd/rolling-curl] library to asynchronously retrieve content from a large amount of web resources as part of a scheduled task. The library allows you to set the maximum number of concurrent CURL connections, and I started out at 20 but later moved up to 50 to increase speed.
It seems that every time I run it, arbitrary urls out of the several thousand being processed just fail and return a blank string. It seems the more concurrent connections I have, the more failed requests I get. The same url that failed one time may work the next time I attempt to run the function. What could be causing this, and how can I avoid it?
Everything Luc Franken wrote is accurate and his answer lead me to the solution to my version of the questioner's problem, which is:
Remote servers respond according to their own, highly variable, schedules. To give them enough time to respond, it's important to set two cURL parameters to provide a liberal amount of time. They are:
CURLOPT_CONNECTTIMEOUT => 30
CURLOPT_TIMEOUT => 30
You can try longer and shorter amounts of time until you find something that minimizes errors. But if you're getting intermittent non-responses with curl/multi-curl/rollingcurl, you can likely solve most of the issue this way.
In general you assume that this should not happen.
In the case of accessing external servers that is just not the case. Your code should be totally aware of servers which might not respond, don't respond in time or respond wrong. It is allowed in the HTTP process that things can go wrong. If you reach the server you should get notified by an HTTP error code (although that not always happens) but also network issues can create no or useless responses.
Don't trust external input. That's the root of the issue.
In your concrete case you increase the amount of requests consistently. That will create more requests, open sockets and other uses. To find the solution to your exact issue you need advanced access to the server so you can see the logfiles and monitor open connections and other concerns. Preferably you test this on a test server without any other software creating connections so you can isolate the issue.
But how well tested you make it, you have just uncertainties. For example you might get blocked by external servers because you make too many requests. You might be get stuck in some security filters like DDOS filters etc. Monitoring and customization of the amount of requests (automated or by hand) will generate the most stable solution for you. You could also just accept these lost requests and just handle a stable queue which makes sure you get the contents in at a certain moment in time.

What are the limits of PHP's multi curl functions?

Is there any limits in the max amounts of concurrent connections a multi curl session make?
I am using it to process batches of calls that I need to make to a API service, I just want to be careful that this does not effect the rest of my app.
A few queries, do curl sessions take up the amount of connections the apache server can serve? Is multi curl a ram or CPU hungry operation? I'm nit concerned about bandwidth because I have lots of it, a mighty fast host and only small amounts of data is being sent and received per request.
And I imagine it depends on server hardware / config...
But I can't seem to find what limits the amount of curl sessions on the documentation.
PHP doesn't impose any limitations on the number of concurrent curl requests you can make. You might hit the maximum execution time or the memory limit though. It's also possible that your host limits the amount of concurrent connections you're allowed to make.

KeepAlive packets over a Soap request

I've been debugging some Soap requests we are making between two servers on the same VLAN. The app on one server is written in PHP, the app on the other is written in Java. I can control and make changes to the PHP code, but I can't affect the Java server. The PHP app forms the XML using the DOMDocument objects, then sends the request using the cURL extension.
When the soap request took longer than 5 minutes to complete, it would always wait until the max timeout limit and exit with a message like this:
Operation timed out after 900000 milliseconds with 0 bytes received
After sniffing the packets that were being sent, it turns out that the problem was caused by a 5 minute timeout in the network that was closing what it thought was a stale connection. There were two ways to fix it: bump up the timeout in iptables, or start sending KeepAlive packets over the request.
To be thorough, I would like to implement both solutions. Bumping up the timeout was easy for ops to do, but sending KeepAlive packets is turning out to be difficult. The cURL library itself supports this (see the --keepalive-time flag for the CLI app), but it doesn't appear that this has been implemented in the PHP cURL library. I even checked the source to make sure it wasn't an undocumented feature.
So my question is this: How the heck can I get these packets sent? I see a few clear options, but I don't like any of them:
Write a wrapper that will kick off the request by shell_execing the CLI app. This is a hack that I just don't like
Update the cURL extension to support this. This is a non-option according to Ops.
Open the socket myself. I know just enough to be dangerous. I also haven't seen a way to do this with fsockopen, but I could be missing something.
Switch to another library. What exists that supports this?
Thanks for any help you can offer.

Categories