Is there any limits in the max amounts of concurrent connections a multi curl session make?
I am using it to process batches of calls that I need to make to a API service, I just want to be careful that this does not effect the rest of my app.
A few queries, do curl sessions take up the amount of connections the apache server can serve? Is multi curl a ram or CPU hungry operation? I'm nit concerned about bandwidth because I have lots of it, a mighty fast host and only small amounts of data is being sent and received per request.
And I imagine it depends on server hardware / config...
But I can't seem to find what limits the amount of curl sessions on the documentation.
PHP doesn't impose any limitations on the number of concurrent curl requests you can make. You might hit the maximum execution time or the memory limit though. It's also possible that your host limits the amount of concurrent connections you're allowed to make.
Related
suppose I use CURL MULTI to perform parallel operations (upload/download etc)
Is there a recommended maximum limit of parallel operations that I can perform or can I set it so that it runs, for instance, 100 operations in parallel? What about 1000?
What sort of factors should I consider when I specify the number of concurrent operations using CURL?
For 1000 CURL in parallel you will need a good bandwidth, the recommended amount of parallel requests can be found by debugging, set a function to check for timed-out connections and in that case decrease / increase the limit, or timeout period.
This answer to this question is dependent on your network capacity, the capacity of your NIC, and what you are trying to optimize.
If you are trying to minimize latency of a single request, the answer is quite possible 1 or something close to 1.
If you are trying to maximize throughput, then you should keep increasing the number of parallel operations until your throughput peaks and then either plateaus or falls. That will be the sweet spot.
Most browsers will open at most 4 connections at a time to a single server. That's probably a good guideline.
If you're downloading from different servers, the only problem may be that you'll use up lots of local TCP ports, and this might interfere with other applications on the system. It won't get in the way of connections TO the server, since those all use the same local port 80 (or 443 for SSL). But if you have other applications using CURL, or your machine sends mail, they might not be able to get an outgoing port if you use them all up. There are typically 15-30K ephemeral ports, so you could probably get away with using 1,000 of them.
How to create an exact amount of artificial CPU load using PHP? I am looking for PHP code that generates 1% CPU utilization for 1 HTTP request per second. That means I can get 50% CPU utilization using 50 HTTP requests per second; and 80% CPU utilization using a rate of 80 HTTP requests per second, and so on.
It is not possible doing it using PHP. You should look into virtualization systems. For example, Cloudmin:
Using Cloudmin you can create, destroy, resize, startup, shutdown and restrict multiple instances using different virtualization technologies from a single interface. It also has a full command line API that can be used to manage virtual systems from a shell script or via HTTP requests.
Cloudmin supports additional logins called system owners, who can be given limited access to a subset of virtual systems, and can be restricted in the actions that they perform. Owners can have limits set on their disk, RAM and CPU usage that apply across all their virtual systems, either defined on a per-owner limit or from a plan.
I have a script which runs 1000 cURL requests using curl_multi_* functions in PHP.
What is the bottleneck behind them timing out?
Would it be the CPU usage? Is there some more efficient way, in terms of how that number of outbound connections is handled by the server, to do this?
I cannot change the functionality and the requests themselves are simple calls to a remote API. I am just wondering what the limit is - would I need to increase memory on the server, or Apache connections, or CPU? (Or something else I have missed)
Your requests are made in a single thread of execution. The bottleneck is almost certainly CPU, have you ever actually watched curl multi code run ? ... it is incredibly cpu hungry; because you don't really have enough control over dealing with the requests. curl_multi makes it possible for you to orchestrate 1000 requests at once, but this doesn't make it a good idea. You have almost no chance of using curl_multi efficiently because you cannot control the flow of execution finely enough, just servicing the sockets and select()'ing on them will account for a lot of the high CPU usage you would see watching your code run on the command line.
The reasons the CPU usage is high during such tasks is this; PHP is designed to run for a fraction of a second, do everything as fast as it can. It usually does not matter how the CPU is utilized, because it's for such a short space of time. When you prolong a task like this the problem becomes more apparent, the overhead incurred with every opcode becomes visible to the programmer.
I'm aware you have said you cannot change the implementation, but still, for a complete answer. Such a task is far more suitable for Threading than curl multi, and you should start reading http://php.net/pthreads, starting with http://php.net/Thread
Left to their own devices on an idle CPU even 1000 threads would consume as much CPU as curl_multi, the point is that you can control precisely the code responsible for downloading every byte of response and upload every byte of the request, and if CPU usage is a concern you can implement a "nice" process by explicitly calling usleep, or limiting connection usage in a meaningful way, additionally your requests can be serviced in separate threads.
I do not suggest that 1000 threads is the thing to do, it is more than likely not. The thing to do would be design a Stackable ( see the documentation ) whose job is to make and service a request in a "nice", efficient way, and design pools ( see examples on github/pecl extension sources ) of workers to execute your newly designed requests ...
My PHP script takes approx 7 secs to run since it fetches and prcess data from various sources in the web. How is this associated with the number of requests I can process per second?
It depends on what resources your script uses.
Basically, when you run out of CPU, disk I/O, memory on your script's server, or on your database server, or on any of the servers that you fetch data from, or hit third party API request limits, the game is over.
This generally has to do with concurrent requests rather than requests per second - how many requests can, at the same time, access the resources they require from the pool available to all requests. This is actually a lot more complicated in real life, since requests access different resources at different parts of their lifecycle, and also you will generally handle requests for different scripts on the same server, each with a different mix of resource requirements.
Long-running requests have an interesting interaction with requests per second. Usually, requests will take, say, 200ms. If you can handle 50 concurrent requests, that means you could handle something like 250 requests per second. Assuming you can only handle 50 concurrent requests, you can only handle 7 requests per second.
In the first second, you will have 7 running scripts. In the second, 14. Up to 49 in the seventh. Then, in the eighth, 7 will free up due to completion, and 7 will be added from new requests.
You may run into Apache configuration issues way before you run into actual resource usage issues - you will need to up the number of servers/workers because you have a somewhat non-standard use-case (ie, most requests are handled in less than a second). Depending on how complicated your processing is, you may be able to handle several hundred concurrent scripts if most of the time they are doing network I/O.
Benchmarking and other performance analysis is the only way to get more accurate information about requests per second and/or concurrent connections.
I am looking to create an ajax powered chatroom for my website.
I have used yshout and it seems very good but crashes when there are too many connections.
What is the best way to go about doing this using the minimum resources possible?
Probably one of the following::
Exceeding the number available threads. Depending on your configuration, you'll have a limit to how many requests can be simultaneously served. Since yshout will be maintaining open connections for longer than normal requests, you're far more likely to exhaust your thread/process limit. See the relevant Apache documentation for more info (assuming Apache, of course).
Exceeding PHP memory limits. For the reasons above, you're likely to need more memory to handle the multiple long running HTTP requests. Try throwing more memory in your server and bumping the PHPs memory limit.