Optimize PHP+CURL connection times

Optimize PHP+CURL connection times - php

I have PHP scripts doing CURL POST requests to distant Nginx servers via HTTPS (several times per second).
My issue is that each request needs 3 round-trips (TCP connection + SSL handshake) before the transfer can start, which significantly slows down the process.
Is there a way to do reduce this, for instance with some sort of "Keep-Alive" to avoid renegotiating TCP / SSL for each request?
Thank you!

There is no way to keep a connection alive between two different PHP execution as the PHP script "die" at the end (thus closing any open socket), the only way to do what you want to achieve would be to have a background PHP script that never stops, takes care of fetching the data and put them into a database or a file that you will be able to easily and rapidly query later.
On another topic making multiple HTTPS request per second is maybe not the most efficient way to do it, if you have the hand on the server you query you might want to use WebSockets, that would allow you to make multiple queries per second without any major performance issue
I hope this answered you question, have a good day

Related

How to process multiple parallel requests from one client to one PHP script

I have a webpage that when users go to it, multiple (10-20) Ajax requests are instantly made to a single PHP script, which depending on the parameters in the request, returns a different report with highly aggregated data.
The problem is that a lot of the reports require heavy SQL calls to get the necessary data, and in some cases, a report can take several seconds to load.
As a result, because one client is sending multiple requests to the same PHP script, you end up seeing the reports slowly load on the page one at a time. In other words, the generating of the reports is not done in parallel, and thus causes the page to take a while to fully load.
Is there any way to get around this in PHP and make it possible for all the requests from a single client to a single PHP script to be processed in parallel so that the page and all its reports can be loaded faster?
Thank you.

As far as I know, it is possible to do multi-threading in PHP.
Have a look at pthreads extension.
What you could do is make the report generation part/function of the script to be executed in parallel. This will make sure that each function is executed in a thread of its own and will retrieve your results much sooner. Also, set the maximum number of concurrent threads <= 10 so that it doesn't become a resource hog.
Here is a basic tutorial to get you started with pthreads.
And a few more examples which could be of help (Notably the SQLWorker example in your case)

Server setup
This is more of a server configuration issue and depends on how PHP is installed on your system: If you use php-fpm you have to increase the pm.max_children option. If you use PHP via (F)CGI you have to configure the webserver itself to use more children.
Database
You also have to make sure that your database server allows that many concurrent processes to run. It won’t do any good if you have enough PHP processes running but half of them have to wait for the database to notice them.
In MySQL, for example, the setting for that is max_connections.
Browser limitations
Another problem you’re facing is that browsers won’t do 10-20 parallel requests to the same hosts. It depends on the browser, but to my knowledge modern browsers will only open 2-6 connections to the same host (domain) simultaneously. So any more requests will just get queued, regardless of server configuration.
Alternatives
If you use MySQL, you could try to merge all your calls into one request and use parallel SQL queries using mysqli::poll().
If that’s not possible you could try calling child processes or forking within your PHP script.

Of course PHP can execute multiple requests in parallel, if it uses a Web Server like Apache or Nginx. PHP dev server is single threaded, but this should ony be used for dev anyway. If you are using php's file sessions however, access to the session is serialized. I.e. only one script can have the session file open at any time. Solution: Fetch information from the session at script start, then close the session.

Apache, PHP multi thread over windows for concurrent users

I have a MS sql server that solves queries sent by a system made in PHP, everything under windows. the problem i have right now is that if a query takes a long time o proccess, all the remaining incoming request made by other users won't be processed until php get the results from the first one and finishes the first request.
is there a way to allow/make php to handle parallel, simultaneously many request? because right now i have a very big bottleneck since the sql server can handle a lot of simultaneous queries but the web application can just send the block of queries request by request.
if neccesary i can use solutions based on linux too

Finally i solved my problem using the function session_write_close()
since the session file was the same for all the request and php was locking it, all the request get stucked

Should I $mysqli->close a connection after each page load, if PHP runs via FCGI?

I run PHP via FCGI - that is my web server spawns several PHP processes and they keep running for like 10,000 requests until they get recycled.
My question is - if I've a $mysqli->connect at the top of my PHP script, do I need to call $mysqli->close in when I'm about to end running the script?
Since PHP processes are open for a long time, I'd image each $mysqli->connect would leak 1 connection, because the process keeps running and no one closes the connection.
Am I right in my thinking or not? Should I call $mysqli->close?

When PHP exits it closes the database connections gracefully.
The only reason to use the close method is when you want to terminate a database connection that you´ll not use anymore, and you have lots of things to do: Like processing and streaming the data, but if this is quick, you can forget about the close statement.
Putting it in the end of a script means redundancy, no performance or memory gain.

In a bit more detail, specifically about FastCGI:
FastCGI keeps PHP processing running between requests. FastCGI is good at reducing CPU usage by leveraging your server's available RAM to keep PHP scripts in memory instead of having to start up a separate PHP process for each and every PHP request.

FastCGI will start a master process and as many forks of that master process as you have defined and yes those forked processes might life for a long time. This means in effect that the process doesn't have to start-up the complete PHP process each time it needs to execute a script. But it's not like you think that your scripts are now running all the time. There is still a start-up and shutdown phase each time a script has to be executed. At this point things like global variables (e.g. $_POST and $_GET) are populated etc. You can execute functions each time your process shuts down via register_shutdown_function().
If you aren't using persistent database connections and aren't closing database connections, nothing bad will happen. As Colin Schoen explained, PHP will eventually close them during shutdown.
Still, I highly encourage you to close your connections because a correctly crafted program knows when the lifetime of an object is over and cleans itself up. It might give you exactly the milli- or nanosecond that you need to deliver something in time.
Simply always create self-contained objects that are also cleaning up after they are finished with whatever they did.

I've never trusted FCGI to close my database connections for me. One habit I learned in a beginners book many years ago is to always explicitly close my database connections.
Is not typing sixteen keystrokes worth the possible memory and connection leak? As far as I'm concerned its cheap insurance.

If you have long running FastCGI processes, via e.g. php-fpm, you can gain performance by reusing your database connection inside each process and avoiding the cost of opening one.
Since you are most likely opening a connection at some point in your code, you should read up on how to have mysqli open a persistent connection and return it to you on subsequent requests managed by the same process.
http://php.net/manual/en/mysqli.quickstart.connections.php
http://php.net/manual/en/mysqli.persistconns.php
In this case you don't want to close the connection, else you are defeating the purpose of keeping it open. Also, be aware that each PHP process will use a separate connection so your database should allow for at least that number of connections to be opened simultaneously.

You're right in your way of thinking. It is still important to close connection to prevent memory/data leaks and corruption.
You can also lower the amount of scipts recycled each cycle, to stop for a connection close.
For example: each 2500 script runs, stop and close and reopen connection.
Also recommended: back up data frequently.
Hope I helped. Phantom

php doing curl in loop will slow down server?

If I have a loop with a lot of curl executions happening, will that slow down the server that is running that process? I realize that when this process runs, and I open a new tab to access some other page on the website, it doesn't load until this curl process that's happening finishes, is there a way for this process to run without interfering with the performance of the site?
For example this is what I'm doing:
foreach ($chs as $ch) {
$content = curl_exec($ch);
... do random stuff...
}
I know I can do multi curl, but for the purposes of what I'm doing, I need to do it like this.
Edit:
Okay, maybe this might change things a bit but I actually want this process to run using WordPress cron. If this is running as a WordPress "cron", would it hinder the page performance of the WordPress site? So in essence, if the process is running, and people try to access the site, will they be lagged up?

The curl requests are not asynchronous so using curl like that, any code after that loop will have to wait to execute until after the curl requests have each finished in turn.
curl_multi_init is PHP's fix for this issue. You mentioned you need to do it the way you are, but is there a way you can refactor to use that?
http://php.net/manual/en/function.curl-multi-init.php
As an alternate, this library is really good for this purpose too: https://github.com/petewarden/ParallelCurl

Not likely unless you use a strictly 1-thread server for development. Different requests are eg in Apache handled by workers (which depending on your exact setup can be either threads or separate processes) and all these workers run independently.
The effect you're seeing is caused by your browser and not by the server. It is suggested in rfc 2616 that a client only opens a limited number of parallel connections to a server:
Clients that use persistent connections SHOULD limit the number of
simultaneous connections that they maintain to a given server. A
single-user client SHOULD NOT maintain more than 2 connections with
any server or proxy.
btw, the standard usage of capitalized keywords like here SHOULD and SHOULD NOT is explained in rfc 2119
and that's what eg Firefox and probably other browsers also use as their defaults. By opening more tabs you quickly exhaust these parallel open channels, and that's what causes the wait.
EDIT: but after reading #earl3s 'reply I realize that there's more to it: earl3s addresses the performance within each page request (and thus the server's "performance" as experienced by the individual user), which can in fact be sped up by parallelizing curl requests. But at the cost of creating more than one simultaneous link to the system(s) you're querying... And that's where rfc2616's recommendation comes back into play: unless the backend systems delivering the content are under your control you should think twice before paralleling your curl requests, as each page hit on your system will hit the backend system with n simultaneous hits...
EDIT2: to answer OP's clarification: no (for the same reason I explained in the first paragraph - the "cron" job will be running in another worker than those serving your users), and if you don't overdo it, ie, don't go wild on parallel threads, you can even mildly parallelize the outgoing requests. But the latter more to be a good neighbour than because of fear to met down your own server.

I just tested it and it looks like the multi curl process running on WP's "cron" made no noticeable negative impact on the site's performance. I was able to load multiple other pages with no terrible lag on the site while the site was running the multi curl process. So looks like it's okay. And I also made sure that there is locking so that this process doesn't get scheduled multiple times. And besides, this process will only run once a day in U.S. low-peak hours. Thanks.

cURL sometimes returning blank string for a valid URL

I'm using the rolling-curl [https://github.com/LionsAd/rolling-curl] library to asynchronously retrieve content from a large amount of web resources as part of a scheduled task. The library allows you to set the maximum number of concurrent CURL connections, and I started out at 20 but later moved up to 50 to increase speed.
It seems that every time I run it, arbitrary urls out of the several thousand being processed just fail and return a blank string. It seems the more concurrent connections I have, the more failed requests I get. The same url that failed one time may work the next time I attempt to run the function. What could be causing this, and how can I avoid it?

Everything Luc Franken wrote is accurate and his answer lead me to the solution to my version of the questioner's problem, which is:
Remote servers respond according to their own, highly variable, schedules. To give them enough time to respond, it's important to set two cURL parameters to provide a liberal amount of time. They are:
CURLOPT_CONNECTTIMEOUT => 30
CURLOPT_TIMEOUT => 30
You can try longer and shorter amounts of time until you find something that minimizes errors. But if you're getting intermittent non-responses with curl/multi-curl/rollingcurl, you can likely solve most of the issue this way.

In general you assume that this should not happen.
In the case of accessing external servers that is just not the case. Your code should be totally aware of servers which might not respond, don't respond in time or respond wrong. It is allowed in the HTTP process that things can go wrong. If you reach the server you should get notified by an HTTP error code (although that not always happens) but also network issues can create no or useless responses.
Don't trust external input. That's the root of the issue.
In your concrete case you increase the amount of requests consistently. That will create more requests, open sockets and other uses. To find the solution to your exact issue you need advanced access to the server so you can see the logfiles and monitor open connections and other concerns. Preferably you test this on a test server without any other software creating connections so you can isolate the issue.
But how well tested you make it, you have just uncertainties. For example you might get blocked by external servers because you make too many requests. You might be get stuck in some security filters like DDOS filters etc. Monitoring and customization of the amount of requests (automated or by hand) will generate the most stable solution for you. You could also just accept these lost requests and just handle a stable queue which makes sure you get the contents in at a certain moment in time.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.