I am developing a web app that needs to call functions from a web service using SOAP.
I have coded a simple PHP backend to access the SOAP functions. The backend mainly consists of classes with static "helper" methods (I use __autoload() to load the classes only when needed).
At some point in the app, I need to call two SOAP functions at the same time. Since I want to speed up the process, I use two simultaneous ajax calls.
The thing is, those calls don't always run simultaneously. The requests are sent at the same time, but the second call lasts longer than the first one (they should have roughly the same run time). If the first call lasts 600 ms, the second one lasts 1400-1600 ms.
I have setup a simple test, where I call both functions first in parallel and then, when both requests are complete, in serial one after the other.
I have tried profiling the PHP side (both using microtime() and xdebug), but I've consistently found that the calls have the same execution time on PHP (about 500ms).
What is REALLY driving me insane, though, is the fact that the very same code seems to work and break in an apparently random fashion. Sometimes I reload the page and the calls are executed simultaneously. Then I reload again, and the second call takes longer, once more.
PHP Session lock should not be an issue, I open the session only when needed and close it with session_write_close() afterwards.
The SOAP server called by the php backend supports multiple connections, and besides my microtime() profiling has shown me that it takes about 400ms to complete a SOAP request, so I don't think the web service is to blame.
On the client side, the requests are sent at the same time. I tried adding a slight delay to the second call, to no avail.
But then again, this "random" behaviour is baffling me.
I have tested the app both on local (WAMP) and on two different servers (shared hosting). In all of the environments, PHP 5.5 was used.
What could be causing this discrepancy, and how could one try to isolate the problem, if the behaviour is not 100% consistent?
I've solved the mystery.
In my local environment, once I disabled xdebug, the ajax calls ran flawlessly in parallel.
In remote, I still had the same problem (and no xdebug at all).
So first of all, I've tried using one single AJAX call to a PHP script that used curl_multi_exec() to run the SOAP requests in parallel, basically a similar setup but mainly server-side.
I found that I had the same discrepancies in the timings using this technique as well, with a very similar pattern.
I've contacted the hosting provider explaining my problem, and they told me that it might be due to the shared environment having limited resources depending on the overall load.
So in the end while it's possible to achieve this behaviour, in my specific case I had to change my strategy and use two serial AJAX calls, updating the view after the first call to give the impression of a faster response while waiting for the second call to complete.
Related
After some investigation, I updated the title of the question. Please see my updates below.
Original question:
I'm building a website with Wordpress and make sometimes use of async calls to WP REST API endpoints.
Calling this endpoints from my AJAX functions leads often to TTFB times from at least ~780ms:
But if I open the URL/endpoint directly in the browser I get TTFB times that are 4-5 time faster:
I wonder where the delays come frome. I'm running this page on my local dev server with Apache 2.4, HTTP/2 and PHP 7 enabled.
What is the best way to monitor such performance "issues"?
Please mind: I'm not using Wordpress' built-in AJAX-functionality. I'm just calling something like
axios.get(`${url}/wp-json/wp/v2/flightplan`)
inside a React component that I've mounted in my homepage-template.
Update
Damn interesting: clearing cookies reduces TTFB a lot:
Update 2
After removing the other both AJAX calls, the flightplan request performs much faster. I think there are some issues with concurrent AJAX requests. I've read a bit about sessions locking, but since Wordpress and all of the installed plugins doesn't make use of sessions, this can't be the reason.
Update 3
Definitively, it has something to do with my local server setup. Just deployed the site to a "real" webserver:
But it would still be interesting to know how to setup a server that can handle concurrent better.
Update 4
I made a little test: calling 4 dummy-requests before calling the "real" ones. The script only returns a "Foobar" string. Everything looks fine at this time:
But when adding sleep(3) to the dummy AJAX-script, all other requests takes much longer too:
Why?
Because your Ajax call will wait the loading of all your WP plugins :)
So you need to make some test without plugin, and activate one by one to see which one slow down your ajax call.
I have a webpage that when users go to it, multiple (10-20) Ajax requests are instantly made to a single PHP script, which depending on the parameters in the request, returns a different report with highly aggregated data.
The problem is that a lot of the reports require heavy SQL calls to get the necessary data, and in some cases, a report can take several seconds to load.
As a result, because one client is sending multiple requests to the same PHP script, you end up seeing the reports slowly load on the page one at a time. In other words, the generating of the reports is not done in parallel, and thus causes the page to take a while to fully load.
Is there any way to get around this in PHP and make it possible for all the requests from a single client to a single PHP script to be processed in parallel so that the page and all its reports can be loaded faster?
Thank you.
As far as I know, it is possible to do multi-threading in PHP.
Have a look at pthreads extension.
What you could do is make the report generation part/function of the script to be executed in parallel. This will make sure that each function is executed in a thread of its own and will retrieve your results much sooner. Also, set the maximum number of concurrent threads <= 10 so that it doesn't become a resource hog.
Here is a basic tutorial to get you started with pthreads.
And a few more examples which could be of help (Notably the SQLWorker example in your case)
Server setup
This is more of a server configuration issue and depends on how PHP is installed on your system: If you use php-fpm you have to increase the pm.max_children option. If you use PHP via (F)CGI you have to configure the webserver itself to use more children.
Database
You also have to make sure that your database server allows that many concurrent processes to run. It won’t do any good if you have enough PHP processes running but half of them have to wait for the database to notice them.
In MySQL, for example, the setting for that is max_connections.
Browser limitations
Another problem you’re facing is that browsers won’t do 10-20 parallel requests to the same hosts. It depends on the browser, but to my knowledge modern browsers will only open 2-6 connections to the same host (domain) simultaneously. So any more requests will just get queued, regardless of server configuration.
Alternatives
If you use MySQL, you could try to merge all your calls into one request and use parallel SQL queries using mysqli::poll().
If that’s not possible you could try calling child processes or forking within your PHP script.
Of course PHP can execute multiple requests in parallel, if it uses a Web Server like Apache or Nginx. PHP dev server is single threaded, but this should ony be used for dev anyway. If you are using php's file sessions however, access to the session is serialized. I.e. only one script can have the session file open at any time. Solution: Fetch information from the session at script start, then close the session.
I'm creating a plugin for a CMS and need one or more preriodical tasks in background. As it is a plugin for an open source CMS, cron job is not a perfect solution because users may not have access to cron on their server.
I'm going to start a infinite loop via an AJAX request then abort XHR request. So HTTP connection will be closed but script continue running.
Is it a good solution generally? What about server resources? Is there any shutdown or limitation policies in servers (such as Apache) for long time running threads?
Long running php scripts are not too good idea. If your script uses session variables your user won't be able to load any pages until the other session based script is closed.
If you really need long running scripts make sure its not using any session and keep them under the maximum execution time. Do not let it run without your control. It can cause various problems. I remember when I made such a things like that and my server just crashed several times.
Know what you want to do and make sure it's well tested on different servers.
Also search for similiar modules and check what methods they use for such a problems like that. Learn from the pros. :)
If I have a loop with a lot of curl executions happening, will that slow down the server that is running that process? I realize that when this process runs, and I open a new tab to access some other page on the website, it doesn't load until this curl process that's happening finishes, is there a way for this process to run without interfering with the performance of the site?
For example this is what I'm doing:
foreach ($chs as $ch) {
$content = curl_exec($ch);
... do random stuff...
}
I know I can do multi curl, but for the purposes of what I'm doing, I need to do it like this.
Edit:
Okay, maybe this might change things a bit but I actually want this process to run using WordPress cron. If this is running as a WordPress "cron", would it hinder the page performance of the WordPress site? So in essence, if the process is running, and people try to access the site, will they be lagged up?
The curl requests are not asynchronous so using curl like that, any code after that loop will have to wait to execute until after the curl requests have each finished in turn.
curl_multi_init is PHP's fix for this issue. You mentioned you need to do it the way you are, but is there a way you can refactor to use that?
http://php.net/manual/en/function.curl-multi-init.php
As an alternate, this library is really good for this purpose too: https://github.com/petewarden/ParallelCurl
Not likely unless you use a strictly 1-thread server for development. Different requests are eg in Apache handled by workers (which depending on your exact setup can be either threads or separate processes) and all these workers run independently.
The effect you're seeing is caused by your browser and not by the server. It is suggested in rfc 2616 that a client only opens a limited number of parallel connections to a server:
Clients that use persistent connections SHOULD limit the number of
simultaneous connections that they maintain to a given server. A
single-user client SHOULD NOT maintain more than 2 connections with
any server or proxy.
btw, the standard usage of capitalized keywords like here SHOULD and SHOULD NOT is explained in rfc 2119
and that's what eg Firefox and probably other browsers also use as their defaults. By opening more tabs you quickly exhaust these parallel open channels, and that's what causes the wait.
EDIT: but after reading #earl3s 'reply I realize that there's more to it: earl3s addresses the performance within each page request (and thus the server's "performance" as experienced by the individual user), which can in fact be sped up by parallelizing curl requests. But at the cost of creating more than one simultaneous link to the system(s) you're querying... And that's where rfc2616's recommendation comes back into play: unless the backend systems delivering the content are under your control you should think twice before paralleling your curl requests, as each page hit on your system will hit the backend system with n simultaneous hits...
EDIT2: to answer OP's clarification: no (for the same reason I explained in the first paragraph - the "cron" job will be running in another worker than those serving your users), and if you don't overdo it, ie, don't go wild on parallel threads, you can even mildly parallelize the outgoing requests. But the latter more to be a good neighbour than because of fear to met down your own server.
I just tested it and it looks like the multi curl process running on WP's "cron" made no noticeable negative impact on the site's performance. I was able to load multiple other pages with no terrible lag on the site while the site was running the multi curl process. So looks like it's okay. And I also made sure that there is locking so that this process doesn't get scheduled multiple times. And besides, this process will only run once a day in U.S. low-peak hours. Thanks.
Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.