How to profile and approach a slow PHP application

How to profile and approach a slow PHP application - php

I have an Enterprise PHP application hosted on RHEL 5.5. It works with MySQL and perl scripts.
It is causing regular CPU and memory spikes. I can see httpd and MySQL processes in top command output.
I know I can profile individual php scripts. But is there a way which can give me statistics about how many web hits my application got, which script got called with that arguments and what was its execution time?
I intend to start refactoring and optimizing the top 10 scripts that shows up in the result, till the results become acceptable.

Your first port of call is your web server logs. You should be able to correlate the spikes in CPU usage with urls.
Consider using a log analyser such as webaliser which can extract lots of useful usage data from the apache logs.

For webserver statistics you can always use awstats it is a great handy tool for statistical information about no of hits,dynamic reports etc. http://awstats.sourceforge.net/
Thanks & Regards,
Alok Thaker

Related

Debugging potential network bottleneck on AJAX calls

I've written some JS scripts on my school's VLE.
It uses the UWA Widget Format and to communicate with a locally-hosted PHP script, it uses a proxy and AJAX requests.
Recently we've moved the aforementioned locally-hosted server from a horrible XP-based WAMP server to a virtual Server 2008 distribution running IIS and FastCGI PHP.
Since then - or maybe it was before and I just didn't notice - my AJAX calls are starting to take in excess of 1 second to run.
I've run the associated PHP script's queries on PHPMyAdmin and, for example, the associated getCategories SQL takes 0.00023s to run so I don't think the problem lies there.
I've pinged the server and it consistently returns <1ms as it should for a local network server on a relatively small scale network. The VLE is on this same network.
My question is this: what steps can I take to determine where the "bottleneck" might be?

First of all, test how long your script is actually running:
Simplest way to profile a PHP script
Secondly, you should check the disk activity on the server. If it is running too many FastCGI processes for the amount of available RAM, it will swap and it will be very slow. If the disk activity is very high, then you know you've found your culprit. Solve it by reducing the maximum number of fastcgi processes or by increasing the amount of server RAM.

PHP script that works forever :)

I'm looking for some ideas to do the following. I need a PHP script to perform certain action for quite a long time. This is an extension for a CMS and this can't be anything else but PHP. It also can't be a command line script because it should be used by common people that will have only the standard means of the CMS. One of the options is having a cron job (most simple hostings have it) that will trigger the script often so that instead of working for a long time it could perform the action step by step preserving its state from one launch to the next one. This is not perfect but I can't see of any other solutions. If the script will be redirecting to itself server will interrupt it. What other options can suit?
Thanks everyone in advance!

What you're talking about is a daemon or long running program that waits for calls by client programs, performs and action, provides a response then keeps on waiting for more calls.
You might be familiar w/ these in the form of Apache & MySQL ;) Anyway PHP is generally OK in this regard, it does have the ability to function over raw sockets as well as fork sub-processes to handle multiple requests simultaneously.
Having said that PHP daemons are a tool where YMMV. Some folks will say they work great, other folks like me will say they have issues w/ interprocess communication and leaking memory even amidst plethora unset() calls.
Anyway you likely won't be able to deploy a daemon of any type on a shared hosting environment. You'll need to get a better server package or stick with a Cron based solution.
Here's a link about writing a PHP daemon.
Also, one more note. Daemons do crash from time to time and therefore you may still need to store state about whats going on, just in case someone trips over the power cord to your shared server :)

I would also suggest that you think about making it a daemon but if not then you can simply use
set_time_limit(0);
ignore_user_abort(true);
at the top to tell it not to time out and not to get interrupted by anything. Then call it from the cron to start it every day or whatever. I have this on many long processing daily tasks and it works great for me. However, it won't be able to easily talk to the outside world (other scripts can't query it or anything -- if that is what you want look into php services) so once you get it running make sure it will stop and have it print its progress to a logfile.

Is there a way to find out which PHP pages are taking more resources in a linux server?

My linux server websites keep going down again and again but SSH, FTP, etc are alive. So I had a look at the server through SSH and used top command which lists all the processes. It shows that when some PHP pages are executed, mysql CPU usage reaches 100%. So is there any command/log which can be used to find out which PHP pages are taking up so much of mysql usage? Thank you...

You may want to take a look at your Apache log format to see if it includes the %D parameter as this indicates the amount of time taken to to serve a request in microseconds.
If you exclude anything but requests to PHP scripts, you should get an idea of which scripts are taking the longest suggesting high execution time. Obviously this could also mean a very large response payload...

There are multiple aspects to resource consumption.
As mobius mentioned, you can use SHOW FULL PROCESSLIST in MySQL to see what is currently running. Look at the processed taking longer than you would expect and check out the query to find hints about where it originates in your application.
The problem may not be with the application. It might simply be a matter of tuning MySQL, which will be about adding or changing indexes most of the time. EXPLAIN is the command that will you help analyze the execution plan MySQL decided to use. Reading EXPLAIN takes some practice. The best reference I have is High Performance MySQL.
You can also use the MySQL slow query log to get information about the slow queries happening when you are not in front of the server.
If MySQL is running at 100%, you will probably find the problem from there. If you really want to track the usage from PHP, you can set up XHProf, a high performance profiler created by Facebook to run on production sites. You can set it up to sample one request out of 100 and get a bigger picture of the performance of your site. There are a few articles out there that explain how to set it up.
Finally, XDebug and KCacheGrind can be used in development to profile one request at a time.

If MySQL is getting stuck at 100% then you've probably got some badly tuned MySQL queries inside one of your PHP applications. This time will clock up in the MySQL daemon and so won't show up in the %D value. This could be indexes out of date.
If you have access to the D/B through at the command prompt through SSH then you could try doing an ANALYZE TABLE and OPTIMIZE TABLE on any large tables. Also look at "The Slow Query Log" in the MySQL documentation.
Unfortunately fixing this will probably need you to get into the Application internals.

mytop - http://jeremy.zawodny.com/mysql/mytop/ (SHOW FULL PROCESSLIST on your mySQL)
Xdebug Profiler - http://xdebug.org/docs/profiler

jmeter multiple users problem

We are using Jmeter to test our Php application running on the Apache 2 web server. I can load up Jmeter to use 25 or 50 threads and the load on the server does not increase, however the response time from the server does. The more threads the slower the response time. It seems like Jmeter or Apache is queuing the requests. I have changed the maxclients value in apache web server configuration file, but this does not change the problem. While Jmeter is running I can use the application and get respectable response times. What gives? I would expect to be able to tax my server down to 0% idle by increase the number of threads. Can anyone help point me in the right direction?
Update: I found that if I remove sessions from my application I am able to simulate a full load on the server. I have tried to re-enable sessions and use an HTTP Cookie Manager for each thread, but it does not seem to make an impact.

You need to identify where the bottleneck is occurring, and then attempt to remediate the problem.
The JMeter client should be running on a well equipted machine. I prefer a Solaris/Unix server running the JVM, but for <200 threads, a modern windows machine will do just fine. JMeter can become a bottleneck, and you won't get any meaningful results once it does. Additionally, it should run on a separate machine to what your testing, and preferable on the same network. The WAN latency can become a problem if your test rig and server are far apart.
The second thing to check is your Apache workers. Apache has a module - mod_status - which will show you the state of every worker. It's possible to have your pool size set too low. From the mod_status, you'll be able to see how many workers are in use. To few, and Apache won't have any workers to process requests, and the requests will queue up. Too many, and Apache may exhaust the memory on the box it's running on.
Next, you should check your database. If it's on a separate machine, the database could have an IO or CPU shortage.
If your hitting a bottleneck, and the server and db are on the same machine, you'll generally hit a CPU, RAM, or IO limit. I listed those in the order in which they are easiest to identify. If you get a CPU bound app, you can easily see you CPU usage go to 100%. If you run out of RAM, your machine will start swapping. On both Windows and unix it's fairly easy to see your available free RAM. Lastly, you may be IO bound. This too can be monitored using various tools or stats, but it's not as obvious as CPU.
Lastly, specifically to your question, the one thing that stands out is it's possible to have a huge number of session files stored in a single directory. Often PHP stores session information in files. If this directory gets large, it will take increasingly long amount of time for PHP to find the session. If you ran your test will cookies turned off, the PHP app may have created thousands of session files for each user request. On a Windows server, it will slow down faster than on a unix server, do to differences in the way directories are stored on the two operating systems.

Are you using a constant throughput timer? If Jmeter can't service the throughput with the threads allocated to it, you'll see this queueing and blowouts in the response time. To figure out if this is the problem, try adding more threads.
I also found a report of this happening when there are javascript calls inside the script. In this instance, try to move javascript calls to the test plan element at the top of the script, or look for ways to pre-calculate the value.

Try checking a static file served by apache and not by PHP to see if the problem is in the Apache config or the PHP config.
Also check your network connections and configuration. Our JMeter testing was progressing nicely until it hit a wall. Eventually realized we only had a 100Mb connection and it was saturated, going to gigabit fixed it. Your network cards or switch may be running at a lower speed than you think, especially if their speed setting is "auto".

Multithreaded Programming in PHP to avoid runtime limitations

I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?

If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.

By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.

Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.

Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.