How to identify the bottlenecks with Xhprof?

How to identify the bottlenecks with Xhprof? - php

I have an issue with a very slow API call and want to find out, what it caused by, using Xhprof: the default GUI and the callgraph. How should this data be analyzed?
What is the approach to find the places in the code, that should be optimized, and especially the most expensive bottlenecks?

Of all those columns, focus on the one called "IWall%", column 5.
Notice that send, doRequest, read, and fgets each have 72% inclusive wall-clock time.
What that means is if you took 100 stack samples, each of those routines would find itself on 72 of them, give or take, and I suspect they would appear together.
(Your graph should show that too.)
So since the whole thing takes 23 seconds, that means about 17 seconds are spent simply reading.
The only way you can reduce that 17 seconds is if you can find that some of the reading is unnecessary. Can you?
What about the remaining 28% (6 seconds)?
First, is it worth it?
Even if you could reduce that to zero (17 seconds total, which you can't), the speedup factor would 1/(1-0.28) = 1.39, or 39%.
If you could reduce it by half (20 seconds total), it would be 1/(1-0.14) = 1.16, or 16%.
20 seconds versus 23, it's up to you to decide if it's worth the trouble.
If you decide it is, I recommend the random pausing method, because it doesn't flood you with noise.
It gets right to the heart of the matter, not only telling you which routines, but which lines of code, and why they are being executed.
(The why is most important, because you can't replace it if it's absolutely necessary.
With profilers, you tend to assume it is necessary, because you have no way to tell otherwise.)
Since you are looking for something taking about 14% of the time, you're going to have to examine 2/0.14 = 14 samples, on average, to see it twice, and that will tell you what it is.
Keep in mind that about 14 * 0.72 = 10 of those samples will land in fgets (and all its callers), so you can either ignore those or use them to make sure all that I/O is really necessary.
(For example, is it just possible that you're reading things twice, for some obscure reason like it was easier to do that way? I've seen that.)

Related

What is the difference between revolutions and iterations in phpbench?

I already read the documentation, but when testing, I'm still not able to understand well the difference between them.
For example, with this simple file:
<?php
class StackOverflowBench
{
public function benchNothing()
{
}
}
When I set 1000 revolutions, and only one iteration, here is my result:
subject
set
revs
its
mem_peak
best
mean
mode
worst
stdev
rstdev
diff
benchNothing
0
10000
1
2,032,328b
10.052μs
10.052μs
10.052μs
10.052μs
0.000μs
0.00%
1.00x
the best, mean, mode and worst are always the same, which means they are based on the only iteration I made.
When I run it with 10 revolutions and still 1 iteration, I have this:
subject
set
revs
its
mem_peak
best
mean
mode
worst
stdev
rstdev
diff
benchNothing
0
10
1
2,032,328b
10.200μs
10.200μs
10.200μs
10.200μs
0.000μs
0.00%
1.00x
which seems to mean the times calculated are not a sum of all the revolutions, but something like an average for each iteration.
If I wanted to measure the best and worst execution time of each time the method is executed, I'd try 1000 iterations and only 1 revolution each, but it takes waay to much time. I launched it with 100 iterations of 1 revolution, here's the result :
subject
set
revs
its
mem_peak
best
mean
mode
worst
stdev
rstdev
diff
benchNothing
0
1
100
2,032,328b
20.000μs
25.920μs
25.196μs
79.000μs
5.567μs
21.48%
1.00x
This time, the time seems to be at least twice as long, and I'm wondering what I didn't understand well. I may be using these informations badly (I know my last example is a wrong one).
Is it necessary to measure the best and worst of each revolution, like I want to do ?
What are the interests of iterations ?

Revolution vs iteration
Let's take your example class:
class StackOverflowBench
{
public function benchNothing()
{
}
}
If you have 100 revolutions and 3 iterations, this is the pseudo code that will be run:
// Iterations
for($i = 0; $i < 3; $i++){
// Reset memory stats code here...
// Start timer for iteration...
// Create instance
$obj = new StackOverflowBench();
// Revolutions
for($j = 0; $j < 100; $j++){
$obj->benchNothing();
}
// Stop timer...
// Call `memory_get_usage` to get memory stats
}
What does the report mean?
Almost all of the calculated stats (mem_peak, best, mean, mode, worst, stdev and rstdev) in the output are based on individual iterations and are documented here.
The diff stat is the weird one and document here and mentioned elsewhere as:
the percentage difference from the lowest measurement
When you run a test, you can specify what column to report the difference on. So if you diff_column on run time, if iteration #1 takes 10 seconds and #2 takes 20 seconds, the diff for #1 would be 1.00 (since it is the lowest) and #2 would be 2.00 since it took twice as long. (Actually, I'm not 100% sure that is the exact usage of that column
Measuring revolution vs iteration
Some code needs to be run thousands or millions of times in a task/request/action/etc. which is why revolutions exist. If I run a simple but critical block of code just once, a report might tell me it takes 0.000 seconds which isn't helpful. That's why some blocks of code need to have their revolution count kicked up to get a rough idea, based on possible real-world usage, how they perform under load. Array sorting algorithms are great examples of a tightly-coupled call that will happen a lot in a single request.
Other code might only do a single thing, such as making an API or database request, and for those blocks of code we need to know how much system resources will they take up as a whole. So if I make a database call and consume 2MB of, and I'm expecting to have 1,000 concurrent users, those calls could take up 2GB of memory. (I'm simplifying but you should get the gist.)
If you look at my pseudo code above, you'll see that setting up each iteration is more expensive than each revolution. The revolution basically just invokes a method, but the iteration calculates memory and does instantiation-related work.
So, to your second-to-last question:
Is it necessary to measure the best and worst of each revolution, like I want to do?
Probably not, although there are tools out there that will tell you. You could for instance, find out how much memory was used before a method and after to determine if your code is sub-optimal, but you can also do that with PHPBench by making a 1 iteration, 1 revolution run and looking for methods with high memory.
I'd further say that if you have code that has great variance per revolution, it is almost 100% related to IO factors and not code, or it is related to the test dataset, and most probably size.
You should hopefully know all of your IO-related paths, so benchmarking the various problems related to those paths really isn't a factor of this tool.
For dataset-related problems, however, that is interesting and is a case where you'd want to know each run potentially. There, too, however, the measurements are there to know either how to fix/change your code, or to know that your code runs with a certain time complexity.

Truncate last X digits of number without division by 10eX

I'm making a blocking algorithm, and I just realised that adding a timeout to such algorithm is not so easy if it should remain precise.
Adding timeout means, that the blocking algorithm should abort after X ms if not earlier. Now I seem to have two options:
Iterating time (has mistake, but is fast)
Check blocking condition
Iterate time_elapsed by 1 (which means 1e-6 sec with use of usleep)
Compare time_elapsed with timeout. (here is the problem I will talk about)
usleep(1)
Getting system time every iteration (slow, but precise)
I know how to do this, please do not post any answers about that.
Compating timeout with time_elapsed
And here is what bothers me. The timeout will be in milliseconds (10e-3) while usleep sleeps for 10e-6 seconds. So my time_elapsed will be 1000 times more precise than timeout. I want to truncate last three digits of time_elapsed (operation equal to floor($time_elapsed/1000) without dividing it. Division algorithm is too slow.
Summary
I want to make my variable 1000 times smaller without dividing it by 1000. I want just get rid of the data. In binary I'd use bit-shift operator, but have no idea how to apply it on decimal system.
Code sample:
Sometimes, when people on SO cannot answer the theoretical question, they really hunger for the code. Here it is:
floor($time_elapsed/1000);
I want to replace this code with something much faster. Please note that though the question itself is full of timeouts, the question title is only about truncating that data. Other users may find the solution useful for other purposes than timing.

Maybe this will help Php number format. Though this does cause rounding, if that is unacceptable then I don't think its possible because PHP is loosely typed that you cant define numbers with a particular level of precision.

try this:
(int)($time_elapsed*0.001)
this should be a lot faster

Possibilities to speed up PHP-CLI script?

I wrote a PHP-CLI script that mixes two audio (.WAV PCM) files (with some math involved) so PHP needs to crunch through thousands (if not even millions) of samples with unpack(), do math on them and save them with pack().
Now, I dont need actual info on how to do the mixing or anything, as the title says, I'm looking for possibilites to speed this process up since the script needs 30 seconds of processing time to produce 10 seconds of audio output.
Things that I tried:
Cache the audiofiles to memory and crunch through with substr() instead of fseek()/fread(). Performance gain: 3 seconds.
Write the output file in 5000-samples chunks. Performance gain: 10 seconds.
After those optimizations I ended up at approximately 17 seconds processing time for 10 seconds audio output. What bugs me, is that other tools can do simple audio operations like mixing two files in realtime or even much faster.
Another idea I had was paralellization, but I refrained from that due to the extra problems that would occur (like calculating correct seek positions for the forks/threads and other related things).
So am I missing stuff out or is this actually good performance for a PHP-CLI script?

Thanks for everyone's input on this one.
I rewrote the thing in C++ and can now perform the above actions in less than a second.
I'd never have thought that the speed difference is that huge (compiled application is ~40X faster).

What is the performace of PHPs strtotime()?

I am doing some large timestamp-list iterations: Putting them in tables with date-ranges, and grouping them by ranges.
In order to do that, I found strtotime() a very helpfull function, but I am worried about its performance.
For example, a function that loops over a list of weeks (say, week 49 to 05) and has to decide the beginning of the week and the timestamp at the end of that week. A usefull way to do that, would be:
foreach ($this->weeks($first, $amount) as $starts_at) {
$ends_at = strtotime('+1 week', $starts_at);
$groups[$week_key] = $this->slice($timestamps, $starts_at, $ends_at);
}
//$this->weeks returns a list of timestamps at which each week starts.
//$this->slice is a simple helper that returns only the timestamps within a range, from a list of timestamps.
Instead of strtotime(), I could, potentially, find out the amount of seconds between begin and end of the week, 99% of the times that would be 24 * 60 * 60 * 7. But in these rare cases where there is a DST-switch, that 24 should either be 23 or 25. Code to sort that out, will probably be a lot slower then strtotime(), not?
I use the same patterns for ranges of years, months (months, being very inconsistent!), days and hours. Only with hours would I suspect simply adding 3600 to the timestamp is faster.
Any other gotcha's? Are there ways (that do not depend on PHP5.3!) that offer better routes for consistent, DST and leap-year safe dateranges?

Why are you worried about its performance? Do you have evidence that it's slowing down your system? If not, don't try to over-complicate the solution for unnecessary reasons. Remember that premature optimization is the root of all evil. Write readable code that makes sense, and only optimize if you KNOW it's going to be an issue...
But something else to consider is that it's also compiled C code, so it should be quite efficient for what it does. You MIGHT be able to build a sub-set of the code in PHP land and make it faster, but it's going to be a difficult job (due to all the overhead involved in PHP code).
Like I said before, use it until you prove it's a problem, then fix that problem. Don't forget re-writing it for you needs isn't free either. It takes time and introduces bugs. Is it worth it if the gain is minimal (meaning it wasn't a performance problem to begin with). So don't bother trying to micro-optimize unless you KNOW it's a problem...

I know this probably isn't the answer you're looking for, but your best bet is profiling it with a real use case in mind.
My instinct is that, as you think, strtotime will be slower. But even if it's, say, 3 times slower, this is only meaningful in context. Maybe your routine, with real data, takes 60 ms using strtotime, so in most cases, you'd be just saving 40 ms (I totally made up these numbers, but you get the idea). So, you might find out that optimising this wouldn't really pay off (considering you're opening your code to more potential bugs and you'll have to invest more time to get it right).
By the way, if you have good profiling tools, awesome, but even if you don't comparing timestamps should give you a rough idea.

To respond to the question, finaly :
Based on many benchmarks like this one: https://en.code-bude.net/2013/12/19/benchmark-strtotime-vs-datetime-vs-gettimestamp-in-php/
We can see that strtotime() is more effective that we can think.
So yes, to convert a string to a timestamp, the strtotime function has pretty good performance.

Very interesting question. I'd say that the only way you can really figure this out, is to set up your own performance test. Observe the value of microtime() at the beginning and end of the script, to determine performance. Run a ridiculous number of values through a loop with one method, then the other method. Compare times.

php memory how much is too much

I'm currently re-writing my site using my own framework (it's very simple and does exactly what I need, i've no need for something like Zend or Cake PHP). I've done alot of work in making sure everything is cached properly, caching pages in files so avoid sql queries and generally limiting the number of sql queries.
Overall it looks like it's very speedy. The average time taken for the front page (taken over 100 times) is 0.046152 microseconds.
But one thing i'm not sure about is whether i've done enough to reduce php memory usage. The only time i've ever encountered problems with it is when uploading large files.
Using memory_get_peak_usage(TRUE), which I THINK returns the highest amount of memory used whilst the script has been running, the average (taken over 100 times) is 1572864 bytes.
Is that good?
I realise you don't know what it is i'm doing (it's rather simple, get the 10 latest articles, the comment count for each, get the user controls, popular tags in the sidebar etc). But would you be at all worried with a script using that sort of memory getting hit 50,000 times a day? Or once every second at peak times?
I realise that this is a very open ended question. Hopefully you can understand that it's a bit of a stab in the dark and i'm really just looking for some re-assurance that it's not going to die horribly come re-launch day.
EDIT: Just an mini experiment I did for myself. I downloaded and installed Wordpress and a default installation with no extra add ons, just one user and just one post and it used 10.5 megabytes of memory or "11010048 bytes". Quite pleased with my 1.5mb now.

Memory usage values can vary heavily and are subject to fluctuation, but as you already say in your update, a regular WordPress instance is much, much fatter than that. I have had great troubles to get the WordPress backend running with a memory_limit of sixteen megabytes - let alone when Plug-ins come into play. So from that, I'd say a peak of 1,5 Megabytes performing normal tasks is quite okay.
Generation time is extremely subject to the hardware your site runs on, obviously. However, a generation time of 0.046152 seconds (I assume you mean seconds here) sounds very okay to me under normal circumstances.

It is a subjective question. PHP has a lot of overhead and when calling the function with TRUE, that overhead will be included. You'll see what I mean when you call the function in a simple Hello World script. Also keep in mind that results can differ greatly depending on whether PHP is run as an apache module or FastCGI.
Unfortunately, no one can provide assurances. There will always be unforseen variables that can bring down a site. Perform load testing. Use a code profiler to narrow down the location of any bottlenecks to see if there are ways to make those code blocks more efficient
Encyclopaedia Britannica thought they were prepared when they launched their ad-supported encyclopedia ten years ago. The developers didn't know they would be announcing it on Good Morning America the day of the launch. The whole thing came crashing down for days.

As long as your systems aren't swapping, your memory usage is reasonable. Any additional concern is just premature optimization.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.