How to debug, and protect against, infinite loops in PHP? - php

I recently ran up against a problem that challenged my programming abilities, and it was a very accidental infinite loop. I had rewritten some code to dry it up and changed a function that was being repeatedly called by the exact methods it called; an elementary issue, certainly. Apache decided to solve the problem by crashing, and the log noted nothing but "spawned child process". The problem was that I never actually finished debugging the issue that day, it cropped up in the afternoon, and had to solve it today.
In the end, my solution was simple: log the logic manually and see what happened. The problem was immediately apparent when I had a log file consisting of two unique lines, followed by two lines that were repeated some two hundred times apiece.
What are some ways to protect ourselves against infinite loops? And, when that fails (and it will), what's the fastest way to track it down? Is it indeed the log file that is most effective, or something else?
Your answer could be language agnostic if it were a best practice, but I'd rather stick with PHP specific techniques and code.

You could use a debugger such as xdebug, and walk through your code that you suspect contains the infinite loop.
Xdebug - Debugger and Profiler Tool for PHP
You can also set
max_execution_time
to limit the time the infinite loop will burn before crashing.

I sometimes find the safest method is to incorporate a limit check in the loop. The type of loop construct doesn't matter. In this example I chose a 'while' statement:
$max_loop_iterations = 10000;
$i=0;
$test=true;
while ($test) {
if ($i++ == $max_loop_iterations) {
echo "too many iterations...";
break;
}
...
}
A reasonable value for $max_loop_iterations might be based upon:
a constant value set at runtime
a computed value based upon the size of an input
or perhaps a computed value based upon relative runtime speed
Hope this helps,
- N

Unit tests might be a good idea, too. You might want to try PHPUnit.

can you trace execution and dump out a call graph? infinite loops will be obvious, and you can easily pick out the ones that are on purpose (a huge mainloop) vs an accident (local abababa... loop or loop that never returns back to mainloop)
i've heard of software that does this for large complex programs, but i couldn't find a screenshot for you. Someone traced a program like 3dsmax and dumped out a really pretty call graph.

write simple code, so bugs will be more apparent? I wonder if your infinite loop is due to some ridiculous yarn-ball of control structures that no human being can possibly understand. Thus of course you f'ed it up.
all infinite loops aren't bad (e.g. while (!quit)).

Related

Does extensive use of php echo statement make page load times slower?

I know this question has been asked before but I haven't been able to find a definitive answer.
Does the overly use of the echo statement slow down end user load times?
By having more echo statements in the file the file size increases so I know this would be a factor. Correct me if I'm wrong.
I know after some research that using php's ob_start() function along with upping Apaches SendBufferSize can help decrease load times, but from what I understand this is more of decrease in php execution time by allowing php to finish/exit sooner, which in turn allows Apache to exit sooner.
With that being said, php does exit sooner, but does that mean php actually took less time to execute and in turn speed things up on the end user side ?
To be clear, what I mean by this is if I had 2 files, same content, and one made use of the echo statement for every html tag and the other file used the standard method of breaking in and out of php, aside for the difference in file size from the "overly" use of the echo statement (within reason I'm guessing?), which one would be faster? Or would there really not be any difference?
Maybe I'm going about this or looking at this wrong?
Edit: I have done a bit of checking around and found a way to create a stop watch to check execution time of a script and seems to work quit well. If anybody is interested in doing the same here is the link to the method I have chosen to use for now.
http://www.phpjabbers.com/measuring-php-page-load-time-php17.html
Does the overly use of the echo statement slow down end user load times?
No.
By having more echo statements in the file the file size increases so I know this would be a factor. Correct me if I'm wrong.
You are wrong.
does that mean php actually took less time to execute and in turn speed things up on the end user side?
No.
Or would there really not be any difference?
Yes.
Maybe I'm going about this or looking at this wrong?
Definitely.
There is a common problem with performance related questions.
Most of them coming up not from the real needs but out of imagination.
While one have to solve only real problems, not imaginable ones.
This is not an issue.
You are overthinking things.
This is an old question, but the problem with the logic presented here is it assumes that “More commands equals slower performance…” when—in terms of modern programming and modern systems—this is an utterly irrelevant issue. These concerns are only of concerns of someone who—for some reason—programs at an extremely low level in something like assembler and such,.
The reason why is there might be a slowdown… But nothing anyone would ever humanly be able to perceive. Such as a slowdown of such a small fraction of a second that the any effort you make to optimize that code would not result in anything worth anything.
That said, speed and performance should always be a concern when programming, but not in terms of how many of a command you use.
As someone who uses PHP with echo statements, I would recommend that you organize your code for readability. A pile of echo statements is simply hard to read and edit. Depending on your needs you should concatenate the contents of those echo statements into a string that you then echo later on.
Or—a nice technique I use—is to create an array of values I need to echo and then run echo implode('', $some_array); instead.
The benefit of an array over string concatenation is it’s naturally easier to understand that some_array[] = 'Hello!'; will be a new addition to that array where something like $some_string .= 'Hello!'; might seem simple but it might be confusing to debug when you have tons of concatenation happening.
But at the end of the day, clean code that is easy to read is more important to all involved than shaving fractions of a second off of a process. If you are a modern programmer, program with an eye towards readability as a first draft and then—if necessary—think about optimizing that code.
Do not worry about having 10 or 100 calls to echo. When optimizing these shouldn't be even take in consideration.
Think that on a normal server you can run an echo simple call faster than 1/100,000 part of a second.
Always worry about code readability and maintenance than those X extra echo calls.
Didn't made any benchmark. All I can say is, in fact when your echo strings (HTML or not) and use double quotes (") it's slower than single quotes (').
For strings with double quotes PHP has to parse those strings. You could know the possibility to get variables inside of strings by just insert them into your string:
echo "you're $age years old!";
PHP has to parse your string to lookup those variables and automatically replace them. When you're sure, you don't have any variables inside your string use single quotes.
Hope this would help you.
Even when you use a bunch of echo calls, I don't think it would slow down your loading time. Loading time depends on reaction time of your server and execution time. When your loading time would be to high for the given task, check the whole code not only the possibility of echoes could slow down your server. I think there would be something wrong inside your code.

PHP efficiency question

I am working on website and I am trying to make it fast as much as possible - especially the small things that can make my site a little bit quicker.
So, my to my question - I got loop that run 5 times and in each time it echo something, If I'll make variable and the loop will add the text I want to echo into the variable and just in the end I'll echo the variable - will it be faster?
loop 1 (with the echo inside the loop)
for ($i = 0;$i < 5;$i++)
{
echo "test";
}
loop 2 (with the echo outside [when the loop finish])
$echostr = "";
for ($i = 0;$i < 5;$i++)
{
$echostr .= "test";
}
echo $echostr;
I know that loop 2 will increase a bit the file size and therfore the user will have to download more bytes but If I got huge loop will it be better to use second loop or not?
Thanks.
The difference is negligible. Do whatever is more readable (which in this case is definitely the first case). The first approach is not a "naive" approach so there will be no major performance difference (it may actually be faster, I'm not sure). The first approach will also use less memory. Also, in many languages (not sure about PHP), appending to strings is expensive, and therefore so is concatenation (because you have to seek to the end of the string, reallocate memory, etc.).
Moreover, file size does not matter because PHP is entirely server-side -- the user never has to download your script (in fact, it would be scary if they did/could). These types of things may matter in Javascript but not in PHP.
Long story short -- don't write code constantly trying to make micro-optimizations like this. Write the code in the style that is most readable and idiomatic, test to see if performance is good, and if performance is bad then profile and rewrite the sections that perform poorly.
I'll end on a quote:
"premature emphasis on efficiency is a big mistake which may well be the source of most programming complexity and grief."
- Donald Knuth
This is a classic case of premature optimization. The performance difference is negligible.
I'd say that in general you're better off constructing a string and echoing it at the end, but because it leads to cleaner code (side effects are bad, mkay?) not because of performance.
If you optimize like this, from the ground up, you're at risk of obfuscating your code for no perceptable benefit. If you really want your script to be as fast as possible then profile it to find out where the real bottlenecks are.
Someone else mentioned that using string concatenation instead of an immediate echo will use more memory. This isn't true unless the size of the string exceeds the size of output buffer. In any case to actually echo immediately you'd need to call flush() (perhaps preceded by ob_flush()) which adds the overhead of a function call*. The web server may still keep its own buffer which would thwart this anyway.
If you're spending a lot of time on each iteration of the loop then it may make sense to echo and flush early so the user isn't kept waiting for the next iteration, but that would be an entirely different question.
Also, the size of the PHP file has no effect on the user - it may take marginally longer to parse but that would be negated by using an opcode cache like APC anyway.
To sum up, while it may be marginally faster to echo each iteration, depending on circumstance, it makes the code harder to maintain (think Wordpress) and it's most likely that your time for optimization would be better spent elsewhere.
* If you're genuinely worried about this level of optimization then a function call isn't to be sniffed at. Flushing in pieces also implies extra protocol overhead.
The size of your PHP file does not increase the size of the download by the user. The output of the PHP file is all that matters to the user.
Generally, you want to do the first option: echo as soon as you have the data. Assuming you are not using output buffering, this means that the user can stream the data while your PHP script is still executing.
The user does not download the PHP file, but only its output, so the second loop has no effect on the user's download size.
It's best not to worry about small optimizations, but instead focus on quickly delivering working software. However, if you want to improve the performance of your site, Yahoo! has done some excellent research: developer.yahoo.com/performance/rules.html
The code you identify as "loop 2" wouldn't be any larger of a file size for users to download. String concatination is faster than calling a function like echo so I'd go with loop 2. For only 5 iterations of a loop I don't think it really matters all that much.
Overall, there are a lot of other areas to focus on such as compiling PHP instead of running it as a scripted language.
http://phplens.com/lens/php-book/optimizing-debugging-php.php
Your first example would, in theory, be fastest. Because your provided code is so extremely simplistic, I doubt any performance increase over your second example would be noticed or even useful.
In your first example the only variable PHP needs to initialize and utilize is $i.
In your second example PHP must first create an empty string variable. Then create the loop and its variable, $i. Then append the text to $echostr and then finally echo $echostr.

Best approach for running an "endless" process monitoring MySQL?

I have a process that has to be ran against certain things and it isn't suitable to be ran at the users end (15+ seconds to process) so I considered using a cron job but again, this is also unsuitable because it will create a back log. I have narrowed my options down to either running an endless process that monitors for mysql changes, or configuring mysql to trigger the script when it detects a change but the latter is not something I want to get into unless it's my only option, which leaves me with the "endless" monitoring option.
The sort of thing I'm considering with PHP is:
while (true) {
$db->query('SELECT * FROM database');
while($row = $db->fetch_assoc()){
// do the stuff here
}
sleep(5);
}
and then running it via the command line. Now this is theoretically sound but in practice it isn't doing as well as I hoped, using more resources than I would expect (but not out of my range, just not what I'm aiming for optimally). So my questions are as follows:
Is PHP the wrong language to do this in? PHP is what I work with, but I understand that there are times when it's the wrong choice and I think maybe this is. If it is, what language should I use?
Is there a better approach that I haven't considered and that isn't any of the ideas I have listed?
If PHP is the correct option, how can I optimise the code I posted, is there a method better than sleeping for 5 seconds after each completed operation?
Thanks in advance! I'm open to any ideas as long as they're not too far out there, I'm running my own server with free reign so there's no theoretical limit on what I can do.
I recommend moving the loop out into a shell script and then executing a new PHP process for every iteration. This way PHP will never use unbounded resources (even if there is a memory/connection leak somewhere) since the process is terminated on each iteration. Something like the following should be fine (Bash):
while true; do
php /path/to/your/script.php 2>&1 | logger ...(logger options)
sleep 5
done
I've found this approach to be far more robust for long-running scripts in PHP, probably because this is very like the way PHP operates when run as a CGI script.
You should always work with the language you're most familiar with. If this is PHP, then it's not a wrong choice.
Disconnect from the database before sleeping. This way your script won't keep a connection reserved, and it will work fine even after database restart.
Free mysql result after using it. Always check for error conditions in daemonized processes, and deal with them appropriately.
PHP might be the wrong language as it's really designed for serving requests on an ad-hoc basis, rather than creating long-running daemons. (It was originally created as a preprocessor language, then later on came into general use as a web application language.)
Something like Python might work better for your needs; it's a little more naturally designed for "daemon-like" programs.
That said, it is possible to do what you want in PHP.
what kind of problems are you experiencing?
i dont know about the database class you have there in $db, but it could generate a memory leak.
furthermore i would suggest closing all your connections and unsetting all your variables if necessary at the end of the loop and re open on the beginning!
if its only 5 second sleep maby only on every 10th interation or something. you can do a counter for that...
theese points considered theres nothing wrong with this approach.

functions vs repeated code

I am writing some PHP code to create PDFs using the FPDF library. And I basically use the same 4 lines of code to print every line of the document. I was wondering which is more efficient, repeating these 4 lines over and over, or would making it into a function be better? I'm curious because it feels like a function would have a larger overhead becuse the function would only be 4 lines long.
The code I am questioning looks like this:
$pdf->checkIfPageBreakNeeded($lineheight * 2, true);
$text = ' label';
$pdf->MultiCell(0, $lineheight, $text, 1, 'L', 1);
$text = $valueFromForm;
$pdf->MultiCell(0, $lineheight, $text, 1, 'L');
$pdf->Ln();
This should answer it:
http://en.wikipedia.org/wiki/Don%27t_repeat_yourself
and
http://www.codinghorror.com/blog/2007/03/curlys-law-do-one-thing.html
Curly's Law, Do One Thing, is
reflected in several core principles
of modern software development:
Don't Repeat Yourself
If you have more than one way to express the same thing, at some point
the two or three different
representations will most likely fall
out of step with each other. Even if
they don't, you're guaranteeing
yourself the headache of maintaining
them in parallel whenever a change
occurs. And change will occur. Don't
repeat yourself is important if you
want flexible and maintainable
software.
Once and Only Once
Each and every declaration of behavior should occur once, and only
once. This is one of the main goals,
if not the main goal, when refactoring
code. The design goal is to eliminate
duplicated declarations of behavior,
typically by merging them or replacing
multiple similar implementations with
a unifying abstraction.
Single Point of Truth
Repetition leads to inconsistency and code that is subtly
broken, because you changed only some
repetitions when you needed to change
all of them. Often, it also means that
you haven't properly thought through
the organization of your code. Any
time you see duplicate code, that's a
danger sign. Complexity is a cost;
don't pay it twice.
Rather than asking yourself which is more efficient you should instead ask yourself which is more maintainable.
Writing a function is far more maintainable.
I'm curious because it feels like a
function would have a larger overhead
becuse the function would only be 4
lines long.
This is where spaghetti comes from.
Defininely encapsulate it into a function and call it. The overhead that you fear is the worst kind of premature optimization.
DRY - Don't Repeat Yourself.
Make it a function. Function call overhead is pretty small these days. In general you'll be able to save far more time by finding better high-level algorithms than fiddling with such low-level details. And making and keeping it correct is far easier with such a function. For what shall it profit a man, if he shall gain a little speed, and lose his program's correctness?
A function is certainly preferable, especially if you have to go back later to make a change.
Don't worry about overhead; worry about yourself, a year in the future, trying to debug this.
In the light of the above, Don't Repeat Yourself and make a tiny function.
In addition to all the valuable answers about the far more important topic of maintainability; I'd like to add a little something on the question of overhead.
I don't understand why you fear that a four line function would have a greater overhead.
In a compiled language, a good compiler would probably be able to inline it anyway, if appropriate.
In an interpreted language (such as PHP) the interpreter has to parse all of this repeated code each time it is encountered, at runtime. To me, that suggests that repetition might carry an even greater overhead than a function call.
Worrying about function call overhead here is ghastly premature optimisation. In matters like this, the only way to really know which is faster, is to profile it.
Make it work, make it right, make it fast. In that order.
The overhead is actually very small and wont be causing a big difference in your application.
Would u rather these small overhead, but have a easier program to maintain, or u want to save the mere millisecond but take hours to correct small changes which are repeated.
If you ask me or other developer out there, we definitely want the 1st option.
So go on with the function. U may not be maintaining the code today, but when u do, u will hate yourself for trying to save that mere milliseconds

Reusing MySQL results

I'm having somewhat theoretical question: I'm designing my own CMS/app-framework (as many PHP programmers on various levels did before... and always will) to either make production-ready solution or develop various modules/plugins that I'll use later.
Anyway, I'm thinking on gathering SQL connections from whole app and then run them on one place:
index.php:
<?php
include ('latestposts.php');
include ('sidebar.php');
?>
latestposts.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
sidebar.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
Now, while whole module system application is yet-to-be-figured, it's idea is already floating somewhere in my brain. However, I'm thinking, if I'm able to first load all gather_data functions, then run sql and then run draw functions - and if I'm able to reuse results!
If, in example, $sql is SELECT * FROM POSTS LIMIT 10 and $sql2 is SELECT * FROM POSTS LIMIT 5, is it possible to program PHP to see: "ah, it's the same SQL, I'll call it just once and reuse the first 5 rows"?
Or is it possible to add this behavior to some DRM?
However, as tags say, this is still just an idea in progress. If it proves to be easy to accomplish, then I will post more question how :)
So, basically: Is it possible, does it make sense? If both are yes, then... any ideas how?
Don't get me wrong, that sounds like a plausible idea and you can probably get it running. But I wonder if it is really going to be beneficial. Will it cause a system to be faster? Give you more control? Make development easier?
I would just look into using (or building) a system using well practiced MVC style coding standards, build a good DB structure, and tweak the heck out of Apache (or use something like Lighttpd). You will have a lot more widespread acceptance of your code if you ever decide to make it open source, and if you ever need a hand with it another developer could step right in and pick up the keyboard.
Also, check out query caching in MySQL--you will see a similar (though not one-to-one) benefit from caching your query results server side with regard to your query example. Even better that is stored in server memory so PHP/MySQL overhead is dropped AND you don't have to code it.
All of that aside, I do think it is possible. =)
Generally speaking, such a cache system can generate significant time savings, but at the cost of memory and complexity. The more results you want to keep, the more memory it will take; and there's no guarantee that your results will ever be used again, particularly the larger result sets.
Second, there are certain queries that should never be cached, or that should be run again even if they're in the cache. For the most part, only SELECT and SHOW queries can be cached effectively, but you need to worry about invalidating them when you modify the underlying data. Even in the same pageview, you might find yourself working around your own cache system on occasion.
Third, this kind of problem has already been solved several times. First, consider turning on the MySQL query cache. Most of the time, it will speed things up a bit without requiring any code changes on your end. However, it's a bit aggressive about invalidating entries, so you could gain some performance at a higher level.
If you need another level, consider memcached. You'll have to store and invalidate entries manually, but it can store results across page views (where you'll really find the performance benefit), and will let unused entries expire before running out of memory.

Categories