PHP's ability to handle recursion - php

I've seen in a few places lately people saying that PHP has a poor capacity for recursion. Recently I wrote a recursive php function for graph traversal and found it to be very slow compared to java. I don't know whether this is because of php's capacity for recursion or because php is slower than java in general.
Some googling revealed this (http://bugs.php.net/bug.php?id=1901)
[7 Aug 1999 12:25pm UTC] zeev at cvs dot php dot net
PHP 4.0 (Zend) uses the stack for
intensive data, rather than using the
heap. That means that its tolerance
recursive functions is significantly
lower than that of other languages.
It's relatively easy to tell Zend not
to use the stack for this data, and
use the heap instead - which would
greatly increase the number of
recursive functions possible - in the
price of reduced speed. If you're
interested in such a setting, let me
know, we may add a compile-time
switch.
What does it mean to say that php uses the stack for intensive data? Does php not set up a run-time stack? Also, is it true in general that recursion in php is much slower than other languages? And by how much?
Thanks!

Okay, I'll take a stab at it.
First: "The stack" is the area used for function call tracking in standard C/C++ type programs. It's a place the operating system and programming language conventions define in memory, and it's treated like a stack (the data structure). When you call a C function fibbonaci(int i) then it places the variable i, and the return address of the function that was already busy calling it, on the stack. That takes some memory. When it's done with that function call, the memory is available again. The stack is of finite size. If you are storing very massive variables on it and make many many recursive calls, then you may run out of room. Right?
So.....
Apparently Zend has two ways to allocate data: on the heap (a more-general area to request memory from) and on the stack, and the stack is a more efficient place to have them because of the way things are programmed. (I don't know why, but I can guess. There may be very-low-level caching concerns - I expect the stack is likelier to be in L1 or L2 cache than arbitrary memory in the heap would be, because the CPU is very likely to be using that area very frequently - every time you call a function, in fact. There may also be allocation overhead for heap data access.)
"Intensive" data in this context, I believe, refers to data which is very likely to be used very soon or very often. It would make sense to use the speedier stack-based allocation for these variables. What sort of variables are you certain to be using very quickly? Well, how about parameters to a function? You're very likely to use those: otherwise why would you be bothering to pass them around? They're also probably likelier to be small data items (references to massive data structures rather than massive data structures themselves - because that gives you copying overhead, among other things). So the stack probably makes sense for storing PHP function parameters for most PHP programmers... but it fails sooner in recursion.
Hopefully that answers at least "what does this mean?". For your recursion performance question: Go benchmark it yourself; it probably depends on what sort of recursion you're trying to do.

At a guess, I'd say that your problem lies elsewhere than the recursion itself. For many things, Java is a lot faster than PHP. There are, sort of, ways to improve PHP's performance.
However, the PHP recursion limitation results in PHP running out of stack and crashing, with the dreaded 'stack overflow' message (pun sort of intended). At this point, your program ceases to execute.
If PHP is using a dynamic stack, you could see some (mild) slowdown due to the time it takes to realloc the stack to a larger block of memory.
Anyway, I'd need to know a bit more about what you're doing to pinpoint your performance problem, which is something I do for a living...

Related

Does PHP silently optimize consecutive fseek-commands into one fseek command?

I am running Windows 7 - 64 bit, with the latest XAMPP version that has a 32-bit PHP version.
On testing http://php.net/manual/en/function.fseek.php#112647
for a very big file (bigger than PHP_MAX_INT 2147483647) I'm now pretty sure, that the consecutively following fseeks are summed up before being executed on the filepointer.
I have two questions:
Could I break up this summing up with reasonable means (or only with the workaround mentioned in the link above)?
Is this aggregation happening in PHP (as I assume, though I don't know where in PHP) or in Windows 7?
Answering myself: Trying two workarounds with multiple seeks didn't work
on my system. Instead they put the filepointer to different positions
at under PHP_MAX_INT. (32-bit PHP only can seek up to PHP_MAX_INT +
8192. Reading from there on is still possible, but I don't know how far.)
Therefore the question is obsolete for my specific case, as
32-bit PHP only can seek up to PHP_MAX_INT + 8192, whatever you do. I
leave the question, because two people voted it up, and might be
interested in a general answer.
I filed a bug report here:
https://bugs.php.net/bug.php?id=69213
Result: With a 64-bit PHP build it might work, but I didn't try it.
It doesn't. It actually does something even dumber. Here's a snippet from the PHP source code:
switch(whence) {
case SEEK_CUR:
offset = stream->position + offset;
whence = SEEK_SET;
break;
}
This is in the guts of the implementation for PHP's fseek. What's happening here is: if you tell PHP to seek from the current position, it translates that to an "equivalent" seek from the start of the file. This only works when that offset computation doesn't overflow; if it does, well, offset is a signed integer, so that's undefined behavior.
And, okay, this is there because PHP buffers streams internally, so they need to do something. But it doesn't have to be this.
You're probably best off trying to do your work in a language that actually does what you tell it to.
If aggregation were to happen it would likely have to be as an opcode optimization or would have to occur at the low level via a buffer.
I can answer at the low level. fseek() in php is implemented using php streams. It is declared in ext/standard/file.h and defined in .c. Its implementation calls php_stream_seek() which calls through to _php_stream_seek() in streams.c. The low level implementation of this is handled through the plain streams wrapper, in which case seek calls through to either zend_seek or zend_fseek, which in turn just map through to either 32 or 64-bit seek _seeki64 c calls.
So... if any aggregation happens, it would seem to have to be in the opcode optimizations or even further below in the OS or hardware. Hard disks implement out-of-order fetching to reduce head seek distances and filesystem buffering systems might be able to reduce seeks that have no side-effects. If you are concerned about disk read time, the first automatically handles this. If you are concerned with perhaps thrashing memory (seeking great distances unnecessarily in the buffer) you might considered another approach. See: http://www.cs.iit.edu/~cs561/cs450/disksched/disksched.html for more info on how disks avoid wasting seek time.
I hope this helps.

PHP - is it possible to instantiate too many objects

I have PHP program that requires me to instantiate 1800 objects, and each object is associated with 7-10 arrays filled with historical data (about 500 records per array).This program is run by cron every 5 minutes, and not by users.
Anyways, the designer of the program says instantiating 1800 objects at once is required, and is not something we can change. My question is whether or not instantiating this many objects alone is a "code smell", and if having this much data in memory (arrays consisting of a total of 9,000,000 records), is something that would be hard for PHP to handle (assuming adequate memory is available on the host).
Thanks
Classes and objects are mostly a conceptual tool used to organise code in a logical fashion that more or less applies to "things" in the real world. There's no significant difference for the computer when executing code written procedurally vs. object oriented code. The OO code may add a little bit of overhead compared to code written in the most optimal procedural way, but you will hardly ever notice this difference. 1800 objects can be created and destroyed within milliseconds, repeatedly. They by themselves are not a problem.
The question is: does writing it this way in OO significantly help code organisation? If done properly, likely yes. Is there any other realistic way to write the same algorithm in a procedural way which is significantly faster in execution? Would this other way be as logically structured, understandable and maintainable? Would the difference in code level quality be worth the difference in performance? Is it really too slow with its 1800 objects? Are the objects the bottleneck (likely: no) or is the overall algorithm and approach the bottleneck?
In other words: there's no reason to worry about 1800 objects unless you have a clear indication that they are a bottleneck, which they likely are not in and of themselves. Storing the same data in memory without an object wrapper will not typically significantly reduce any resource usage.
It would be slow as an application to initialize all those objects for your system to run. Now I know why you would do it as I have done it before - i would load a lookup object to avoid tapping into the DB if I'm going to do a look up.
However 1800 objects with a 500 record per array - that's pretty heavy and defeats the purpose of touching a database. Im aware that memory shall be available but considering that this is a load up without the crunch - I'm unsure that the 5mins cron will finish.
I suggest to benchmark this with a profiler to see what is the memory used and the time elapse before running this.
There isn't a (practically relevant) limit for the amount of objects itself. As long as there is enough physical RAM, you can always increase the memory llimit. However, from an architectural standpoint it might be very unwise to keep all of this in RAM for the whole execution time, when it is actually not needed. Because of their high dynamics PHP arrays are quite expansive, so it could be a massive performance hit. However, without any details or profiling it is not possible to give you a definitive answer.
But admittedly, it seems quite odd that so many objects are needed. A DBMS might be an alternative for handling this amount of data.

Floating Point Arithmetic, C/C++ the way to go?

I'm creating a web application that does some very heavy floating point arithmetic calculations, and lots of them! I've been reading a lot and have read you can make C(and C++) functions and call them from within PHP, I was wondering if I'd notice a speed increase by doing so?
I would like to do it this way even if it's only a second difference, unless it's actually slower.
It all depends on the actual number of calculations you are doing. If you have thousands of calculations to do then certainly it will be worthwhile to write an extension to handle it for you. In particular, if you have a lot of data this is where PHP really fails: it's memory manager can't handle a lot of objects, or large arrays (based on experience working with such data).
If the algorithm isn't too difficult you may wish to write it in PHP first anyway. This gives you a good reference speed but more importantly it'll help define exactly what API you need to implement in a module.
Update to "75-100 calculations with 6 numbers".
If you are doing this only once per page load I'd suspect it won't be a significant part of the overall load time (depends what else you do of course). If you are calling this function many times then yes, even 75 ops might be slow -- however since you use only 6 variables perhaps their optimizer will do a good job (whereas with 100 variables it's pretty much guaranteed not to).
Check SWIG.
Swig is a way to make php (and other languages) modules from your C sources rather easily.

PHP Optimization - Reducing memory usage

I'm running Eclipse in Linux and I was told I could use Xdebug to optimize my program. I use a combination algorithm in my script that takes too long too run.
I am just asking for a starting point to debug this. I know how to do the basics...break points, conditional break points, start, stop, step over, etc... but I want to learn more advanced techniques so I can write better, optimized code.
The first step is to know how to calculate the asymptotic memory usage, which means how much the memory grows when the problem gets bigger. This is done by saying that one recursion takes up X bytes (X = a constant, the easiest is to set it to 1). Then you write down the recurrence, i.e., in what manner the function calls itself or loops and try to conclude how much the memory grows (is it quadratic to the problem size, linear or maybe less?)
This is taught in elementary computer science classes at the universities since it's really useful when concluding how effective an algorithm is. The exact method is hard to describe in a simple forum post, so I recommend you to pick up a book on algorithms (I recommend "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein - MIT Press).
But if you don't have a clue about this type of work, start by using get_memory_usage and echoing how much memory you're using in your loop/recursion. This can give you a hint about were the problem is. Try to reduce the amount of things you keep in memory. Throw away everything you don't need (for example, don't build up a giant array of all data if you can boil it down to intermediary values earlier).

How to use each bit of memory sensitively in PHP

Can any body give me a a introduction of how to program efficiently minimizing memory usage in PHP program correctly and generate my program results using minimum memory ?
Based on how I read your question, I think you may be barking up the wrong tree with PHP. It was never designed for a low memory overhead.
If you just want to be as efficient as possible, then look at the other answers. Remember that every single variable costs a fair bit of memory, so use only what you have to, and let the garbage collector work. Make sure that you only declare variables in a local scope so they can get GC'd when the program leaves that scope. Objects will be more expensive than scalar variables. But the biggest common abuse I see are multiple copies of data. If you have a large array, operate directly on it rather than copying it (It may be less CPU efficient, but it should be more memory efficient).
If you are looking to run it in a low memory environment, I'd suggest finding a different language to use. PHP is nice because it manages everything for you (with respect to variables). But that type coersion and flexibility comes at a price (speed and memory usage). Each variable requires a lot of meta-data stored with it. So an 8 byte int (32 bit) would take 8 bytes to store in C, it will likely take more than 64 bytes in PHP (because of all of the "tracking" information associated with it such as type, name, scoping information, etc). That overhead is normally seen as ok since PHP was not designed for large memory loads. So it's a trade-off. More memory used for easier programming. But if you have tight memory constraints, I'd suggest moving to a different language...
It's difficult to give advice with so little information on what you're trying to do and why memory utilization is a problem. In the common scenarios (web servers that serve many requests), memory is not a limiting factory and it's preferable to serve the requests as fast as possible, even if this means sacrificing memory for speed.
However, the following general guidelines apply:
unset your variables as soon as you don't need them. In a program that's well written, this, however, won't have a big impact, as variables going out of scope have the same effect.
In long running scripts, with lot's of variables with circular references, and if using PHP 5.3, trey calling the garbage collector explicitly in certain points.
First of all: Don't try to optimize memory usage by using references. PHP is smart enough not to copy the contents of a variable if you do something like this:
$array = array(1,2,3,4,5,);
$var = $array;
PHP will only copy the contents of the variable when you write to it. Using references all the time because you think they will save you copying the variable content can often fire backwards ;)
But, I think your question is hard to answer, as long as you are more precise.
For example if you are working with files it can be recommendable not always to file_get_contents() the whole file, but use the f(open|...) functions to load only small parts of the file at once or even skip whole chunks.
Or if you are working with strings make use of functions which return a string offset instead of the rest of a string (e.g. strcspn instead of strpbrk) when possible.

Categories