Best practises to stop memory leaks and improve performance - php

To put it simply i am a fairly new PHP coder and i was wondering if anyone could guide me towards the best ways to improve performance in code as well as stopping those pesky memory leaks, my host is one of those that doesn't have APC or the like installed so it would all have to be hand coded -_-

I don't think ordinary memory leaks (like forgetting to dispose of objects or strings) are common in PHP, but resource leaks in general are. I've had issues with:
database connections -- you should really call pg_close/mysql_close/etc. when you're done with the connection. Though I think PHPs connection pooling mitigates this (but can have problems of its own).
Images -- if you use the gd2 extension to open or create images, you need to image_destroy these, because otherwise they'll occupy memory forever. And images tend to be big in terms of data size.
Note that if your scripts run as pure CGI (no HTTP server modules), then the resources will effectively be cleaned up when the script exits. However there may still be memory issues during the script's runtime, especially in the case of images where it's not uncommon to perform many manipulations in a single script execution.

In general, php scripts can't leak memory. The php runtime manages all memory for its scripts. The script itself may leak memory, but this will be reclaimed when the php process ends. Since php is mainly used for processing http-requests and these generally run for a very short time, this makes it a non-issue if you leak a bit of memory underway. So memory leaks should only really concern you if you use php for non-http tasks. Performance should be a bigger concern for you than memory usage. Use a tool such as xdebug to profile your code.

Related

Can I share a large array in memory between PHP processes?

I use PHP to do a lot of data processing ( realizing I'm probably pushing into territories where I should be using other languages and/or techniques ).
I'm doing entity extraction with a PHP process that loads an array containing ngrams to look for into memory. That array uses 3GB of memory and takes about 20 seconds to load each time I launch a process. I generate it once locally on the machine and each process loads it from a .json file. Each process then tokenizes the text it's processing and does an array_intersect between these two arrays to extract entities.
Is there any way to preload this into memory on the machine that is running all these processes and then share the resource across all the processes?
Since it's probably not possible with PHP: What type of languages/methods should I be researching to do this sort of entity extraction more efficiently?
If the array never gets modified after it's loaded, then you could use pcntl_fork() and fork off a bunch of copies of the script. With copy-on-write semantics, they'd all be reading from the exact same memory copy of the array.
However, as soon as the array gets modified, then you'll pay a huge penalty as the array gets copied into each forked child's memory space. This would be especially true if any of the scripts finish their run early - they'd shut down, that PHP process starts shutdown cleanup, and that'd count as a write on the array's memory space, causing the copying.
In your case, the best way of sharing might be read only mmap access.
I don't know if this is possible in PHP. A lot of languages will allow you to mmap a file into memory - and your operating system will be smart enough to realize that read-only maps can be shared. Also, if you don't need all of it, the operating system can reclaim the memory, and load it again from disk as necessary. In fact, it may even allow you to map more memory than you physically have.
mmap is really elegant. But nevertheless, dealing with such mapped data in PHP will likely be a pain, and sloooow. In general PHP is slow. In benchmarks, it is common to see PHP come in at 40-50 times the runtime of a good C program. This is much worse than e.g. Java, where a good Java program is only twice as slow as a highly optimized C; there it may pay off to have the powerful development tools of Java as opposed to having to debug low-level C code. But PHP does not have any key benefit: it is neither elegant to write, nor does it have a superior toolchain, nor it is fast...

PHP Out of Memory Exception

I have a PHP program that will run forever (not a webpage a socket server). After processing over 1000 requests the program eventually crashes due to an out of memory exception.
Here is a link to my project.
Here is a link to my program.
I am not sure why this happens, I have tried using garbage collection functions in the function that processes requests (onMessage) in the program but it does not result in any changes. Any suggestions would be appreciated.
Investing huge amounts of effort, you may be able to mitigate this for a while. But in the end you will have trouble running a non-terminating PHP application.
Check out PHP is meant to die. This article discusses PHP's memory handling (among other things) and specifically focuses on why all long-running PHP processes eventually fail. Some excerpts:
There’s several issues that just make PHP the wrong tool for this. Remember, PHP will die, no matter how hard you try. First and foremost, there’s the issue of memory leaks. PHP never cared to free memory once it’s not used anymore, because everything will be freed at the end — by dying. In a continually-running process, that will slowly keep increasing the allocated memory (which is, in fact, wasted memory), until reaching PHP’s memory_limit value and killing your process without a warning. You did nothing wrong, except expecting the process to live forever. Under load, replace the “slowly” part for "pretty quickly".
There’s been improvements in the “don’t waste memory” front. Sadly, they’re not enough. As things get complex or the load increases, it’ll crash.

How to reduce the memory footprint of a multi-process PHP application

I have a multi-process PHP (CLI) application that runs continuously. I am trying to optimize the memory usage because the amount of memory used by each process limits the number of forks that I can run at any given time (since I have a finite amount of memory available). I have tried several approaches. For example, following the advice given by preinheimer, I re-compiled PHP, disabling all extensions and then re-enabling only those needed for my application (mysql, curl, pcntl, posix, and json). This, however, did not reduce the memory usage. It actually increased slightly.
I am nearly ready to abandon the multi-process approach, but I am making a last ditch effort to see if anyone else has any better ideas on how to reduce memory usage. I will post my alternative approach, which involves significant refactoring of my application, below.
Many thanks in advance to anyone who can help me tackle this challenge!
Mutli-process PHP applications (e.g. an application that forks itself using pcntl_fork()) are inherently inefficient in terms of memory because each child process loads an entire copy of the php executable into memory. This can easily equate to 10 MB of memory per process or more (depending on the application). Compiling extensions as shared libraries, in theory, should reduce the memory footprint, but I have had limited success with this (actually, my attempts at this made the memory usage worse for some unknown reason).
A better approach is to use multi-threading. In this approach, the application resides in a single process, but multiple actions can be performed *concurrently** in separate threads (i.e. multi-tasking). Traditionally PHP has not been ideal for multi-threaded applications, but recently some new extensions have made multi-threading in PHP more feasible. See for example, this answer to a question about multithreading in PHP (whose accepted answer is rather outdated).
For the above problem, I plan to refactor my application into a multi-theaded one using pthreads. This requires a significant amount of modifications, but it will (hopefully) result in a much more efficient overall architecture for the application. I will update this answer as I proceed and offer some re-factoring examples for anyone else who would like to do something similar. Others feel free to provide feedback and also update this answer with code examples!
*Footnote about concurrence: Unless one has a multi-core machine, the actions will not actually be performed concurrently. But they will be scheduled to run on the CPU in different small time slices. From the user perspective, they will appear to run concurrently.

Is the memory allocated by PHP in a single request always released at the end?

I'm a bit confused about the memory leaks in PHP.
I've read that PHP is releasing automatically the memory used in each request thanks to the Zend Memory Manager:
http://www.webreference.com/programming/php_mem/2.html
But I see a lot of people and topics (even here in SO) concerned about PHP and memory leaks.
So I feel that I'm losing something.
Is it possible to have memory leaks in PHP between different requests?
It is not possible to have memory leaks from PHP scripts between different requests (when using the default Apache configuration), as the variables and code used in one request are released at the end of that request and PHP's memory allocator starts afresh for the next request. Bugs in the PHP interpreter or extensions could leak memory separately, however.
A much greater problem is that Apache child processes have PHP's memory space inside them. They swell to allocate the peak memory usage of a PHP script and then maintain this memory allocation until the child process is killed (once a process has asked the kernel to allocate a portion of memory, that memory won't be released until the process dies). For a more detailed explaination of why this is a problem and how to combat it, see my answer on Server Fault.
Memory leaks in a script, where variables are not unset and the PHP garbage collector fails, are very rare - most PHP scripts run for a few hundred milliseconds, and this is not generally enough time for even a serious memory leak to manifest.
You can monitor how much memory your PHP script is using with memory_get_usage() and memory_get_peak_usage() - there is also a good explanation on memory usage and how to program defensively in the PHP manual.
PHP's memory management is explained in detail in this article.
edit: You can determine the compiled in modules to Apache with httpd -l - defaults vary by OS distribution and repository configuration. There are plenty of ways to interface PHP to Apache - most detailed here.

Strategies for handling memory consumption in PHP5?

We have a large management software that is producing big reports of all kinds, based on numerous loops, with database retrievals, objects creations (many), and so on.
On PHP4 it could run happily with a memory limit of 64 MB - now we have moved it on a new server and with the same database - same code, the same reports won't come up without a gig of memory limit...
I know that PHP5 has changed under the hood quite a lot of things, but is there a way to make it behave ?
The question at the end is, what strategies do you apply when you need to have your scripts on a diet ?
A big problem we have run into was circular references between objects stopping them from freeing memory when they become out of scope.
Depending on your architecture you may be able to use __destruct() and manually unset any references. For our problem i ended up restructuring the classes and removing the circular references.
When I need to optimize resources on any script, I try always to analyze, profile and debug my code, I use xDebug, and the xDebug Profiler, there are other options like APD, and Benchmark Profiler.
Additionally I recommend you this articles:
Make PHP apps fast, faster, fastest..
Profiling PHP Applications (PDF)
PHP & Performance (PDF)
Since moving to the new server, have you verified that your MySQL and PHP system variables are identical to the way they were on your old server?
PHP5 introduced a lot of new functionality but due to its backward compatibility mantra, I don't believe that the differences between PHP5 and PHP4 should be causing this large an affect on the performance of an application who's code and database has not been altered.
Are you also running on the same version of Apache or IIS?
It sounds like a problem that is more likely related to your new system environment than to an upgrade from PHP4 to 5.
Bertrand,
If you are interested in refactoring the existing code then I would recommend that you first monitor your CPU and Memory usage while executing reports. Are you locking up your SQL server or are you locking up Apache (which happens if a lot of stress is being put onto the system by the PHP code)?
I worked on a project that initially bogged down MySQL so severely that we had to refactor the entire report generation process. However, when we finished the load was simply transferred to Apache (through the more complex PHP code). Our final solution was to refactor the database design to provide for better performance for reporting functions and to use PHP to pick up the slack on what we couldn't do natively in MySQL.
Depending on the nature of the reports you might consider denormalizing the data that is being used for the reports. You might even consider constructing a second database that serves as a data warehouse and is designed around OLAP principles rather than OLTP principles. You can start at Wikipedia for a general explanation of OLAP and data warehousing.
However, before you start looking at serious refactoring, have you verified that your environments are sufficiently similar by looking at phpinfo(); for PHP and SHOW VARIABLES;
in MySQL?
A gig!?!
even 64MB is big.
ignoring the discrepancy between environments, (which does sound very peculiar), it sounds like the code may need some re-factoring.
any chance you can re factor your code so that the result sets from database queries are not dumped into arrays. I would recommend that you construct an iterator for your result sets. (thence you can treat them as array for most purposes).there is a big difference between handling one record at a time, and handling 10,000 records at a time.
secondly, have a look at weather your code is creating multiple instances of the data. Can you pass the objects by reference. (use the '&'). We had to do a similar thing when using an early variant of the horde framework. a 1 MB attachment would blow out to 50MB from numerous calls which passed the whole dataset as a copy, rather than as a reference.

Categories