Memory management is not something that most PHP developers ever need to think about. I'm running into an issue where my command line script is running out of memory. It performs multiple iterations over a large array of objects, making multiple database requests per iteration. I'm sure that increasing the memory ceiling may be a short term fix, but I don't think it's an appropriate long-term solution. What should I be doing to make sure that my script is not using too much memory, and using memory efficiently?
The golden rule
The number one thing to do when you encounter (or expect to encounter) memory pressure is: do not read massive amounts of data in memory at once if you intend to process them sequentially.
Examples:
Do not fetch a large result set in memory as an array; instead, fetch each row in turn and process it before fetching the next
Do not read large text files in memory (e.g. with file); instead, read one line at a time
This is not always the most convenient thing in PHP (arrays don't cut it, and there is a lot of code that only works on arrays), but in recent versions and especially after the introduction of generators it's easier than ever to stream your data instead of chunking it.
Following this practice religiously will "automatically" take care of other things for you as well:
There is no longer any need to clean up resources with a big memory footprint by closing them and losing all references to them on purpose, because there will be no such resources to begin with
There is no longer a need to unset large variables after you are done with them, because there will be no such variables as well
Other things to do
Be careful of creating closures inside loops; this should be easy to do, as creating such inside loops is a bad code smell. You can always lift the closure upwards and give it more parameters.
When expecting massive input, design your program and pick algorithms accordingly. For example, you can mergesort any amount of text files of any size using a constant amount of memory.
You could try profiling it puting some calls to memory_get_usage(), to look for the place where it's peaking.
Of course, knowing what the code really does you'll have more information to reduce its memory usage.
When you compute your large array of objects, try to not compute it all at once. Walk in steps and process elements as you walk then free memory and take next elements.
It will take more time, but you can manage the amount of memory you use.
Related
Can php script get allowed memory size and also how much memory can be allocated? I know that it is possible to clean memory using unset. But I'll like to understand how to create php scripts that consume less memory as possibile.
The basic mechanism which PHP uses is garbage collection
How it works in short is something like:
Say you have a certain memory location M allocated to store variable $m e.g.:
$m = [ 0,1,2,3,4,5 ]; //M refers to the memory which is storing this array
As long as $m keeps pointing to M then PHP is not allowed to destroy M. However if you do something like:
$m = null;
This makes $m point to nothing and therefore M no longer is referenced by anything. PHP at this point is allowed to clear that memory, but may not do so immediately. The point is if you ensure that you stop referencing something when you don't need it anymore you're giving PHP the opportunity to run as memory optimized as possible.
However, garbage collection for large complex applications is expensive so keep in mind that PHP may opt to delay garbage collection if it can.
unset will free the memory.
Though major memory consumption will be on the resources from io, db etc.
And for these tasks, it is very important to release the memory or free up the sources for further utilisation.
Also to note that while processing, the processed data will also have near similar usage. Hence freeing that after usage will also improve the use case.
To go more and more memory less, make functions pure as much as possible, where after execution of function, there is only output and no side effect. With this there will be less things in global memory space.
I have PHP program that requires me to instantiate 1800 objects, and each object is associated with 7-10 arrays filled with historical data (about 500 records per array).This program is run by cron every 5 minutes, and not by users.
Anyways, the designer of the program says instantiating 1800 objects at once is required, and is not something we can change. My question is whether or not instantiating this many objects alone is a "code smell", and if having this much data in memory (arrays consisting of a total of 9,000,000 records), is something that would be hard for PHP to handle (assuming adequate memory is available on the host).
Thanks
Classes and objects are mostly a conceptual tool used to organise code in a logical fashion that more or less applies to "things" in the real world. There's no significant difference for the computer when executing code written procedurally vs. object oriented code. The OO code may add a little bit of overhead compared to code written in the most optimal procedural way, but you will hardly ever notice this difference. 1800 objects can be created and destroyed within milliseconds, repeatedly. They by themselves are not a problem.
The question is: does writing it this way in OO significantly help code organisation? If done properly, likely yes. Is there any other realistic way to write the same algorithm in a procedural way which is significantly faster in execution? Would this other way be as logically structured, understandable and maintainable? Would the difference in code level quality be worth the difference in performance? Is it really too slow with its 1800 objects? Are the objects the bottleneck (likely: no) or is the overall algorithm and approach the bottleneck?
In other words: there's no reason to worry about 1800 objects unless you have a clear indication that they are a bottleneck, which they likely are not in and of themselves. Storing the same data in memory without an object wrapper will not typically significantly reduce any resource usage.
It would be slow as an application to initialize all those objects for your system to run. Now I know why you would do it as I have done it before - i would load a lookup object to avoid tapping into the DB if I'm going to do a look up.
However 1800 objects with a 500 record per array - that's pretty heavy and defeats the purpose of touching a database. Im aware that memory shall be available but considering that this is a load up without the crunch - I'm unsure that the 5mins cron will finish.
I suggest to benchmark this with a profiler to see what is the memory used and the time elapse before running this.
There isn't a (practically relevant) limit for the amount of objects itself. As long as there is enough physical RAM, you can always increase the memory llimit. However, from an architectural standpoint it might be very unwise to keep all of this in RAM for the whole execution time, when it is actually not needed. Because of their high dynamics PHP arrays are quite expansive, so it could be a massive performance hit. However, without any details or profiling it is not possible to give you a definitive answer.
But admittedly, it seems quite odd that so many objects are needed. A DBMS might be an alternative for handling this amount of data.
I intend to make a dynamic list in php, for which I have a plain text file with an element of the list in every line. Every line has a string that needs to be parsed into several smaller chunks before rendering the final html document.
Last time I did something similar, I used a file() function to load my file into an array, but in this case I have a 12KB file with more than 50 lines, that will most certainly grow bigger over time. Should I load the entries from the file to a SQL database to avoid performance issues?
Yes, put the information into a data base. Not for performance reasons (in terms of sequential reading) because a 12KB file will be read very quickly, but for the part about parsing into separate chunks. Make those chunks into columns of your DB table. It will make the whole programming process go faster, with greater flexibility.
Breaking stuff up in to properly formatted database is -almost- always a good idea and will be a performance saver.
However, 50 lines is pretty minor (even a few hundred lines is pretty minor). A bit of quick math, 12KB / 50 lines tells me each line is only about 240 characters long on average.
I doubt that amount of processing (or even several times that much) will be a significant enough performance hit to cause dread unless this is a super high performance site.
While 50 lines doesn't seem like too much, it would be a good idea to use the database now rather than making the change later. One think you would have to remember is that using database won't straight-away eliminate performance issues, but help you make better use of resources. In fact, you can write a similarly optimized process using files too, and they would work just about the same except for I/O difference.
I reread the question and you realize that you might mean that you would load the file to the database every time. I don't see how this can help unless you are using database as a form of cache to avoid repeated hits to the file. Ultimately, reading from a file or database would only differ in how the script uses I/O, disk caches, etc... The processing you do on the list might make more of a difference here.
I have some very large data files and for business reasons I have to do extensive string manipulation (replacing characters and strings). This is unavoidable. The number of replacements runs into hundreds of thousands.
It's taking longer than I would like. PHP is generally very quick but I'm doing so many of these string manipulations that it's slowing down and script execution is running into minutes. This is a pain because the script is run frequently.
I've done some testing and found that str_replace is fastest, followed by strstr, followed by preg_replace. I've also tried individual str_replace statements as well as constructing arrays of patterns and replacements.
I'm toying with the idea of isolating string manipulation operation and writing in a different language but I don't want to invest time in that option only to find that improvements are negligible. Plus, I only know Perl, PHP and COBOL so for any other language I would have to learn it first.
I'm wondering how other people have approached similar problems?
I have searched and I don't believe that this duplicates any existing questions.
Well, considering that in PHP some String operations are faster than array operation, and you are still not satisfied with its speed, you could write external program as you mentioned, probably in some "lower level" language. I would recommend C/C++.
There are two ways of handling this, IMO:
[easy] Precompute some generic replacements in a background process and store them in a DB/file (this trick comes from a gamedev, where all the sinuses and cosinuses are precomputed once and then stored in RAM). You can easily run into curse of dimensionality here, though;
[not so easy] Implement replacement tool in C++ or other fast and compilable programming language and use it afterwards. Sphinx is a good example of fast manipulation tool on big textual data sets implemented in C++.
If you'd allow the replacement to be handled over multiple executions, you could create a script that process each file, temporarily creating replacement files with duplicate content. This would allow for you to extract data from one file to another, process the copy - and then merge the changes, or if you use a stream buffer you might be able to remember each row so the copy/merge step can be skipped.
The problem though might be that you process a file without completing it, rendering it mixed. Therefore a temporary file is suitable.
This would allow for the script to run as many times there's still changes to be made, all you need is a temporary file that remembers which files that has been processed.
The limiting factor is about PHP rebuilding the strings. Consider:
$out=str_replace('bad', 'good', 'this is a bad example');
It's a relatively low cost operation to locate 'bad' in the string, but in order to make room for the substitution, PHP then has to move up, each of the chars e,l,p,m,a,x,e,space before writing in the new value.
Passing arrays for the needle and haystack will improve performance, but not as much as it might.
AFAIK, PHP does not have low level memory access functions, hence an optimal solution would have to be written in a different language, dividing the data up into 'pages' which can be stretched to accomodate changes. You could try this using chunk_split to divide the string up into smaller units (hence each replacement would require less memory juggling).
Another approach would be to dump it into a file and use sed (this still operates one search/replace at a time), e.g.
sed -i 's/good/bad/g;s/worse/better/g' file_containing_data
If you have to do this operation only once and you have to replace with static content you can use Dreamwaver or other editor, so you will not need PHP. It will be much faster.
Still, if you do need to do this dynamically with PHP (you need database records or others) you can use shell commands via exec - google search for search-replace
It is possible that you have hit a wall with PHP. PHP is great, but in some areas it fails, such as processing LOTS of data. There are a few things you could do:
Use more than one php process to accomplish the task (2 process potentially could take half as long).
Install a faster CPU.
Do the processing on multiple machines.
Use a compiled language to process the data (Java, C, C++, etc)
I think the question is why are you running this script frequently? Are you performing the computations (the string replacements) on the same data over and over again, or are you doing it on different data every time?
If the answer is the former then there isn't much more you can do to improve performance on the PHP side. You can improve performance in other ways such as using better hardware (SSDs for faster reads/writes on the files), multicore CPUs and breaking up the data into smaller pieces running multiple scripts at the same time to process the data concurrently, and faster RAM (i.e. higher bus speeds).
If the answer is the latter then you might want to consider caching the result using something like memcached or reddis (key/value cache stores) so that you can only perform the computation once and then it's just a linear read from memory, which is very cheap and involves virtually no CPU overhead (you might also utilize CPU cache at this level).
String manipulation in PHP is already cheap because PHP strings are essentially just byte arrays. There's virtually no overhead from PHP in reading a file into memory and storing it in a string. If you have some sample code that demonstrates where you're seeing performance issues and some bench mark numbers I might have some better advice, but right now it just looks like you need refactor your approach based on what your underlying needs are.
For example, there are both CPU and I/O costs to consider individually when you're dealing with data in different situations. I/O involves blocking since it's a system call. This means your CPU has to wait for more data to come over the wire (while your disk transfers data to memory) before it can continue to process or compute that data. Your CPU is always going to be much faster than memory and memory is always much faster than disk.
Here's a simple benchmark to show you the difference:
/* First, let's create a simple test file to benchmark */
file_put_contents('in.txt', str_repeat(implode(" ",range('a','z')),10000));
/* Now let's write two different tests that replace all vowels with asterisks */
// The first test reads the entire file into memory and performs the computation all at once
function test1($filename, $newfile) {
$start = microtime(true);
$data = file_get_contents($filename);
$changes = str_replace(array('a','e','i','o','u'),array('*'),$data);
file_put_contents($newfile,$changes);
return sprintf("%.6f", microtime(true) - $start);
}
// The second test reads only 8KB chunks at a time and performs the computation on each chunk
function test2($filename, $newfile) {
$start = microtime(true);
$fp = fopen($filename,"r");
$changes = '';
while(!feof($fp)) {
$changes .= str_replace(array('a','e','i','o','u'),array('*'),fread($fp, 8192));
}
file_put_contents($newfile, $changes);
return sprintf("%.6f", microtime(true) - $start);
}
The above two tests do the same exact thing, but Test2 proves significantly faster for me when I'm using smaller amounts of data (roughly 500KB in this test).
Here's the benchmark you can run...
// Conduct 100 iterations of each test and average the results
for ($i = 0; $i < 100; $i++) {
$test1[] = test1('in.txt','out.txt');
$test2[] = test2('in.txt','out.txt');
}
echo "Test1 average: ", sprintf("%.6f",array_sum($test1) / count($test1)), "\n",
"Test2 average: ", sprintf("%.6f\n",array_sum($test2) / count($test2));
For me the above benchmark gives Test1 average: 0.440795 and Test2 average: 0.052054, which is an order of magnitude difference and that's just testing on 500KB of data. Now, if I increase the size of this file to about 50MB Test1 actually proves to be faster since there are fewer system I/O calls per iteration (i.e. we're just reading from memory linearly in Test1), but more CPU cost (i.e. we're performing a much larger computation per iteration). The CPU generally proves to be able to handle much larger amounts of data at a time than your I/O devices can send over the bus.
So it's not a one-size-fits-all solution in most cases.
Since you know Perl, I would suggest doing the string manipulations in perl using regular expressions and use the final result in PHP web page.
This seems better for the following reasons
You already know Perl
Perl does string processing better
You can use PHP where necessary only.
does this manipulation have to happen on the fly? if not, might i suggest pre-processing... perhaps via a cron job.
define what rules your going to be using.
is it just one str_replace or a few different ones?
do you have to do the entire file in one shot? or can you split it into multiple batches? (e.g. half the file at a time)
once your rules are defined decide when you will do the processing. (e.g. 6am before everyone gets to work)
then you can setup a job queue. i have used apache's cron jobs to run my php scripts on a given time schedule.
for a project i worked on a while ago i had a setup like this:
7:00 - pull 10,000 records from mysql and write them to 3 separate files.
7:15 - run a complex regex on file one.
7:20 - run a complex regex on file two.
7:25 - run a complex regex on file three.
7:30 - combine all three files into one.
8:00 - walk into the metting with the formatted file you boss wants. *profit*
hope this helps get you thinking...
Can any body give me a a introduction of how to program efficiently minimizing memory usage in PHP program correctly and generate my program results using minimum memory ?
Based on how I read your question, I think you may be barking up the wrong tree with PHP. It was never designed for a low memory overhead.
If you just want to be as efficient as possible, then look at the other answers. Remember that every single variable costs a fair bit of memory, so use only what you have to, and let the garbage collector work. Make sure that you only declare variables in a local scope so they can get GC'd when the program leaves that scope. Objects will be more expensive than scalar variables. But the biggest common abuse I see are multiple copies of data. If you have a large array, operate directly on it rather than copying it (It may be less CPU efficient, but it should be more memory efficient).
If you are looking to run it in a low memory environment, I'd suggest finding a different language to use. PHP is nice because it manages everything for you (with respect to variables). But that type coersion and flexibility comes at a price (speed and memory usage). Each variable requires a lot of meta-data stored with it. So an 8 byte int (32 bit) would take 8 bytes to store in C, it will likely take more than 64 bytes in PHP (because of all of the "tracking" information associated with it such as type, name, scoping information, etc). That overhead is normally seen as ok since PHP was not designed for large memory loads. So it's a trade-off. More memory used for easier programming. But if you have tight memory constraints, I'd suggest moving to a different language...
It's difficult to give advice with so little information on what you're trying to do and why memory utilization is a problem. In the common scenarios (web servers that serve many requests), memory is not a limiting factory and it's preferable to serve the requests as fast as possible, even if this means sacrificing memory for speed.
However, the following general guidelines apply:
unset your variables as soon as you don't need them. In a program that's well written, this, however, won't have a big impact, as variables going out of scope have the same effect.
In long running scripts, with lot's of variables with circular references, and if using PHP 5.3, trey calling the garbage collector explicitly in certain points.
First of all: Don't try to optimize memory usage by using references. PHP is smart enough not to copy the contents of a variable if you do something like this:
$array = array(1,2,3,4,5,);
$var = $array;
PHP will only copy the contents of the variable when you write to it. Using references all the time because you think they will save you copying the variable content can often fire backwards ;)
But, I think your question is hard to answer, as long as you are more precise.
For example if you are working with files it can be recommendable not always to file_get_contents() the whole file, but use the f(open|...) functions to load only small parts of the file at once or even skip whole chunks.
Or if you are working with strings make use of functions which return a string offset instead of the rest of a string (e.g. strcspn instead of strpbrk) when possible.