Create large numbers of objects efficiently

Create large numbers of objects efficiently - php

edit: wrong assumptions were made by my un-perfect self when I posted this question and I feel this question might be misleading.
The efficiency problem actually turned out to be unrelated to push_array.
The comments were helpful in helping me to understand that:
1)this should not take so long, and
2) diagnosing efficiency problems with microtime() is a good practice.
end edit
I am creating ~1400 objects in a test scenario. I think ~1400 will be within a magnitude of typical use.
public $t = Array();
...
for(...){
code...
for(...){
array_push($this->t, new T($i, $str)); //<--this line slows program.
count++;
}
code...
}
Unfortunately the script is taking about 90 seconds to run. If I comment out the one line of code with array_push, the script runs in about 1/6 the time, about 15 seconds.
The inner loop count varies, but averages about 3 to 15 cycles with one new object for each cycle.
Questions:
I am not an expert in PHP. I would like to know:
1) if it would help (and if so, how) to allocate memory space beforehand.
2) if there are any efficiency steps I should be taking that would help the script run faster or a data structure that would be more efficient then an array of objects. The newly created objects currently have two attributes, an integer and a string representing a single word (roughly averaging ~10 characters).
edit:
This is the constructor:
class T{
public $line;
public $text;
function __construct($ln, $txt){
$this->line = $ln;
$this->text = $txt;
}
}

The runtime depends on few different factors :
The server your using to run the script
Code efficiency of course - here's a great article about writing efficient php code
Of course there are more factors , but from previous exprience it shouldn't take that long , but still , the information I've got about the objects you're creating is not broad enough so I could detect where the problem is.
By the way, using PHP Accelerators such as APC or xcache might improve your code runtime.

Related

will White Spaces, Comments, Empty Lines increase PHP excution time? [duplicate]

This question already has answers here:
Commenting interpreted code and performance
(9 answers)
Closed 9 years ago.
i have this doubt from many days. in every PHP code i write, i will write many comments, leave many white spaces and leave 4 to 5 empty lines between section and section (to make it clear for me)
will all these empty spaces, comments, empty lines make my PHP code to run slow ?
personal experiences are much appreciated :)

This is really a matter of IO and hard drive speed. If your bare file is 10KB and comments and line breaks add 4KB then the extra time that the hard drive spends reading more KB is what you need to benchmark (it's negligible by the way), not even worth your time.
If you start getting into micro-optimization then you run the risk of making your code absolutely horrid to read and maintain.
The best way to speed up your code is to re-factor code where necessary and don't do silly things that obviously hog resources like this crude example:
<?php
$arr = array(); // pretend it has 50,000 items
//GOOD IDEA: count the array once and reference that number
$arr_count = count($arr);
for($i=0; $i < $arr_count; $i++){
echo $arr[$i];
}
//BAD IDEA: re-counting the array for every iteration
for($i=0; $i < count($arr); $i++){
echo $arr[$i];
}
?>
Also unsetting a large array after you are done using it is better than waiting for the Garbage Collector to kick in. For example: pulling data from DB and looping through it. Unset the data when done and keep coding.

Comments and white space are completely ignored when the code is run. You can think of all that extra stuff as being completely wiped away once your done and the code is doing its thing.
Extra white space and comments are solely there for you and fellow coders to be better able to read and understand your code. In fact, if you don't use extra white space and comments, coders will get angry with you for writing and providing terrible code!

Consider the following code.
<?php
$time = round(microtime(true) * 1000);
for($i = 0; $i < 1000000; $i++) {
/*
*/
}
echo (round(microtime(true) * 1000) - $time) . "<br/>";
$time = round(microtime(true) * 1000);
for($i = 0; $i < 1000000; $i++) {
}
echo (round(microtime(true) * 1000) - $time) . "<br/>";
?>
There are times that the first is faster and others that the second is fast. So comments do not affect the speed.

Not really, is the simple answer for general scripts and coding.
It's likely that if you were having to consider gaining a few milliseconds here and there, and removing comments was affective, A) you have too many comments, and B) you'd already know about it all and be performing benchmarks etc.
The amount of comments is usually proportionate to the amount of code you have. ie a line or two of comments for a load of IF/ELSE, setting vars to POST or SESSIONS etc, and DB queries etc. And as the majority of PHP's time parsing a script is opening the file, accessing memory, checking thousands of things including cache etc, reading and executing the code, accessing database etc, the time taken to ignore your comments is probably .001%
Comments are used by you, and possibly other developers, to understand the code. Just keep them neat and try to keep them as short as possible while remaining concise, factual and useful.

Form code on fly and eval() it for repeatable blocks of code - PRO & CONTRA?

I was reading sourses of OpenCart and phpBB engines and noticed that there are lot of strings (sometimes full screen listings) with repeated code, which differs only in one parameter. Such as:
$this->data['button_cart'] = $this->language->get('button_cart');
$this->data['button_wishlist'] = $this->language->get('button_wishlist');
$this->data['button_compare'] = $this->language->get('button_compare');
$this->data['button_continue'] = $this->language->get('button_continue');
I am thinking about using function for generating code using paterns, and then eval() it.
Some such function:
function CodeGenerator($patern, $placements_arr){
$echo_str = '';
foreach($placements_arr as $placement){
$echo_str .= str_replace('=PATERN=', $placement, $patern);
}
if(substr($echo_str, -1)!==';'){
$echo_str .= ';'; # for correct eval() working
}
return $echo_str;
}
And then for large repeated blocks of code with same patern:
$patern = "$this->data['=PATERN='] = $this->language->get('=PATERN=');";
$placements_arr = array('button_cart', 'button_wishlist', 'button_compare', 'button_continue');
$echo_str = CodeGenerator($patern, $placements_arr);
eval($echo_str);
I want to understand PRO and CONTRA of such design, because I am thinking about using such design in my future development.
The only problem I see here now - a bit more slow execution. Any others?

Well for the block of code you have shown you could rewrite it like this
$params = array('button_cart', 'button_wishlist', 'button_compare', 'button_continue');
foreach($params as $param)
$this->data[$param] = $this->language->get($param);
You are writing out the parameters anyways, so I cannot see one benefit to your code over something like what I have shown above. Plus this is only 3 lines of code vs 11 of yours, and mine is instantly readable
in 99.9% of the code you write, you can write it without eval. There are some corner cases where eval makes sense, but in my 5 years of coding php so far I have used it maybe one or 2 times, and if I went back to the code I could probably rewrite it so It didn't.
If I had to maintain a project with code that you wrote, I would be tearing my hair out. Just look at what OpenCart wrote, and look at what you wrote. Which one is easier to understand? I actually have to look at your code a few times to understand what it is doing, I can skim over the OpenCart code and instantly understand what is happening.

Maintainability - if that's a word - might be a small concern. I would despise such a construct because it seems unnecessarily complex. I have inherited many - a - poorly designed php sites working as a web developer and in just about zero cases can I recall it being considered a nuisance to have to spool through a list of var assignments like above. However, I would become infuriated by having to deal with weird lazy functions that attempt to escape the banalities of repetitive typing.
In the end you're talking about fractions of a second for processing so that's hardly an argument for doing something like this. And if the microseconds are a concern, use a caching mechanism to write to flat text and wipe out all redundant processing all together.
Hey, my 2 cents. If it's your project and you don't expect anyone else to maintain it then knock yourself out.

Is it better call a function every time or store that value in a new variable?

I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)

TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.

In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.

If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.

Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.

Unless you are calling this function a million times your "performance boost" will be negligible.

I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.

I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.

Force freeing memory in PHP

In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.
Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.
Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?

it has to do with memory fragmentation.
Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.
Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:
''.join(['str1','str2','str3'])
in python
implode('', array('str1', 'str2', 'str3'))
in PHP
sprintf equivalents are also fine.
The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.

In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.
Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.

Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.
For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.

There's a way.
I had this problem one day. I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step. Kept running out of memory. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help. But it was not the end, to quote a classic.
When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.
TL;DR
When a function exits, it frees all local variables.
If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained. Garbage collection is very likely to happen on return from a function. (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.)
Side note: for reference-passed variables it will obviously not work; a function can only free its inside variables that would be lost anyway on return.
I hope this saves your day as it saved mine!

I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:
while (condition) {
// do
// cool
// stuff
}
to
while (condition) {
do_cool_stuff();
}
function do_cool_stuff() {
// do
// cool
// stuff
}
EDIT
I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()
for($x=0;$x<10000000;$x++)
{
do_something_cool();
}
function do_something_cool() {
$json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
$result = json_decode($json);
echo memory_get_peak_usage() . PHP_EOL;
}

I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?
I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.
I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.
HTH
C.

Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can. If you are iterating with foreach(), use a referenced variable to avoid making a copy of the original (foreach()).
foreach( $x as &$y)
If PHP is actually leaking memory a forced garbage collection won't make any difference.
There's a good article on PHP memory leaks and their detection at IBM

There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.
Could you try using fread instead? I think this may solve your problem.
If it does, it's probably time to do a bugreport over at PHP.

Doctrine query memory usage

Doctrine appears to be taking well over 4MB of RAM to execute a single, simple query:
print memory_get_peak_usage()." <br>\n";
$q = Doctrine_Query::create()
->from('Directories d')
->where('d.DIRECTORY_ID = ?', 5);
$dir = $q->fetchOne();
print $dir['name']." ".$dir['description']."<br>\n";
print memory_get_peak_usage()." <br>\n";
/*************** OUTPUT: **************************
6393616
testname testdescription
10999648
/***************************************************/
This is on a test database with very little data in it - the item that I am querying doesn't contain any data other than what is displayed here.
Is there potentially something wrong with the way I have the system set up, or is this standard memory usage for Doctrine?

From what I can see, you code doesn't seem to be wrong...
As a test, I've set up a quick example, with a very simple table (only four fields).
Here is the relevant code :
var_dump(number_format(memory_get_peak_usage()));
$test = Doctrine::getTable('Test')->find(1);
var_dump(number_format(memory_get_peak_usage()));
When doing that, I have this kind of output :
string '1,316,088' (length=9)
string '2,148,760' (length=9)
Considering the table is really simple and that I am only fetching one line, it seems "much" to me too -- but that's quite consistent with what you are getting, and with what I saw on other projects :-(
If you only need to display your data, and not work with it (ie update/delete/...), a solution might be to not fetch complex objects, but only a simple array :
$test = Doctrine::getTable('Test')->find(1, Doctrine::HYDRATE_ARRAY);
But, in this case, it doesn't make much of a difference, actually :-( :
string '1,316,424' (length=9)
string '2,107,128' (length=9)
Only 40 KB of difference -- well, with bigger objects / more lines, it might still be a good idea...
In the Doctrine manual, there is a page called Improving Performance ; maybe it could help you, especially for these sections :
Conservative Fetching
Free Objects
Oh, btw : I did this test on PHP 5.3.0 ; maybe this can have an impact on the amount of memory used...

I agree with romanb's answer - using an OpCode cache is a definite must when using large libs/frameworks.
An example related to OpCode caching
I've recently adopted Doctrine usage with Zend Framework and was curious about memory usage - so like the OP, I created a method using similar criteria to the OPs test and ran it as an overall test to see what ZF + Doctrine's peak memory usage would be.
I got the following results:
Result without APC:
10.25 megabytes
RV David
16.5 megabytes
Result with APC:
3 megabytes
RV David
4.25 megabytes
Opcode caching makes a very significant difference.

Well, where does this memory usage come from? As Pascal MARTIN pointed out, array hydration does not make a great difference which is logical concerning that we're only talking about a few records here.
The memory consumption comes from all the classes that are loaded on demand through autoloading.
If you dont have APC set up, then yes, there is something wrong with the way your system is set up. Dont even start to measure performance and expect good results with any large php library without an opcode cache like APC. It will not only speed up the execution but also reduce memory usage by at least 50% on all page loads except the very first one (where APC needs to cache the bytecodes first).
And 4MB with your simple example really smells like no-APC, otherwise it would really be a bit high.

Caution with fetchOne() on Doctrine Query. This function call will not append "Limit 1" on SQL
If you just need to get one records from DB, make sure:
$q->limit(1)->fetchOne()
The memory usage is tremendous dropped on large table.
You could see fetchOne() will fetch from DB as a collection first then return the first element.
public function fetchOne($params = array(), $hydrationMode = null)
{
$collection = $this->execute($params, $hydrationMode);
if (is_scalar($collection)) {
return $collection;
}
if (count($collection) === 0) {
return false;
}
if ($collection instanceof Doctrine_Collection) {
return $collection->getFirst();
} else if (is_array($collection)) {
return array_shift($collection);
}
return false;
}

Doctrine provides a free() function on Doctrine_Record, Doctrine_Collection, and Doctrine_Query which eliminates the circular references on those objects, freeing them up for garbage collection.
More info..
To make memory usage a little bit less You can try to use folowing code:
$record->free(true) – will do deep free-up's, calls free() on all relations too
$collection->free() – this will free all collection references
Doctrine_Manager::connection()->clean()/clear() – cleanup connection (and remove identity map entries)
$query->free()

I would guess that most of that memory is used up loading Doctrine's classes, not actually for objects associated with the query itself.
Which version of Doctrine are you using?
Are you using the autoloader?
In Doctrine 1.1, the default autoload behavior is called 'aggressive', which means that it load all of your model classes even if you're only using one or two on any particular request. Setting that behavior to 'conservative' would reduce memory usage.

I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Create large numbers of objects efficiently - php

Related

will White Spaces, Comments, Empty Lines increase PHP excution time? [duplicate]

Form code on fly and eval() it for repeatable blocks of code - PRO & CONTRA?

Is it better call a function every time or store that value in a new variable?

Force freeing memory in PHP

Doctrine query memory usage

Categories

Resources