Store text: array vs constant to best performance - php

I'm coding a multi language website. The text for each page is loaded from a MySQL database and should be assigned to an array or constant to insert it into the web content.
I would like to know if it is better, to save memory and for best performance, the use of constants or arrays to store the text. i.e.
foreach ($db_text_object as $t){
$text["$t->key"] = $t->text;
}
or:
foreach ($db_text_object as $t){
define($t->key, $t->text);
}
To be used as:
echo $text['mytext'];
or:
echo mytext;
Any other comment about advantage or disadvantage of each method will be appreciated.
Thank you.

I don't think either will have a significant impact on performance, especially since the bottleneck will be getting all the data from the database every single time. If you're really interested in whether it makes a difference: try it both ways and measure it.
When you're done doing that, do it right by using a native extension like gettext, which was made for this exact purpose, does internal caching of binary translation files and comes with a whole ecosystem of tools supporting the translation workflow.

The performance difference will be too trivial or too minute
I personally go with the arrays, since you would be able to access it in a simpler way..
Benchmarking results...
<?php
$db_text_object=[1,2,3,4,5];
$start = microtime(true);
foreach ($db_text_object as $k=>$v){
$text[$k] = $v;
}
echo "Constant Performance: " . (microtime(true) - $start) . "\n";
$start = microtime(true);
foreach ($db_text_object as $k=>$v){
define($k, $v);
}
echo "Array Performance: " . (microtime(true) - $start) . "\n";
OUTPUT:
Constant Performance: 1.9073486328125E-5
Array Performance: 1.3113021850586E-5
Benchmarking Demo at CodeViper

Related

How can I quickly delete a value of less than two characters from a large array?

I want delete values of less than two characters from a my large array which have 9436065 string values. I deleted with preg_grep() using this code:
function delLess($array, $less)
{
return preg_grep('~\A[^qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM]{'.$less.',}\z~u', $array);
}
$words = array("ӯ","ӯро","ӯт","ғариб","афтода","даст", "ра");
echo "<pre>";
print_r(delLess($words,2));
echo "</pre>";
But it works slower. Is it possible to optimize this code?
I would go for array_filter function, performance should be better.
function filter($var)
{
return strlen($var) > 2;
}
$newArray = array_filter($array, "filter"));
given the size of the dataset, I'd use a database, so it would probably look like this:
delete from table where length(field) <= 2
maybe something like sqlite?
You could try using the strlen function instead of regular expressions and see if that is faster. (Or mb_strlen for multibyte characters.)
$newArr = array();
foreach($words as $val)
if(strlen($val) > 2)
$newArr[] = $val;
echo "<pre>";
print_r($newArr);
echo "</pre>";
Any work on 10 million strings will take time. In my opinion, this kind of operation is a one timer, so it does not really matter if it is not instantaneous.
Where are the strings coming from? You certainly got them from a database, if so, do the work on the database it will be faster and at least you will not be polluted with them ever. This kind of operation will be faster on a database than PHP, but could still take time.
Again, if it is stored in a database, it has not got there magically... So you could also make sure that no new unwanted entry gets in it, that way you make sure this operation will not need to be redone.
I am aware that this absolutely does not answer your question at all, because we should stick to PHP and you got the best way to do it... Optimizing such a simple function would cost a lot of time and wouldn't bring much if any optimization... The only other suggestion I could make is use another tool, if not database-based, file-based like sed, awk or anything that reads/writes to files... You'd have one string per line and parse the file reducing its size accordingly, but writing the file from PHP, exec the script and load the file back in PHP would make things too complicated for nothing...

What is better in a foreach loop... using the & symbol or reassigning based on key?

Consider the following PHP Code:
//Method 1
$array = array(1,2,3,4,5);
foreach($array as $i=>$number){
$number++;
$array[$i] = $number;
}
print_r($array);
//Method 2
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
print_r($array);
Both methods accomplish the same task, one by assigning a reference and another by re-assigning based on key. I want to use good programming techniques in my work and I wonder which method is the better programming practice? Or is this one of those it doesn't really matter things?
Since the highest scoring answer states that the second method is better in every way, I feel compelled to post an answer here. True, looping by reference is more performant, but it isn't without risks/pitfalls.
Bottom line, as always: "Which is better X or Y", the only real answers you can get are:
It depends on what you're after/what you're doing
Oh, both are OK, if you know what you're doing
X is good for Such, Y is better for So
Don't forget about Z, and even then ...("which is better X, Y or Z" is the same question, so the same answers apply: it depends, both are ok if...)
Be that as it may, as Orangepill showed, the reference-approach offers better performance. In this case, the tradeoff one of performance vs code that is less error-prone, easier to read/maintan. In general, it's considered better to go for safer, more reliable, and more maintainable code:
'Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.' — Brian Kernighan
I guess that means the first method has to be considered best practice. But that doesn't mean the second approach should be avoided at all time, so what follows here are the downsides, pitfalls and quirks that you'll have to take into account when using a reference in a foreach loop:
Scope:
For a start, PHP isn't truly block-scoped like C(++), C#, Java, Perl or (with a bit of luck) ECMAScript6... That means that the $value variable will not be unset once the loop has finished. When looping by reference, this means a reference to the last value of whatever object/array you were iterating is floating around. The phrase "an accident waiting to happen" should spring to mind.
Consider what happens to $value, and subsequently $array, in the following code:
$array = range(1,10);
foreach($array as &$value)
{
$value++;
}
echo json_encode($array);
$value++;
echo json_encode($array);
$value = 'Some random value';
echo json_encode($array);
The output of this snippet will be:
[2,3,4,5,6,7,8,9,10,11]
[2,3,4,5,6,7,8,9,10,12]
[2,3,4,5,6,7,8,9,10,"Some random value"]
In other words, by reusing the $value variable (which references the last element in the array), you're actually manipulating the array itself. This makes for error-prone code, and difficult debugging. As opposed to:
$array = range(1,10);
$array[] = 'foobar';
foreach($array as $k => $v)
{
$array[$k]++;//increments foobar, to foobas!
if ($array[$k] === ($v +1))//$v + 1 yields 1 if $v === 'foobar'
{//so 'foobas' === 1 => false
$array[$k] = $v;//restore initial value: foobar
}
}
Maintainability/idiot-proofness:
Of course, you might say that the dangling reference is an easy fix, and you'd be right:
foreach($array as &$value)
{
$value++;
}
unset($value);
But after you've written your first 100 loops with references, do you honestly believe you won't have forgotten to unset a single reference? Of course not! It's so uncommon to unset variables that have been used in a loop (we assume the GC will take care of it for us), so most of the time, you don't bother. When references are involved, this is a source of frustration, mysterious bug-reports, or traveling values, where you're using complex nested loops, possibly with multiple references... The horror, the horror.
Besides, as time passes, who's to say that the next person working on your code won't foget about unset? Who knows, he might not even know about references, or see your numerous unset calls and deem them redundant, a sign of your being paranoid, and delete them all together. Comments alone won't help you: they need to be read, and everyone working with your code should be thoroughly briefed, perhaps have them read a full article on the subject. The examples listed in the linked article are bad, but I've seen worse, still:
foreach($nestedArr as &$array)
{
if (count($array)%2 === 0)
{
foreach($array as &$value)
{//pointless, but you get the idea...
$value = array($value, 'Part of even-length array');
}
//$value now references the last index of $array
}
else
{
$value = array_pop($array);//assigns new value to var that might be a reference!
$value = is_numeric($value) ? $value/2 : null;
array_push($array, $value);//congrats, X-references ==> traveling value!
}
}
This is a simple example of a traveling value problem. I did not make this up, BTW, I've come across code that boils down to this... honestly. Quite apart from spotting the bug, and understanding the code (which has been made more difficult by the references), it's still quite obvious in this example, mainly because it's a mere 15 lines long, even using the spacious Allman coding style... Now imagine this basic construct being used in code that actually does something even slightly more complex, and meaningful. Good luck debugging that.
side-effects:
It's often said that functions shouldn't have side-effects, because side-effects are (rightfully) considered to be code-smell. Though foreach is a language construct, and not a function, in your example, the same mindset should apply. When using too many references, you're being too clever for your own good, and might find yourself having to step through a loop, just to know what is being referenced by what variable, and when.
The first method hasn't got this problem: you have the key, so you know where you are in the array. What's more, with the first method, you can perform any number of operations on the value, without changing the original value in the array (no side-effects):
function recursiveFunc($n, $max = 10)
{
if (--$max)
{
return $n === 1 ? 10-$max : recursiveFunc($n%2 ? ($n*3)+1 : $n/2, $max);
}
return null;
}
$array = range(10,20);
foreach($array as $k => $v)
{
$v = recursiveFunc($v);//reassigning $v here
if ($v !== null)
{
$array[$k] = $v;//only now, will the actual array change
}
}
echo json_encode($array);
This generates the output:
[7,11,12,13,14,15,5,17,18,19,8]
As you can see, the first, seventh and tenth elements have been altered, the others haven't. If we were to rewrite this code using a loop by reference, the loop looks a lot smaller, but the output will be different (we have a side-effect):
$array = range(10,20);
foreach($array as &$v)
{
$v = recursiveFunc($v);//Changes the original array...
//granted, if your version permits it, you'd probably do:
$v = recursiveFunc($v) ?: $v;
}
echo json_encode($array);
//[7,null,null,null,null,null,5,null,null,null,8]
To counter this, we'll either have to create a temporary variable, or call the function tiwce, or add a key, and recalculate the initial value of $v, but that's just plain stupid (that's adding complexity to fix what shouldn't be broken):
foreach($array as &$v)
{
$temp = recursiveFunc($v);//creating copy here, anyway
$v = $temp ? $temp : $v;//assignment doesn't require the lookup, though
}
//or:
foreach($array as &$v)
{
$v = recursiveFunc($v) ? recursiveFunc($v) : $v;//2 calls === twice the overhead!
}
//or
$base = reset($array);//get the base value
foreach($array as $k => &$v)
{//silly combine both methods to fix what needn't be a problem to begin with
$v = recursiveFunc($v);
if ($v === 0)
{
$v = $base + $k;
}
}
Anyway, adding branches, temp variables and what have you, rather defeats the point. For one, it introduces extra overhead which will eat away at the performance benefits references gave you in the first place.
If you have to add logic to a loop, to fix something that shouldn't need fixing, you should step back, and think about what tools you're using. 9/10 times, you chose the wrong tool for the job.
The last thing that, to me at least, is a compelling argument for the first method is simple: readability. The reference-operator (&) is easily overlooked if you're doing some quick fixes, or try to add functionality. You could be creating bugs in the code that was working just fine. What's more: because it was working fine, you might not test the existing functionality as thoroughly because there were no known issues.
Discovering a bug that went into production, because of your overlooking an operator might sound silly, but you wouldn't be the first to have encountered this.
Note:
Passing by reference at call-time has been removed since 5.4. Be weary of features/functionality that is subject to changes. a standard iteration of an array hasn't changed in years. I guess it's what you could call "proven technology". It does what it says on the tin, and is the safer way of doing things. So what if it's slower? If speed is an issue, you can optimize your code, and introduce references to your loops then.
When writing new code, go for the easy-to-read, most failsafe option. Optimization can (and indeed should) wait until everything's tried and tested.
And as always: premature optimization is the root of all evil. And Choose the right tool for the job, not because it's new and shiny.
As far as performance is concerned Method 2 is better, especially if you either have a large array and/or are using string keys.
While both methods use the same amount of memory the first method requires the array to be searched, even though this search is done by a index the lookup has some overhead.
Given this test script:
$array = range(1, 1000000);
$start = microtime(true);
foreach($array as $k => $v){
$array[$k] = $v+1;
}
echo "Method 1: ".((microtime(true)-$start));
echo "\n";
$start = microtime(true);
foreach($array as $k => &$v){
$v+=1;
}
echo "Method 2: ".((microtime(true)-$start));
The average output is
Method 1: 0.72429609298706
Method 2: 0.22671484947205
If I scale back the test to only run ten times instead of 1 million I get results like
Method 1: 3.504753112793E-5
Method 2: 1.2874603271484E-5
With string keys the performance difference is more pronounced.
So running.
$array = array();
for($x = 0; $x<1000000; $x++){
$array["num".$x] = $x+1;
}
$start = microtime(true);
foreach($array as $k => $v){
$array[$k] = $v+1;
}
echo "Method 1: ".((microtime(true)-$start));
echo "\n";
$start = microtime(true);
foreach($array as $k => &$v){
$v+=1;
}
echo "Method 2: ".((microtime(true)-$start));
Yields performance like
Method 1: 0.90371179580688
Method 2: 0.2799870967865
This is because searching by string key has more overhead then the array index.
It is also worth noting that as suggested in Elias Van Ootegem's Answer to properly clean up after yourself you should unset the reference after the loop has completed. I.e. unset($v); And the performance gains should be measured against the loss in readability.
There are some minor performance differences, but they aren't going to have any significant effect.
I would choose the first option for two reasons:
It's more readable. This is a bit of a personal preference, but at first glance, it's not immediately obvious to me that $number++ is updating the array. By explicitly using something like $array[$i]++, it's much clearer, and less likely to cause confusion when you come back to this code in a year.
It doesn't leave you with a dangling reference to the last item in the array. Consider this code:
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
// ... some time later in an unrelated section of code
$number = intval("100");
// now unexpectedly, $array[4] == 100 instead of 6
I guess that depends. Do you care more about code readability/maintainability or minimizing memory usage. The second method would use slightly less memory, but I would honestly prefere the first usage, as assigned by reference in foreach definition does not seem to be commonplace practice in PHP.
Personally if I wanted to modify an array in place like this I would go with a third option:
array_walk($array, function(&$value) {
$value++;
});
The first method will be insignificantly slower, because each time it will go through the loop, it will assign a new value to the $number variable. The second method uses the variable directly so it doesn't need to assign a new value for each loop.
But, as I said, the difference is not significant, the main thing to consider is readability.
In my opinion, the first method makes more sense when you don't need to modify the value in the loop, the $number variable would only be read.
The second method makes more sense when you need to modify the $number variable often, as you don't need to repeat the key each time you want to modify it, and it is more readable.
Have you considered array_map? It is designed to change values inside arrays.
$array = array(1,2,3,4,5);
$new = array_map(function($number){
return $number++ ;
}, $array) ;
var_dump($new) ;
I'd choose #2, but it's a personal preference.
I disagree with the other answers, using references to array items in foreach loops is quite common, but it depends on the framework you're using. As always, try to follow existing coding conventions in your project or framework.
I also disagree with the other answers that suggest array_map or array_walk. These introduce the overhead of a function call for each array element. For small arrays, this won't be significant, but for large arrays, this will add a significant overhead for such a simple function. However, they are appropriate if you're performing more significant calculations or actions - you'll need to decide which to use depending on the scenario, perhaps by benchmarking.
Most of the answers interpreted your question to be about performance.
This is not what you asked. What you asked is:
I wonder which method is the better programming practice?
As you said, both do the same thing. Both work. In the end, better is often a matter of opinion.
Or is this one of those it doesn't really matter things?
I wouldn't go so far as to say it doesn't matter. As you can see there can be performance considerations for Method 1 and reference gotchas for Method 2.
I can say what matters more is readability and consistency. While there are dozens of ways to increment array elements in PHP, some look like line noise or code golf.
Ensuring your code is readable to future developers and you consistently apply your method of solving problems is a far better macro programming practice than whatever micro differences exist in this foreach code.

Exploding an array within a foreach loop parameter

foreach(explode(',' $foo) as $bar) { ... }
vs
$test = explode(',' $foo);
foreach($test as $bar) { ... }
In the first example, does it explode the $foo string for each iteration or does PHP keep it in memory exploded in its own temporary variable? From an efficiency point of view, does it make sense to create the extra variable $test or are both pretty much equal?
I could make an educated guess, but let's try it out!
I figured there were three main ways to approach this.
explode and assign before entering the loop
explode within the loop, no assignment
string tokenize
My hypotheses:
probably consume more memory due to assignment
probably identical to #1 or #3, not sure which
probably both quicker and much smaller memory footprint
Approach
Here's my test script:
<?php
ini_set('memory_limit', '1024M');
$listStr = 'text';
$listStr .= str_repeat(',text', 9999999);
$timeStart = microtime(true);
/*****
* {INSERT LOOP HERE}
*/
$timeEnd = microtime(true);
$timeElapsed = $timeEnd - $timeStart;
printf("Memory used: %s kB\n", memory_get_peak_usage()/1024);
printf("Total time: %s s\n", $timeElapsed);
And here are the three versions:
1)
// explode separately
$arr = explode(',', $listStr);
foreach ($arr as $val) {}
2)
// explode inline-ly
foreach (explode(',', $listStr) as $val) {}
3)
// tokenize
$tok = strtok($listStr, ',');
while ($tok = strtok(',')) {}
Results
Conclusions
Looks like some assumptions were disproven. Don't you love science? :-)
In the big picture, any of these methods is sufficiently fast for a list of "reasonable size" (few hundred or few thousand).
If you're iterating over something huge, time difference is relatively minor but memory usage could be different by an order of magnitude!
When you explode() inline without pre-assignment, it's a fair bit slower for some reason.
Surprisingly, tokenizing is a bit slower than explicitly iterating a declared array. Working on such a small scale, I believe that's due to the call stack overhead of making a function call to strtok() every iteration. More on this below.
In terms of number of function calls, explode()ing really tops tokenizing. O(1) vs O(n)
I added a bonus to the chart where I run method 1) with a function call in the loop. I used strlen($val), thinking it would be a relatively similar execution time. That's subject to debate, but I was only trying to make a general point. (I only ran strlen($val) and ignored its output. I did not assign it to anything, for an assignment would be an additional time-cost.)
// explode separately
$arr = explode(',', $listStr);
foreach ($arr as $val) {strlen($val);}
As you can see from the results table, it then becomes the slowest method of the three.
Final thought
This is interesting to know, but my suggestion is to do whatever you feel is most readable/maintainable. Only if you're really dealing with a significantly large dataset should you be worried about these micro-optimizations.
In the first case, PHP explodes it once and keeps it in memory.
The impact of creating a different variable or the other way would be negligible. PHP Interpreter would need to maintain a pointer to a location of next item whether they are user defined or not.
From the point of memory it will not make a difference, because PHP uses the copy on write concept.
Apart from that, I personally would opt for the first option - it's a line less, but not less readable (imho!).
Efficiency in what sense? Memory management, or processor? Processor wouldn't make a difference, for memory - you can always do $foo = explode(',', $foo)

Why is echo faster than print?

In PHP, why is echo faster than print?
They do the same thing... Why is one faster than the other?
Do they do exactly the same thing?
echo and print are virtually (not technically) the same thing. The (pretty much only) difference between the two is that print will return the integer 1, whereas echo returns nothing. Keep in mind that neither is actually a function, but rather language constructs. echo allows you to pass multiple strings when using it as if it were a function (e.g., echo($var1, $var2, $var3)).
echo can also be shorthanded by using the syntax <?= $var1; ?> (in place of <?php echo $var1; ?>).
As far as which is faster, there are many online resources that attempt to answer that question. PHP Benchmark concludes that "[i]n reality the echo and print functions serve the exact purpose and therefore in the backend the exact same code applies. The one small thing to notice is that when using a comma to separate items whilst using the echo function, items run slightly faster."
It will really come down to your preference, since the differences in speed (whatever they actually are) are negligible.
Print always returns 1, which is also probably why it's slower
Print has a return value, this is the only difference.
The speed difference (if any) is so miniscule that it's not worth the effort thinking about micro optimisations like this, and it's absolutely not worth updating any old code to switch prints to echos. There are much better ways to speed up your site, if this is your goal.
As my experience and knowledge, You are wrong. print is faster than echo in the loops autobahn and hypertexts.
Which is faster?
I'm implementing a test that shows the difference between print and echo.
$start = microtime(1);
for($i = 0; $i < 100000; $i++)
echo "Hello world!";
echo "echo time: " . round(microtime(1) - $start, 5);
$start = microtime(1);
for($i = 0; $i < 100000; $i++)
print "Hello world!";
echo "print time: " . round(microtime(1) - $start, 5);
result:
echo time: .09
print time: .04
Another reference is phpbench that shows this fact.
Comparision
Now its time to investigate that why print is faster than echo. When you are using loops, of course, php checks if echo has multiple values to print or not, but always print can take only one parameter and it's not needed to be checked in loops. also when there are multiple values for echo bad things come through, like converting them to string and streaming them, I do believe that in huge hypertexts these problem come through too because you are forcing php to process before printing. But in small jobs like printing, only a small string echo I good (if you consider concatenations) because it doesn't return anything like print.

How important is it to unset variables in PHP?

I am somewhat new to PHP, and I am wondering:
How important is it to unset variables in PHP?
I know in languages like C, we free the allocated memory to prevent leaks, etc. By using unset on variables when I am done with them, will this significantly increase performance of my applications?
Also, is there a benchmark anywhere that compares the difference between using unset and not using unset?
See this example (and the article I linked below the question):
$x = str_repeat('x', 80000);
echo memory_get_usage() . "<br>\n"; // 120172
echo memory_get_peak_usage() . "<br>\n"; // 121248
$x = str_repeat('x', 80000);
echo memory_get_usage() . "<br>\n"; // 120172
echo memory_get_peak_usage() . "<br>\n"; // 201284
As you can see, at one point PHP had used up almost double the memory. This is because before assigning the 'x'-string to $x, PHP builds the new string in memory, while holding the previous variable in memory, too. This could have been prevented with unsetting $x.
Another example:
for ($i=0; $i<3; $i++) {
$str = str_repeat("Hello", 10000);
echo memory_get_peak_usage(), PHP_EOL;
}
This will output something like
375696
425824
425824
At the first iteration $str is still empty before assignment. On the second iteration $str will hold the generated string though. When str_repeat is then called for the second time, it will not immediately overwrite $str, but first create the string that is to be assigned in memory. So you end up with $str and the value it should be assigned. Double memory. If you unset $str, this will not happen:
for($i=0;$i<3;$i++) {
$str = str_repeat("Hello", 10000);
echo memory_get_peak_usage(), PHP_EOL;
unset($str);
}
// outputs something like
375904
376016
376016
Does it matter? Well, the linked article sums it quite good with
This isn't critical, except when it is.
It doesn't hurt to unset your variables when you no longer need them. Maybe you are on a shared host and want to do some iterating over large datasets. If unsetting would prevent PHP from ending with Allowed memory size of XXXX bytes exhausted, then it's worth the tiny effort.
What should also be taken into account is, that even if the request lifetime is just a second, doubling the memory usage effectively halves the maximum amount of simultaneous requests that can be served. If you are nowhere close to the server's limit anyway, then who cares, but if you are, then a simple unset could save you the money for more RAM or an additional server.
There are many situations in which unset will not actually deallocate much of anything, so its use is generally quite pointless unless the logical flow of your code necessitates its non-existence.
Applications being written in languages like C usually run for many hours. But usual php application's runtime is just about 0.05 sec. So, it is much less important to use unset.
It depends, you should make sure you unsetted very long strings (such as a blog post's content selected from a db).
<?php
$post = get_blog_post();
echo $post;
unset($post);
?>

Categories