PHP variable name and SQL table column name length - php

I have this really newbie question :)
Despite the the fact that
$lastInvoiceNumber
$lastInvNum
or:
last_invoice_number (int 10)
last_inv_num (int 10)
Save a bit of time to write. Do they have any benefits (even the slightest)
performance-wise?
Long vs short?
Is there any chance php and MySQL more importantly will consume
less memory if the query had a shorter table column name?
For example if I have to fetch 500 rows on a single query I imagine
the query would run 500 times and running
last_invoice_number 500 times
vs running
last_inv_num can save some memory or make things slightly faster.
Thanks.

No, there is really no noticeable difference in performance whatsoever, and you'll gain a huge improvement in readability by using descriptive variable names. Internally, these variables are referred to by memory addresses (to put it simply), not by their ASCII/Unicode names. The impact it may have on performance, in nearly any language, is so infinitesimal that it would never be noticed.
Edit:
I've added a benchmark. It shows that there is really no difference at all between using a single letter as a variable name and using a 17-character variable name. The single letter might even be a tiny bit slower. However, I do notice a slight consistent increase in time when using a 90-character variable name, but again, the difference is too small to ever notice for practical purposes. Here's the benchmark and output:
<?php
# To prevent any startup-costs from skewing results of the first test.
$start = microtime(true);
for ($i = 0; $i<1000; $i++)
{
$noop = null;
}
$end = microtime(true);
# Let's benchmark!
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$thisIsAReallyLongAndReallyDescriptiveVariableNameInFactItIsJustWayTooLongHonestlyWtf = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a long name took %f seconds.\n", ($end - $start));
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$thisIsABitTooLong = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a medium name took %f seconds.\n", ($end - $start));
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$t = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a short name took %f seconds.\n", ($end - $start));
Output:
$ php so-test.php
Using a long name took 0.148200 seconds.
Using a medium name took 0.142286 seconds.
Using a short name took 0.145952 seconds.
The same should be true for MySQL as well; I would almost guarantee it, but it's not as easy to benchmark. With MySQL, you will have far more overhead from the network and IO than anything to do with symbol naming in the code. Just as with PHP, internally, column names aren't just strings that are iterated over; data is stored in memory-efficient formats.

Related

Nested loops in PHP extremely slow

I have 6 nested loops in a PHP program, however, the calculation time for the script is extremely slow. I would like to ask if there is a better way of implementing the 6 loops and increasing computation time, even if it means switching to another language. The nature of the algorithm I'm implementing requires iteration, so I don't know how I can better implement it.
Here's the code.
<?php
$time1 = microtime(true);
$res = 16;
$imageres = 128;
for($x=0;$x<$imageres;++$x){
for($y=0;$y<$imageres;++$y){
$pixels[$x][$y]=1;
}};
$quantizermatrix = 1;
$scalingcoefficient = 1/($res/2);
for($currentimagex=0;$currentimagex<($res*($imageres/$res-1)+1);$currentimagex = $currentimagex +$res){
for($currentimagey=0;$currentimagey<($res*($imageres/$res-1)+1);$currentimagey = $currentimagey +$res){
for($u=0;$u<$res;++$u){
for($v=0;$v<$res;++$v){
for($x=0;$x<$res;++$x){
for($y=0;$y<$res;++$y){
if($u == 0) {$a = 1/(sqrt(2));} else{$a = 1;};
if($v == 0){$b = 1/(sqrt(2));}else{$b = 1;};
$xes[$y] = $pixels[$x+$currentimagex][$y+$currentimagey]*cos((M_PI/$res)*($x+0.5)*$u)*cos( M_PI/$res*($y+0.5)*$v);
}
$xes1[$x] = array_sum($xes);
}
$xes2= array_sum($xes1)*$scalingcoefficient*$a*$b;
$dctarray[$u+$currentimagex][$v+$currentimagey] = round($xes2/$quantizermatrix)*$quantizermatrix;
}}}};
foreach($dctarray as $dct){
foreach($dct as $dc){
echo $dc." ";
}
echo "<br>";}
$time2 = microtime(true);echo 'script execution time: ' . ($time2 - $time1);
?>
I've removed a large portion of the code that's irrelevant, since this is the section of the code that's problematic
Essentially the code iterates through every pixel in a PNG image and outputs a computed matrix (2d array). This code takes around 2 seconds for a 128x128 image. This makes this program impractical for normal images greater than 128x128
There is a function available in Imagick library
Imagick::exportImagePixels
Refer the below link it might help you out
http://www.php.net/manual/en/imagick.exportimagepixels.php

PHP rand vs mt_rand vs openssl_random_pseudo_bytes

I want to generate a random string and was doing some research and found the following link:
http://golearnphp.com/php-rand-vs-mt_rand-and-openssl_random_pseudo_bytes/
function generateRandom($length) {
$validCharacters = 'abcdefghijklmnopqrstuvwxyz0123456789';
$myKeeper = '';
for ($n = 1; $n < $length; $n++) {
$whichCharacter = rand(0, strlen($validCharacters) - 1);
$myKeeper .= $validCharacters{$whichCharacter};
}
return $myKeeper;
}
function generateRandomdMT($length) {
$validCharacters = 'abcdefghijklmnopqrstuvwxyz0123456789';
$myKeeper = '';
for ($n = 1; $n < $length; $n++) {
$whichCharacter = mt_rand(0, strlen($validCharacters) - 1);
$myKeeper .= $validCharacters{$whichCharacter};
}
return $myKeeper;
}
$start = microtime(true);
echo htmlentities(generateRandom(100000));
var_dump(microtime(true) - $start);
$start = microtime(true);
echo htmlentities(generateRandomdMT(100000));
var_dump(microtime(true) - $start);
$start = microtime(true);
echo htmlentities(substr(base64_encode(openssl_random_pseudo_bytes(100000)), 0, 100000));
var_dump(microtime(true) - $start);
In the post the writer is saying that openssl_random_pseudo_bytes is significant faster then the other two. Is this true? Is openssl_random_pseudo_bytes really that much faster? Is that the correct way to test the "fastness" of functions?
openssl_random_pseudo_bytes created to be crypto strong(check the second param). Rand is old rand function with small period of repeating. MT_Rand is better than rand but not supposed to be used by crypto systems.
I bet that the difference between execution time do not impact on your application.
Also. Those functions return different results. First two return string with 36 possible letters. And third one returns string with 64 possible symbols. Result of two first function is shorter than third one.
If you are making optimization to speed up your application first thing that you should to know: how to profile your code.
In the post the writer is saying that openssl_random_pseudo_bytes is significant faster then the other two. Is this true?
In normal situations mt_rand() is significantly faster than openssl_random_pseudo_bytes().
It's only slower in the test code you've posted because you are comparing apples and oranges. For rand() and mt_rand() you are using complex functions which build up a string one byte at a time, whereas for openssl_random_pseudo_bytes() you're using the raw binary stream it produces with base64_encode() which is going to be much faster.
If you could get a raw binary stream out of mt_rand() or rand(), or a sequence of numbers 0 to 63 from openssl_random_pseudo_bytes(), you could do an apples to apples comparison.
In my testing, I found mt_rand() about 4 times as fast as openssl_random_pseudo_bytes(4) when I used unpack('V', openssl_random_pseudo_bytes(4) & "\xff\xff\xff\x7f") in order to get an equivalent output to mt_rand(). However this is still technically an apples to oranges situation because I'm doing additional processing on one in order to match it to the other, just in the opposite direction to you.
The time you asked this question, there was a bug report here > https://bugs.php.net/bug.php?id=70014 (php 5.6.10) It seems to be fixed in new versions of PHP.
My experience using it has always been unnecessary, I prefer Mt_Rand() but if you are generating random values for encryption purposes like I am doing, then do not use it, you should use random_bytes() ref. https://www.php.net/manual/en/function.random-bytes.php

How to optimise a Exponential Moving Average algorithm in PHP?

I'm trying to retrieve the last EMA of a large dataset (15000+ values). It is a very resource-hungry algorithm since each value depends on the previous one. Here is my code :
$k = 2/($range+1);
for ($i; $i<$size_data; ++$i) {
$lastEMA = $lastEMA + $k * ($data[$i]-$lastEMA);
}
What I already did:
Isolate $k so it is not computed 10000+ times
Keep only the latest computed EMA, and not keep all of them in an array
use for() instead of foreach()
the $data[] array doesn't have keys; it's a basic array
This allowed me to reduced execution time from 2000ms to about 500ms for 15000 values!
What didn't work:
Use SplFixedArray(), this shaved only ~10ms executing 1,000,000 values
Use PHP_Trader extension, this returns an array containing all the EMAs instead of just the latest, and it's slower
Writing and running the same algorithm in C# and running it over 2,000,000 values takes only 13ms! So obviously, using a compiled, lower-level language seems to help ;P
Where should I go from here? The code will ultimately run on Ubuntu, so which language should I choose? Will PHP be able to call and pass such a huge argument to the script?
Clearly implementing with an extension gives you a significant boost.
Additionally the calculus can be improved as itself and that gain you can add in whichever language you choose.
It is easy to see that lastEMA can be calculated as follows:
$lastEMA = 0;
$k = 2/($range+1);
for ($i; $i<$size_data; ++$i) {
$lastEMA = (1-$k) * $lastEMA + $k * $data[$i];
}
This can be rewritten as follows in order to take out of the loop as most as possible:
$lastEMA = 0;
$k = 2/($range+1);
$k1m = 1 - $k;
for ($i; $i<$size_data; ++$i) {
$lastEMA = $k1m * $lastEMA + $data[$i];
}
$lastEMA = $lastEMA * $k;
To explain the extraction of the "$k" think that in the previous formulation is as if all the original raw data are multiplied by $k so practically you can instead multiply the end result.
Note that, rewritten in this way, you have 2 operations inside the loop instead of 3 (to be precise inside the loop there are also $i increment, $i comparison with $size_data and $lastEMA value assignation) so this way you can expect to achieve an additional speedup in the range between the 16% and 33%.
Further there are other improvements that can be considered at least in some circumstances:
Consider only last values
The first values are multiplied several times by $k1m = 1 - $k so their contribute may be little or even go under the floating point precision (or the acceptable error).
This idea is particularly helpful if you can do the assumption that older data are of the same order of magnitude as the newer because if you consider only the last $n values the error that you make is
$err = $EMA_of_discarded_data * (1-$k) ^ $n.
So if order of magnitude is broadly the same we can tell that the relative error done is
$rel_err = $err / $lastEMA = $EMA_of_discarded_data * (1-$k) ^ $n / $lastEMA
that is almost equal to simply (1-$k) ^ $n.
Under the assumption that "$lastEMA almost equal to $EMA_of_discarded_data":
Let's say that you can accept a relative error $rel_err
you can safely consider only the last $n values where (1 - $k)^$n < $rel_err.
Means that you can pre-calculate (before the loop) $n = log($rel_err) / log (1-$k) and compute all only considering the last $n values.
If the dataset is very big this can give a sensible speedup.
Consider that for 64 bit floating point numbers you have a relative precision (related to the mantissa) that is 2^-53 (about 1.1e-16 and only 2^-24 = 5.96e-8 for 32 bit floating point numbers) so you cannot obtain better than this relative error
so basically you should never have an advantage in calculating more than $n = log(1.1e-16) / log(1-$k) values.
to give an example if $range = 2000 then $n = log(1.1e-16) / log(1-2/2001) = 36'746.
I think that is interesting to know that extra calculations would go lost inside the roundings ==> it is useless ==> is better not to do.
now one example for the case where you can accept a relative error larger than floating point precision $rel_err = 1ppm = 1e-6 = 0.00001% = 6 significant decimal digits you have $n = log(1.1e-16) / log(1-2/2001) = 13'815
I think is quite a little number compared to your last samples numbers so in that cases the speedup could be evident (I'm assuming that $range = 2000 is meaningful or high for your application but thi I cannot know).
just other few numbers because I do not know what are your typical figures:
$rel_err = 1e-3; $range = 2000 => $n = 6'907
$rel_err = 1e-3; $range = 200 => $n = 691
$rel_err = 1e-3; $range = 20 => $n = 69
$rel_err = 1e-6; $range = 2000 => $n = 13'815
$rel_err = 1e-6; $range = 200 => $n = 1'381
$rel_err = 1e-6; $range = 20 => $n = 138
If the assumption "$lastEMA almost equal to $EMA_of_discarded_data" cannot be taken things are less easy but since the advantage cam be significant it can be meaningful to go on:
we need to re-consider the full formula: $rel_err = $EMA_of_discarded_data * (1-$k) ^ $n / $lastEMA
so $n = log($rel_err * $lastEMA / $EMA_of_discarded_data) / log (1-$k) = (log($rel_err) + log($lastEMA / $EMA_of_discarded_data)) / log (1-$k)
the central point is to calculate $lastEMA / $EMA_of_discarded_data (without actually calculating $lastEMA nor $EMA_of_discarded_data of course)
one case is when we know a-priori that for example $EMA_of_discarded_data / $lastEMA < M (for example M = 1000 or M = 1e6)
in that case $n < (log($rel_err/M)) / log (1-$k)
if you cannot give any M number
you have to find a good idea to over-estimate $EMA_of_discarded_data / $lastEMA
one quick way could be to take M = max(data) / min(data)
Parallelization
The calculation can be re-written in a form where it is a simple addition of independent terms:
$lastEMA = 0;
$k = 2/($range+1);
$k1m = 1 - $k;
for ($i; $i<$size_data; ++$i) {
$lastEMA += $k1m ^ ($size_data - 1 - $i) * $data[$i];
}
$lastEMA = $lastEMA * $k;
So if the implementing language supports parallelization the dataset can be divided in 4 (or 8 or n ...basically the number of CPU cores available) chunks and it can be computed the sum of terms on each chunk in parallel summing up the individual results at the end.
I do not go in detail with this since this reply is already terribly long and I think the concept is already expressed.
Building your own extension definitely improves performance. Here's a good tutorial from the Zend website.
Some performance figures: Hardware: Ubuntu 14.04, PHP 5.5.9, 1-core Intel CPU#3.3Ghz, 128MB RAM (it's a VPS).
Before (PHP only, 16,000 values) : 500ms
C Extension, 16,000 values : 0.3ms
C Extension (100,000 values) : 3.7ms
C Extension (500,000 values) : 28.0ms
But I'm memory limited at this point, using 70MB. I will fix that and update the numbers accordingly.

PHP reduce function calls by optimizer

Everyone knows that function calls in PHP hit the performance badly. This script demonstats the problem:
// Plain variable assignment.
$time = microtime(true);
$i = 100000;
while ($i--)
{
$x = 'a';
}
echo microtime(true) - $time."\n\n";
// 0.017973899841309
$time = microtime(true);
function f() { $a = "a"; return $a; }
$i = 100000;
while ($i--)
{
$x = f();
}
echo microtime(true) - $time."\n\n";
//0.18558096885681
By the way anonymous functions are the worst. thy are 10 times slower.
Is there a PHP-Script-Optimizer that reduces the amount of function calls and minifies the script?
There is also this post: Why are PHP function calls *so* expensive? related to this article
You only really call functions that you need at any given time, therefore no.
The thing you could do to optimize your code is using as minimal anonymous functions, reducing the amount of spaces (e.g. use a php minifier) and rename your functions to 1 letter names,
this would atleast tokenize your script which would allow for faster reading of the functions.
But in terms of optimalisation you are better off not doing so due to the readability being completely gone down the drain.

Performance of variable expansion vs. sprintf in PHP

Regarding performance, is there any difference between doing:
$message = "The request $request has $n errors";
and
$message = sprintf('The request %s has %d errors', $request, $n);
in PHP?
I would say that calling a function involves more stuff, but I do not know what's PHP doing behind the scenes to expand variables names.
Thanks!
It does not matter.
Any performance gain would be so minuscule that you would see it (as an improvement in the hundreths of seconds) only with 10000s or 100000s of iterations - if even then.
For specific numbers, see this benchmark. You can see it has to generate 1MB+ of data using 100,000 function calls to achieve a measurable difference in the hundreds of milliseconds. Hardly a real-life situation. Even the slowest method ("sprintf() with positional params") takes only 0.00456 milliseconds vs. 0.00282 milliseconds with the fastest. For any operation requiring 100,000 string output calls, you will have other factors (network traffic, for example) that will be an order of magniture slower than the 100ms you may be able to save by optimizing this.
Use whatever makes your code most readable and maintainable for you and others. To me personally, the sprintf() method is a neat idea - I have to think about starting to use that myself.
In all cases the second won't be faster, since you are supplying a double-quoted string, which have to be parsed for variables as well. If you are going for micro-optimization, the proper way is:
$message = sprintf('The request %s has %d errors', $request, $n);
Still, I believe the seconds is slower (as #Pekka pointed the difference actually do not matter), because of the overhead of a function call, parsing string, converting values, etc. But please, note, the 2 lines of code are not equivalent, since in the second case $n is converted to integer. if $n is "no error" then the first line will output:
The request $request has no error errors
While the second one will output:
The request $request has 0 errors
A performance analysis about "variable expansion vs. sprintf" was made here.
As #pekka says, "makes your code most readable and maintainable for you and others". When the performance gains are "low" (~ less than twice), ignore it.
Summarizing the benchmark: PHP is optimized for Double-quoted and Heredoc resolutions. Percentuals to respect of average time, to calculating a very long string using only,
double-quoted resolution: 75%
heredoc resolution: 82%
single-quote concatenation: 93%
sprintf formating: 117%
sprintf formating with indexed params: 133%
Note that only sprintf do some formating task (see benchmark's '%s%s%d%s%f%s'), and as #Darhazer shows, it do some difference on output. A better test is two benchmarks, one only comparing concatenation times ('%s' formatter), other including formatting process — for example '%3d%2.2f' and functional equivalents before expand variables into double-quotes... And more one benchmark combination using short template strings.
PROS and CONS
The main advantage of sprintf is, as showed by benchmarks, the very low-cost formatter (!). For generic templating I suggest the use of the vsprintf function.
The main advantages of doubled-quoted (and heredoc) are some performance; and some readability and maintainability of nominal placeholders, that grows with the number of parameters (after 1), when comparing with positional marks of sprintf.
The use of indexed placeholders are at the halfway of maintainability with sprintf.
NOTE: not use single-quote concatenation, only if really necessary. Remember that PHP enable secure syntax, like "Hello {$user}_my_brother!", and references like "Hello {$this->name}!".
I am surprised, but for PHP 7.* "$variables replacement" is the fastest approach:
$message = "The request {$request} has {$n} errors";
You can simply prove it yourself:
$request = "XYZ";
$n = "0";
$mtime = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
$message = "The request {$request} has {$n} errors";
}
$ctime = microtime(true);
echo '
"variable $replacement timing": '. ($ctime-$mtime);
$request = "XYZ";
$n = "0";
$mtime = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
$message = 'The request '.$request.' has '.$n.' errors';
}
$ctime = microtime(true);
echo '
"concatenation" . $timing: '. ($ctime-$mtime);
$request = "XYZ";
$n = "0";
$mtime = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
$message = sprintf('The request %s has %d errors', $request, $n);
}
$ctime = microtime(true);
echo '
sprintf("%s", $timing): '. ($ctime-$mtime);
The result for PHP 7.3.5:
"variable $replacement timing": 0.091434955596924
"concatenation" . $timing: 0.11175799369812
sprintf("%s", $timing): 0.17482495307922
Probably you already found recommendations like 'use sprintf instead of variables contained in double quotes, it’s about 10x faster.' What are some good PHP performance tips?
I see it was the truth but one day. Namely before the PHP 5.2.*
Here is a sample of how it was those days PHP 5.1.6:
"variable $replacement timing": 0.67681694030762
"concatenation" . $timing: 0.24738907814026
sprintf("%s", $timing): 0.61580610275269
For Injecting Multiple String variables into a String, the First one will be faster.
$message = "The request $request has $n errors";
And For a single injection, dot(.) concatenation will be faster.
$message = 'The request '.$request.' has 0 errors';
Do the iteration with a billion loop and find the difference.
For eg :
<?php
$request = "XYZ";
$n = "0";
$mtime = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
$message = "The request {$request} has {$n} errors";
}
$ctime = microtime(true);
echo ($ctime-$mtime);
?>
Ultimately the 1st is the fastest when considering the context of a single variable assignment which can be seen by looking at various benchmarks. Perhaps though, using the sprintf flavor of core PHP functions could allow for more extensible code and be better optimized for bytecode level caching mechanisms like opcache or apc. In other words, a particular sized application could use less code when utilizing the sprintf method. The less code you have to cache into RAM, the more RAM you have for other things or more scripts. However, this only matters if your scripts wouldn't properly fit into RAM using evaluation.

Categories