I have tried the memory usage of some simple variables and encounter unexpected results, please see this code:
$datetimes = [];
$memory_before = memory_get_usage();
for ($x = 0; $x < 1000; $x++) {
$datetimes[] = new \DateTime();
}
var_dump('DateTimes: ' . (memory_get_usage() - $memory_before));
$ints = [];
$memory_before = memory_get_usage();
for ($x = 0; $x < 1000; $x++) {
$ints[] = $x;
}
var_dump('Integers: ' . (memory_get_usage() - $memory_before));
I get this output (on PHP 7.4, 64bit):
string(17) "DateTimes: 350504"
string(15) "Integers: 37160"
37 KB memory for 1000 ints does not make sense to me, right? I'd expect 8000 byte plus some array overhead.
My experiment scales: for a million ints, I get 33558808 byte memory usage.
I have disabled xdebug.
It's how PHP works and the disadvantage of having dynamically-typed variables.
The integer is in reality a Zend object.
1000 x (64 * 2) = 128 Kbit so 16KB.
Add to that the array of size 1000.
In memory, zval is represented as two 64-bit words. The first word keeps the value — and the second word keeps the type, type_flags, extra, and reserved fields.
Related
I have two points (x1 and x2) and want to generate a normal distribution in a given step count. The sum of y values for the x values between x1 and x2 is 1. To the actual problem:
I'm fairly new to Python and wonder why the following code produces the desired result, but about 100x slower than the same program in PHP. There are about 2000 x1-x2 pairs and about 5 step values per pair.
I tried to compile with Cython, used multiprocessing but it just improved things 2x, which is still 50x slower than PHP. Any suggestions how to improve speed to match at least PHP performance?
from scipy.stats import norm
import numpy as np
import time
# Calculates normal distribution
def calculate_dist(x1, x2, steps, slope):
points = []
range = np.linspace(x1, x2, steps+2)
for x in range:
y = norm.pdf(x, x1+((x2-x1)/2), slope)
points.append([x, y])
sum = np.array(points).sum(axis=0)[1]
norm_points = []
for point in points:
norm_points.append([point[0], point[1]/sum])
return norm_points
start = time.time()
for i in range(0, 2000):
for j in range(10, 15):
calculate_dist(0, 1, j, 0.15)
print(time.time() - start) # Around 15 seconds or so
Edit, PHP Code:
$start = microtime(true);
for ($i = 0; $i<2000; $i++) {
for ($j = 10; $j<15; $j++) {
$x1 = 0; $x2 = 1; $steps = $j; $slope = 0.15;
$step = abs($x2-$x1) / ($steps + 1);
$points = [];
for ($x = $x1; $x <= $x2 + 0.000001; $x += $step) {
$y = stats_dens_normal($x, $x1 + (($x2 - $x1) / 2), $slope);
$points[] = [$x, $y];
}
$sum = 0;
foreach ($points as $point) {
$sum += $point[1];
}
$norm_points = [];
foreach ($points as &$point) {
array_push($norm_points, [$point[0], $point[1] / $sum]);
}
}
}
return microtime(true) - $start; # Around 0.1 seconds or so
Edit 2, profiled each line and found that norm.pdf() was taking 98% of time, so found a custom normpdf function and defined it, now time is around 0.67s which is considerably faster, but still around 10x slower than PHP. Also I think redefining common functions goes against the idea of Pythons simplicity?!
The custom function (source is some other Stackoverflow answer):
from math import sqrt, pi, exp
def normpdf(x, mu, sigma):
u = (x-mu)/abs(sigma)
y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
return y
The answer is, you aren't using the right tools/data structures for the tasks in python.
Calling numpy functionality has quite an overhead (scipy.stats.norm.pdf uses numpy under the hood) in python and thus one would never call this functions for one element but for the whole array (so called vectorized computation), that means instead of
for x in range:
y = norm.pdf(x, x1+((x2-x1)/2), slope)
ys.append(y)
one would rather use:
ys = norm.pdf(x,x1+((x2-x1)/2), slope)
calculating pdf for all elements in x and paying the overhead only once rather than len(x) times.
For example to calculate pdf for 10^4 elements takes less than 10 times more time than for one element:
%timeit norm.pdf(0) # 68.4 µs ± 1.62 µs
%timeit norm.pdf(np.zeros(10**4)) # 415 µs ± 12.4 µs
Using vectorized computation will not only make your program faster but often also shorter/easier to understand, for example:
def calculate_dist_vec(x1, x2, steps, slope):
x = np.linspace(x1, x2, steps+2)
y = norm.pdf(x, x1+((x2-x1)/2), slope)
ys = y/np.sum(y)
return x,ys
Using this vectorized version gives you a speed-up around 10.
The problem: norm.pdf is optimized for long vectors (nobody really cares how fast/slow it is for 10 elements if it is very fast for one million elements), but your test is biased against numpy, because it uses/creates only short arrays and thus norm.pdf cannot shine.
So if it is really about small arrays and you are serious about speeding it up you will have to roll out your own version of norm.pdf Using cython for creating this fast and specialized function might be worth a try.
I have this really newbie question :)
Despite the the fact that
$lastInvoiceNumber
$lastInvNum
or:
last_invoice_number (int 10)
last_inv_num (int 10)
Save a bit of time to write. Do they have any benefits (even the slightest)
performance-wise?
Long vs short?
Is there any chance php and MySQL more importantly will consume
less memory if the query had a shorter table column name?
For example if I have to fetch 500 rows on a single query I imagine
the query would run 500 times and running
last_invoice_number 500 times
vs running
last_inv_num can save some memory or make things slightly faster.
Thanks.
No, there is really no noticeable difference in performance whatsoever, and you'll gain a huge improvement in readability by using descriptive variable names. Internally, these variables are referred to by memory addresses (to put it simply), not by their ASCII/Unicode names. The impact it may have on performance, in nearly any language, is so infinitesimal that it would never be noticed.
Edit:
I've added a benchmark. It shows that there is really no difference at all between using a single letter as a variable name and using a 17-character variable name. The single letter might even be a tiny bit slower. However, I do notice a slight consistent increase in time when using a 90-character variable name, but again, the difference is too small to ever notice for practical purposes. Here's the benchmark and output:
<?php
# To prevent any startup-costs from skewing results of the first test.
$start = microtime(true);
for ($i = 0; $i<1000; $i++)
{
$noop = null;
}
$end = microtime(true);
# Let's benchmark!
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$thisIsAReallyLongAndReallyDescriptiveVariableNameInFactItIsJustWayTooLongHonestlyWtf = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a long name took %f seconds.\n", ($end - $start));
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$thisIsABitTooLong = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a medium name took %f seconds.\n", ($end - $start));
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$t = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a short name took %f seconds.\n", ($end - $start));
Output:
$ php so-test.php
Using a long name took 0.148200 seconds.
Using a medium name took 0.142286 seconds.
Using a short name took 0.145952 seconds.
The same should be true for MySQL as well; I would almost guarantee it, but it's not as easy to benchmark. With MySQL, you will have far more overhead from the network and IO than anything to do with symbol naming in the code. Just as with PHP, internally, column names aren't just strings that are iterated over; data is stored in memory-efficient formats.
I'm trying to retrieve the last EMA of a large dataset (15000+ values). It is a very resource-hungry algorithm since each value depends on the previous one. Here is my code :
$k = 2/($range+1);
for ($i; $i<$size_data; ++$i) {
$lastEMA = $lastEMA + $k * ($data[$i]-$lastEMA);
}
What I already did:
Isolate $k so it is not computed 10000+ times
Keep only the latest computed EMA, and not keep all of them in an array
use for() instead of foreach()
the $data[] array doesn't have keys; it's a basic array
This allowed me to reduced execution time from 2000ms to about 500ms for 15000 values!
What didn't work:
Use SplFixedArray(), this shaved only ~10ms executing 1,000,000 values
Use PHP_Trader extension, this returns an array containing all the EMAs instead of just the latest, and it's slower
Writing and running the same algorithm in C# and running it over 2,000,000 values takes only 13ms! So obviously, using a compiled, lower-level language seems to help ;P
Where should I go from here? The code will ultimately run on Ubuntu, so which language should I choose? Will PHP be able to call and pass such a huge argument to the script?
Clearly implementing with an extension gives you a significant boost.
Additionally the calculus can be improved as itself and that gain you can add in whichever language you choose.
It is easy to see that lastEMA can be calculated as follows:
$lastEMA = 0;
$k = 2/($range+1);
for ($i; $i<$size_data; ++$i) {
$lastEMA = (1-$k) * $lastEMA + $k * $data[$i];
}
This can be rewritten as follows in order to take out of the loop as most as possible:
$lastEMA = 0;
$k = 2/($range+1);
$k1m = 1 - $k;
for ($i; $i<$size_data; ++$i) {
$lastEMA = $k1m * $lastEMA + $data[$i];
}
$lastEMA = $lastEMA * $k;
To explain the extraction of the "$k" think that in the previous formulation is as if all the original raw data are multiplied by $k so practically you can instead multiply the end result.
Note that, rewritten in this way, you have 2 operations inside the loop instead of 3 (to be precise inside the loop there are also $i increment, $i comparison with $size_data and $lastEMA value assignation) so this way you can expect to achieve an additional speedup in the range between the 16% and 33%.
Further there are other improvements that can be considered at least in some circumstances:
Consider only last values
The first values are multiplied several times by $k1m = 1 - $k so their contribute may be little or even go under the floating point precision (or the acceptable error).
This idea is particularly helpful if you can do the assumption that older data are of the same order of magnitude as the newer because if you consider only the last $n values the error that you make is
$err = $EMA_of_discarded_data * (1-$k) ^ $n.
So if order of magnitude is broadly the same we can tell that the relative error done is
$rel_err = $err / $lastEMA = $EMA_of_discarded_data * (1-$k) ^ $n / $lastEMA
that is almost equal to simply (1-$k) ^ $n.
Under the assumption that "$lastEMA almost equal to $EMA_of_discarded_data":
Let's say that you can accept a relative error $rel_err
you can safely consider only the last $n values where (1 - $k)^$n < $rel_err.
Means that you can pre-calculate (before the loop) $n = log($rel_err) / log (1-$k) and compute all only considering the last $n values.
If the dataset is very big this can give a sensible speedup.
Consider that for 64 bit floating point numbers you have a relative precision (related to the mantissa) that is 2^-53 (about 1.1e-16 and only 2^-24 = 5.96e-8 for 32 bit floating point numbers) so you cannot obtain better than this relative error
so basically you should never have an advantage in calculating more than $n = log(1.1e-16) / log(1-$k) values.
to give an example if $range = 2000 then $n = log(1.1e-16) / log(1-2/2001) = 36'746.
I think that is interesting to know that extra calculations would go lost inside the roundings ==> it is useless ==> is better not to do.
now one example for the case where you can accept a relative error larger than floating point precision $rel_err = 1ppm = 1e-6 = 0.00001% = 6 significant decimal digits you have $n = log(1.1e-16) / log(1-2/2001) = 13'815
I think is quite a little number compared to your last samples numbers so in that cases the speedup could be evident (I'm assuming that $range = 2000 is meaningful or high for your application but thi I cannot know).
just other few numbers because I do not know what are your typical figures:
$rel_err = 1e-3; $range = 2000 => $n = 6'907
$rel_err = 1e-3; $range = 200 => $n = 691
$rel_err = 1e-3; $range = 20 => $n = 69
$rel_err = 1e-6; $range = 2000 => $n = 13'815
$rel_err = 1e-6; $range = 200 => $n = 1'381
$rel_err = 1e-6; $range = 20 => $n = 138
If the assumption "$lastEMA almost equal to $EMA_of_discarded_data" cannot be taken things are less easy but since the advantage cam be significant it can be meaningful to go on:
we need to re-consider the full formula: $rel_err = $EMA_of_discarded_data * (1-$k) ^ $n / $lastEMA
so $n = log($rel_err * $lastEMA / $EMA_of_discarded_data) / log (1-$k) = (log($rel_err) + log($lastEMA / $EMA_of_discarded_data)) / log (1-$k)
the central point is to calculate $lastEMA / $EMA_of_discarded_data (without actually calculating $lastEMA nor $EMA_of_discarded_data of course)
one case is when we know a-priori that for example $EMA_of_discarded_data / $lastEMA < M (for example M = 1000 or M = 1e6)
in that case $n < (log($rel_err/M)) / log (1-$k)
if you cannot give any M number
you have to find a good idea to over-estimate $EMA_of_discarded_data / $lastEMA
one quick way could be to take M = max(data) / min(data)
Parallelization
The calculation can be re-written in a form where it is a simple addition of independent terms:
$lastEMA = 0;
$k = 2/($range+1);
$k1m = 1 - $k;
for ($i; $i<$size_data; ++$i) {
$lastEMA += $k1m ^ ($size_data - 1 - $i) * $data[$i];
}
$lastEMA = $lastEMA * $k;
So if the implementing language supports parallelization the dataset can be divided in 4 (or 8 or n ...basically the number of CPU cores available) chunks and it can be computed the sum of terms on each chunk in parallel summing up the individual results at the end.
I do not go in detail with this since this reply is already terribly long and I think the concept is already expressed.
Building your own extension definitely improves performance. Here's a good tutorial from the Zend website.
Some performance figures: Hardware: Ubuntu 14.04, PHP 5.5.9, 1-core Intel CPU#3.3Ghz, 128MB RAM (it's a VPS).
Before (PHP only, 16,000 values) : 500ms
C Extension, 16,000 values : 0.3ms
C Extension (100,000 values) : 3.7ms
C Extension (500,000 values) : 28.0ms
But I'm memory limited at this point, using 70MB. I will fix that and update the numbers accordingly.
It looks like in PHP it requires about 213 bytes to store one integer, is it true?
Okay, please take a look on the next code:
$N = 10000;
echo memory_get_usage()."\n";
$v = array();
for($i = 0; $i < $N; $i++) {
$v[] = $i;
}
echo memory_get_usage()."\n";
unset($v);
echo memory_get_usage()."\n";
Output is next:
641784
2773768
642056
So, the difference is 2773768 - 641784 = 2131984 byte, or 213 byte per integer.
why so much? 4 bytes is more than enough.
4 bytes is only enough if you simply store an integer value somewhere in memory, without making any allowance for the fact that it is a variable which needs a datatype identification, flags to indicate if there are any other references to that variable, the name of that variable, etc. all of which require additional memory.
PHP stores the value in a zval* so there's all the additional bytes used to store the zval details in addition to the actual value.
I want to solve a problem from Project Euler (BTW, problem 25), and I found a solution in Python:
fibonacci = 1
old1 = 0
old2 = 1
limit = 1000
i = 1
while len(str(fibonacci)) < limit:
fibonacci = old1 + old2
old1 = old2
old2 = fibonacci
i = i + 1
print(i)
It took 1.5 seconds to calculate.
I implemented the same in PHP, this is the code:
$fibonacci = 1;
$old1 = 0;
$old2 = 1;
$limit = 1000;
$i = 1;
while (strlen((string)$fibonacci) < $limit){
$fibonacci = $old1 + $old2;
$old1 = $old2;
$old2 = $fibonacci;
$i = $i + 1;
}
print($i);
And it took more than 30 minutes, and still calculating...
I know that Python is considered faster than PHP, but still it shouldn't be so big a difference. How to improve my PHP code to get the results faster, if there is a way to do it?
EDIT:
I edit this post based on comments below so first my solution was not going to work.
One solution can be instead of old while to put this one:
while (strlen(number_format($fibonacci, 0, '', '')) < $limit){ ... }
But again is a big speed issue.
So the final solution is using BCMath:
$fibonacci = '1';
$old1 = '0';
$old2 = '1';
$limit = 1000;
$i = 1;
while (strlen($fibonacci) < $limit){
$fibonacci = bcadd($old1, $old2);
$old1 = $old2;
$old2 = $fibonacci;
$i = $i + 1;
}
echo $fibonacci . "<br />";
print($i);
So you can get the results at the same speed as Python in PHP.
Definitely, the PHP is going into an infinite loop. There's no way it could be taking that long if there wasn't something wrong...
I don't think counting the digits of these numbers with strlen is going to work in PHP. PHP is dealing with the numbers in scientific notation, in lower precision than Python.
I added debugging echo statements to PHP, to print out $fibonacci and $i for each step.
A typical Python line looks like
fib is 7540113804746346429
i is 92
In PHP, that's
fib is 7.54011380475E+18
i is 92
To accomplish this in PHP, you'll probably need to use a higher precision math library.
Check out http://www.php.net/manual/en/book.bc.php - you can use the bcadd function to accomplish the addition, and it will work as it does in Python.
It isn't a speed issue, it's a logic problem in the while condition for termination.
It's probably not going to finish. When you convert the current value of $fibonacci to a string in your while test, it will be converted to scientific format and truncated to a limited set of decimal places (dependent on your precision setting) when you cast it to string. That number of digits will be a lot less than 1000, so the while termination condition won't ever be met.
The problem is, that you are working with big numbers. You should use BC Math Functions (php.net/bc). So your code can be:
$fibonacci = "1";
$old1 = "0";
$old2 = "1";
$limit = 1000;
$i = 1;
while (strlen($fibonacci) < $limit){
$fibonacci = bcadd($old1, $old2);
$old1 = $old2;
$old2 = $fibonacci;
$i = $i + 1;
}
print($i);
I have tried it and it takes about 0.095s.
Many of project Euler problems will have to handle big numbers.
PHP will make your big numbers look like 2.579234678963E+12 which is the Exponential representation of the number... It's obviously hard to work with.
So, for most of problems, it's best to go with BCMath Functions. This will keep your number as it is, even if it is a giant number.
Note that using echo bcmul(500,500); will never be as fast as echo 500*500. And, BCMath function return values are always strings.
To fix your problem replace all arithmetic operations with the corresponding BCMath function.
I optimized a bit the Python code. Using len(str()) to check the number of digits is very slow. Replaced by math.log10 run your program much faster
The first term in the Fibonacci sequence to contain 1000 digits is : 4782
Calculated in 0.008573 seconds
import time
from math import log10
def digits(n): # Return the number of digits for n>=1
return int(log10(n))+1
fibonacci = 1L # Thanks to Python to handle very big numbers
old1 = 0
old2 = 1
limit = 1000
i = 1
start = time.time() #Start timer for bench
while digits(fibonacci) < limit:
fibonacci = old1 + old2
old1 = old2
old2 = fibonacci
i += 1
print "The first term in the Fibonacci sequence to contain %s digits is : %s" % (str(limit), str(i))
print "Calculated in %3.6f seconds" % (time.time() - start)
Since the problem seems to be with converting to strings, here's a much faster way to do it that doesn't require it. This is essentially the same algorithm as you have posted (so I don't feel bad showing it to you) but demonstrates how to use division to test the length of an integer instead of converting it to a string.
def fibonacci_digits(limit):
limit = 10**limit
fib = 1
old1 = 0
old2 = 1
i = 1
size = 1
while size < limit:
fib = old1 + old2
if not size//fib: # // is pythons integer division operator, not a comment
size *= 10
old1 = old2
old2 = fib
i += 1
return i
print fibonacci_digits(1000)
Converting to a string is slow and is almost never the right thing to do. Here's the timeit results:
$ python -mtimeit -s'import fib' 'fib.fibonacci_digits(1000)'
10 loops, best of 3: 30.2 msec per loop
$ python -mtimeit -s'import fib' 'fib.fibonacci_digits2(1000)'
10 loops, best of 3: 1.41 sec per loop