Best way to count coincidences in an Array in PHP

Best way to count coincidences in an Array in PHP - php

I have an array of thousands of rows and I want to know what is the best way or the best prectices to count the number of rows in PHP that have a coincidence on it.
In the example you can see that I can find the number of records that match with a range.
I´m thinking in this 2 options:
Option 1:
$rowCount = 0;
foreach ($this->data as $row) {
if ($row['score'] >= $rangeStart && $row['score'] <= $rangeEnd) {
$rowCount++;
}
}
return $rowCount;
Option 2:
$countScoreRange = array_filter($this->data, function($result) {
return $result['score'] >= $this->rangeStart && $result['score'] <= $this->rangeEnd;
});
return count($countScoreRange);
Thanks in advance.

it depends on what you mean when are you speaking about best practices?
if your idea about best practice is about performance, it can say that there is one tradeoff that you must care about"
**speed <===> memory**
if you need performance about memory :
in this way : when you think about performance in iterating an iterable object in PHP, you can Use YIELD to create a generator function , from PHP doc :
what is generator function ?
A generator function looks just like a normal function, except that instead of returning a value, a generator yields as many values as it needs to. Any function containing yield is a generator function.
why we must use generator function ?
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate.
so it's better to dedicate a bit of memory instead of reserving an array of thousands
Unfortunately:
Unfortunately, PHP also does not allow you to use the array traversal functions on generators, including array_filter, array_map, etc.
so till here you know if :
you are iterating an array
of thousands element
especially if you use it in a function and the function runs many where.
and you care about performance specially in memory usage.
it's highly recommended to use generator functions instead.
** but if you need performance about speed :**
the comparison will be about 4 things :
for
foreach
array_* functions
array functions (just like nexT() , reset() and etc.)
The 'foreach' is slow in comparison to the 'for' loop. The foreach copies the array over which the iteration needs to be performed.
but you can do some tricks if you want to use it :
For improved performance, the concept of references needs to be used. In addition to this, ‘foreach’ is easy to use.
what about foreach and array-filter ?
as the previous answer said, also based on this article , also this : :
it is incorrect when we thought that array_* functions are faster!
Of course, if you work on the critical system you should consider this advice. Array functions are a little bit slower than basic loops but I think that there is no significant impact on performance.

Using foreach is much faster, and incrementing your count is also faster than using count().
I once tested both performance and foreach was about 3x faster than array_filter.
I'll go with the first option.

Related

Optimize laravel project with extracting methods

I am working on a ongoing laravel project which uses laravel 5.1. There is a process that takes over 8 seconds. within that process, there is a method that has almost 250 lines of code.In optimization perspective, if I extracted that code into several methods, will it be faster than usual or will it take much time?

Breaking a big function into multiple small functions is not going to be faster instead it will become slower because of multiple function calls.
I would recommend to change your approach, use queues for async processes, etc.

With PHP there are usually a few things you can do to speed up code:
Use references where possible.
By default, whenever you call a function with parameters, a copy of the variables is made. If a function that takes arrays or strings as parameters is called frequently, this can be very expensive. So instead of doing things like:
function f($array) {
Use:
function f(&$array) {
This implies any modification to $array will affect the original array too, so it should be used only in situations where either $array isn't modified or you don't care that it is modified.
The same is good for foreach loops. Just make sure to unset() the variable after the loop is done to avoid problems.
foreach ($set as &$element) { ... }
unset($element);
Another trick is to previously allocate buffers instead of letting them expand automatically. For example, instead of:
$numbers = [1, 2, 3, 4, ... 10000000];
$inverted = [];
foreach ($numbers as $num) { $inverted[] = 1 / $num; }
You make sure $inverted already has the size required to fit all the elements:
$numbers = [1, 2, 3, 4, ... 10000000];
$inverted = array_fill(0, count($numbers), 0);
foreach ($numbers as $i => $num) { $inverted[$i] = 1 / $num; }
Likewise, avoid resizing strings and arrays whenever possible. Work with indexes etc.
Regular expressions, while convenient, are the enemy of performance. Avoid using them for simple tasks.
And all that of course considering the problem is with the PHP code. Maybe it isn't. If you are using a database, you might want to reassess how you perform your SQL queries. The way you structure them can impact performance severely.

How can I quickly delete a value of less than two characters from a large array?

I want delete values of less than two characters from a my large array which have 9436065 string values. I deleted with preg_grep() using this code:
function delLess($array, $less)
{
return preg_grep('~\A[^qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM]{'.$less.',}\z~u', $array);
}
$words = array("ӯ","ӯро","ӯт","ғариб","афтода","даст", "ра");
echo "<pre>";
print_r(delLess($words,2));
echo "</pre>";
But it works slower. Is it possible to optimize this code?

I would go for array_filter function, performance should be better.
function filter($var)
{
return strlen($var) > 2;
}
$newArray = array_filter($array, "filter"));

given the size of the dataset, I'd use a database, so it would probably look like this:
delete from table where length(field) <= 2
maybe something like sqlite?

You could try using the strlen function instead of regular expressions and see if that is faster. (Or mb_strlen for multibyte characters.)
$newArr = array();
foreach($words as $val)
if(strlen($val) > 2)
$newArr[] = $val;
echo "<pre>";
print_r($newArr);
echo "</pre>";

Any work on 10 million strings will take time. In my opinion, this kind of operation is a one timer, so it does not really matter if it is not instantaneous.
Where are the strings coming from? You certainly got them from a database, if so, do the work on the database it will be faster and at least you will not be polluted with them ever. This kind of operation will be faster on a database than PHP, but could still take time.
Again, if it is stored in a database, it has not got there magically... So you could also make sure that no new unwanted entry gets in it, that way you make sure this operation will not need to be redone.
I am aware that this absolutely does not answer your question at all, because we should stick to PHP and you got the best way to do it... Optimizing such a simple function would cost a lot of time and wouldn't bring much if any optimization... The only other suggestion I could make is use another tool, if not database-based, file-based like sed, awk or anything that reads/writes to files... You'd have one string per line and parse the file reducing its size accordingly, but writing the file from PHP, exec the script and load the file back in PHP would make things too complicated for nothing...

mysqli_fetch_array() or mysqli_fetch_all()?

I see comparisons between mysqli_fetch_array() and mysqli_fetch_all() that say that with mysqli_fetch_all() it will take more memory and I will have to iterate over the array.
mysqli_fetch_all() is one call to mysqli but mysqli_fetch_array() is one call per row, for example 100 calls.
I don't know how the mysqli processing works: is calling mysqli_fetch_array() really more efficient when you also take the number of calls into account?
(I already understand that the returned data can be associative arrays or not)

It has nothing to do with whatever efficiency. It's all about usability only.
fetch_all() is a thing that is called a "syntax sugar" - a shorthand to automate a frequently performed operation. It can be easily implemented as a userland function:
function mysqli_fetch_all ($resouce, $mode = MYSQLI_BOTH)
{
$ret = [];
while ($row = $resource->fetch_array($mode))
{
$ret[] = $row;
}
return $ret;
}
Thus you can tell use cases for these functions:
fetch_all() have to be used if you need an array, consists of all the returned rows, that will be used elsewhere.
fetch_assoc() in a loop have to be used if you're going to process all the rows one by one right in place.
As simple as that.
These functions bear different purpose and thus there is no point in comparing them.
Note that PDO is tenfold more sweet in terms of syntax sugar, as it's fetchAll() function can return data in dozens different formats

From PHP's page on mysql_fetch_all():
I tested using fetch_all versus while / fetch_array and:
fetch_all uses less memory (but not for so much).
In my case (test1 and test2): 147008,262848 bytes (fetch_all) versus 147112,262888 bytes (fetch_array & while).
So, about the memory, in both cases are the same.
However, about the performance:
My test takes :350ms (worst case) using fetch_all, while it takes 464ms (worst case) using fetch_array, or about 35% worst using fetch_array and a while cycle.
So, using fetch_all, for a normal code that returns a moderate amount of information is:
a. cleaner (a single line of code)
b. uses less memory (about 0.01% less)
c. faster.
php 5.6 32bits, windows 8.1 64bits

for loop best practice

As far as I know second and third expressions are executed every time in a for loop.
I always took for granted performance wise second option is recommended, can anyone confirm this?
1) for($i=0;$i<=dosomething();$i++) [...]
2)
$max = dosomething();
for($i=0;$i<=$max;$i++) [...]

You shouldn't call a function inside of a loop definition because that function will be executed every iteration. When you only have a small loop the effect is negligible, however if you have a loop of hundreds or thousands of iterations you'll definitely notice.
But even if you only have a small loop, it's just bad practice. So in a word: don't.

Unless your dosomething() function returns different values and it can be done in a single shot, it's better to use the second method.
$options = array(1,2,3,4,5);
$element_count = count($options);
Functions like count() that returns same value in multiple calls can be saved in a one variable and use it in your for loop.
If you are very strict for performance, use ++$i instead of $i++

The second method is always going to preform better, especially if there is substantial work to be done in doSomething(). If you are only doing tens of loops, and doSomething() is just returning a local variable, then it won't make a noticeable difference.

Yes I confirm you can search benchmarks if you want.
Though I don't know if it's true if it's only a getter to an object

Is it better call a function every time or store that value in a new variable?

I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)

TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.

In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.

If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.

Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.

Unless you are calling this function a million times your "performance boost" will be negligible.

I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.

I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Best way to count coincidences in an Array in PHP - php

Using foreach is much faster, and incrementing your count is also faster than using count(). I once tested both performance and foreach was about 3x faster than array_filter. I'll go with the first option.

Related

Optimize laravel project with extracting methods

How can I quickly delete a value of less than two characters from a large array?

mysqli_fetch_array() or mysqli_fetch_all()?

for loop best practice

Is it better call a function every time or store that value in a new variable?

Categories

Resources