PHP: Performance when looping huge datasets

PHP: Performance when looping huge datasets - php

I'm interested in knowing the best way (performance and best practice wise) to handle looping of huge datasets.
Lets say you have a result from a query with thousands of rows with a lot of data.
You need to loop all those and see if something matches the date string from another loop. If it matches -> Append it to the date array which will be the actual result set.
The same dataset needs to be looped very many times to get all the matches.
Is it better to:
Unset the dataset array key when match is found to shrink the array - resulting in a empty dataset at the very end?
Leave it unchanged and just unset it when all the looping is done?
Example code:
$dates = ['2022-01-01' => [], '2022-01-02' => []]; // Hundreds of dates
$dataset = ['manyThousandRowsOfData']; // Thousands of heavy rows
foreach($dates as $date => $dateData){
foreach($dataset as $key => $data){
if($data['date'] !== $date){
continue;
}
$dates[$date] = $data
unset($dataset[$key]); // Unset? Or leave it be?
break;
}
}

Related

Replace repeating value with zero in PHP string

The following is the code
<?php
$id ="202883-202882-202884-0";
$str = implode('-',array_unique(explode('-', $id)));
echo $str;
?>
The result is
202883-202882-202884-0
for $id ="202883-202882-202882-0";, result is 202883-202882-0
I would like to replace the duplicate value with zero, so that the result should be like 202883-202882-0-0, not just remove it.
and for $id ="202883-0-0-0";, result should be 202883-0-0-0. zero should not be replaced, repeating zeros are allowed.
How can I archive that?
More info:
I want to replace every duplicate numbers. Because this is for a product comparison website. There will be only maximum 4 numbers. each will be either a 6 digit number or single digit zero. all zero means no product was selected. one 6 digit number and 3 zero means, one product selected and 3 blank.
Each 6 digit number will collect data from database, I dont want to allow users to enter same number multiple times (will happen only if the number is add with the URL manually.).
Update: I understand that my question was not clear, may be my English is poor.
Here is more explanation, this function is for a smartphone comparison website.
The URL format is sitename.com/compare.html?id=202883-202882-202889-202888.
All three numbers are different smartphones(their database product ID).
I dont want to let users to type in the same product ID like id=202883-202882-202882-202888. It will not display two 202882 results in the website, but it will cause some small issues. The URL will be same without change, but the internal PHP code should consider it as id=202883-202882-202888-0.
The duplicates should be replaced as zero and added to the end.
There will be only 4 numbers separated by "-".
The following examples might clear the cloud!
if pid=202883-202882-202889-202888 the result should be 202883-202882-202889-202888
if pid=202883-202883-202883-202888 the result should be 202888-0-0-0
if pid=202883-202882-202883-202888 the result should be 202883-202882-202888-0
if pid=202882-202882-202882-202882 the result should be 202882-0-0-0
I want to allow only either 6 digit numbers or single digit zero through the string.
if pid=rgfsdg-fgsdfr4354-202883-0 the result should be 202883-0-0-0
if pid=fasdfasd-asdfads-adsfds-dasfad the result should be 0-0-0-0
if pid=4354-45882-445202882-202882 the result should be 202882-0-0-0
It is too complicated for me create, I know there are bright minds out there who can do it much more efficiently than I can.

You can do a array_unique (preserves key), then fill the gaps with 0. Sort by key and you are done :)
+ on arrays will unify the arrays but prioritizes the one on the left.
Code
$input = "0-1-1-3-1-1-3-5-0";
$array = explode('-', $input);
$result = array_unique($array) + array_fill(0, count($array), 0);
ksort($result);
var_dump(implode('-',$result));
Code (v2 - suggested by mickmackusa) - shorter and easier to understand
Fill an array of the size of the input array. And replace by leftover values from array_unique. No ksort needed. 0s will be replaced at the preserved keys of array_unique.
$input = "0-1-1-3-1-1-3-5-0";
$array = explode('-', $input);
$result = array_replace(array_fill(0, count($array), 0), array_unique($array));
var_export($result);
Working example.
Output
string(17) "0-1-0-3-0-0-0-5-0"
Working example.
references
ksort - sort by key
array_fill - generate an array filled with 0 of a certain length

This is another way to do it.
$id = "202883-202882-202882-0-234567-2-2-45435";
From the String you explode the string into an array based on the delimiter which in this case is '-'/
$id_array = explode('-', $id);
Then we can loop through the array and for every unique entry we find, we can store it in another array. Thus we are building an array as we search through the array.
$id_array_temp = [];
// Loop through the array
foreach ($id_array as $value) {
if ( in_array($value, $id_array_temp)) {
// If the entry exists, replace it with a 0
$id_array_temp[] = 0;
} else {
// If the entry does not exist, save the value so we can inspect it on the next loop.
$id_array_temp[] = $value;
}
}
At the end of this operation we will have an array of unique values with any duplicates replaced with a 0.
To recreate the string, we can use implode...
$str = implode('-', $id_array_temp);
echo $str;
Refactoring this, using a ternary to replace the If,else...
$id_array = explode('-', $id);
$id_array_temp = [];
foreach ($id_array as $value) {
$id_array_temp[] = in_array($value, $id_array_temp) ? 0 : $value;
}
$str = implode('-', $id_array_temp);
echo $str;
Output is
202883-202882-0-0-234567-2-0-45435

This appears to be a classic XY Problem.
The essential actions only need to be:
Separate the substrings in the hyphen delimited string.
Validate that the characters in each substring are in the correct format AND are unique to the set.
Only take meaningful action on qualifying value.
You see, there is no benefit to replacing/sanitizing anything when you only really need to validate the input data. Adding zeros to your input just creates more work later.
In short, you should use a direct approach similar to this flow:
if (!empty($_GET['id'])) {
$ids = array_unique(explode('-', $_GET['id']));
foreach ($ids as $id) {
if (ctype_digit($id) && strlen($id) === 6) {
// or: if (preg_match('~^\d{6}$~', $id)) {
takeYourNecessaryAction($id);
}
}
}

Compare two multidimensional arrays with different number of elements

I have two multidimensional arrays with different number of elements:
$complete = array(array("24","G:\TVShows\24"),array("Lost","G:\TVShows\Lost"),array("Game of Thrones","G:\TVShows\Game of Thrones"));
$subset = array(array("24","G:\TVShows\24","English"));
The first one ($complete) is the complete list of my tv shows on disk (name of the show, path to files). The second one ($subset) come from my database and include the spoken language as a third column / element.
I would like to return the shows that I have on disk but that do not appear in databse. How can I compare those to array with different number of elements?
Thank you for your help!

Since its a multi leveled array, you could use and combine array_map() and serialize()/unserialize(). Consider this example:
$complete = array(
array("24","G:\TVShows\24"),
array("Lost","G:\TVShows\Lost"),
array("Game of Thrones","G:\TVShows\Game of Thrones"),
array("The Walking Dead","G:\TVShows\The Walking Dead"),
array("Breaking Bad","G:\TVShows\Breaking Bad"),
array("Heroes","G:\TVShows\Heroes"),
);
$subset = array(
array("24","G:\TVShows\24","English"),
array("The Walking Dead","G:\TVShows\The Walking Dead","English"),
array("Heroes","G:\TVShows\Heroes","English"),
);
$shows_not_in_db = array();
// properly format the subsets for comparison on complete
foreach($subset as $key_s => $value_s) {
array_pop($value_s); // remove the last element "English"
$subset[$key_s] = serialize($value_s);
}
// serialize each complete arrays
$complete = array_map('serialize', $complete);
$shows_not_in_db = array_map('unserialize', array_diff($complete, $subset)); // array diff them, then unserialize
print_r($shows_not_in_db);
Sample Output
Edit: For case insensitive comparisons, you may use this alternative:
$shows_not_in_db = array_map('unserialize', array_udiff($complete, $subset, 'strcasecmp'));
// sample: The walking dead - The Walking Dead

Best to processing large arrays in PHP with my date wise case

I have a large array in PHP, having near around 168000 keys and values. There is date (Y-m-d) and hour in key and numeric value in value.
So value is just a numeric.
And key is in Y-m-d_H format.
Array looks like following:
$input = array('2008-01-01_00' => 123, '2008-01-01_01' => 456, ...... , '2012-09-22_16' => 789);
I need to find the total of last month, last year, current year, current month and etc.
Which is the best way to find it? Please suggest.

What about this:
$results=array();
foreach ($input as $k=>$v) {
$date=explode('_',$k);
$date=explode('-',$date[0]);
//Store year
$key=$date[0];
if (!isset($results[$key])) $results[$key]=0;
$results[$key]+=$v;
//Store month
$key.='-'.$date[1];
if (!isset($results[$key])) $results[$key]=0;
$results[$key]+=$v;
//Store day
$key.='-'.$date[2];
if (!isset($results[$key])) $results[$key]=0;
$results[$key]+=$v;
}
print_r($results);

Change the keys to ints, e.g.
'2008-01-01_00' -> 2008010100
sort your array with ksort()
Then you can use quicksearch to find something between 2008010100 and 2009010100.
Also, Fastest is to traverse the array once and calculate all the statistics you need.

Break it down into manageable execution units and thread them ...
http://pthreads.org/

Algorithm for unique elements in n number of arrays (or objects)

Programming Language I am using : PHP
I have 30 results from database, which can be ARRAY or OBJECTs
What I want is an algorithm to find COMMON values.
Ex :
$data[0] = array('effected_object_id'=>54,'data'=>array('some_data'));
$data[1] = array('effected_object_id'=>21,'data'=>array('some_data'));
$data[2] = array('effected_object_id'=>63,'data'=>array('some_data'));
$data[3] = array('effected_object_id'=>21,'data'=>array('some_data'));
$data[4] = array('effected_object_id'=>54,'data'=>array('some_data'));
$data[5] = array('effected_object_id'=>21,'data'=>array('some_data'));
...... 30 arrays
in above example, in effect_object_id column I have few common elements, like 21(3), 54(2)
I want to get these common elements.
Sorry if this is already asked and solved somewhere, I just googled and tried my head on it. but can't find a FAST SOLUTION, yes I don't need too much of loops here.
Thanks in advance. :)

$groups = array();
foreach ($data as $row) {
$groups[ $row['effected_object_id'] ][] = $row;
}

The easiest way is to just use an array as a hash table:
Basically, you could create an array formatted as:
array[k] = v
where k is the effected_object_id and v is an array of entries that have effective_object_id = k.
Like:
$vals = array(
21 => array('data1', 'data2'),
);
Like:
$indexedData = array();
foreach ($data as $d) {
$indexedData[$d['effected_object_id']][] = $d['data'];
}
And then to find repeated values you could just loop through and check count() > 1.
Depending on your exact needs you may want to structure this approach differently, but hopefully this conveys the basic idea.
It's worth noting that to find the duplicates, this is O(n). It would actually take <= 2n iterations to find the duplicates. n iterations are required to index the entries, and then another amount of iterations <= n is required to loop through the indexed array and look for the duplicates.

Get single column from 2d array [duplicate]

This question already has answers here:
Is there a function to extract a 'column' from an array in PHP?
(15 answers)
Closed 4 months ago.
This is my array
Array
(
[0] => Array
(
[sample_id] => 3
[time] => 2010-05-30 21:11:47
)
[1] => Array
(
[sample_id] => 2
[time] => 2010-05-30 21:11:47
)
[2] => Array
(
[sample_id] => 1
[time] => 2010-05-30 21:11:47
)
)
And I want to get all the sample_ids in one array. can someone please help ?
Can this be done without for loops (because arrays are very large).

$ids = array_map(function($el){return $el["sample_id"];}, $array);
Or in earlier versions:
function get_sample_id($el){return $el["sample_id"];}
$ids = array_map('get_sample_id', $array);
However, this is probably not going to be faster.

This is a problem I've had MANY times. There isn't an easy way to flatten arrays in PHP. You'll have to loop them adding them to another array. Failing that rethink how you're working with the data to use the original structure and not require the flatten.
EDIT: I thought I'd add a bit of metric information, I created an array $data = array(array('key' => value, 'value' => other_value), ...); where there were 150,000 elements in my array. I than ran the 3 typical ways of flattening
$start = microtime();
$values = array_map(function($ele){return $ele['key'];}, $data);
$end = microtime();
Produced a run time of: Run Time: 0.304405 Running 5 times averaged the time to just below 0.30
$start = microtime();
$values = array();
foreach ($data as $value) {
$values[] = $value['key'];
}
$end = microtime();
Produced a run time of Run Time: 0.167301 with an average of 0.165
$start = microtime();
$values = array();
for ($i = 0; $i < count($data); $i++) {
$values[] = $data[$i]['key'];
}
$end = microtime();
Produced a run time of Run Time: 0.353524 with an average of 0.355
In every case using a foreach on the data array was significantly faster. This is likely related to the overhead of the execution of a function for each element in the array for hte array_map() implementation.
Further Edit: I ran this testing with a predefined function. Below are the average numbers over 10 iterations for 'On the Fly' (defined inline) and 'Pre Defined' (string lookup).
Averages:
On the fly: 0.29714539051056
Pre Defined: 0.31916437149048

no array manitulation can be done without a loop.
if you can't see a loop, it doesn't mean it's absent.
I can make a function
array_summ($array) {
$ret=0;
foreach ($array as $value) $ret += $value;
return $ret;
}
and then call it array_summ($arr) without any visible loops. But don't be fooled by this. There is loop. Every php array function iterate array as well. You just don't see it.
So, the real solution you have to look for, is not the magic function but reducing these arrays.
From where it came? From database most likely.
Consider to make a database to do all the calculations.
It will save you much more time than any PHP bulit in function

I don't think you'll have any luck doing this without loops. If you really don't want to iterate the whole structure, I'd consider looking for a way to alter your circumstances...
Can you generate the sample_id data structure at the same time the larger array is created?
Do you really need an array of sample_id entries, or is that just a means to an end? Maybe there's a way to wrap the data in a class that uses a cache and a cursor to keep from iterating the whole thing when you only need certain pieces?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP: Performance when looping huge datasets - php

Related

Replace repeating value with zero in PHP string

Compare two multidimensional arrays with different number of elements

Best to processing large arrays in PHP with my date wise case

Algorithm for unique elements in n number of arrays (or objects)

Get single column from 2d array [duplicate]

Categories

Resources