Fastest way to combine thousands of arrays uniquely in PHP - php

I am getting data from a database. Each result looks something like this
ASDF-1234-JKL-F1-STUFF
There are 50,000 results. Each one is being exploded
$exploded = explode('-',$dash_delimited_datum);
// $exploded = array('ASDF','1234','JKL','F1','STUFF');
I tried this:
$data = array();
while($row = mysql_fetch_array($result) ){
$i++;
if($i > 99999) {
break;
}
$data = array_merge($data,explode('-',$row[0]));
}
But I hit the server timeout of 5 minutes with it.
And this didn't work at all:
while($row = mysql_fetch_array($result) ){
$i++;
if($i > 99999) {
break;
}
$data_parts = explode('-',$row[0]);
foreach($data_parts as $value) {
$data = array_push(($data,$value);
}
}
Unexpectedly, this worked, taking "only" 9 seconds, but I wonder if I can make it even faster:
while($row = mysql_fetch_array($result) ){
$i++;
if($i > 99999) {
break;
}
$data = array_unique(array_merge($data,explode('-',$row[0])));
}
EDIT: I came up with an solution that I thought would be the best one, at 800ms
Note that I used a "closure" (the anonymous function) to remove numeric keys and I assumed it was a drag on speed. But actually, removing it caused the script to timeout at 30s.
$data=array();
while($row = mysql_fetch_array($result) ){
$i++;
if($i > 99999) {
break;
}
$data_parts = array_flip(array_filter(explode('-',$row[0]),
function($value) {
if(is_numeric($value)) {
return false;
} else return true;
}));
$data = array_merge($data,$data_parts);
}
$data = array_keys($data);
sort($data);
Conclusions:
Every fast answer used tricks involving the array keys, rather than values. And the difference between my best answer and the two very fast answers below seems to be their use of foreach inside the while loop to assign values directly to the main $data array. PHP function calls are supposedly expensive, and this example seems to prove that they really are. Both of the best answers gave me results in under 300 milliseconds. My best answer only worked fast when I filtered out numeric values, otherwise it ran into 30 second server timeout.
So, I guess if you are processing massive amounts of data, use constructs and not functions whenever you can.
Note about the (yes I know they're deprecated) mysql functions
One answer suggested that I use mysql_fetch_assoc instead of mysql_fetch_array. Actually, mysql_fetch_row is supposed to be "fastest", but the change made absolutely no difference in the speed of page loading with this data set (about 48,000 results). I also tried using mysql_result. The PHP docs say that it's slower when retrieving multiple rows, and it is a lot slower.
This took 6.27 seconds to load, versus about 0.27 seconds (270 milliseconds) for the similarly structured best answer.
$i=0;
while($data_parts = explode('-',mysql_result($result,$i,0)) ){
$i++;
if($i > 99999) {
break;
}
foreach($data_parts as $value) {
$data[$value] = 1;
}
}
$data = array_keys($data);

In order to speed the process, instead of using expensive functions to deal with arrays, using an associative array (hash) to ensure unique values should have the whole faster
$i = 0;
$hash = array();
while($row = mysql_fetch_array($result)) {
$i++;
if($i > 99999) {
break;
}
foreach (explode('-', $row[0]) as $s) {
$hash[ $s ] = 1;
}
}
This way, all strings are uniquely stored in an associative array (called hash).
The resulting array is $hash keys ( $data )
$data = array_keys( $hash );
print_r( $data );

How about this (I removed your counter, but you can add back in if its necessary):
$data = array();
$i = 0;
while($row = mysql_fetch_array($result) )
{
$data_parts = explode('-',$row[0]);
foreach($data_parts as $value)
{
if (!isset($data[$value]))
$data[$value] = $i++;
}
}
$data = array_flip($data);
I can't really benchmark on my computer, so if its slower than your implementations, let me know!

Try using mysql_fetch_assoc instead of mysql_fetch_array. mysql_fetch_array returns both numeric and associative indexes (effectively doubling the size of your array). In addition, try to use a little functions as possible within your while loop. For instance, if you iterate through 50,000 elements, and have 3 function calls in each iteration, thats 150,000 times the functions are called.
In addition, why not strip duplicates before even passing the result to the loop?
SELECT someField
FROM someTable
GROUP BY someField
HAVING COUNT(someField)>0
Once running that, run your loop
$data = array();
while($row = mysql_fetch_assoc($result) ){
$i++;
if($i > 99999) {
break;
}
$data[] = explode('-',$row[0]);
}

Related

In PHP need all unique combinations of multiple arrays

First, I have looked through dozens of similar questions/answers on this site an have not found any type of solution I could use or modify to use for this problem. There are a lot of things that come into play with this. I got close but it kept running out of memory or other resource so here I am.
I need to create a function that takes a set of arrays and returns all combinations with a specific set of rules. So to start I pull the arrays. There are 9 to start of all various lengths. 4 of the arrays are duplicate set of 4 other individual arrays. Heres how I pull the arrays to start, just so you have an idea of how the data looks.
$prodA1 = Products::where('type','a')->pluck('id')->toArray();
$prodA2 = Products::where('type','a')->pluck('id')->toArray();
$prodB1 = Products::where('type','b')->pluck('id')->toArray();
$prodB2 = Products::where('type','b')->pluck('id')->toArray();
$prodC1 = Products::where('type','c')->pluck('id')->toArray();
$prodC2 = Products::where('type','c')->pluck('id')->toArray();
$prodD1 = Products::where('type','d')->pluck('id')->toArray();
$prodD2 = Products::where('type','d')->pluck('id')->toArray();
$prodE = Products::where('type','e')->pluck('id')->toArray();
I need all set of arrays to be array('prodA1','prodA2','prodB1','prodB2','prodC1','prodC2','prodD1','prodD2','prodE') with the values being IDs bot the name of the array.
Each returned array must be unique
Each individual array can have as many as 96 ids but they all vary
I found the below function which if I add a take(5) to the above so it limits the output seems to produce at least all the combinations which I can then apply other rules to but it takes forever to run. This is the closest I have gotten though, there has to be a faster solution to this. Any help would be appreciated.
public function combinations($arrays, $i = 0) {
if (!isset($arrays[$i])) {
return array();
}
if ($i == count($arrays) - 1) {
return $arrays[$i];
}
// get combinations from subsequent arrays
$tmp = $this->combinations($arrays, $i + 1);
$result = array();
// concat each array from tmp with each element from $arrays[$i]
foreach ($arrays[$i] as $v) {
foreach ($tmp as $t) {
$result[] = is_array($t) ?
array_merge(array($v), $t) :
array($v, $t);
}
}
return $result;
}
EDIT:
The input wound be a set of 9 arrays with ids in them. For instance:
$prodA1 = array(1,2,3,4,5);
$prodA2 = array(1,2,3,4,5);
$prodB1 = array(6,7,8,9);
$prodB2 = array(6,7,8,9);
$prodC1 = array(10,11,12,13);
$prodC2 = array(10,11,12,13);
$prodD1 = array(14,15,16,17);
$prodD2 = array(14,15,16,17);
$prodE = array(18,19,20,21);
returns would look something like
array(1,2,6,7,10,11,14,15,18)
array(1,4,6,8,10,12,14,16,18)
so all the results would be unique, each one would contain 9 values made up of 2 prodA,2 prodB, 2 prodC, 2 prodD and 1 prodE
Hope that clears it up a little.

Basic PHP concepts (accessing/declaring array variables)

$result = array();
for ( $i = 10; $i < 101; $i = $i + 10 ){
$result[] = $i;
}
echo implode(", ", $result);
Hello...I'm new to PHP, and this really confused me, declaring a variable array even the code will work without it.
I've found this code here in the forum, regarding the removal of the comma in a for loop. I was wondering what variable is called when it is echoed? Is it the $result = array() or the $result[]? I've tried to remove the $result = array(); and the code still work, is that mean, is it ok to just remove the $result = array();? Does it have some coding issues if it is removed?
The line $result = array(); is used to declare an array.
It's a better approach to use this especially when you have some other previously specified variable $result storing some other value. Mentioning this first line will reset any previously assigned value to $result and declare it as an array datatype.
$result[] = $i; means the value of $i keeps apppending to $result every time.

More elegant way of looping through array and aggregating result in PHP

I need to loop through a set of data (example below) and generate an aggregate. Original data format is CSV (but could be other kind).
LOGON;QUERY;COUNT
L1;Q1;1
L1;Q1;2
L1;Q2;3
L2;Q2;1
I need to group the quantities by LOGON and QUERY, so at the end I would have an array like:
"L1-Q1" => 3,
"L1-Q2" => 3,
"L2-Q1" => 1,
I usually use a code like this:
$logon = NULL;
$query = NULL;
$count = 0;
$result = array();
// just imagine I get each line as a named array
foreach ($csvline as $line) {
if ($logon != $line['logon'] || $query != $line['query']) {
if ($logon !== NULL) {
$result[$logon . $query] = $count;
}
$logon = $line['logon'];
$query = $line['query'];
$count = 0;
}
$count += $line['count'];
}
$result[$logon . $query] = $count;
Sincerely, I don't think this is nice, as I have to repeat last statement to include last line. So, is there a more elegant way of solving this in PHP?
Thanks!
You simply would need to check for the existence of a key, then increment - create missing keys at any time with value 0.
Then you dont need to repeat anything at any time:
$result = array();
foreach ($csvline as $line) {
if (!isset($result[$line['logon'] . $line['query']])){
//create entry
$result[$line['logon'] . $line['query']] = 0;
}
//increment, no matter what we encounter
$result[$line['logon'] . $line['query']] += $line['count'];
}
For readability and to avoid misstakes, you should generate the key just one time, instead of performing the same concatenation over and over:
foreach ($csvline as $line) {
$curKey = $line['logon'] . $line['query'];
if (!isset($result[$curKey])){
//create entry
$result[$curKey] = 0;
}
//increment, no matter what we encounter
$result[$curKey] += $line['count'];
}
this would allow you to refactor the key without touching several lines of code.

PHP in_array() horrible performance. Fatest way to search array for value

I have the following simple code to test against collision on a primary key I am creating:
$machine_ids = array();
for($i = 0; $i < 100000; $i++) {
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
}
die("Success!");
Any idea why this is taking many minutes to run? Anyway to speed it up?
for($i = 0; $i < 100000; $i++)
{
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if (isset($machine_ids[$mid]))
{
die("Collision!");
}
$machine_ids[$mid] = true;
}
For this, use $mid as keys, and dummy value as value. Specifically, instead of
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
use
if(isset($machine_ids[$mid])) {
die("Collision!");
} else {
$machine_ids[$mid] = 1;
}
At the end you can extract the array you originally wanted with array_keys($machine_ids).
This should be much faster. If it is still slow, then your Functions::generate_machine_id() is slow.
EDITED to add isset as per comments.
Checking for array membership is a O(n) operation, since you have to compare the value to every element in the array. After you add a whole bunch of stuff to the array, naturally it gets slower.
If you need to do a whole bunch of membership tests, as is the case here, you should use a different data structure that supports O(1) membership tests, such as a hash.
Refactor your code so that it uses a associated array to hold the machine IDs and use isset to check
if( isset($machine_id[$mid]) ) die("Collision");
$machine_ids[$mid] = $mid;
Using isset should be faster
If you need best performance for your case, you need store your data as array key and
use isset or array_key_exists(since php >= 7.4 array_key_exists is now as fast as isset) instead in_array.
Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind
that converting an array of values, ["foo", "bar", "baz"], to a hash
map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory
cost (as well as potentially constructing the hash map, depending on
how and when you do it). As with all things, you'll have to weigh the
pros & cons for each case to determine if a hash map or array (list)
of values works best for your needs. This isn't specific to PHP but
more of a general problem space of computer science.
And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a
<?php declare(strict_types = 1);
function testPerformance($name, Closure $closure, $runs = 1000000)
{
$start = microtime(true);
for (; $runs > 0; $runs--)
{
$closure();
}
$end = microtime(true);
printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}
$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
$items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);
$assocItems = array_combine($items, array_fill(0, count($items), true));
$in_array = function () use ($items) {
in_array(1111111, $items);
};
$isset = function () use ($assocItems) {
isset($items[1111111]);
};
$array_key_exists = function () use ($assocItems) {
array_key_exists(1111111, $assocItems);
};
testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);
Output:
Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds

PHP for loop will affect the page load speed?

$categories = array("google","adobe","microsoft","exoot","yahoo");
$sql='google,exoot,adobe';//from mysql_query
$categs = explode(",",$sql);
for($x=0;$x<count($categs);$x++){
for($y=0;$y<count($categories);$y++){
if($categs[$x] == $categories[$y]){
$str .= $y.",";
}
}
}
echo str; // 0,3,1,
Will this code will affect page render time? Can I do it using any other fast methods?
Thanks in advance.
$str = implode(',', array_keys(array_intersect($categories, $categs)));
You can use array_intersect() to find the common items and then use implode() to construct a comma-separated list:
Str = implode(',', array_intersect($categories, $categs)) . ',';
Unless you're dealing with a large number of items (thousands) it won't affect page speed. The one issue is that this intersection is O(n2). Putting the values into keys could speed it up considerably as that changes lookup time from O(n) to near O(1) making the whole operation O(n).
yes it will since you are looping in a loop.
Best thing is to check with in array:
$categories = array("google","adobe","microsoft","exoot","yahoo");
$sql='google,exoot,adobe';//from mysql_query
$categs = explode(",",$sql);
$str = array();
foreach($categs as $id => $categs_check)
{
if(in_array($categs_check, $categories))
{
//its better to put it into a array and explode it on a later point if you need it with comma.
$str[] = $id;
}
}
I'm not completely sure what you are trying to do but it should be something like the above
I don't think that str_replace is a faster method than all the array functions but another possible solution is:
$categories = array("google","adobe","microsoft","exoot","yahoo");
$sql='google,exoot,adobe';//from mysql_query
foreach($categories as $i=> $c) {
$sql = str_replace($c, $i, $sql);
}
$arrCategories = array("google","adobe","microsoft","exoot","yahoo");
$sql='google,exoot,adobe';//from mysql_query
$arrCategs = explode(",",$sql);
$arrAns = array();
for($i = 0, $intCnt = count($arrCategs); $i <= $intCnt; $i++) {
if(in_array($arrCategs[$i],$arrCategories)) {
$arrAns[$arrCategs[$i]] = array_search($arrCategs[$i], $arrCategories);
}
}
print "<pre>";
print_r($arrAns);
print "</pre>";

Categories