PHP: Efficiently Search through Collections - php

I have collections of numbers (arbitrary order) to store.
psuedocode:
id_a:[3,5,7,11]
id_x:[3,5,10,21]
id_b:[12,24,25,26]
etc.
I need to be able to search through all the collections and return the group_IDs.
For example, if I look up 5, I should get back ['id_a','id_x']. I want to do this efficiently with some sort of mapping, not by looping through all numbers of all collections. I also want to be able to map directly to each key and get back the collection (e.g., 'id_x' returns [3,5,10,21]) ; again I prefer this be done efficiently without looping through the keys.
edit:
I could use the numbers as the keys and efficiently get back 'id_'. Or, I could go the other way and use 'id_' as keys and efficiently get back the array of numbers. However, I want to be able to go efficiently in both directions. I guess I could maintain two arrays, but that seems messy.

Your examples all show the array values in sorted order. If they are always in sorted order, then you can use a binary search to find known values. This code:
function binarySearch($needle, array $haystack) {
$high = count($haystack) - 1;
$low = 0;
$mid = false;
while ($high >= $low) {
$mid = ($high + $low) >> 1;
$t = $needle - $haystack[$mid];
if ($t < 0) {
$high = $mid - 1;
} elseif ($t > 0) {
$low = $mid + 1;
} else {
return $mid;
}
}
return $mid;
}
function searchArrays($needle) {
static $id_a = array(3,5,7,11);
static $id_x = array(3,5,10,21);
static $id_b = array(12,24,25,26);
static $arrayNames = array('id_a', 'id_x', 'id_b');
$rv = array();
foreach ($arrayNames as $arrayName) {
$array = $$arrayName;
$index = binarySearch($needle, $array);
if ($array[$index] == $needle) {
$rv[] = $arrayName;
}
}
return $rv;
}
$needles = range(3,8);
foreach ($needles as $needle) {
$result = searchArrays($needle);
printf("searchArrays(%s)=%s\n", $needle, join(', ', $result));
}
will output the following:
searchArrays(3)=id_a, id_x
searchArrays(4)=
searchArrays(5)=id_a, id_x
searchArrays(6)=
searchArrays(7)=id_a
searchArrays(8)=

Related

Algorithm for exclude combinations with unique value using php

The algoritham i'm trying different combinations of values will able to give my exact or approx output sum of values
I have attached image for the detail explanation , I have created column total as sum of each row value and finally I have sum all the total value, the whole total sum value is to be my expected output value.
So I'm trying to take a combination of each row sum and like to get total sum value
My algorithm i have searched in google below
function extractList($array, &$list, $temp = array()) {
if (count($temp) > 0 && ! in_array($temp, $list))
$list[] = $temp;
for($i = 0; $i < sizeof($array); $i ++) {
$copy = $array;
$elem = array_splice($copy, $i, 1);
if (sizeof($copy) > 0) {
$add = array_merge($temp, array($elem[0]));
sort($add);
extractList($copy, $list, $add);
} else {
$add = array_merge($temp, array($elem[0]));
sort($add);
if (! in_array($temp, $list)) {
$list[] = $add;
}
}
}
}
echo "<pre>";
$sum = 32 ; //SUM
$array = array(5.14327,5.72355,5.91794,4.8209,8.69933,4.12977,4.12977,2.92791,2.36829,2.21819,1.33759,1.72278,1.72278,0.589,1.06405,0.6387,0.6387,1.68995,2.51669,3.97842,2.38058,2.17175,4.88264,5.84811,6.14215);
$list = array();
# Extract All Unique Conbinations
extractList($array, $list);
#Filter By SUM = $sum
$list = array_filter($list,function($var) use ($sum) { return(array_sum($var) == $sum);});
#Return Output
print_r($list);
Attached Image here
You need to decide how you determine approximate equality. A percentage? Or an absolute amount? That's what you need in your filter lambda function.
// outside lambda
$error = $sum * 5 / 100;// 5%, or
$error = 0.02;// an absolute
...
// inside lambda
return abs(array_sum($var) - $sum) <= $error;

Object comparison and array sorting in PHP

I have a problem with object comparison in PHP. What seems like a straightforward code actually runs way too slow for my liking and as I am not that advanced in the language I would like some feedback and suggestions regarding the following code:
class TestTokenGroup {
private $tokens;
...
public static function create($tokens) {
$instance = new static();
$instance->tokens = $tokens;
...
return $instance;
}
public function getTokens() {
return $this->tokens;
}
public static function compare($tokenGroup1, $tokenGroup2) {
$i = 0;
$minLength = min(array(count($tokenGroup1->getTokens()), count($tokenGroup2->getTokens())));
$equalLengths = (count($tokenGroup1->getTokens()) == count($tokenGroup2->getTokens()));
$comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
while ($comparison == 0) {
$i++;
if (($i == $minLength) && ($equalLengths == true)) {
return 0;
}
$comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
}
$result = $comparison;
if ($result < 0)
return -1;
elseif ($result > 0)
return 1;
else
return 0;
}
...
}
In the code above $tokens is just a simple array of strings.
Using the method above through usort() for an array of TestTokenGroup consisting of around 40k objects takes ~2secs.
Is there a sensible way to speed that up? Where is the bottleneck here?
EDIT: Added the getTokens() method I initially forgot to include.
You know that objects are "pass by reference", and arrays are "pass by value"?
If getTokens() returns $this->tokens, the array is copied every time you invoke that method.
Try accessing $tokens directly via $tokenGroup1->tokens. You could also use references (&) although returning a reference doesn't work in all PHP versions.
Alternatively, make one copy only:
$tokens1 = $tokenGroup1->getTokens();
$tokens2 = $tokenGroup2->getTokens();
Even if each token group is relatively small, it will save at least 40000 * ( 6 + $average_token_group_length * 2) array copies.
UPDATE
I've benchmarked OP's code (removing the ... lines) using:
function gentokens() {
$ret = [];
for ( $i=0; $i< 3; $i++)
{
$str = "";
for ( $x = rand(0,3); $x < 10; $x ++ )
$str .= chr( rand(0,25) + ord('a') );
$ret[] = $str;
}
return $ret;
}
$start = microtime(true);
$array = []; // this will hold the TestTokenGroup instances
$dummy = ""; // this will hold the tokens, space-separated and newline-separated
$dummy2= []; // this will hold the space-concatenated strings
for ( $i=0; $i < 40000; $i++)
{
$array[] = TestTokenGroup::create( $t = gentokens() );
$dummy .= implode(' ', $t ) . "\n";
$dummy2[] = implode(' ', $t );
}
// write a test file to benchmark GNU sort:
file_put_contents("sort-data.txt", $dummy);
$inited = microtime(true);
printf("init: %f s\n", ($inited-$start));
usort( $array, [ 'TestTokenGroup', 'compare'] );
$sorted = microtime(true);
printf("sort: %f s\n", ($sorted-$inited));
usort( $dummy2, 'strcmp' );
$sorted2 = microtime(true);
printf("sort: %f s\n", ($sorted2-$sorted));
With the following results:
init: 0.359329 s // for generating 40000 * 3 random strings and setup
sort: 1.012096 s // for the TestTokenGroup::compare
sort: 0.120583 s // for the 'strcmp' compare
And, running time sort sort-data.txt > /dev/null yields
.052 u (user-time, in seconds).
optimisation 1: remove array copies
replacing ->getTokens() with ->tokens yields (I'll only list the TestTokenGroup::compare results):
sort: 0.832794 s
Optimisation 2: remove redundant array() in min
Changing the $minlength line to:
$minLength = min(count($tokenGroup1->tokens), count($tokenGroup2->tokens));
gives
sort: 0.779134 s
Optimisation 3: Only call count once for each tokenGroup
$count1 = count($tokenGroup1->tokens);
$count2 = count($tokenGroup2->tokens);
$minLength = min($count1, $count2);
$equalLengths = ($count1 == $count2);
gives
sort: 0.679649 s
Alternative approach
The fastest sort so far is strcmp( $stringarray, 'strcmp' ): 0.12s - still twice as slow as GNU sort, but the latter only does one thing, and does it well.
So, to sort the TokenGroups efficiently we need to construct sort key consisting of a simple string. We can use \0 as a delimiter for the tokens, and we don't have to worry about them being equal length, because as soon as one character is different, the compare aborts.
Here's the implementation:
$arr2 = [];
foreach ( $array as $o )
$arr2[ implode("\0", $o->getTokens() ) ] = $o;
$init2 = microtime(true);
printf("init2: %f s\n", ($init2-$sorted2));
uksort( $arr2, 'strcmp' );
$sorted3 = microtime(true);
printf("sort: %f s\n", ($sorted3-$init2));
and here the results:
init2: 0.125939 s
sort: 0.104717 s

Issue with custom script to sort Arrays ascending and descending order

I have an issue to deal with here (a logical error in my code 99%). I just can't seem to find the way to fix it, but I bet one of you will find the problem in no time!
I have to create a function which sorts array passed to it in asc or desc order, but can't use any array sorting functions !
I've been struggling with loops until now and I finally want to ask help from other devs ( you ).
Currently only code for ascending is worked on, descending will be no problem I assume once I do this one. It kinda of does sort values up to some point, but then stops ( it stops if the next smallest value is at the end of the passed array ). What could I do to prevent this and make it sort the whole array and it's elements?
Here is the code so far.
<?php
function order_array($array,$mode = 'ascending') {
$length = count($array);
if($mode == 'descending') {
return $array;
} else {
$sorted_array = array();
$used_indexes = array();
for($i = 0; $i < $length; $i++) {
$smallest = true;
echo $array[$i] . '<br/>';
for($y = 0; $y < $length; $y++) {
//echo $array[$i] . ' > ' . $array[$y] . '<br/>';
// if at ANY time during checking element vs other ones in his array, he is BIGGER than that element
// set smallest to false
if(!in_array($y,$used_indexes)) {
if($array[$i] > $array[$y]) {
$smallest = false;
break;
}
}
}
if($smallest) {
$sorted_array[] = $array[$i];
$used_indexes[] = $i;
}
}
return $sorted_array;
}
}
$array_to_sort = array(1, 3, 100, 99, 33, 20);
$sorted_array = order_array($array_to_sort);
print_r($sorted_array);
?>
I've solved the issue myself by doing it completely different. Now it sorts correctly all the elements of the passed in array. The logical issue I had was of using for() loop. The for() loop ran only a set ( length of passed array ) number of times, while we need it to loop more than that, because we will need to loop all the way untill we have a new sorted array in ascending order. Here is the code that will work
function order_array($array,$mode = 'ascending') {
if($mode == 'descending') {
// for() wont work here, since it will only loop an array length of times, when we would need it
// to loop more than that.
while(count($array)){
$value = MAX($array);
$key = array_search($value, $array);
if ($key !== false) {
unset($array[$key]);
}
$sorted[] = $value;
}
return $sorted;
} else {
// for() wont work here, since it will only loop an array length of times, when we would need it
// to loop more than that.
while(count($array)){
$value = MIN($array);
$key = array_search($value, $array);
if ($key !== false) {
unset($array[$key]);
}
$sorted[] = $value;
}
return $sorted;
}
}
function order_array($array,$mode = 'ascending') {
$length = count($array);
$sorted_array = array();
$used_indexes = array();
for($i = 0; $i < $length; $i++) {
$smallest = true;
echo $array[$i] . '<br/>';
for($y = 0; $y < $length; $y++) {
//echo $array[$i] . ' > ' . $array[$y] . '<br/>';
// if at ANY time during checking element vs other ones in his array, he is BIGGER than that element
// set smallest to false
if(!in_array($y,$used_indexes)) {
if($array[$i] > $array[$y]) {
$smallest = false;
break;
}
}
}
if($smallest) {
$sorted_array[] = $array[$i];
$used_indexes[] = $i;
}
if($mode == 'descending') {
return array_reverse($sorted_array);
}
return $sorted_array;
}
}

Php consolidate number (zip codes)

I have a php array with zipcodes returned from a db query. Zip Codes are in German format so 5 digit long.
Example:
array ('90475', '90419', '90425', '90415', '90429', '90479', '90485');
I would like to consolidate the values to "ranges" with placeholder, like:
array ('90...', '904..', '9041.', '9042', '9047.');
//90485 is left out because there is only 1 match.
Edit / logic:
This is for an autosuggestion search. Trying to build a tree so users can search for entries that match any zipcode starting with 90 or 904, etc.. For the autocomplete to make sense I only want to provide the "9041." value if there is a minimum of two entries matching (90419 and 90415 in example). The zipcodes are always 5 digit long from 00000 - 99999.
Highly appreciate any help.
Thanks.
Here you are:
$length = 5;
$zip = array('90475', '90419', '90425', '90415', '90429', '90479', '90485');
$result = array();
for ($i = 2; $i <= $length - 1; $i++) {
$pass = array();
foreach ($zip as $val) {
$pass[substr($val, 0, $i)]++;
}
foreach ($pass as $key => $val) {
if ($val > 1) {
$result[] = $key.str_repeat('.', $length - $i);
}
}
}
sort($result);
var_dump($result);
This will return in $result an array:
array ('90...', '904..', '9041.', '9042', '9047.');
Every range, which is used only once will be ignored and not returned in $result array.
$myArray = array ('90475', '90419', '90425', '90415', '90429', '90479', '90485');
$consolidate = array();
foreach($myArray as $zip) {
for ($c = 2; $c < 5; ++$c) {
$key = substr($zip, 0, $c) . str_repeat('.',5 - $c);
$consolidate[$key] = (isset($consolidate[$key])) ? $consolidate[$key] + 1 : 1;
}
}
$consolidate = array_filter(
$consolidate,
function ($value) {
return $value > 1;
}
);
var_dump($consolidate);

Sorting x and y coordinates in an array in PHP most efficently?

Currently I have an array that contains x and y coordinates of various positions.
ex.
$location[0]['x'] = 1; $location[0]['y'] = 1
This indicates id 0 has a position of (1,1).
Sometimes I want to sort this array by x, and other times by y.
Currently I am using array_multisort() to sort my data, but I feel this method is inefficient since every time before I sort, I must make a linear pass through the $location array just to build the index (on the x or y key) before I can invoke the array_multisort() command.
Does anyone know a better way to do this? Perhaps it is a bad idea even to store the data like this? Any suggestions would be great.
You could use usort() which lets you choose how your array elements are compared.
// sort by 'y'
usort($location, 'cmp_location_y');
// or sort by 'x'
usort($location, 'cmp_location_x');
// here are the comparison functions
function cmp_location_x($a, $b) {
return cmp_location($a, $b, 'x');
}
function cmp_location_y($a, $b) {
return cmp_location($a, $b, 'y');
}
function cmp_location($a, $b, $key) {
if ($a[$key] == $b[$key]) {
return 0;
} else if ($a[$key] < $b[$key]) {
return -1;
} else {
return 1;
}
}
You want to keep using multisort.
I made a quick benchmark of usort and array_multisort. Even at a count of only 10 multisort with building an index is faster than usort. At 100 elements it's about 5 times faster. At around 1000 elements improvement levels off right at a magnitude faster. User function calls are just too slow. I'm running 5.2.6
$count = 100;
for ($i = 0; $i < $count; $i++)
{
$temp = array('x' => rand(), 'y' => rand());
$data[] = $temp;
$data2[] = $temp;
}
function sortByX($a, $b) { return ($a['x'] > $b['x']); }
$start = microtime(true);
usort($data, "sortByX");
echo (microtime(true) - $start) * 1000000, "<br/>\n";
$start = microtime(true);
foreach ($data2 as $temp)
$s[] = $temp['x'];
array_multisort($s, SORT_NUMERIC, $data2);
echo (microtime(true) - $start) * 1000000, "<br/>\n";
PHP currently doesn't have an array_pluck function like ruby. Once it does you can replace this code
foreach ($data2 as $temp)
$s[] = $temp['x'];`
with
$s = array_pluck('x', $data2);
Something like what jcinacio said. With this class you can store and sort all sorts of data really, not just locations in different dimensions. You can implement other methods like remove etc as needed.
class Locations {
public $locations = array();
public $data = array();
public $dimensions = 2;
public function __construct($dimensions = null)
{
if (is_int($dimensions))
$this->dimensions = $dimensions;
}
public function addLocation()
{
$t = func_num_args();
if ($t !== $this->dimensions)
throw new Exception("This Locations object has {$this->dimensions} dimensions");
$args = func_get_args();
for ($i = 0; $i < $t; $i++)
$this->locations[$i][] = $args[$i];
return $this;
}
public function sortByDimension($dimension = 1)
{
if ($dimension > $this->dimensions)
throw new Exception("Wrong number of dimensions");
--$dimension;
$params[] = &$this->locations[$dimension];
for ($i = 0, $t = $this->dimensions; $i < $t; $i++) {
if ($i === $dimension)
continue;
$params[] = &$this->locations[$i];
}
call_user_func_array('array_multisort', $params);
return $this;
}
}
test data:
$loc = new Locations(3);
$loc
->addLocation(1, 1, 'A')
->addLocation(2, 3, 'B')
->addLocation(4, 2, 'C')
->addLocation(3, 2, 'D')
;
$loc->sortByDimension(1);
var_dump($loc->locations);
$loc->sortByDimension(2);
var_dump($loc->locations);
keeping the arrays and the multisort you have, changing the structure to something like the following would eliminate the need for a previous pass:
$locations = array(
'x' => $x_coordinates,
'y' => $y_coordinates,
'data' => $data_array
);
then just use the array_multisort() on all columns.

Categories