Best method to intersect many arrays - php

I have a 2 dimensional array ($a), the first dimensional has 600 elements, the second dimension can have from 1k to 10k elements. Something like:
$a[0] = array(4 => true, 10 => true, 18 => true...);
$a[1] = array(6 => true, 10 => true, 73 => true...);
$a[599] = array(106 => true, 293 => true, 297 => true...);
I need to save in a file the $intersection of each of those 600 elements with each other 5x. Something like this:
for ($i=0;$i<=599;$i++) {
for ($j=$i+1;$j<=599;$j++) {
for ($k=$j+1;$k<=599;$k++) {
for ($p=$k+1;$p<=599;$p++) {
for ($m=$p+1;$m<=599;$m++) {
$intersection = array_intersect_key($a[$i], $a[$j], $a[$k], $a[$p], $a[$m]);
}
}
}
}
}
I am using array_intersect_key because its much faster than array_intersect because it's O(n) vs O(n^2).
That code works fine but runs pretty slow. So I made maaany other attempts which runs a little faster, something like:
for ($i=0;$i<=599;$i++) {
for ($j=$i+1;$j<=599;$j++) {
$b = array_intersect_keys($a[$i], $a[$j]);
for ($k=$j+1;$k<=599;$k++) {
$c = array_intersect_keys($b, $a[$k]);
for ($p=$k+1;$p<=599;$p++) {
$d = array_intersect_keys($c, $a[$p]);
for ($m=$p+1;$m<=599;$m++) {
$intersection = array_intersect_key($d, $a[$m]);
}
}
}
}
}
This runs 3x faster than the 1st code, but it's still pretty slow. I also tried creating long binary vectors like 00000011001110... (for each 600 elements) and doing bitwise operations && which works exactly like an intersect, but it's not much faster and uses a lot more memory.
So I am wondering, do you have any breakthrough suggestion? Any mathematical operation, matrix operation... that I can use? I am really tempted to recreate the C+ code of the array_intersect_keys because at every execution it does a lot of things that I dont need, like ordering the arrays before performing the intersection - I dont need that because I already ordered all my arrays before operations start, so that creates a big overhead. But I still dont want to go that route because it's not that simple to create a PHP extension that performs better than the native implementation.
Note: my array $a has all the elements already sorted and the elements in the first dimension that have less elements in the second dimension are positioned first in the array so array_intersect_keys works much faster because that function has to check only until reach the end of the first parameter (which is smaller in size).
EDIT
#user1597430 Again I appreciate all your attention and further explanations you gave me. You made me complete understand everything you said! And after that, I decided to take a shot on my own implementation of your algorithm and I think it is a little bit simpler and faster because I never need to use sort and I also make good use of array keys (instead of values). Feel free to use the algorithm below whenever you want if you ever need it, you already provided me with lots of your time!
After implementing the algorithm below I just realized one thing at the end! I can ONLY start checking the intersections/combinations after all the combinatory phase is done, and after that, I need to run another algorithm to parse the results. To me that's not possible because using 600x1000 takes already a long time and I may very well end up having to use 600x20000 (yeap, 600 x 20k) and I am pretty sure it's gonna take forever. And only after "forever" is that I will be able to check the combinations/intersections result (parse). Do you know a way in such a manner that I can already check the intersections after each computation or not having to wait ALL the combinations be generated? That's pretty sad that after taking almost 6 hours to understand and implement the algorithm below, I just came to the conclusion it wont fit my need. Dont worry, I will accept your answer if no better one come, since you already helped me a lot and I LEARNED A LOT WITH you!
<?php
$quantity_first_order_array = 10;
$quantity_second_order_array = 20;
$a = array();
$b = array();
for ($i=0;$i<$quantity_first_order_array;$i++) {
for ($j=0;$j<$quantity_second_order_array;$j++) {
//I just use `$i*2 + $j*3 + $j*$i` as a way to avoid using `rand`, dont try to make sense of this expression, it's just to avoid using `rand`.
$a['group-' . $i][$i*2 + $j*3 + $j*$i] = true;
$b[$i*2 + $j*3 + $j*$i] = true;
}
}
//echo "aaaa\n\n";var_dump($a);exit();
$c = array();
foreach ($b as $chave1 => $valor1) {
$c[$chave1] = array();
foreach ($a as $chave2 => $valor2) {
if (isset($a[$chave2][$chave1])) {
$c[$chave1][] = $chave2;
}
}
}
foreach ($c as $chave1 => $valor1) {
if (count($c[$chave1]) < 5) {
unset($c[$chave1]);
}
}
//echo "cccc\n\n";var_dump($c);exit();
$d = array();
foreach ($c as $chave1 => $valor1) {
$qntd_c_chave1 = count($c[$chave1]);
for ($a1=0;$a1<$qntd_c_chave1;$a1++) {
for ($a2=$a1 + 1;$a2<$qntd_c_chave1;$a2++) {
for ($a3=$a2 + 1;$a3<$qntd_c_chave1;$a3++) {
for ($a4=$a3 + 1;$a4<$qntd_c_chave1;$a4++) {
for ($a5=$a4 + 1;$a5<$qntd_c_chave1;$a5++) {
$d[$c[$chave1][$a1] . '|' . $c[$chave1][$a2] . '|' . $c[$chave1][$a3] . '|' . $c[$chave1][$a4] . '|' . $c[$chave1][$a5]][] = $chave1;
}
}
}
}
}
}
echo "dddd\n\n";var_dump($d);exit();
?>

Well, I usually don't do this for free, but since you mentioned genetic algos... Hello from Foldit community.
<?php
# a few magic constants
define('SEQUENCE_LENGTH', 600);
define('SEQUENCE_LENGTH_SUBARRAY', 1000);
define('RAND_MIN', 0);
define('RAND_MAX', 100000);
define('MIN_SHARED_VALUES', 5);
define('FILE_OUTPUT', sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'out.csv');
# generates a sample data for processing
$a = [];
for ($i = 0; $i < SEQUENCE_LENGTH; $i++)
{
for ($j = 0; $j < SEQUENCE_LENGTH_SUBARRAY; $j++)
{
$a[$i][] = mt_rand(RAND_MIN, RAND_MAX);
}
$a[$i] = array_unique($a[$i]);
sort($a[$i]);
}
# prepares a map where key indicates the number
# and value is a sequence that contains this number
$map = [];
foreach ($a as $i => $array)
{
foreach ($array as $value)
{
$map[$value][$i] = $i;
}
}
ksort($map);
# extra optimization: drop all values that don't have at least
# MIN_SHARED_VALUES (default = 5) sequences
foreach ($map as $index => $values)
{
if (count($values) < MIN_SHARED_VALUES)
{
unset($map[$index]);
}
else
{
# reindex array keys - we need that later for simple
# "for" loops that should start from "0"
$map[$index] = array_values($map[$index]);
}
}
$file = fopen(FILE_OUTPUT, 'w');
# permutation: generates all sequences that share the same $value
foreach ($map as $value => $array)
{
$array_length = count($array);
for ($i = 0; $i < $array_length; $i++)
{
for ($j = $i + 1; $j < $array_length; $j++)
{
for ($k = $j + 1; $k < $array_length; $k++)
{
for ($l = $k + 1; $l < $array_length; $l++)
{
for ($m = $l + 1; $m < $array_length; $m++)
{
# index indicates a group that share a $value together
$index = implode('-', [$array[$i], $array[$j], $array[$k], $array[$l], $array[$m]]);
# instead of CSV export you can use something like
# "$result[$index][] = $value" here, but you need
# tons of RAM to do that
fputcsv($file, [$index, $value]);
}
}
}
}
}
}
fclose($file);
Locally script makes ~6.3 million permutations, that takes ~15s in total, 70% of the time goes to the I/O HDD operations.
You are free to play with the output method (by using linux sort out.csv -o sorted-out.csv on the top of your CSV file or by creating a separate file pointer for each unique $index sequence to dump your results - it's up to you).

Related

combinations without duplicates using php

I need all possible combinations in math sense (without duplicates) where n=30 and k=18
function subcombi($arr, $arr_size, $count)
{
$combi_arr = array();
if ($count > 1) {
for ($i = $count - 1; $i < $arr_size; $i=$i+1) {
$highest_index_elem_arr = array($i => $arr[$i]);
foreach (subcombi($arr, $i, $count - 1) as $subcombi_arr)
{
$combi_arr[] = $subcombi_arr + $highest_index_elem_arr;
}
}
} else {
for ($i = $count - 1; $i < $arr_size; $i=$i+1) {
$combi_arr[] = array($i => $arr[$i]);
}
}
return $combi_arr;
}
function combinations($arr, $count)
{
if ( !(0 <= $count && $count <= count($arr))) {
return false;
}
return $count ? subcombi($arr, count($arr), $count) : array();
}
$numeri="01.02.03.04.05.06.07.08.09.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30";
$numeri_ar=explode(".",$numeri);
$numeri_ar=array_unique($numeri_ar);
for ($combx = 2; $combx < 19; $combx++)
{
$combi_arr = combinations($numeri_ar, $combx);
}
print_r($combi_arr);
It works but it terminates with an out of memory error, of course, number of combinations is too large.
Now I do not need exactly all the combinations. I need only a few of them.
I'll explain.
I need this work for a statistical study over Italian lotto.
I have the lotto archive in this format saved in $archivio array
...
35.88.86.03.54
70.72.45.18.09
55.49.35.30.43
15.52.49.41.72
74.26.54.77.90
33.14.56.42.11
08.79.41.01.52
82.33.32.83.43
...
A full archive is available here
https://pastebin.com/tut6kFXf
newer extractions are on top.
I tried (unsuccessfully) to modify the function to do this
for each 18 numbers combination found by the function combinations, the function should check if there are min. 3 numbers in one of the first 30 rows of $archivio. If "yes", the combination must not be saved in combination array, this combination has no statistical value for my need. If "no", the combination must be saved in combination array, this combination has great statistical value for my need.
In this way the total combinations will be no more than some hundred or thousand and I will avoid the out of memory and I'll have what I need.
The script time will be surely long but there should not be out of memory using the way above.
Anyone is able to help me in this ?
Thank you

Removing enclosed intervals in an array of intervals in PHP

I have such an array of intervals sorted by the lower bound ($a[$i] <= $a[$i+1] for every $i), key l is lower bound and , key h is upper bound and I'd like to remove all rows with intervals that are enclosed by larger intervals.
$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 280, 'h'=>285);
$a[3] = array('l' => 310, 'h'=>310);
$a[4] = array('l' => 390, 'h'=>400);
So the result I'd like to get is
$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 390, 'h'=>400);
This is what I attempted
function dup($a){
$c = count($a)-1;
for ($i = $c; $i > 0; $i --){
while ($a[$i]['h'] <= $a[$i-1]['h']){
unset($a[$i]);
}
}
$a = array_values($a);
}
The first answer which comes in mind was given with different variations by other contributors : for each interval, loop on each interval looking for a larger and enclosing interval. It's simple to understand and to write, and it works for sure.
This is basically n2 order, which means for n intervals we'll do n*n loop turns. There can be some tricks to optimize it :
break'ing when we find an enclosing interval in the nested loop, as in user3137702's answer, because it's useless to continue if we find at least one enclosing interval
avoiding looping on the same interval in the nested loop because we know an interval cant be strictly enclosed in itself (not significant)
avoiding looping on already excluded intervals in the nested loop (can have a significant impact)
looping on intervals (global loop) in ascending width = (h - l) order, because smaller intervals have more chance to be enclosed in others and the earliest we eliminate intervals, the more the next loop turns are effective (can be significant too in my opinion)
searching for enclosing intervals (nested loop) in descending width order, because larger intervals have more chance to be enclosing other intervals (I think it can have a significant impact too)
probably many other things that do not come to mind at the moment
Let me say now that :
optimization does not matter much if we have only few intervals to compute from time to time, and currently accepted user3137702's answer does the trick
to develop the suitable algorithm, it is necessary anyway to study the characteristics of the data that we have to deal with : in the case before us, how is the distribution of intervals ? Are there many enclosed intervals ? This can help to choose from the above list, the most useful tricks.
For educational purposes, I wondered if we could develop a different algorithm avoiding a n*n order which running time is necessarily very quickly deteriorated gradually as you increase the number of intervals to compute.
"Virtual rule" algorithm
I imagined this algorithm I called the "virtual rule".
place starting and ending points of the intervals on a virtual rule
run through the points along the rule in ascending order
during the run, register open or not intervals
when an interval starts and ends while another was opened before and is still open, we can say it is enclosed
so when an interval ends, check if it was opened after one of the other currently open intervals and if it is strictly closed before this interval. If yes, it is enclosed !
I do not pretend this is the best solution. But we can assume this is faster than the basic method because, despite many tests to do during the loop, this is n order.
Code example
I wrote comments to make it as clear as possible.
<?php
function removeEnclosedIntervals_VirtualRule($a, $debug = false)
{
$rule = array();
// place one point on a virtual rule for each low or up bound, refering to the interval's index in $a
// virtual rule has 2 levels because there can be more than one point for a value
foreach($a as $i => $interval)
{
$rule[$interval['l']][] = array('l', $i);
$rule[$interval['h']][] = array('h', $i);
}
// used in the foreach loop
$open = array();
$enclosed = array();
// loop through the points on the ordered virtual rule
ksort($rule);
foreach($rule as $points)
{
// Will register open intervals
// When an interval starts and ends while another was opened before and is still open, it is enclosed
// starts
foreach($points as $point)
if($point[0] == 'l')
$open[$point[1]] = $point[1]; // register it as open
// ends
foreach($points as $point)
{
if($point[0] == 'h')
{
unset($open[$point[1]]); // UNregister it as open
// was it opened after a still open interval ?
foreach($open as $i)
{
if($a[$i]['l'] < $a[$point[1]]['l'])
{
// it is enclosed.
// is it *strictly* enclosed ?
if($a[$i]['h'] > $a[$point[1]]['h'])
{
// so this interval is strictly enclosed
$enclosed[$point[1]] = $point[1];
if($debug)
echo debugPhrase(
$point[1], // $iEnclosed
$a[$point[1]]['l'], // $lEnclosed
$a[$point[1]]['h'], // $hEnclosed
$i, // $iLarger
$a[$i]['l'], // $lLarger
$a[$i]['h'] // $hLarger
);
break;
}
}
}
}
}
}
// obviously
foreach($enclosed as $i)
unset($a[$i]);
return $a;
}
?>
Benchmarking against basic method
It runs tests on randomly generated intervals
basic method works without a doubt. Comparing results from the two methods allows me to predent the "VirtualRule" method works because as far as I tested, it returned the same results
// * include removeEnclosingIntervals_VirtualRule function *
// arbitrary range for intervals start and end
// Note that it could be interesting to do benchmarking with different MIN and MAX values !
define('MIN', 0);
define('MAX', 500);
// Benchmarking params
define('TEST_MAX_NUMBER', 100000);
define('TEST_BY_STEPS_OF', 100);
// from http://php.net/manual/en/function.microtime.php
// used later for benchmarking purpose
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
function debugPhrase($iEnclosed, $lEnclosed, $hEnclosed, $iLarger, $lLarger, $hLarger)
{
return '('.$iEnclosed.')['.$lEnclosed.' ; '.$hEnclosed.'] is strictly enclosed at least in ('.$iLarger.')['.$lLarger.' ; '.$hLarger.']'.PHP_EOL;
}
// 2 foreach loops solution (based on user3137702's *damn good* work ;) and currently accepted answer)
function removeEnclosedIntervals_Basic($a, $debug = false)
{
foreach ($a as $i => $valA)
{
$found = false;
foreach ($a as $j => $valB)
{
if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
{
$found = true;
if($debug)
echo debugPhrase(
$i, // $iEnclosed
$a[$i]['l'], // $lEnclosed
$a[$i]['h'], // $hEnclosed
$j, // $iLarger
$a[$j]['l'], // $lLarger
$a[$j]['h'] // $hLarger
);
break;
}
}
if (!$found)
{
$out[$i] = $valA;
}
}
return $out;
}
// runs a benchmark with $number intervals
function runTest($number)
{
// Generating a random set of intervals with values between MIN and MAX
$randomSet = array();
for($i=0; $i<$number; $i++)
// avoiding self-closing intervals
$randomSet[] = array(
'l' => ($l = mt_rand(MIN, MAX-2)),
'h' => mt_rand($l+1, MAX)
);
/* running the two methods and comparing results and execution time */
// Basic method
$start = microtime_float();
$Basic_result = removeEnclosedIntervals_Basic($randomSet);
$end = microtime_float();
$Basic_time = $end - $start;
// VirtualRule
$start = microtime_float();
$VirtualRule_result = removeEnclosedIntervals_VirtualRule($randomSet);
$end = microtime_float();
$VirtualRule_time = $end - $start;
// Basic method works for sure.
// If results are the same, comparing execution time. If not, sh*t happened !
if(md5(var_export($VirtualRule_result, true)) == md5(var_export($VirtualRule_result, true)))
echo $number.';'.$Basic_time.';'.$VirtualRule_time.PHP_EOL;
else
{
echo '/;/;/;Work harder, results are not the same ! Cant say anything !'.PHP_EOL;
stop;
}
}
// CSV header
echo 'Number of intervals;Basic method exec time (s);VirtualRule method exec time (s)'.PHP_EOL;
for($n=TEST_BY_STEPS_OF; $n<TEST_MAX_NUMBER; $n+=TEST_BY_STEPS_OF)
{
runTest($n);
flush();
}
Results (for me)
As I thought, clearly different performances are obtained.
I ran the tests on a Core i7 computer with PHP5 and on a (old) AMD Quad Core computer with PHP7. There are clear differences in performance between the two versions on my systems ! which in principle can be explained by the difference in PHP versions because the computer that is running PHP5 is much more powerful...
A simplistic approach, maybe not exactly what you want, but should at least point you in the right direction. I can refine it if needed, just a bit busy and didn't want to leave the question unanswered..
$out = [];
foreach ($a as $valA)
{
$found = false;
foreach ($a as $valB)
{
if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
{
$found = true;
break;
}
}
if (!$found)
{
$out[] = $valA;
}
}
This is entirely untested, but should end up with only the unique (large) ranges in $out. Overlaps as I mentioned in my comment are unhandled.
The problem was missing break in the while cycle
function dup($a){
$c = count($a)-1;
for ($i = $c; $i > 0; $i --){
while ($a[$i]['h'] <= $a[$i-1]['h']){
unset($a[$i]);
break; //here
}
}
$a = array_values($a);
}
Here is the code
function sort_by_low($item1,$item2){
if($item1['l'] == $item2['l'])
return 0;
return ($item1['l']>$item2['l'])? -1:1;
}
usort($a,'sort_by_low');
for($i=0; $i<count($a); $i++){
for($j=$i+1; $j<count($a);$j++){
if($a[$i][l]<=$a[$j]['l'] && $a[$i][h]>=$a[$j]['h']){
unset($a[$j]);
}
}
}
$a=array_values($a);
Here is the working code (Tested)
$result = array();
usort($a, function ($item1, $item2) {
if ($item1['l'] == $item2['l']) return 0;
return $item1['l'] < $item2['l'] ? -1 : 1;
});
foreach ($a as $element) {
$exists = false;
foreach ($result as $r) {
if (($r['l'] < $element['l'] && $r['h'] > $element['h'])) {
$exists = true;
break;
}
}
if (!$exists) {
$result[] = $element;
}
}
$result will contain the desired result

How to revert a function in PHP?

I am building a little game and got stuck in developing the leveling system. I created a function that will exponentially increase the experience required for the next level. However, I am not sure how to turn it around so that I can put in the amount of experience a user has gained and get the corresponding level.
PHP function
function experience($level, $curve = 300) {
// Preset value to prevent notices
$a = 0;
// Calculate level cap
for ($x = 1; $x < $level; $x++) {
$a += floor($x+$curve*pow(2, ($x/7)));
}
// Return amount of experience
return floor($a/4);
}
The issue
I am wondering how I can reverse engineer this function in order to return the correct level for a certain amount of experience.
Using the above function, my code would output the following:
Level 1: 0
Level 2: 83
Level 3: 174
Level 4: 276
Level 5: 388
Level 6: 512
Level 7: 650
Level 8: 801
Level 9: 969
Level 10: 1154
What I am looking for is a way to invert this function so that I can input a certain amount and it will return the corresponding level.
A 1000 experience should return level 9 for example.
Plugging the values into excel and creating a trend line, I got the following equation:
y = 1.17E-09x^3 - 4.93E-06x^2 + 1.19E-02x + 6.43E-02
So your reverse engineered equation would be
function level($xp) {
$a = 1.17e-9;
$b = -4.93e-6;
$c = 0.0119;
$d = 0.0643
return round($a*pow($xp, 3) + $b*pow($xp,2) + $c * $xp + $d);
}
Results are accurate to within 1dp, but if your $curve changes, you'd need to recalculate. I also haven't extended higher than level 10.
Other options include caching the results of the lookup:
$levelXpAmounts = array()
function populateLevelArray($curve=300) {
$levelXpAmounts[$curve] = array();
for($level = $minlevel; $level <= $maxLevel; $level++) {
$levelXpAmounts[$curve][$level] = experience($level);
}
}
//at game load:
populateLevelArray()
Then, your reverse lookup would be
function level($xp, $curve=300) {
if (!array_key_exists($levelXpAmounts, curve)
populateLevelArray($curve);
for($level = $minlevel; $ level <= $maxLevel; $level++) {
if ($xp < $levelXpAmounts[$curve][$level]) {
return $level - 1;
}
}
}
That way, the iteration through all the levels is only done once for each different value of $curve. You can also replace your old experience() function with a (quite likely faster) lookup.
Note: it's been a while since I've written any php, so my syntax may be a little rusty. I apologize in advance for any errors in that regard.
You can do another function called level which uses the experience function to find the level:
function level($experience)
{
for ($level = 1; $level <= 10; $level++) {
if ($experience <= experience($level)) {
return $level;
}
}
}
function experience($level, $curve = 300)
{
$a = 0;
for ($x = 1; $x < $level; $x++) {
$a += floor($x+$curve*pow(2, ($x/7)));
}
return floor($a/4);
}
var_dump(level(1000));
You can clearly work the math here and find a reverse formula. Not sure whether it will be a nice and easy formula, so I would suggest you an alternative approach which is easy to implement.
Precalculate the results for all the levels you realistically want your person to achieve (I highly doubt that you need more than 200 levels, because based on my estimation you will need tens of billions exp points).
Store all these levels in the array: $arr = [0, 83, 174, 276, 388, 512, 650, ...];. Now your array is sorted and you need to find a position where your level should fit.
If you are looking for 400 exp points, you see that it should be inserted after 5-th position - so it is 5-th level. Even a simple loop will suffice, but you can also write a binary search.
This task could be solved in other way. This is method of partial sums.
Let's assume, you have a class , which stores an array of exponential values calculated by function:
function formula($level, $curve){ return floor($level+$curve*pow(2, ($level/7)));}
$MAX_LEVEL = 90;
function calculateCurve($curve){
$array = [];
for($i =0; $i< $MAX_LEVEL; $i++) $array.push(formula($i, $curve));
return $array;
}
Now we can calculate experience, needed for a level:
$curve = calculateCurve(300);
function getExperienceForLevel($level, $curve){
$S = 0;
for($i =0; $i < level; $i++) $S += $curve[$i];
}
And calculate level for experience:
function getLevelForExperience($exp, $curve){
for($i =0; $i < $MAX_LEVEL; $i++){
$exp -= $curve[$i];
if($exp < 0) return $i-1;
}
return $MAX_LEVEL;
}
I assume there could index problems - I didn't tested the code, but I suppose that main idea is clearly explained.
Pros:
Code cleaner, There no magic numbers and interpolation coeficients.
You can easy change your learning curve.
Possibility to improve and make calculating functions as O(1);
Cons:
There is an $curve array to store, or calculate somewhere.
Also. you could make even more advanced version of this:
function calculateCurve($curve){
$array = [];
$exp = 0;
for($i =0; $i< $MAX_LEVEL; $i++) {
$exp += formula($i, $curve);
$array.push($exp);
}
return $array;
}
Now calculating experience have O(1) complexity;
function getExperienceForLevel($level, $curve){
return $curve[min($MAX_LEVEL, $level)];
}
Perhaps not the best way, but it's working.
function level($experience, $curve = 300)
{
$minLevel = 1;
$maxLevel = 10;
for($level = $minLevel; $level <= $maxLevel; $level++)
{
if(experience($level, $curve) <= $experience && $experience < experience($level + 1, $curve))
{
return $level;
}
}
return $maxLevel;
}

Random generator returning endless duplicates

I am trying to create a random string which will be used as a short reference number. I have spent the last couple of days trying to get this to work but it seems to get to around 32766 records and then it continues with endless duplicates. I need at minimum 200,000 variations.
The code below is a very simple mockup to explain what happens. The code should be syntaxed according to 1a-x1y2z (example) which should give a lot more results than 32k
I have a feeling it may be related to memory but not sure. Any ideas?
<?php
function createReference() {
$num = rand(1, 9);
$alpha = substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, 1);
$char = '0123456789abcdefghijklmnopqrstuvwxyz';
$charLength = strlen($char);
$rand = '';
for ($i = 0; $i < 6; $i++) {
$rand .= $char[rand(0, $charLength - 1)];
}
return $num . $alpha . "-" . $rand;
}
$codes = [];
for ($i = 1; $i <= 200000; $i++) {
$code = createReference();
while (in_array($code, $codes) == true) {
echo 'Duplicate: ' . $code . '<br />';
$code = createReference();
}
$codes[] = $code;
echo $i . ": " . $code . "<br />";
}
exit;
?>
UPDATE
So I am beginning to wonder if this is not something with our WAMP setup (Bitnami) as our local machine gets to exactly 1024 records before it starts duplicating. By removing 1 character from the string above (instead of 6 in the for loop I make it 5) it gets to exactly 32768 records.
I uploaded the script to our centos server and had no duplicates.
What in our enviroment could cause such a behaviour?
The code looks overly complex to me. Let's assume for the moment you really want to create n unique strings each based on a single random value (rand/mt_rand/something between INT_MIN,INT_MAX).
You can start by decoupling the generation of the random values from the encoding (there seems to be nothing in the code that makes a string dependant on any previous state - excpt for the uniqueness). Comparing integers is quite a bit faster than comparing arbitrary strings.
mt_rand() returns anything between INT_MIN and INT_MAX, using 32bit integers (could be 64bit as well, depends on how php has been compiled) that gives ~232 elements. You want to pick 200k, let's make it 400k, that's ~ a 1/10000 of the value range. It's therefore reasonable to assume everything goes well with the uniqueness...and then check at a later time. and add more values if a collision occured. Again much faster than checking in_array in each iteration of the loop.
Once you have enough values, you can encode/convert them to a format you wish. I don't know whether the <digit><character>-<something> format is mandatory but assume it is not -> base_convert()
<?php
function unqiueRandomValues($n) {
$values = array();
while( count($values) < $n ) {
for($i=count($values);$i<$n; $i++) {
$values[] = mt_rand();
}
$values = array_unique($values);
}
return $values;
}
function createReferences($n) {
return array_map(
function($e) {
return base_convert($e, 10, 36);
},
unqiueRandomValues($n)
);
}
$start = microtime(true);
$references = createReferences(400000);
$end = microtime(true);
echo count($references), ' ', count(array_unique($references)), ' ', $end-$start, ' ', $references[0];
prints e.g. 400000 400000 3.3981630802155 f3plox on my i7-4770. (The $end-$start part is constantly between 3.2 and 3.4)
Using base_convert() there can be strings like li10, which can be quite annoying to decipher if you have to manually type the string.

PHP - Nest a for loop a variable number of times

I'm a beginner with PHP (and programming in general).
To test what I've learned so far I wrote this code, which prints all the possibile combinations of a set number of dice with a certain number of faces. (you'll find the code at the end).
What I want to do is dynamically change the number of nested for loops according to the $dicenumber variable. Right now it can only process 3 dice, since the code is:
for ($d1=1; $d1 <= $d1value ; $d1++) {
for ($d2=1; $d2 <= $d2value ; $d2++) {
for ($d3=1; $d3 <= $d3value ; $d3++) {
array_push(${sum.($d1+$d2+$d3)}, "$d1"."$d2"."$d3");
}
}
}
But I want to change it so that, for example, if $dicenumber were 2, it would produce something like:
for ($d1=1; $d1 <= $d1value ; $d1++) {
for ($d2=1; $d2 <= $d2value ; $d2++) {
array_push(${sum.($d1+$d2)}, "$d1"."$d2");
}
}
I want the code to process for whatever number $dicenumber may be, without limits. Looking around, it seems like I have to add some kind of recursive code, but I don't know how to do that. Any tips? Also, any feedback on what I did wrong in general, would be extremely helpful! thanks!
<?php
//defines the number and type of dice
$dicenumber = 3;
$dtype = 6;
//defines the maximum value of every die
for ($i=1; $i <=$dicenumber ; $i++) {
${d.$i.value} = $dtype;
}
//defines and array for each possible sum resulting from the roll of the given number of dice.
for ($i=$dicenumber; $i <= ($dtype*$dicenumber) ; $i++) {
${sum.$i} = array();
}
//the troublesome piece of code I want to change
for ($d1=1; $d1 <= $d1value ; $d1++) {
for ($d2=1; $d2 <= $d2value ; $d2++) {
for ($d3=1; $d3 <= $d3value ; $d3++) {
array_push(${sum.($d1+$d2+$d3)}, "$d1"."$d2"."$d3");
}
}
}
//prints all the possible roll combinations, each line lists combination that share the same sum
for ($i=$dicenumber; $i <= ($dtype*$dicenumber); $i++) {
print join(" ", ${sum.$i})."<br />";
}
?>
Here we have a two-function process. The first function, buildArrays, creates arrays in the proper format to feed into the second function, allCombinations. So, for this example with 3 d6's in play, buildArrays will produce an array equivalent to this:
$data = array(
array(1, 2, 3, 4, 5, 6),
array(1, 2, 3, 4, 5, 6),
array(1, 2, 3, 4, 5, 6));
I will warn you that as you increase the number of dice and the number of sides, the number of possible combinations increases exponentially! This means that you could place a very large demand on the server, and both timeout and max memory limits will quickly come into play. The arrays generated could be very, very large and quickly consume more than the max memory limit. That said, here we go:
function buildArrays($dicenumber, $dtype){
for ($i = 0; $i<$dicenumber; $i++){
$tmp = array();
for ($j = 1; $j<=$dtype; $j++){
$tmp[] = $j;
}
$data[$i] = $tmp;
}
return $data;
}
function allCombinations($data){
$result = array(array()); //this is crucial, dark magic.
foreach ($data as $key => $array) {
$new_result = array();
foreach ($result as $old_element){
foreach ($array as $element){
if ($key == 0){
$new_result[] = $element;
} else {
$new_result[] = $old_element.$element;
}
}
$result = $new_result;
}
}
return $result;
}
//set variables
$dicenumber = 3;
$dtype = 6;
//set_time_limit(0); //You may need to uncomment this for large values.
//call functions
$data = buildArrays($dicenumber, $dtype);
$results = allCombinations($data);
//print out the results
foreach ($results as $result){
echo $result."<br/>";
}
N.B. This answer is a variant of the cartesian product code
If you have learned functions, you can do a recursive call and keep track of what dicenumber you're on, then increment it each call to the function (and end the loop once you've hit your #).
understanding basic recursion

Categories