Creating Arrays of test data in PHP - php

Firstly, I'm still a beginner to PHP so my terminology may be a bit wrong - please let me know and I'll amend the question.
Task:
I have a function which I'm looking to test to see how long it takes to run at large scale. I need to pass it data in the following format:
$data = [
[ 'A', 'B', 'C', 'D' ],
[ 'C', 'B' ],
[ 'C', 'B' ],
];
As you can see, the number of items in an array can vary - although they are drawn from an overall set (range of integers or letters).
For my testing purposes, I'd like to be able to change the number of items in each nested array.
I also need to be able to change how many arrays are created.
Example tests I'd like to perform
e.g.
Run one test with a small number of arrays, but a large amount of
data within each.
Run a second test with a large number of arrays, but
a small amount of data in each
A third with huge numbers of items
and arrays.
The story so far
I was Googling and know I could use range() to create an array that count sequentially (or with a certain step). But I have to set the upper and lower bounds for each array.
I figure I could use a do.. while loop to add X number of arrays within $data, but I'm not sure how I can vary the amount of data within each array.
For the function to work, I need there to be either a letter or integer repeated. In other words: I couldn't have the first array count from 1-10, the next 11-21. It's as if all the data is drawn from the pool of integers 1-10,000,000.
Bonus points if the data can be randomized in order in each array.
Really appreciate any guidance and pointers on what to use / research - I'm sure this is a totally n00b question.
Many thanks in advance.

Generate a random range:
range(mt_rand(0, 100), mt_rand(101, 1000))
Generate an array of letters from a range (65 = A, 90 = Z):
array_map('chr', range(65, 90))
Generate a random order:
$data = range(..);
shuffle($data);
Take a random slice of an array:
$data = range(..);
$data = array_slice($data, mt_rand(0, count($data) - 1), mt_rand(1, count($data)));
Generate arrays of random length:
for ($i = 0, $length = mt_rand(0, 100); $i < $length; $i++) {
$data[] = ..;
}
You can nest two of those to generate randomly long arrays of randomly long arrays.
Now combine all these techniques as needed to spit out the kind of test data you want.

Related

Getting every combination of X numbers given Y numbers?

I've come to a mathematical problem which for I can't program the logic.
Let me explain it with an example:
Let's say I have 4 holes and 3 marbles, the holes are in order and my marbles are A,B and C and also in order.
I need to get every posible ORDERED combination:
ABC4
AB3C
A2BC
1ABC
This is very simple, but what if the number of holes changes? Let's say now I have 5 holes.
ABC45
AB3C5
A2BC5
1ABC5
AB34C
A2B4C
1AB4C
A23BC
1A3BC
12ABC
Now let's say we have 5 holes and 4 marbles.
ABCD5
ABC4D
AB3CD
A2BCD
1ABCD
And this can be any number of holes and any number of marbles.
The number of combinations is given by:
$combinations = factorial($number_of_holes)/(factorial($number_of_marbles)*factorial($number_of_holes-$number_of_marbles)))
(Here it is the factorial function in case you need it)
function factorial($number) {
if ($number < 2) {
return 1;
} else {
return ($number * factorial($number-1));
}
}
What I need and can't figure out how to program, is a function or a loop or something, that returns an array with the position of the holes, given X numbers of holes and Y number of marbles.
For first example it would be: [[4],[3],[2],[1]], for second: [[4,5],[2,5],[1,5],[3,4],[2,4],[1,5],[2,3],[1,3],[1,2]], for third: [[5],[4],[3],[2],[1]].
It doesn't have to be returned in order, I just need all the elements.
As you can see, another approach is the complementary or inverse or don't know how to call it, but the solution is every combinations of X number of free holes given Y number of holes, so, If I have 10 holes, and 5 marbles, there would be 5 free holes, the array returned would be every combination of 5 that can be formed with (1,2,3,4,5,6,7,8,9,10), which are 252 combinations, and what I need is the 252 combinations.
Examples for the 2nd approach:
Given an array=[1,2,3,4], return every combination for sets of 2 and 3.
Sets of 2
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Sets of 3
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
What I need is the logic to do this, I'm trying to do it in PHP, but I just can't figure out how to do it.
The function would receive the array and the set size and would return the array of sets:
function getCombinations($array,$setize){
//magic code which I can't figure out
return array(sets);
}
I hope this is clear enough and someone can help me, I've been stuck for several days now, but it seems to be just too much for me to handle by myself.
This post, PHP algorithm to generate all combinations of a specific size from a single set, is for all possible combinations, repeating the elements and order doesn't matter, its a good lead, I did read it, but it doesn't solve my problem, it's very different. I need them without repeating the elements and ordered as explained.
Let's say if I have already a set of [3,4] in my array, I don't want [4,3] as an other set.
Here's a recursive solution in PHP:
function getCombinations($array, $setsize){
if($setsize == 0)
return [[]];
// generate combinations including the first element by generating combinations for
// the remainder of the array with one less element and prepending the first element:
$sets = getCombinations(array_slice($array, 1), $setsize - 1);
foreach ($sets as &$combo) {
array_unshift($combo, $array[0]);
}
// generate combinations not including the first element and add them to the list:
if(count($array) > $setsize)
$sets = array_merge($sets, getCombinations(array_slice($array, 1), $setsize));
return $sets;
}
// test:
print_r(getCombinations([1, 2, 3, 4], 3));
Algorithm works like this:
If setsize is 0 then you return a single, empty combination
Otherwise, generate all combinations that include the first element, by recursively generating all combinations off the array excluding the first element with setsize - 1 elements, and then prepending the first element to each of them.
Then, if the array size is greater than setsize (meaning including the first element is not compulsory), generate all the combinations for the rest of the list and add them to the ones we generated in the second step.
So basically at each step you need to consider whether an element will be included or excluded in the combination, and merge together the set of combinations representing both choices.

Random integer with conditions

I have a PHP script where I have an array of integers, let's say $forbidden.
I want to get a random integer from 1 to 400 that is not in $forbidden.
Of course, I don't want any loop which breaks when rand gives a working result. I'd like something more effective.
How do you do this ?
Place all forbidden numbers in an array, and use array_diff from range(1,400). You'll get an array of allowed numbers, pick a random one with array_rand().
<?php
$forbidden = array(2, 3, 6, 8);
$complete = range(1,10);
$allowed = array_diff($complete, $forbidden);
echo $allowed[array_rand($allowed)];
This way you're removing the excluded numbers from the selection set, and nullifying the need for a loop :)
Produce an array of the allowed numbers. Find out the number in this array. Select one of those randomly.

What is the best algorithm to see if my number is in an array of ranges?

I have a 2 dimensional arrays in php containing the Ranges. for example:
From.........To
---------------
125..........3957
4000.........5500
5217628......52198281
52272128.....52273151
523030528....523229183
and so on
and it is a very long list. now I want to see if a number given by user is in range.
for example numbers 130, 4200, 52272933 are in my range but numbers 1, 5600 are not.
of course I can count all indexes and see if my number is bigger than first and smaller than second item. but is there a faster algorithm or a more efficient way of doing it using php function?
added later
It is sorted. it is actually numbers created with ip2long() showing all IPs of a country.
I just wrote a code for it:
$ips[1] = array (2,20,100);
$ips[2] = array (10,30,200);
$n=11;// input ip
$count = count($ips);
for ($i = 0; $i <= $count; $i++) {
if ($n>=$ips[1][$i]){
if ($n<=$ips[2][$i]){
echo "$i found";
break;
}
}else if($n<$ips[1][$i]){echo "not found";break;}
}
in this situation numbers 2,8,22,and 200 are in range. but not numbers 1,11,300
Put the ranges in a flat array, sorted from lower to higher, like this:
a[0] = 125
a[1] = 3957
a[2] = 4000
a[3] = 5500
a[4] = 5217628
a[5] = 52198281
a[6] = 52272128
a[7] = 52273151
a[8] = 523030528
a[9] = 523229183
Then do a binary search to determine at what index of this array the number in question should be inserted. If the insertion index is even then the number is not in any sub-range. If the insertion index is odd, then the number falls inside one of the ranges.
Examples:
n = 20 inserts at index 0 ==> not in a range
n = 126 inserts at index 1 ==> within a range
n = 523030529 inserts at index 9 ==> within a range
You can speed things up by implementing a binary search algorithm. Thus, you don't have to look at every range.
Then you can use in_array to check if the number is in the array.
I'm not sure if I got you right, do your arrays really look like this:
array(125, 126, 127, ..., 3957);
If so, what's the point? Why not just have?
array(125, 3957);
That contains all the information necessary.
The example you give suggests that the numbers may be large and the space sparse by comparison.
At that point, you don't have very many options. If the array is sorted, binary search is about all there is. If the array is not sorted, you're down to plain, old CS101 linear search.
The correct data structure to use for this problem is an interval tree. This is, in general, much faster than binary search.
I am assuming that the ranges do not overlap.
If that is the case, you can maintain a map data structure that is keyed on the lower value of the range.
Now all you have to do (given the number N) is to find the key in the map that is just lower than N (using binary search - logarithmic complexity) and then check if the number is lesser than the right value.
Basically, it is a binary search (logarithmic) on the constructed map.
From a pragmatic point of view, a linear search may very well turn out to be the fastest lookup method. Think of page faults and hard disk seek time here.
If your array is large enough (whatever "enough" actually means), it may be wise to stuff your IPs in a SQL database and let the database figure out how to efficiently compute SELECT ID FROM ip_numbers WHERE x BETWEEN start AND end;.

I have two unordered integer arrays, and i need to know how many integers these arrays have in common

I'm working in a LAMP environment, so PHP is the language; at least i can use python.
As the title said i have two unordered integer arrays.
$array_A = array(13, 4, 59, 38, 9, 69, 72, 93, 1, 3, 5)
$array_B = array(29, 72, 21, 3, 6)
I want to know how many integers these array have in common; in the example as you see the result is 2. I'm not interested in what integers are in common, like (72, 3).
I need a faster method than take every element of array B and check if it's in array A ( O(nxm) )
Arrays can be sorted through asort or with sql ordering (they came from a sql result).
An idea that came to me is to create a 'vector' for every array where the integer is a position who gets value 1 and integers not present get 0.
So, for array A (starting at pos 1)
(1, 0, 1, 1, 1, 0, 0, 0, 1, 0, ...)
Same for array B
(0, 0, 1, 0, 0, 1, ...)
And then compare this two vectors with one cycle. The problem is that in this way the vector length is about 400k.
Depending on your data (size) you might want to use array_intersect_key() instead of array_intersect(). Apparently the implementation of array_intersect (testing php 5.3) does not use any optimization/caching/whatsoever but loops through the array and compares the values one by one for each element in array A. The hashtable lookup is incredibly faster than that.
<?php
function timefn($fn) {
static $timer = array();
if ( is_null($fn) ) {
return $timer;
}
$x = range(1, 120000);
$y = range(2, 100000);
foreach($y as $k=>$v) { if (0===$k%3) unset($y[$k]); }
$s = microtime(true);
$fn($x, $y);
$e = microtime(true);
#$timer[ $fn ] += $e - $s;
}
function fnIntersect($x, $y) {
$z = count(array_intersect($x,$y));
}
function fnFlip($x, $y) {
$x = array_flip($x);
$y = array_flip($y);
$z = count(array_intersect_key($x, $y));
}
for ($i=0; $i<3; $i++) {
timefn( 'fnIntersect' );
timefn( 'fnFlip' );
}
print_r(timefn(null));
printsArray
(
[fnIntersect] => 11.271192073822
[fnFlip] => 0.54442691802979
)which means the array_flip/intersect_key method is ~20 times faster on my notebook.
(as usual: this is an ad hoc test. If you spot an error, tell me ...I'm expecting that ;-) )
I don't know a great deal about PHP so you may get a more specific answer from others, but I'd like to present a more language-agnostic approach.
By checking every element in A against every element in B, it is indeed O(n2) [I'll assume the arrays are of identical length here to simplify the equations but the same reasoning will hold for arrays of differing lengths].
If you were to sort the data in both arrays, you could reduce the time complexity to O(n log n) or similar, depending on the algorithm chosen.
But you need to keep in mind that the complexity only really becomes important for larger data sets. If those two arrays you gave were typical of the size, I would say don't sort it, just use the "compare everything with everything" method - sorting won't give you enough of an advantage over that. Arrays of 50 elements would still only give you 2,500 iterations (whether that's acceptable to PHP, I don't know, it would certainly be water off a duck's back for C and other compiled languages).
And before anyone jumps in and states that you should plan for larger data sets just in case, that's YAGNI, as unnecessary as premature optimization. You may never need it in which case you've wasted time that would have been better spent elsewhere. The time to implement that would be when it became a problem (that's my opinion of course, others may disagree).
If the data sets really are large enough to make the O(n2) unworkable, I think sorting then walking through the arrays in parallel is probably your best bet.
One other possibility is if the range of numbers is not too big - then your proposed solution of a vector of booleans is quite workable since that would be O(n), walking both arrays to populate the vector followed by comparisons of fixed locations within the two vectors. But I'm assuming your range is too large or you wouldn't have already mentioned the 400K requirement. But again, the size of the data sets will dictate whether or not that's worth doing.
The simplest way would be:
count(array_intersect($array_A, $array_B));
if I understand what you're after.
Should be fast.
If both arrays came from SQL, could you not write an SQL query with an inner join on the 2 sets of data to get your result?
You want the array_intersect() function. From there you can count the result. Don't worry about speed until you know you have a problem. The built-in function execute much faster than anything you'll be able to write in PHP.
I have written a PHP extension that provides functions for efficient set operations like union, intersection, binary search, etc. Internal data layout is an ordinary int32_t array stored in a PHP string. Operations are based on merge algorithms.
Example:
// Create two intarrays
$a = intarray_create_from_array(array(1, 2, 3));
$b = intarray_create_from_array(array(3, 4, 5));
// Get a union of them
$u = intarray_union($a, $b);
// Dump to screen
intarray_dump($u);
It's available here: https://github.com/tuner/intarray

Collect Lowest Numbers Algorithm

I'm looking for an algorithm (or PHP code, I suppose) to end up with the 10 lowest numbers from a group of numbers. I was thinking of making a ten item array, checking to see if the current number is lower than one of the numbers in the array, and if so, finding the highest number in the array and replacing it with the current number.
However, I'm planning on finding the lowest 10 numbers from thousands, and was thinking there might be a faster way to do it. I plan on implementing this in PHP, so any native PHP functions are usable.
Sort the array and use the ten first/last entries.
Honestly: sorting an array with a thousand entries costs less time than it takes you to blink.
What you're looking for is called a selection algorithm. The Wikipedia page on the subject has a few subsections in the selecting k smallest or largest elements section. When the list is large enough, you can beat the time required for the naive "sort the whole list and choose the first 10" algorithm.
Naive approach is to just sort the input. It's likely fast enough, so just try it and profile it before doing anything more complicated.
Potentially faster approach: Linearly search the input, but keep the output array sorted to make it easier to determine if the next input belongs in the array or not. Pseudocode:
output[0-9] = input[0-9];
sort(output);
for i=10..n-1
if input[i] < output[9]
insert(input[i])
where insert(x) will find the right spot (binary search) and do the appropriate shifting.
But seriously, just try the naive approach first.
Where are you getting this group of numbers?
If your List of numbers is already in an array you could simply do a sort(), and then a array_slice() to get the first 10.
I doesn't matter much for a small array, but as it gets larger a fast and easy way to increase processing speed is to take advantage of array key indexing, which for 1 mill. rows will use about 40% of the time. Example:
// sorting array values
$numbers = array();
for($i = 0; $i < 1000000; ++$i)
{
$numbers[$i] = rand(1, 999999);
}
$start = microtime(true);
sort($numbers);
$res = array_slice($numbers, 0, 10, true);
echo microtime(true) - $start . "\n";
// 2.6612658500671
print_r($res);
unset($numbers, $res, $start);
// sorting array keys
$numbers = array();
for($i = 0; $i < 1000000; ++$i)
{
$numbers[rand(1, 999999)] = $i;
}
$start = microtime(true);
ksort($numbers);
$res = array_keys(array_slice($numbers, 0, 10, true));
echo microtime(true) - $start . "\n";
// 0.9651210308075
print_r($res);
But if the array data is from a database the fastest is probably to just sort it there:
SELECT number_column FROM table_with_numbers ORDER BY number_column LIMIT 10
Create a sorted set (TreeSet in Java, I don't know about PHP), and add the first 10 numbers. Now iterate over the rest of the numbers Iterate over all your numbers, add the new one, then remove the biggest number from the set.
This algorithm is O(n) if n >> 10.
I would use a heap with 10 elements and the highest number at the root of the tree. Then start at the beginning of the list of numbers:
If the heap has less than 10 elements: add the number to the list
Otherwise, if the number is smaller than the highest number in the heap, remove the highest number in the heap, and then add the current number to the list
Otherwise, ignore it.
You will end up with the 10 lowest numbers in the heap. If you are using an array as the heap data structure, then you can just use the array directly.
(alternatively: you can slice out the first 10 elements, and heapify them instead of using the first step above, which will be slightly faster).
However, as other people have noted, for 1000 elements, just sort the list and take the first 10 elements.

Categories