Constructing an Array for Dataset - Different Start Points?

Constructing an Array for Dataset - Different Start Points? - php

I have an array of values set up like:
$array[0]['Year', 'Value']
Where there are 100 or so array's inside of the main array, each with 30 value pairs of 'Year' and 'Value'. This works great for all of my current applications, until I came upon various graphing libraries that want their data formatted like:
[x1, y1, y2, y3 ...], [x2, y1, y2, y3 ...], ...
This would be easy if all of my data points started from the same year, but they do not. My arrays all start at different years, and overlap. The problem with this is, that the graphing libraries format requires a 'null' value as a placemarker for that particular series if there is no Y-axis value for that X-point.
Example: My $array[0] goes from 1900-1930, my $array[1] goes from 1901-1931. So, the new array need to be [1900, $array[0]['Value']], null] (to indicate that the $array[1] doesn't have a value there, but is still a series on the graph. Then [1901, $array[0]['Value'], $array[1]['Value']], etc etc.
I've been drawing out diagrams and trying to wrap my head around creating a series of loops to take my old array and convert it... but I'm at a loss. Any ideas?

First of all you need to define your y points, for this we need a range. As you are creating your array I would have two variables, $minYear and $maxYear, then for every entry into the array you check if it's smaller than minYear, or bigger than maxYear, and if so replace the value with this one.
That gives us our min and max, now we need a range. You need to determine how many points between each year you have. To keep things simple I'm going to assume 1 year is one increment on the y axis. However if you have a maximum number of points on the y axis you can use, then you can use the following formulae to work out how many years are between each point:
$increment = ($maxYear - $minYear) / $numPointsOnYAxis;
That will give you how many years are between each point on the Y Axis. For simplicity you may wish to round that number using the ceil() function. It means you won't use every point on the Y axis, but it keeps the maths nice and easy.
Now we have our range you can use foreach loops to go through your data and plot the graph.
$space = '';
foreach($array as $num => $a) {
echo $space . '[x' . $num;
foreach($a as $year => $value) {
$y = ceil(($value - $minValue) / $increment); //Work out how many increments up the scale the year needs to be.
echo ', y' . $y;
}
echo ']';
$space = ', ';
}
That should echo out what you are after. Please not that I haven't tested this code.

Related

How to calculate the difference between minutes of a series of time intervals

I need to calculate the difference, "for example in minutes", of a series of intervals, however there is no limit, because there may be N time records, between "tipo_2_pause" and "tipo_3_return".
Excluding "tipo_1_start" and "tipo_4_end", both "tipo_2_pause" and "tipo_3_return" is an array with N records.
As an example, I need to calculate the difference between the "tipo_4_end" and the "tipo_1_start".... ok... this is easy... but I'm not able to find a way in which I can calculate always considering the respective pairs between "tipo_2_pause" and "tipo_3_return".
In the example, the first pair between "tipo_2_pause" and "tipo_3_return", I hope to get the difference... which in this case is equal to 10min....in the second pair, by chance, it could also be 10min... but it could be any time .. but if there is no pair, the result would be 0.
Objective:
Being able to calculate the difference in minutes between each item in the array "tipo_2_pause" that pairs with the array "tipo_3_return".

You're right that the key to getting pairs is to use the length of the shorter array.
Then you just apply a for-loop.
Somewhat verbose (for clarity) first pass:
$numPairs = min(count($tipo2), count($tipo3));
$duration = 0;
for($i=0; $i<$numPairs; $i++){
$start = $tipo2[$i];
$end = $tipo3[$i];
$duration += minutes_between($start, $end); // implementing minutes_between is left as an exercise.
}

PHP - Optimize finding closest point in an Array

I have created a script which gets a big array of points and then finds the closest point in 3D-space based on a limited array of chosen points. It works great. However, sometimes I get like over 2 Million points to compare to an array of 256 items so it is over 530 million calculations! Which takes a considerable amount of time and power (taking that it will be comparing stuff like that few times a min).
I have a limited group of 3D coordinates like this:
array (size=XXX)
0 => 10, 20, 30
1 => 200, 20, 13
2 => 36, 215, 150
3 => ...
4 => ...
... // this is limited to max 256 items
Then I have another very large group of, let's say, random 3D coordinates which can vary in size from 2,500 -> ~ 2,000,000+ items. Basically, what I need to do is to iterate through each of those points and find the closest point. To do that I use Euclidean distance:
sq((q1-p1)2+(q2-p2)2+(q3-p3)2)
This gives me the distance and I compare it to the current closest distance, if it is closer, replace the closest, else continue with next set.
I have been looking on how to change it so I don't have to do so many calculations. I have been looking at Voronoi Diagrams then maybe place the points in that diagram, then see which section it belongs to. However, I have no idea how I can implement such a thing in PHP.
Any idea how I can optimize it?

Just a quick shot from the hip ;-)
You should be able to gain a nice speed up if you dont compare each point to each other point. Many points can be skipped because they are already to far away if you just look at one of the x/y/z coordinates.
<?php
$coord = array(18,200,15);
$points = array(
array(10,20,30),
array(200,20,13),
array(36,215,150)
);
$closestPoint = $closestDistance= false;;
foreach($points as $point) {
list($x,$y,$z) = $point;
// Not compared yet, use first poit as closest
if($closestDistance === false) {
$closestPoint = $point;
$closestDistance = distance($x,$y,$z,$coord[0],$coord[1],$coord[2]);
continue;
}
// If distance in any direction (x/y/z) is bigger than closest distance so far: skip point
if(abs($coord[0] - $x) > $closestDistance) continue;
if(abs($coord[1] - $y) > $closestDistance) continue;
if(abs($coord[2] - $z) > $closestDistance) continue;
$newDistance = distance($x,$y,$z,$coord[0],$coord[1],$coord[2]);
if($newDistance < $closestDistance) {
$closestPoint = $point;
$closestDistance = distance($x,$y,$z,$coord[0],$coord[1],$coord[2]);
}
}
var_dump($closestPoint);
function distance($x1,$y1,$z1,$x2,$y2,$z2) {
return sqrt(pow($x1-$x2,2) + pow($y1 - $y2,2) + pow($z1 - $z2,2));
}
A working code example can be found at http://sandbox.onlinephpfunctions.com/code/8cfda8e7cb4d69bf66afa83b2c6168956e63b51e

How can I gradually make an array sparser?

I have a fully-populated array of values, and I would like to arbitrarily remove elements from this array with more removed towards the far end.
For example, given input ( where a . signifies a populated index )
............................................
I would like something like
....... . ... .. . . .. . .
My first thought was to count the elements, then iterate over the array generating a random number somewhere between the current index and the total size of the array, eg:
if ( mt_rand( 0, $total ) > $total - $current_index )
//remove this element
however, as this entails making a random number each time the loop goes round it becomes very arduous.
Is there a better way of doing this?

One easy way is to flip a weighted coin for each entry with coin flips more weighted towards the end. For example, if the array is size n, for each entry you could choose a random number from 0 to n-1 and only keep the value if the index is less than or equal to the random number. (That is, keep each entry with probability 1 - index/total.) This has the nice advantage that if you're going to be compacting your array anyways, and you're using a good enough but efficient random number generator (could be a simple integer hash over a nonce), it's going to be rather fast for memory access.
On the other hand if you're only blanking out a few items and aren't rearranging the array, you can go with some sort of weighted random number generator that more often chooses numbers that are toward the end of the index. For example, if you have a random number generator that generates floats in the value of [0,1] (closed or open bounds not mattering that much likely), consider obtaining such a random float r and squaring it. This will tend to prefer lower values. You can fix this by flipping it around: 1-r^2. Of course, you need this to be in your index range of 0 to n - 1, so take floor(n * (1 - r^2)) and also round n down to n-1.
There's practically an infinite number of variations on both of these techniques.

This is quite probably not the best/most efficient way to do this, but it is the best I can come up with and it does work.
N.B. the codepad example takes a long time to execute, but this is because of the pretty-print loop I added to the end so you can see it visibly working. If you remove the inner loop, execution time drops to acceptable levels.
<?php
$array = range(0, 99);
for ($i = 0, $count = count($array); $i < $count; $i++) {
// Get array keys
$keys = array_keys($array);
// Get a random number between 0 and count($keys) - 1
$rand = mt_rand(0, count($keys) - 1);
// Cut $rand elements off the beginning of the keys
$keys = array_slice($keys, $rand);
// Unset a random key from the remaining keys
unset($array[$keys[array_rand($keys)]]);
}

This method isn't random- it works by you defining a function, and its inverse. Different functions, with different constant coefficients will have different distribution characteristics.
The results are very pattern like, as expected when mapping a continuous function to a discrete structure like an array.
Here's an example using a quadratic function. You could try varying the constant.
demo: http://codepad.org/ojU3s9xM
#as in y = x^2 / 7;
function y($x) {
return $x * $x / 7;
}
function x($y) {
return 7 * sqrt($y);
}
$theArray = range(0,100);
$size = count($theArray);
//use func inverse to find the max value we can input to $y() without going out of array bounds
$maximumX = x($size);
for ($i=0; $i<$maximumX; $i++) {
$index = (int) y($i);
//unset the index if it still exists, else, the next greatest index
while (!isset($theArray[$index]) && $index < $size) {
$index++;
}
unset($theArray[$index]);
}
for ($i=0; $i<$size; $i++) {
printf("[%-3s]", isset($theArray[$i]) ? $theArray[$i] : '');
}

Slicing / Limiting an Array by Value

Background;
to create a dropdown menu for a fun gambling game (Students can 'bet' how much that they are right) within a form.
Variables;
$balance
Students begin with £3 and play on the £10 table
$table(there is a;
£10 table, with a range of 1,2,3 etc to 10.
£100 table with a range of 10,20,30 etc to 100.
£1,000 table with a range of 100, 200, 300, 400 etc to 1000.)
I have assigned $table to equal number of zeros on max value,
eg $table = 2; for the £100 table
Limitations;
I only want the drop down menu to offer the highest 12 possible values (this could include the table below -IMP!).
Students are NOT automatically allowed to play on the 'next' table.
resources;
an array of possible values;
$a = array(1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000);
I can write a way to restrict the array by table;
(the maximum key for any table is (9*$table) )//hence why i use the zeroes above (the real game goes to $1 billion!)
$arrayMaxPos = (9*$table);
$maxbyTable = array_slice($a, 0, $arrayMaxPos);
Now I need a way to make sure no VALUE in the $maxbyTable is greater than $balance.
to create a $maxBet array of all allowed bets.
THIS IS WHERE I'M STUCK!
(I would then perform "array_slice($maxBet, -12);" to present only the highest 12 in the dropdown)
EDIT - I'd prefer to NOT have to use array walk because it seems unnecessary when I know where i want the array to end.
SECOND EDIT Apologies I realised that there is a way to mathematically ascertain which KEY maps to the highest possible bid.
It would be as follows
$integerLength = strlen($balance);//number of digits in $balance
$firstDigit = substr($balance, 0, 1);
then with some trickery because of this particular pattern
$maxKeyValue = (($integerlength*9) - 10 + $firstDigit);
So for example;
$balance = 792;
$maxKeyValue = ((3*9) - 10 + 7);// (key[24] = 700)
This though works on this problem and does not solve my programming problem.

Optional!
First of all, assuming the same rule applies, you don't need the $a array to know what prices are allowed on table $n
$table = $n; //$n being an integer
for ($i = 1; $i <= 10; $i++) {
$a[] = $i * pow(10, $n);
}
Will generate a perfectly valid array (where table #1 is 1-10, table #2 is 10-100 etc).
As for slicing it according to value, use a foreach loop and generate a new array, then stop when you hit the limit.
foreach ($a as $value) {
if ($value > $balance) { break; }
$allowedByTable[] = $value;
}
This will leave you with an array $allowedByTable that only has the possible bets which are lower then the user's current balance.
Important note
Even though you set what you think is right as options, never trust the user input and always validate the input on the server side. It's fairly trivial for someone to change the value in the combobox using DOM manipulation and bet on sums he's not supposed to have. Always check that the input you're getting is what you expect it to be!

Calculate average without being thrown by strays

I am trying to calculate an average without being thrown off by a small set of far off numbers (ie, 1,2,1,2,3,4,50) the single 50 will throw off the entire average.
If I have a list of numbers like so:
19,20,21,21,22,30,60,60
The average is 31
The median is 30
The mode is 21 & 60 (averaged to 40.5)
But anyone can see that the majority is in the range 19-22 (5 in, 3 out) and if you get the average of just the major range it's 20.6 (a big difference than any of the numbers above)
I am thinking that you can get this like so:
c+d-r
Where c is the count of a numbers, d is the distinct values, and r is the range. Then you can apply this to all the possble ranges, and the highest score is the omptimal range to get an average from.
For example 19,20,21,21,22 would be 5 numbers, 4 distinct values, and the range is 3 (22 - 19). If you plug this into my equation you get 5+4-3=6
If you applied this to the entire number list it would be 8+6-41=-27
I think this works pretty good, but I have to create a huge loop to test against all possible ranges. In just my small example there are 21 possible ranges:
19-19, 19-20, 19-21, 19-22, 19-30, 19-60, 20-20, 20-21, 20-22, 20-30, 20-60, 21-21, 21-22, 21-30, 21-60, 22-22, 22-30, 22-60, 30-30, 30-60, 60-60
I am wondering if there is a more efficient way to get an average like this.
Or if someone has a better algorithm all together?

You might get some use out of standard deviation here, which basically measures how concentrated the data points are. You can define an outlier as anything more than 1 standard deviation (or whatever other number suits you) from the average, throw them out, and calculate a new average that doesn't include them.

Here's a pretty naive implementation that you could fix up for your own needs. I purposely kept it pretty verbose. It's based on the five-number-summary often used to figure these things out.
function get_median($arr) {
sort($arr);
$c = count($arr) - 1;
if ($c%2) {
$b = round($c/2);
$a = $b-1;
return ($arr[$b] + $arr[$a]) / 2 ;
} else {
return $arr[($c/2)];
}
}
function get_five_number_summary($arr) {
sort($arr);
$c = count($arr) - 1;
$fns = array();
if ($c%2) {
$b = round($c/2);
$a = $b-1;
$lower_quartile = array_slice($arr, 1, $a-1);
$upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
$fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
return $fns;
}
else {
$b = round($c/2);
$a = $b-1;
$lower_quartile = array_slice($arr, 1, $a);
$upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
$fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
return $fns;
}
}
function find_outliers($arr) {
$fns = get_five_number_summary($arr);
$interquartile_range = $fns[3] - $fns[1];
$low = $fns[1] - $interquartile_range;
$high = $fns[3] + $interquartile_range;
foreach ($arr as $v) {
if ($v > $high || $v < $low)
echo "$v is an outlier<br>";
}
}
//$numbers = array( 19,20,21,21,22,30,60 ); // 60 is an outlier
$numbers = array( 1,230,239,331,340,800); // 1 is an outlier, 800 is an outlier
find_outliers($numbers);
Note that this method, albeit much simpler to implement than standard deviation, will not find the two 60 outliers in your example, but it works pretty well. Use the code for whatever, hopefully it's useful!
To see how the algorithm works and how I implemented it, go to: http://www.mathwords.com/o/outlier.htm
This, of course, doesn't calculate the final average, but it's kind of trivial after you run find_outliers() :P

Why don't you use the median? It's not 30, it's 21.5.

You could put the values into an array, sort the array, and then find the median, which is usually a better number than the average anyway because it discounts outliers automatically, giving them no more weight than any other number.

You might sort your numbers, choose your preferred subrange (e.g., the middle 90%), and take the mean of that.
There is no one true answer to your question, because there are always going to be distributions that will give you a funny answer (e.g., consider a biased bi-modal distribution). This is why may statistics are often presented using box-and-whisker diagrams showing mean, median, quartiles, and outliers.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Constructing an Array for Dataset - Different Start Points? - php

Related

How to calculate the difference between minutes of a series of time intervals

PHP - Optimize finding closest point in an Array

How can I gradually make an array sparser?

Slicing / Limiting an Array by Value

Calculate average without being thrown by strays

Categories

Resources