Find the year with the highest population (most efficient solution)

Find the year with the highest population (most efficient solution) - php

Given two arrays; $births containing a list of birth years indicating when someone was born, and $deaths containing a list of death years indicating when someone died, how can we find the year on which the population was highest?
For example given the following arrays:
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
The year on which the population was highest should be 1996, because 3 people were alive during that year, which was the highest population count of all those years.
Here's the running math on that:
| Birth | Death | Population |
|-------|-------|------------|
| 1981 | | 1 |
| 1984 | | 2 |
| 1984 | 1984 | 2 |
| 1991 | 1991 | 2 |
| 1996 | | 3 |
Assumptions
We can safely assume that the year on which someone is born the population can increase by one and the year on which someone died the population can decrease by one. So in this example, 2 people were born on 1984 and 1 person died on 1984, meaning the population increased by 1 on that year.
We can also safely assume that the number of deaths will never exceed the number of births and that no death can occur when the population is at 0.
We can also safely assume that the years in both $deaths and $births will never be negative or floating point values (they're always positive integers greater than 0).
We cannot assume that the arrays will be sorted or that there won't be duplicate values, however.
Requirements
We must write a function to return the year on which the highest population occurred, given these two arrays as input. The function may return 0, false, "", or NULL (any falsey value is acceptable) if the input arrays are empty or if the population was always at 0 throughout. If the highest population occurred on multiple years the function may return the first year on which the highest population was reached or any subsequent year.
For example:
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
/* The highest population was 3 on 1997, 1998 and 1999, either answer is correct */
Additionally, including the Big O of the solution would be helpful.
My best attempt at doing this would be the following:
function highestPopulationYear(Array $births, Array $deaths): Int {
sort($births);
sort($deaths);
$nextBirthYear = reset($births);
$nextDeathYear = reset($deaths);
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = max(0, ...$years);
} else {
$currentYear = 0;
}
$maxYear = $maxPopulation = $currentPopulation = 0;
while(current($births) !== false || current($deaths) !== false || $years) {
while($currentYear === $nextBirthYear) {
$currentPopulation++;
$nextBirthYear = next($births);
}
while($currentYear === $nextDeathYear) {
$currentPopulation--;
$nextDeathYear = next($deaths);
}
if ($currentPopulation >= $maxPopulation) {
$maxPopulation = $currentPopulation;
$maxYear = $currentYear;
}
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = min($years);
} else {
$currentYear = 0;
}
}
return $maxYear;
}
The algorithm above should work in polynomial time given it is at worst O(((n log n) * 2) + k) where n is number of elements to be sorted from each array and k is number of birth years (since we know that k is always k >= y) where y is number of death years. However, I'm not sure if there is a more efficient solution.
My interests are purely in an improved Big O of computational complexity upon the existing algorithm. Memory complexity is of no concern. Nor is the runtime optimization. At least it's not a primary concern. Any minor/major runtime optimizations are welcome, but not the key factor here.

We can solve this in linear time with bucket sort. Let's say the size of the input is n, and the range of years is m.
O(n): Find the min and max year across births and deaths.
O(m): Create an array of size max_yr - min_yr + 1, ints initialized to zero.
Treat the first cell of the array as min_yr, the next as min_yr+1, etc...
O(n): Parse the births array, incrementing the appropriate index of the array.
arr[birth_yr - min_yr] += 1
O(n): Ditto for deaths, decrementing the appropriate index of the array.
arr[death_yr - min_yr] -= 1
O(m): Parse your array, keeping track of the cumulative sum and its max value.
The largest cumulative maximum is your answer.
The running time is O(n+m), and the additional space needed is O(m).
This is a linear solution in n if m is O(n); i.e., if the range of years isn't growing more quickly than the number of births and deaths. This is almost certainly true for real world data.

I think we can have O(n log n) time with O(1) additional space by first sorting, then maintaining a current population and global maximum as we iterate. I tried to use the current year as a reference point but the logic still seemed a bit tricky so I'm not sure it's completely worked out. Hopefully, it can give an idea of the approach.
JavaScript code (counterexamples/bugs welcome)
function f(births, deaths){
births.sort((a, b) => a - b);
deaths.sort((a, b) => a - b);
console.log(JSON.stringify(births));
console.log(JSON.stringify(deaths));
let i = 0;
let j = 0;
let year = births[i];
let curr = 0;
let max = curr;
while (deaths[j] < births[0])
j++;
while (i < births.length || j < deaths.length){
while (year == births[i]){
curr = curr + 1;
i = i + 1;
}
if (j == deaths.length || year < deaths[j]){
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
} else if (j < deaths.length && deaths[j] == year){
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
if (j < deaths.length && deaths[j] > year && (i == births.length || deaths[j] < births[i])){
year = deaths[j];
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
year = births[i];
}
return max;
}
var input = [
[[1997, 1997, 1997, 1998, 1999],
[1998, 1999]],
[[1, 2, 2, 3, 4],
[1, 2, 2, 5]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1984, 1997]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1982, 1984, 1997]]
]
for (let [births, deaths] of input)
console.log(f(births, deaths));
If the year range, m, is on the order of n, we could store the counts for each year in the range and have O(n) time complexity. If we wanted to get fancy, we could also have O(n * log log m) time complexity, by using a Y-fast trie that allows successor lookup in O(log log m) time.

First aggregate the births and deaths into a map (year => population change), sort that by key, and calculate the running population over that.
This should be approximately O(2n + n log n), where n is the number of births.
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
function highestPopulationYear(array $births, array $deaths): ?int
{
$indexed = [];
foreach ($births as $birth) {
$indexed[$birth] = ($indexed[$birth] ?? 0) + 1;
}
foreach ($deaths as $death) {
$indexed[$death] = ($indexed[$death] ?? 0) - 1;
}
ksort($indexed);
$maxYear = null;
$max = $current = 0;
foreach ($indexed as $year => $change) {
$current += $change;
if ($current >= $max) {
$max = $current;
$maxYear = $year;
}
}
return $maxYear;
}
var_dump(highestPopulationYear($births, $deaths));

I solved this problem with a memory requirement of O(n+m) [in worst case, best case O(n)]
and, time complexity of O(n logn).
Here, n & m are the length of births and deaths arrays.
I don't know PHP or javascript. I've implemented it with Java and the logic is very simple. But I believe my idea can be implemented in those languages as well.
Technique Details:
I used java TreeMap structure to store births and deaths records.
TreeMap inserts data sorted (key based) as (key, value) pair, here key is the year and value is the cumulative sum of births & deaths (negative for deaths).
We don't need to insert deaths value that happened after the highest birth year.
Once the TreeMap is populated with the births & deaths records, all the cumulative sums are updated and store the maximum population with year as it progressed.
Sample input & output: 1
Births: [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906]
Deaths: [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915]
Year counts Births: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1911=2, 1914=1, 1919=2}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1910=-1, 1911=0, 1912=-1, 1913=-1, 1914=-2, 1915=-2, 1919=2}
Yearwise population: {1900=2, 1901=3, 1903=4, 1904=5, 1906=6, 1908=9, 1909=10, 1910=9, 1911=9, 1912=8, 1913=7, 1914=5, 1915=3, 1919=5}
maxPopulation: 10
yearOfMaxPopulation: 1909
Sample input & output: 2
Births: [1906, 1901, 1911, 1902, 1905, 1911, 1902, 1905, 1910, 1912, 1900, 1900, 1904, 1913, 1904]
Deaths: [1917, 1908, 1918, 1915, 1907, 1907, 1917, 1917, 1912, 1913, 1905, 1914]
Year counts Births: {1900=2, 1901=1, 1902=2, 1904=2, 1905=2, 1906=1, 1910=1, 1911=2, 1912=1, 1913=1}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1902=2, 1904=2, 1905=1, 1906=1, 1907=-2, 1908=-1, 1910=1, 1911=2, 1912=0, 1913=0}
Yearwise population: {1900=2, 1901=3, 1902=5, 1904=7, 1905=8, 1906=9, 1907=7, 1908=6, 1910=7, 1911=9, 1912=9, 1913=9}
maxPopulation: 9
yearOfMaxPopulation: 1906
Here, deaths occurred (1914 & later) after the last birth year 1913, was not counted at all, that avoids unnecessary computations.
For a total of 10 million data (births & deaths combined) and over 1000 years range, the program took about 3 sec. to finish.
If same size data with 100 years range, it took 1.3 sec.
All the inputs are randomly taken.

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
$years = array_unique(array_merge($births, $deaths));
sort($years);
$increaseByYear = array_count_values($births);
$decreaseByYear = array_count_values($deaths);
$populationByYear = array();
foreach ($years as $year) {
$increase = $increaseByYear[$year] ?? 0;
$decrease = $decreaseByYear[$year] ?? 0;
$previousPopulationTally = end($populationByYear);
$populationByYear[$year] = $previousPopulationTally + $increase - $decrease;
}
$maxPopulation = max($populationByYear);
$maxPopulationYears = array_keys($populationByYear, $maxPopulation);
$maxPopulationByYear = array_fill_keys($maxPopulationYears, $maxPopulation);
print_r($maxPopulationByYear);
This will account for the possibility of a tied year, as well as if a year of someone's death does not correspond to someone's birth.

Memory wise it is to keep currentPopulation and currentYear calculated. Starting by sorting both $births and $deaths arrays is a very good point, because bubble sorting is not that heavy task, yet allows to cut some corners:
<?php
$births = [1997, 1999, 2000];
$deaths = [2000, 2001, 2001];
function highestPopulationYear(array $births, array $deaths): Int {
// sort takes time, but is neccesary for futher optimizations
sort($births);
sort($deaths);
// first death year is a first year where population might decrase
// sorfar max population
$currentYearComputing = $deaths[0];
// year before first death has potential of having the biggest population
$maxY = $currentYearComputing-1;
// calculating population at the begining of the year of first death, start maxPopulation
$population = $maxPop = count(array_splice($births, 0, array_search($deaths[0], $births)));
// instead of every time empty checks: `while(!empty($deaths) || !empty($births))`
// we can control a target time. It reserves a memory, but this slot is decreased
// every iteration.
$iterations = count($deaths) + count($births);
while($iterations > 0) {
while(current($births) === $currentYearComputing) {
$population++;
$iterations--;
array_shift($births); // decreasing memory usage
}
while(current($deaths) === $currentYearComputing) {
$population--;
$iterations--;
array_shift($deaths); // decreasing memory usage
}
if ($population > $maxPop) {
$maxPop = $population;
$maxY = $currentYearComputing;
}
// In $iterations we have a sum of birth/death events left. Assuming all
// are births, if this number added to currentPopulation will never exceed
// current maxPoint, we can break the loop and save some time at cost of
// some memory.
if ($maxPop >= ($population+$iterations)) {
break;
}
$currentYearComputing++;
}
return $maxY;
}
echo highestPopulationYear($births, $deaths);
not really keen on diving into Big O thing, left it to you.
Also, if you rediscover currentYearComputing every loop, you can change loops into if statements and leave with just one loop.
while($iterations > 0) {
$changed = false;
if(current($births) === $currentYearComputing) {
// ...
$changed = array_shift($births); // decreasing memory usage
}
if(current($deaths) === $currentYearComputing) {
// ...
$changed = array_shift($deaths); // decreasing memory usage
}
if ($changed === false) {
$currentYearComputing++;
continue;
}

I fill very comfortable of this solution, the complexity Big O is n + m
<?php
function getHighestPopulation($births, $deaths){
$max = [];
$currentMax = 0;
$tmpArray = [];
foreach($deaths as $key => $death){
if(!isset($tmpArray[$death])){
$tmpArray[$death] = 0;
}
$tmpArray[$death]--;
}
foreach($births as $k => $birth){
if(!isset($tmpArray[$birth])){
$tmpArray[$birth] = 0;
}
$tmpArray[$birth]++;
if($tmpArray[$birth] > $currentMax){
$max = [$birth];
$currentMax = $tmpArray[$birth];
} else if ($tmpArray[$birth] == $currentMax) {
$max[] = $birth;
}
}
return [$currentMax, $max];
}
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
print_r (getHighestPopulation($births, $deaths));
?>

One of most simple and clear approach for your problem.
$births = [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906];
$deaths = [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915];
/* for generating 1 million records
for($i=1;$i<=1000000;$i++) {
$births[] = rand(1900, 2020);
$deaths[] = rand(1900, 2020);
}
*/
function highestPopulationYear(Array $births, Array $deaths): Int {
$start_time = microtime(true);
$population = array_count_values($births);
$deaths = array_count_values($deaths);
foreach ($deaths as $year => $death) {
$population[$year] = ($population[$year] ?? 0) - $death;
}
ksort($population, SORT_NUMERIC);
$cumulativeSum = $maxPopulation = $maxYear = 0;
foreach ($population as $year => &$number) {
$cumulativeSum += $number;
if($maxPopulation < $cumulativeSum) {
$maxPopulation = $cumulativeSum;
$maxYear = $year;
}
}
print " Execution time of function = ".((microtime(true) - $start_time)*1000)." milliseconds";
return $maxYear;
}
print highestPopulationYear($births, $deaths);
output:
1909
complexity:
O(m + log(n))

Related

PHP: Optimizing array iteration

i am working on an algorithm for sorting teams based on highest number of score. Teams are to be generated from a list of players. The conditions for creating a team is
It should have 6 players.
The collective salary for 6 players must be less than or equal to 50K.
Teams are to be generated based on highest collective projection.
What i did to get this result is generate all possibilities of team then run checks on them to exclude those teams that have more than 50K salary and then sort the remainder based on projection. But generating all the possibilities takes a lot of time and sometimes it consume all the memory. For a list of 160 players it takes around 90 seconds. Here is the code
$base_array = array();
$query1 = mysqli_query($conn, "SELECT * FROM temp_players ORDER BY projection DESC");
while($row1 = mysqli_fetch_array($query1))
{
$player = array();
$mma_id = $row1['mma_player_id'];
$salary = $row1['salary'];
$projection = $row1['projection'];
$wclass = $row1['wclass'];
array_push($player, $mma_id);
array_push($player, $salary);
array_push($player, $projection);
array_push($player, $wclass);
array_push($base_array, $player);
}
$result_base_array = array();
$totalsalary = 0;
for($i=0; $i<count($base_array)-5; $i++)
{
for($j=$i+1; $j<count($base_array)-4; $j++)
{
for($k=$j+1; $k<count($base_array)-3; $k++)
{
for($l=$k+1; $l<count($base_array)-2; $l++)
{
for($m=$l+1; $m<count($base_array)-1; $m++)
{
for($n=$m+1; $n<count($base_array)-0; $n++)
{
$totalsalary = $base_array[$i][1]+$base_array[$j][1]+$base_array[$k][1]+$base_array[$l][1]+$base_array[$m][1]+$base_array[$n][1];
$totalprojection = $base_array[$i][2]+$base_array[$j][2]+$base_array[$k][2]+$base_array[$l][2]+$base_array[$m][2]+$base_array[$n][2];
if($totalsalary <= 50000)
{
array_push($result_base_array,
array($base_array[$i], $base_array[$j], $base_array[$k], $base_array[$l], $base_array[$m], $base_array[$n],
$totalprojection, $totalsalary)
);
}
}
}
}
}
}
}
usort($result_base_array, "cmp");
And the cmp function
function cmp($a, $b) {
if ($a[6] == $b[6]) {
return 0;
}
return ($a[6] < $b[6]) ? 1 : -1;
}
Is there anyway to reduce the time it takes to do this task, or any other workaround for getting the desired number of teams
Regards

Because number of elements in array can be very big (for example 100 players can generate 1.2*10^9 teams), you can't hold it in memory. Try to save resulting array to file by parts (truncate array after each save). Then use external file sorting.
It will be slow, but at least it will not fall because of memory.
If you need top n teams (like 10 teams with highest projection) then you should convert code that generates result_base_array to Generator, so it will yield next team instead of pushing it into array. Then iterate over this generator. On each iteration add new item to sorted resulted array and cut redundant elements.

Depending on whether the salaries are often the cause of exclusion, you could perform tests on this in the other loops as well. If after 4 player selections their summed salaries are already above 50K, there is no use to select the remaining 2 players. This could save you some iterations.
This can be further improved by remembering the lowest 6 salaries in the pack, and then check if after selecting 4 members you would still stay under 50K if you would add the 2 lowest existing salaries. If this is not possible, then again it is of no use to try to add the two remaining players. Of course, this can be done at each stage of the selection (after selecting 1 player, 2 players, ...)
Another related improvement comes into play when you sort your data by ascending salary. If after selecting the 4th player, the above logic brings you to conclude you cannot stay under 50K by adding 2 more players, then there is no use to replace the 4th player with the next one in the data series either: that player would have a greater salary, so it would also yield to a total above 50K. So that means you can backtrack immediately and work on the 3rd player selection.
As others pointed out, the number of potential solutions is enormous. For 160 teams and a team size of 6 members, the number of combinations is:
160 . 159 . 158 . 157 . 156 . 155
--------------------------------- = 21 193 254 160
6 . 5 . 4 . 3 . 2
21 billion entries is a stretch for memory, and probably not useful to you either: will you really be interested in the team at the 4 432 456 911th place?
You'll probably be interested in something like the top-10 of those teams (in terms of projection). This you can achieve by keeping a list of 10 best teams, and then, when you get a new team with an acceptable salary, you add it to that list, keeping it sorted (via a binary search), and ejecting the entry with the lowest projection from that top-10.
Here is the code you could use:
$base_array = array();
// Order by salary, ascending, and only select what you need
$query1 = mysqli_query($conn, "
SELECT mma_player_id, salary, projection, wclass
FROM temp_players
ORDER BY salary ASC");
// Specify with option argument that you only need the associative keys:
while($row1 = mysqli_fetch_array($query1, MYSQLI_ASSOC)) {
// Keep the named keys, it makes interpreting the data easier:
$base_array[] = $row1;
}
function combinations($base_array, $salary_limit, $team_size) {
// Get lowest salaries, so we know the least value that still needs to
// be added when composing a team. This will allow an early exit when
// the cumulative salary is already too great to stay under the limit.
$remaining_salary = [];
foreach ($base_array as $i => $row) {
if ($i == $team_size) break;
array_unshift($remaining_salary, $salary_limit);
$salary_limit -= $row['salary'];
}
$result = [];
$stack = [0];
$sum_salary = [0];
$sum_projection = [0];
$index = 0;
while (true) {
$player = $base_array[$stack[$index]];
if ($sum_salary[$index] + $player['salary'] <= $remaining_salary[$index]) {
$result[$index] = $player;
if ($index == $team_size - 1) {
// Use yield so we don't need to build an enormous result array:
yield [
"total_salary" => $sum_salary[$index] + $player['salary'],
"total_projection" => $sum_projection[$index] + $player['projection'],
"members" => $result
];
} else {
$index++;
$sum_salary[$index] = $sum_salary[$index-1] + $player['salary'];
$sum_projection[$index] = $sum_projection[$index-1] + $player['projection'];
$stack[$index] = $stack[$index-1];
}
} else {
$index--;
}
while (true) {
if ($index < 0) {
return; // all done
}
$stack[$index]++;
if ($stack[$index] <= count($base_array) - $team_size + $index) break;
$index--;
}
}
}
// Helper function to quickly find where to insert a value in an ordered list
function binary_search($needle, $haystack) {
$high = count($haystack)-1;
$low = 0;
while ($high >= $low) {
$mid = (int)floor(($high + $low) / 2);
$val = $haystack[$mid];
if ($needle < $val) {
$high = $mid - 1;
} elseif ($needle > $val) {
$low = $mid + 1;
} else {
return $mid;
}
}
return $low;
}
$top_team_count = 10; // set this to the desired size of the output
$top_teams = []; // this will be the output
$top_projections = [];
foreach(combinations($base_array, 50000, 6) as $team) {
$j = binary_search($team['total_projection'], $top_projections);
array_splice($top_teams, $j, 0, [$team]);
array_splice($top_projections, $j, 0, [$team['total_projection']]);
if (count($top_teams) > $top_team_count) {
// forget about lowest projection, to keep memory usage low
array_shift($top_teams);
array_shift($top_projections);
}
}
$top_teams = array_reverse($top_teams); // Put highest projection first
print_r($top_teams);
Have a look at the demo on eval.in, which just generates 12 players with random salary and projection data.
Final remarks
Even with the above mentioned optimisations, doing this for 160 teams might still require a lot of iterations. The more often the salaries amount to more than 50K, the better the performance will be. If this never happens, the algorithm cannot escape from having to look at each of the 21 billion combinations. If you would know beforehand that the 50K limit would not play any role, you would of course order the data by descending projection, like you originally did.
Another optimisation could be if you would feed back into the combination function the 10th highest team projection you have so far. The function could then eliminate combinations that would lead to a lower total projection. You could first take the 6 highest player projection values and use this to determine how high a partial team projection can still grow by adding the missing players. It might turn out that this becomes impossible after having selected a few players, and then you can skip some iterations, much like done on the basis of salaries.

Get the average of the last group in an array

I have two arrays of matching data. One with the values and the other with the time. I round time to the nearest 5 minutes and in result I have a new array with hours that repeat themself es. I group the equal hours, count them, summerize their respective values in the other array and take an average for that hour. here is the code:
$first_hour = array('11:01', '12:04', '13:00', '15:28', "15:43", "15:53", "15:55", "16:02", "16:05", "16:17", "16:15", "16:21", "16:25", "16:33", "16:35", "16:43", "16:45", "16:56", "16:58", "17:00", "17:04", "17:07", "17:19");
$values = array(12, 23, 5, 90, 12, 23, 45, 56, 12, 15, 43, 48, 54, 62, 52, 41, 74, 54, 84, 75, 96, 69, 36);
$minutes = 5;
$precision = 60 * $minutes;
$time = "00:00";
$average = 0;
//round first array to nearest 5 minutes
$timestamp = strtotime($first_hour);
$data_ora = date("H,i", round($timestamp / $precision) * $precision);
//the hours array will become like this:
$first_hour = array('11:00', '12:05', '13:00', '15:30', "15:45", "15:50", "15:55", "16:00", "16:05", "16:15", "16:15", "16:20", "16:25", "16:35", "16:35", "16:45", "16:45", "17:00", "17:00", "17:00", "17:05", "17:10", "17:20");
if ($first_hour == $time) {
if ($values[$i] > 0) {
$indexi = $indexi + 1;
$total = $total + $values;
echo "</br> total = ".$total;
$i++;
} else {
$time = date("H,i", round($timestamp / $precision) * $precision);
//echo "time has changed";
$average = $total / $index;
$dataora = date("Y,m,d", round($timestamp)) . "," . $time;
//fill in the vectors
$v[1] = $average;
$v[2] = $dataora;
echo "</br> average = " . $v[1];
$indexi = 0;
$totali = 0;
}
}
The output is like this:
"total = 2996
total = 3325
total = 3656
total = 3996
average = 333
total = 329
total = 652
total = 976
total = 1304
total = 1630
total = 1961
total = 2297
total = 2629
average = 328.625
total = 332
total = 660
total = 997
total = 1320
total = 1646
total = 1967
For each group there is an average calculated, except for the last group of values. How can I calculate its average? For the last group it gets into the first condition but it doesn't take a break (change hour) to fall into the else so that the average is calculated.

I'm almost certain the sample code you have provided doesn't match your real code, so I'm not going to try and fix that, but I will try and explain how you might fix the problem you claim to have in your real code.
Assumedly you have a loop of some sort that is iterating over the values in the $values array. I also assume that some of the values in that array are less than or equal to zero, and those are the points at which you display the average.
Your problem then is that your loop reaches the end of the array without having a chance to display the average for the last set of values.
One easy solution, is to add a zero to the end of the array if there isn't already one there. That way the last value the loop encounters will always be a zero, which will match the condition needed to display an average.
If that is not feasible, you could make your loop not stop when it reaches the end of the array, so if it's a while loop, it essentially becomes:
while (true) ...
Then you change your first condition to this:
if ($i < count($values) && $values[$i] > 0) {
so it only matches if you haven't gone past the end of the array. Otherwise it falls through to displaying the average.
And at the end of the average code, you add another check to to see if you're past the end of the array and break out of the loop.
if ($i >= count($values)) break;
These changes alone won't fix the sample code you've provided, but it might help if your real code matches the functionality you've described in the question.

More efficient than an if statement?

I'm making a game in PHP (don't ask lol), and the player has a location which is an integer. There's a travel page and this basically shows a 5x5 tiled map. Each tile is a different part of the player's universe. By clicking on it he can travel there.
Just to give you an idea of the integers behind the map:
11, 12, 13, 14, 15
21, 22, 23, 24, 25
31, 32, 33, 34, 35
41, 42, 43, 44, 45
51, 52, 53, 54, 55
Let's say the player starts at 33(the middle) and I wanted to charge him different rates depending on how far he traveled. So, for example, 1 tile in any direction is a 100 credits, 2 tiles is 200 and so on.
So what I came up with is this. $ol represents the player's current location and $nl is where they are travelling to...
if($ol-11==$nl || $ol-10==$nl || $ol-9==$nl || $ol+1==$nl || $ol+11==$nl || $ol+10==$nl || $ol+9==$nl || $ol-1==$nl || $ol-11==$nl ){
echo "cost 100 credits!";
}
else if($ol-22==$nl || $ol-21==$nl || $ol-20==$nl || $ol-19==$nl || $ol-18==$nl || $ol-8==$nl || $ol+2==$nl || $ol+12==$nl || $ol+22==$nl
|| $ol+21==$nl || $ol+20==$nl || $ol+19==$nl || $ol+18==$nl || $ol+8==$nl || $ol-2==$nl || $ol-12==$nl ){
echo "cost 200 credits!";
}
That's the code for 1 and 2 tile travel. As you can see it's a lengthy statement.
I basically worked out a pattern for the grid I'd set up. For example, travelling up 1 tile would always be -10 of the current tile.
Before I type out any more ridiculously long if statements, is there a neater or more efficient way to do this?

I would use a different method: As the first digit defines the row and the second digit the column, I would split the number in these two digits and use these numbers to determine how many rows and how many columns are being travelled.
So for any position:
$row = floor($tile_value / 10);
$column = $tile_value % 10;
With this it is easy to calculate distances.
Edit: A small example to measure absolute distances:
$row_org = floor($tile_org_value / 10);
$column_org = $tile_org_value % 10;
$row_new = floor($tile_new_value / 10);
$column_new = $tile_new_value % 10;
$row_diff = $row_new - $row_org;
$col_diff = $col_new - $col_org;
$distance = sqrt(pow($row_diff, 2) + pow($col_diff, 2));

As in my comment above, you cannot measure distance in units, since not all points can be reached in a straight line through points.
You need to consider these points to be points (x, y coordinates) on a graph. Then you can get the distance between any 2 points using Pythagoras.
For example, if we consider your top row as being the coordinates (1,1) (1,2) and so on, if the person travels from (1,1) to (4,3), the distance travelled is the square root of 3 (4-1) squared plus 2 (3-1) squared, i.e. sqrt(9+4) = sqrt(13)

I would probably try an array for coordinates. This will allow you to set the initial coordinates. You can then pass new coordinates to the function which will move the position and calculate the cost.
<?php
$array = array( );
//populate the array with 0's
for( $i = 1; $i <= 5; $i++ ) {
for( $j = 1; $j <= 5; $j++ ) {
$array[$i][$j] = 0;
}
}
//set beginning position
$array[3][3] = 1;
function newPosition( $array, $newX, $newY ) {
$oldX = 0;
$oldY = 0;
//locate current position
foreach($array as $key=>$subArray) {
foreach($subArray as $subKey=>$val) {
if($val === 1) {
$oldX = $key;
$oldY = $subKey;
}
}
}
//delete old position
$array[$oldX][$oldY] = 0;
//set new position
$array[$newX][$newY] = 1;
//Calculate x and y difference
$xTravel = abs($oldX - $newX);
$yTravel = abs($oldY - $newY);
//Add x and y difference
$totalTravel = $xTravel + $yTravel;
//Calculate the cost
$totalCost = $totalTravel * 100;
echo "cost $totalCost credits!\n";
return $array;
}
$array = newPosition( $array, 5, 2 );
$array = newPosition( $array, 1, 5 );
$array = newPosition( $array, 1, 5 );
$array = newPosition( $array, 3, 3 );
Output
cost 300 credits!
cost 700 credits!
cost 0 credits!
cost 400 credits!
See the demo

Your code seems legit. You could order the conditions so that the most used ones are first.

displaying axis from min to max value - calculating scale and labels

Writing a routine to display data on a horizontal axis (using PHP gd2, but that's not the point here).
The axis starts at $min to $max and displays a diamond at $result, such an image will be around 300px wide and 30px high, like this:
(source: testwolke.de)
In the example above, $min=0, $max=3, $result=0.6.
Now, I need to calculate a scale and labels that make sense, in the above example e.g. dotted lines at 0 .25 .50 .75 1 1.25 ... up to 3, with number-labels at 0 1 2 3.
If $min=-200 and $max=600, dotted lines should be at -200 -150 -100 -50 0 50 100 ... up to 600, with number-labels at -200 -100 0 100 ... up to 600.
With $min=.02and $max=5.80, dotted lines at .02 .5 1 1.5 2 2.5 ... 5.5 5.8 and numbers at .02 1 2 3 4 5 5.8.
I tried explicitly telling the function where to put dotted lines and numbers by arrays, but hey, it's the computer who's supposed to work, not me, right?!
So, how to calculate???

An algorithm (example values $min=-186 and $max=+153 as limits):
Take these two limits $min, $max and mark them if you wish
Calculate the difference between $max and $min: $diff = $max - $min
153 - (-186) = 339
Calculate 10th logarithm of the difference $base10 = log($diff,10) = 2,5302
Round down: $power = round($base10) = 2.
This is your tenth power as base unit
To calculate $step calculate this:
$base_unit = 10^$power = 100;
$step = $base_unit / 2; (if you want 2 ticks per one $base_unit).
Calculate if $min is divisible by $step, if not take the nearest (round up) one
(in the case of $step = 50 it is $loop_start = -150)
for ($i=$loop_start; $i<=$max; $i++=$step){ // $i's are your ticks
end
I tested it in Excel and it gives quite nice results, you may want to increase its functionality,
for example (in point 5) by calculating $step first from $diff,
say $step = $diff / 4 and round $step in such way that $base_unit is divisible by $step;
this will avoid such situations that you have between (101;201) four ticks with $step=25 and you have 39 steps $step=25 between 0 and 999.

ACM Algorithm 463 provides three simple functions to produce good axis scales with outputs xminp, xmaxp and dist for the minimum and maximum values on the scale and the distance between tick marks on the scale, given a request for n intervals that include the data points xmin and xmax:
Scale1() gives a linear scale with approximately n intervals and dist being an integer power of 10 times 1, 2 or 5.
Scale2() gives a linear scale with exactly n intervals (the gap between xminp and xmaxp tends to be larger than the gap produced by Scale1()).
Scale3() gives a logarithmic scale.
The original 1973 paper is online here, which provides more explanation than the code linked to above.
The code is in Fortran but it is just a set of arithmetical calculations so it is very straightforward to interpret and convert into other languages. I haven't written any PHP myself, but it looks a lot like C so you might want to start by running the code through f2c which should give you something close to runnable in PHP.
There are more complicated functions that give prettier scales (e.g. the ones in gnuplot), but Scale1() would likely do the job for you with minimal code.
(This answer builds on my answer to a previous question Graph axis calibration in C++)
(EDIT -- I've found an implementation of Scale1() that I did in Perl):
use strict;
sub scale1 ($$$) {
# from TOMS 463
# returns a suitable scale ($xMinp, $xMaxp, $dist), when called with
# the minimum and maximum x values, and an approximate number of intervals
# to divide into. $dist is the size of each interval that results.
# #vInt is an array of acceptable values for $dist.
# #sqr is an array of geometric means of adjacent values of #vInt, which
# is used as break points to determine which #vInt value to use.
#
my ($xMin, $xMax, $n) = #_;
#vInt = {1, 2, 5, 10};
#sqr = {1.414214, 3.162278, 7.071068 }
if ($xMin > $xMax) {
my ($tmp) = $xMin;
$xMin = $xMax;
$xMax = $tmp;
}
my ($del) = 0.0002; # accounts for computer round-off
my ($fn) = $n;
# find approximate interval size $a
my ($a) = ($xMax - $xMin) / $fn;
my ($al) = log10($a);
my ($nal) = int($al);
if ($a < 1) {
$nal = $nal - 1;
}
# $a is scaled into a variable named $b, between 1 and 10
my ($b) = $a / 10^$nal;
# the closest permissable value for $b is found)
my ($i);
for ($i = 0; $i < $_sqr; $i++) {
if ($b < $sqr[$i]) last;
}
# the interval size is computed
$dist = $vInt[$i] * 10^$nal;
$fm1 = $xMin / $dist;
$m1 = int($fm1);
if ($fm1 < 0) $m1--;
if (abs(($m1 + 1.0) - $fm1) < $del) $m1++;
# the new minimum and maximum limits are found
$xMinp = $dist * $m1;
$fm2 = $xMax / $dist;
$m2 = $fm2 + 1;
if ($fm2 < -1) $m2--;
if (abs ($fm2 + 1 - $m2) < $del) $m2--;
$xMaxp = $dist * $m2;
# adjust limits to account for round-off if necessary
if ($xMinp > $xMin) $xMinp = $xMin;
if ($xMaxp < $xMax) $xMaxp = $xMax;
return ($xMinp, $xMaxp, $dist);
}
sub scale1_Test {
$par = (-3.1, 11.1, 5,
5.2, 10.1, 5,
-12000, -100, 9);
print "xMin\txMax\tn\txMinp\txMaxp,dist\n";
for ($i = 0; $i < $_par/3; $i++) {
($xMinp, $xMaxp, $dist) = scale1($par[3*$i+0],
$par[3*$i+1], $par[3*$i+2]);
print "$par[3*$i+0]\t$par[3*$i+1]\t$par[3*$i+2]\t$xMinp\t$xMaxp,$dist\n";
}
}

I know that this isn't exactly what you are looking for, but hopefully it will get you started in the right direction.
$min = -200;
$max = 600;
$difference = $max - $min;
$labels = 10;
$picture_width = 300;
/* Get units per label */
$difference_between = $difference / ($labels - 1);
$width_between = $picture_width / $labels;
/* Make the label array */
$label_arr = array();
$label_arr[] = array('label' => $min, 'x_pos' => 0);
/* Loop through the number of labels */
for($i = 1, $l = $labels; $i < $l; $i++) {
$label = $min + ($difference_between * $i);
$label_arr[] = array('label' => $label, 'x_pos' => $width_between * $i);
}

A quick example would be something in the lines of $increment = ($max-$min)/$scale where you can tweak scale to be the variable by which the increment scales. Since you devide by it, it should change proportionately as your max and min values change. After that you will have a function like:
$end = false;
while($end==false){
$breakpoint = $last_value + $increment; // that's your current breakpoint
if($breakpoint > $max){
$end = true;
}
}
At least thats the concept... Let me know if you have troubles with it.

Replace duplicate values in array with new randomly generated values

I have below a function (from a previous question that went unanswered) that creates an array with n amount of values. The sum of the array is equal to $max.
function randomDistinctPartition($n, $max) {
$partition= array();
for ($i = 1; $i < $n; $i++) {
$maxSingleNumber = $max - $n;
$partition[] = $number = rand(1, $maxSingleNumber);
$max -= $number;
}
$partition[] = $max;
return $partition;
}
For example: If I set $n = 4 and $max = 30. Then I should get the following.
array(5, 7, 10, 8);
However, this function does not take into account duplicates and 0s. What I would like - and have been trying to accomplish - is to generate an array with unique numbers that add up to my predetermined variable $max. No Duplicate numbers and No 0 and/or negative integers.

Ok, this problem actually revolves around linear sequences. With a minimum value of 1 consider the sequence:
f(n) = 1 + 2 + ... + n - 1 + n
The sum of such a sequence is equal to:
f(n) = n * (n + 1) / 2
so for n = 4, as an example, the sum is 10. That means if you're selecting 4 different numbers the minimum total with no zeroes and no negatives is 10. Now go in reverse: if you have a total of 10 and 4 numbers then there is only one combination of (1,2,3,4).
So first you need to check if your total is at least as high as this lower bound. If it is less there is no combination. If it is equal, there is precisely one combination. If it is higher it gets more complicated.
Now imagine your constraints are a total of 12 with 4 numbers. We've established that f(4) = 10. But what if the first (lowest) number is 2?
2 + 3 + 4 + 5 = 14
So the first number can't be higher than 1. You know your first number. Now you generate a sequence of 3 numbers with a total of 11 (being 12 - 1).
1 + 2 + 3 = 6
2 + 3 + 4 = 9
3 + 4 + 5 = 12
The second number has to be 2 because it can't be one. It can't be 3 because the minimum sum of three numbers starting with 3 is 12 and we have to add to 11.
Now we find two numbers that add up to 9 (12 - 1 - 2) with 3 being the lowest possible.
3 + 4 = 7
4 + 5 = 9
The third number can be 3 or 4. With the third number found the last is fixed. The two possible combinations are:
1, 2, 3, 6
1, 2, 4, 5
You can turn this into a general algorithm. Consider this recursive implementation:
$all = all_sequences(14, 4);
echo "\nAll sequences:\n\n";
foreach ($all as $arr) {
echo implode(', ', $arr) . "\n";
}
function all_sequences($total, $num, $start = 1) {
if ($num == 1) {
return array($total);
}
$max = lowest_maximum($start, $num);
$limit = (int)(($total - $max) / $num) + $start;
$ret = array();
if ($num == 2) {
for ($i = $start; $i <= $limit; $i++) {
$ret[] = array($i, $total - $i);
}
} else {
for ($i = $start; $i <= $limit; $i++) {
$sub = all_sequences($total - $i, $num - 1, $i + 1);
foreach ($sub as $arr) {
array_unshift($arr, $i);
$ret[] = $arr;
}
}
}
return $ret;
}
function lowest_maximum($start, $num) {
return sum_linear($num) + ($start - 1) * $num;
}
function sum_linear($num) {
return ($num + 1) * $num / 2;
}
Output:
All sequences:
1, 2, 3, 8
1, 2, 4, 7
1, 2, 5, 6
1, 3, 4, 6
2, 3, 4, 5
One implementation of this would be to get all the sequences and select one at random. This has the advantage of equally weighting all possible combinations, which may or may not be useful or necessary to what you're doing.
That will become unwieldy with large totals or large numbers of elements, in which case the above algorithm can be modified to return a random element in the range from $start to $limit instead of every value.

I would use 'area under triangle' formula... like cletus(!?)
Im really gonna have to start paying more attention to things...
Anyway, i think this solution is pretty elegant now, it applies the desired minimum spacing between all elements, evenly, scales the gaps (distribution) evenly to maintain the original sum and does the job non-recursively (except for the sort):
Given an array a() of random numbers of length n
Generate a sort index s()
and work on the sorted intervals a(s(0))-a(s(1)), a(s(1))-a(s(2)) etc
increase each interval by the
desired minimum separation size eg 1
(this necessarily warps their
'randomness')
decrease each interval by a factor
calculated to restore the series sum
to what it is without the added
spacing.
If we add 1 to each of a series we increase the series sum by 1 * len
1 added to each of series intervals increases sum by:
len*(len+1)/2 //( ?pascal's triangle )
Draft code:
$series($length); //the input sequence
$seriesum=sum($series); //its sum
$minsepa=1; //minimum separation
$sorti=sort_index_of($series) //sorted index - php haz function?
$sepsum=$minsepa*($length*($length+1))/2;
//sum of extra separation
$unsepfactor100=($seriesum*100)/($seriesum+sepsum);
//scale factor for original separation to maintain size
//(*100~ for integer arithmetic)
$px=series($sorti(0)); //for loop needs the value of prev serie
for($x=1 ; $x < length; $x++)
{ $tx=$series($sorti($x)); //val of serie to
$series($sorti($x))= ($minsepa*$x) //adjust relative to prev
+ $px
+ (($tx-$px)*$unsepfactor100)/100;
$px=$tx; //store for next iteration
}
all intervals are reduced by a
constant (non-random-warping-factor)
separation can be set to values other
than one
implementantions need to be carefuly
tweaked (i usualy test&'calibrate')
to accomodate rounding errors.
Probably scale everything up by ~15
then back down after. Intervals should survive if done right.
After sort index is generated, shuffle the order of indexes to duplicate values to avoid runs in the sequence of collided series.
( or just shuffle final output if order never mattered )
Shuffle indexes of dupes:
for($x=1; $x<$len; $x++)
{ if ($series($srt($x))==$series($srt($x-1)))
{ if( random(0,1) )
{ $sw= $srt($x);
$srt($x)= $srt($x-1);
$srt($x-1)= $sw;
} } }
A kind of minimal disturbance can be done to a 'random sequence' by just parting dupes by the minimum required, rather than moving them more than minimum -some 'random' amount that was sought by the question.
The code here separates every element by the min separation, whether duplicate or not, that should be kindof evenhanded, but overdone maybe. The code could be modified to only separate the dupes by looking through the series(sorti(n0:n1..len)) for them and calculating sepsum as +=minsep*(len-n) for each dupe. Then the adjustment loop just has to test again for dupe before applying adjustment.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Find the year with the highest population (most efficient solution) - php

Related

PHP: Optimizing array iteration

Get the average of the last group in an array

More efficient than an if statement?

displaying axis from min to max value - calculating scale and labels

Replace duplicate values in array with new randomly generated values

Categories

Resources