Given two arrays; $births containing a list of birth years indicating when someone was born, and $deaths containing a list of death years indicating when someone died, how can we find the year on which the population was highest?
For example given the following arrays:
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
The year on which the population was highest should be 1996, because 3 people were alive during that year, which was the highest population count of all those years.
Here's the running math on that:
| Birth | Death | Population |
|-------|-------|------------|
| 1981 | | 1 |
| 1984 | | 2 |
| 1984 | 1984 | 2 |
| 1991 | 1991 | 2 |
| 1996 | | 3 |
Assumptions
We can safely assume that the year on which someone is born the population can increase by one and the year on which someone died the population can decrease by one. So in this example, 2 people were born on 1984 and 1 person died on 1984, meaning the population increased by 1 on that year.
We can also safely assume that the number of deaths will never exceed the number of births and that no death can occur when the population is at 0.
We can also safely assume that the years in both $deaths and $births will never be negative or floating point values (they're always positive integers greater than 0).
We cannot assume that the arrays will be sorted or that there won't be duplicate values, however.
Requirements
We must write a function to return the year on which the highest population occurred, given these two arrays as input. The function may return 0, false, "", or NULL (any falsey value is acceptable) if the input arrays are empty or if the population was always at 0 throughout. If the highest population occurred on multiple years the function may return the first year on which the highest population was reached or any subsequent year.
For example:
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
/* The highest population was 3 on 1997, 1998 and 1999, either answer is correct */
Additionally, including the Big O of the solution would be helpful.
My best attempt at doing this would be the following:
function highestPopulationYear(Array $births, Array $deaths): Int {
sort($births);
sort($deaths);
$nextBirthYear = reset($births);
$nextDeathYear = reset($deaths);
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = max(0, ...$years);
} else {
$currentYear = 0;
}
$maxYear = $maxPopulation = $currentPopulation = 0;
while(current($births) !== false || current($deaths) !== false || $years) {
while($currentYear === $nextBirthYear) {
$currentPopulation++;
$nextBirthYear = next($births);
}
while($currentYear === $nextDeathYear) {
$currentPopulation--;
$nextDeathYear = next($deaths);
}
if ($currentPopulation >= $maxPopulation) {
$maxPopulation = $currentPopulation;
$maxYear = $currentYear;
}
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = min($years);
} else {
$currentYear = 0;
}
}
return $maxYear;
}
The algorithm above should work in polynomial time given it is at worst O(((n log n) * 2) + k) where n is number of elements to be sorted from each array and k is number of birth years (since we know that k is always k >= y) where y is number of death years. However, I'm not sure if there is a more efficient solution.
My interests are purely in an improved Big O of computational complexity upon the existing algorithm. Memory complexity is of no concern. Nor is the runtime optimization. At least it's not a primary concern. Any minor/major runtime optimizations are welcome, but not the key factor here.
We can solve this in linear time with bucket sort. Let's say the size of the input is n, and the range of years is m.
O(n): Find the min and max year across births and deaths.
O(m): Create an array of size max_yr - min_yr + 1, ints initialized to zero.
Treat the first cell of the array as min_yr, the next as min_yr+1, etc...
O(n): Parse the births array, incrementing the appropriate index of the array.
arr[birth_yr - min_yr] += 1
O(n): Ditto for deaths, decrementing the appropriate index of the array.
arr[death_yr - min_yr] -= 1
O(m): Parse your array, keeping track of the cumulative sum and its max value.
The largest cumulative maximum is your answer.
The running time is O(n+m), and the additional space needed is O(m).
This is a linear solution in n if m is O(n); i.e., if the range of years isn't growing more quickly than the number of births and deaths. This is almost certainly true for real world data.
I think we can have O(n log n) time with O(1) additional space by first sorting, then maintaining a current population and global maximum as we iterate. I tried to use the current year as a reference point but the logic still seemed a bit tricky so I'm not sure it's completely worked out. Hopefully, it can give an idea of the approach.
JavaScript code (counterexamples/bugs welcome)
function f(births, deaths){
births.sort((a, b) => a - b);
deaths.sort((a, b) => a - b);
console.log(JSON.stringify(births));
console.log(JSON.stringify(deaths));
let i = 0;
let j = 0;
let year = births[i];
let curr = 0;
let max = curr;
while (deaths[j] < births[0])
j++;
while (i < births.length || j < deaths.length){
while (year == births[i]){
curr = curr + 1;
i = i + 1;
}
if (j == deaths.length || year < deaths[j]){
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
} else if (j < deaths.length && deaths[j] == year){
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
if (j < deaths.length && deaths[j] > year && (i == births.length || deaths[j] < births[i])){
year = deaths[j];
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
year = births[i];
}
return max;
}
var input = [
[[1997, 1997, 1997, 1998, 1999],
[1998, 1999]],
[[1, 2, 2, 3, 4],
[1, 2, 2, 5]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1984, 1997]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1982, 1984, 1997]]
]
for (let [births, deaths] of input)
console.log(f(births, deaths));
If the year range, m, is on the order of n, we could store the counts for each year in the range and have O(n) time complexity. If we wanted to get fancy, we could also have O(n * log log m) time complexity, by using a Y-fast trie that allows successor lookup in O(log log m) time.
First aggregate the births and deaths into a map (year => population change), sort that by key, and calculate the running population over that.
This should be approximately O(2n + n log n), where n is the number of births.
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
function highestPopulationYear(array $births, array $deaths): ?int
{
$indexed = [];
foreach ($births as $birth) {
$indexed[$birth] = ($indexed[$birth] ?? 0) + 1;
}
foreach ($deaths as $death) {
$indexed[$death] = ($indexed[$death] ?? 0) - 1;
}
ksort($indexed);
$maxYear = null;
$max = $current = 0;
foreach ($indexed as $year => $change) {
$current += $change;
if ($current >= $max) {
$max = $current;
$maxYear = $year;
}
}
return $maxYear;
}
var_dump(highestPopulationYear($births, $deaths));
I solved this problem with a memory requirement of O(n+m) [in worst case, best case O(n)]
and, time complexity of O(n logn).
Here, n & m are the length of births and deaths arrays.
I don't know PHP or javascript. I've implemented it with Java and the logic is very simple. But I believe my idea can be implemented in those languages as well.
Technique Details:
I used java TreeMap structure to store births and deaths records.
TreeMap inserts data sorted (key based) as (key, value) pair, here key is the year and value is the cumulative sum of births & deaths (negative for deaths).
We don't need to insert deaths value that happened after the highest birth year.
Once the TreeMap is populated with the births & deaths records, all the cumulative sums are updated and store the maximum population with year as it progressed.
Sample input & output: 1
Births: [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906]
Deaths: [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915]
Year counts Births: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1911=2, 1914=1, 1919=2}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1910=-1, 1911=0, 1912=-1, 1913=-1, 1914=-2, 1915=-2, 1919=2}
Yearwise population: {1900=2, 1901=3, 1903=4, 1904=5, 1906=6, 1908=9, 1909=10, 1910=9, 1911=9, 1912=8, 1913=7, 1914=5, 1915=3, 1919=5}
maxPopulation: 10
yearOfMaxPopulation: 1909
Sample input & output: 2
Births: [1906, 1901, 1911, 1902, 1905, 1911, 1902, 1905, 1910, 1912, 1900, 1900, 1904, 1913, 1904]
Deaths: [1917, 1908, 1918, 1915, 1907, 1907, 1917, 1917, 1912, 1913, 1905, 1914]
Year counts Births: {1900=2, 1901=1, 1902=2, 1904=2, 1905=2, 1906=1, 1910=1, 1911=2, 1912=1, 1913=1}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1902=2, 1904=2, 1905=1, 1906=1, 1907=-2, 1908=-1, 1910=1, 1911=2, 1912=0, 1913=0}
Yearwise population: {1900=2, 1901=3, 1902=5, 1904=7, 1905=8, 1906=9, 1907=7, 1908=6, 1910=7, 1911=9, 1912=9, 1913=9}
maxPopulation: 9
yearOfMaxPopulation: 1906
Here, deaths occurred (1914 & later) after the last birth year 1913, was not counted at all, that avoids unnecessary computations.
For a total of 10 million data (births & deaths combined) and over 1000 years range, the program took about 3 sec. to finish.
If same size data with 100 years range, it took 1.3 sec.
All the inputs are randomly taken.
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
$years = array_unique(array_merge($births, $deaths));
sort($years);
$increaseByYear = array_count_values($births);
$decreaseByYear = array_count_values($deaths);
$populationByYear = array();
foreach ($years as $year) {
$increase = $increaseByYear[$year] ?? 0;
$decrease = $decreaseByYear[$year] ?? 0;
$previousPopulationTally = end($populationByYear);
$populationByYear[$year] = $previousPopulationTally + $increase - $decrease;
}
$maxPopulation = max($populationByYear);
$maxPopulationYears = array_keys($populationByYear, $maxPopulation);
$maxPopulationByYear = array_fill_keys($maxPopulationYears, $maxPopulation);
print_r($maxPopulationByYear);
This will account for the possibility of a tied year, as well as if a year of someone's death does not correspond to someone's birth.
Memory wise it is to keep currentPopulation and currentYear calculated. Starting by sorting both $births and $deaths arrays is a very good point, because bubble sorting is not that heavy task, yet allows to cut some corners:
<?php
$births = [1997, 1999, 2000];
$deaths = [2000, 2001, 2001];
function highestPopulationYear(array $births, array $deaths): Int {
// sort takes time, but is neccesary for futher optimizations
sort($births);
sort($deaths);
// first death year is a first year where population might decrase
// sorfar max population
$currentYearComputing = $deaths[0];
// year before first death has potential of having the biggest population
$maxY = $currentYearComputing-1;
// calculating population at the begining of the year of first death, start maxPopulation
$population = $maxPop = count(array_splice($births, 0, array_search($deaths[0], $births)));
// instead of every time empty checks: `while(!empty($deaths) || !empty($births))`
// we can control a target time. It reserves a memory, but this slot is decreased
// every iteration.
$iterations = count($deaths) + count($births);
while($iterations > 0) {
while(current($births) === $currentYearComputing) {
$population++;
$iterations--;
array_shift($births); // decreasing memory usage
}
while(current($deaths) === $currentYearComputing) {
$population--;
$iterations--;
array_shift($deaths); // decreasing memory usage
}
if ($population > $maxPop) {
$maxPop = $population;
$maxY = $currentYearComputing;
}
// In $iterations we have a sum of birth/death events left. Assuming all
// are births, if this number added to currentPopulation will never exceed
// current maxPoint, we can break the loop and save some time at cost of
// some memory.
if ($maxPop >= ($population+$iterations)) {
break;
}
$currentYearComputing++;
}
return $maxY;
}
echo highestPopulationYear($births, $deaths);
not really keen on diving into Big O thing, left it to you.
Also, if you rediscover currentYearComputing every loop, you can change loops into if statements and leave with just one loop.
while($iterations > 0) {
$changed = false;
if(current($births) === $currentYearComputing) {
// ...
$changed = array_shift($births); // decreasing memory usage
}
if(current($deaths) === $currentYearComputing) {
// ...
$changed = array_shift($deaths); // decreasing memory usage
}
if ($changed === false) {
$currentYearComputing++;
continue;
}
I fill very comfortable of this solution, the complexity Big O is n + m
<?php
function getHighestPopulation($births, $deaths){
$max = [];
$currentMax = 0;
$tmpArray = [];
foreach($deaths as $key => $death){
if(!isset($tmpArray[$death])){
$tmpArray[$death] = 0;
}
$tmpArray[$death]--;
}
foreach($births as $k => $birth){
if(!isset($tmpArray[$birth])){
$tmpArray[$birth] = 0;
}
$tmpArray[$birth]++;
if($tmpArray[$birth] > $currentMax){
$max = [$birth];
$currentMax = $tmpArray[$birth];
} else if ($tmpArray[$birth] == $currentMax) {
$max[] = $birth;
}
}
return [$currentMax, $max];
}
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
print_r (getHighestPopulation($births, $deaths));
?>
One of most simple and clear approach for your problem.
$births = [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906];
$deaths = [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915];
/* for generating 1 million records
for($i=1;$i<=1000000;$i++) {
$births[] = rand(1900, 2020);
$deaths[] = rand(1900, 2020);
}
*/
function highestPopulationYear(Array $births, Array $deaths): Int {
$start_time = microtime(true);
$population = array_count_values($births);
$deaths = array_count_values($deaths);
foreach ($deaths as $year => $death) {
$population[$year] = ($population[$year] ?? 0) - $death;
}
ksort($population, SORT_NUMERIC);
$cumulativeSum = $maxPopulation = $maxYear = 0;
foreach ($population as $year => &$number) {
$cumulativeSum += $number;
if($maxPopulation < $cumulativeSum) {
$maxPopulation = $cumulativeSum;
$maxYear = $year;
}
}
print " Execution time of function = ".((microtime(true) - $start_time)*1000)." milliseconds";
return $maxYear;
}
print highestPopulationYear($births, $deaths);
output:
1909
complexity:
O(m + log(n))
I tried asking this earlier, but I don't think I phrased the question correctly so I worked out something that got me the result I was after and now am hoping that it will help someone help me.
Problem: I have 10 items. If you buy 1, it's $10. I will sell you the second one for $9. I will sell you the third item for $8. I will keep taking off money until we get to $5/item because that is the lowest I will sell it for. So, if you buy all 10, it will cost you $65.
This is the pricing model I am trying to achieve, except at a much larger scale. Instead of a handful of items using dollars, I'm talking about up to millions and using fractions of pennies.
This is my current code:
<?php
function getCost($num_items)
{
$min_price = 0.002;
$max_price = 0.007;
$discount_range = 1000000;
$discount_per_additional_item = ($max_price - $min_price) / ($discount_range - 1);
$price_per_unit = MAX($min_price, ($max_price - ($num_items - 1) * $discount_per_additional_item) );
return $price_per_unit;
}
$array = [100, 1000, 10000, 100000, 200000, 300000, 400000, 500000, 600000, 700000, 800000, 900000, 1000000];
foreach ($array as $value)
{
$sum = 0;
for ($i = 0; $i < $value; ++$i)
$sum += getCost($i);
echo number_format($value) . ' | $' . number_format($sum) . "\n";
}
Which results in:
100 | $1
1,000 | $7
10,000 | $70
100,000 | $675
200,000 | $1,300
300,000 | $1,875
400,000 | $2,400
500,000 | $2,875
600,000 | $3,300
700,000 | $3,675
800,000 | $4,000
900,000 | $4,275
1,000,000 | $4,500
I'm using $array as a sanity check where in the real world, I would simply calculate for the actual number the customer is being charged for.
My question is: Is there a way to accomplish this without using a for loop? Something, perhaps, more elegant?
I made an example online: http://sandbox.onlinephpfunctions.com/code/47e270dbad8cbe16c9ea906ffd2dce098a52fbca
This code will have the same output, and does not have the inner loop:
$min_price = 0.002;
$max_price = 0.007;
$discount_range = 1000000;
$discount_per_additional_item = ($max_price - $min_price)/($discount_range - 1);
$num_progressively_discounted_items =
ceil(($max_price - $min_price) / $discount_per_additional_item);
foreach ($array as $value) {
$num_items_above_min = min($value, $num_progressively_discounted_items);
$num_items_at_min = $value - $num_items_above_min;
$sum = $num_items_at_min * $min_price +
$num_items_above_min * $max_price -
$discount_per_additional_item
* $num_items_above_min * ($num_items_above_min - 1)/2;
echo number_format($value) . ' | $' . number_format($sum) . "\n";
}
This is what it does:
It first checks how many times the unit discount can be subtracted from the original price before hitting the minimum price. If more than the number of items you are buying, then this calculated figure is corrected to that number of items.
The remaining number of items (if any) are also taken note of: these will all have the minimum price.
The sum consists of two parts. The easy part is represented by the number of items that will go for the minimum price, and it is a simple multiplication.
The second part of the sum consists of an always decreasing term, or otherwise put: it is the maximum price for the number of items that don't go for the minimum price, minus the sum of 0+1+2+3+4+5...+n. For that the formula is known: n(n-1)/2.
Like I mentioned in comments, there is something strange in your code: for $i=0 the value returned by getCost($i) is higher than the max price, as the unit discount gets added to it. This can be corrected by starting your inner loop with $i=1. Anyway, this means there is a tiny difference in the result of my proposed code, as it does not have this peculiarity. But as the discount per unit is so tiny, you don't actually notice it in the printed output.
You can do this a little bit more functional style:
function sumOfNaturalSeries($n)
{
return ((1 + $n) / 2) * $n;
}
$minPrice = 0.002;
$maxPrice = 0.007;
$discountRange = 1000000;
$discountStep = ($maxPrice - $minPrice) / $discountRange;
$getPrice = function ($numberOfItems) use (
$minPrice,
$maxPrice,
$discountRange,
$discountStep
) {
if ($numberOfItems <= $discountRange) {
return $maxPrice * $numberOfItems - sumOfNaturalSeries($numberOfItems - 1) * $discountStep;
}
$itemsAboveRange = $numberOfItems - $discountRange;
return $maxPrice * $discountRange - sumOfNaturalSeries($discountRange - 1) * $discountStep + $minPrice * $itemsAboveRange;
};
$array = [100, 1000, 10000, 100000, 200000, 300000, 400000, 500000, 600000, 700000, 800000, 900000, 1000000];
$sums = array_map($getPrice, $array);
var_dump($sums);
var_dump(array_map('number_format', $sums));
Here is demo.
Take a notice on computational error.
I have two arrays of matching data. One with the values and the other with the time. I round time to the nearest 5 minutes and in result I have a new array with hours that repeat themself es. I group the equal hours, count them, summerize their respective values in the other array and take an average for that hour. here is the code:
$first_hour = array('11:01', '12:04', '13:00', '15:28', "15:43", "15:53", "15:55", "16:02", "16:05", "16:17", "16:15", "16:21", "16:25", "16:33", "16:35", "16:43", "16:45", "16:56", "16:58", "17:00", "17:04", "17:07", "17:19");
$values = array(12, 23, 5, 90, 12, 23, 45, 56, 12, 15, 43, 48, 54, 62, 52, 41, 74, 54, 84, 75, 96, 69, 36);
$minutes = 5;
$precision = 60 * $minutes;
$time = "00:00";
$average = 0;
//round first array to nearest 5 minutes
$timestamp = strtotime($first_hour);
$data_ora = date("H,i", round($timestamp / $precision) * $precision);
//the hours array will become like this:
$first_hour = array('11:00', '12:05', '13:00', '15:30', "15:45", "15:50", "15:55", "16:00", "16:05", "16:15", "16:15", "16:20", "16:25", "16:35", "16:35", "16:45", "16:45", "17:00", "17:00", "17:00", "17:05", "17:10", "17:20");
if ($first_hour == $time) {
if ($values[$i] > 0) {
$indexi = $indexi + 1;
$total = $total + $values;
echo "</br> total = ".$total;
$i++;
} else {
$time = date("H,i", round($timestamp / $precision) * $precision);
//echo "time has changed";
$average = $total / $index;
$dataora = date("Y,m,d", round($timestamp)) . "," . $time;
//fill in the vectors
$v[1] = $average;
$v[2] = $dataora;
echo "</br> average = " . $v[1];
$indexi = 0;
$totali = 0;
}
}
The output is like this:
"total = 2996
total = 3325
total = 3656
total = 3996
average = 333
total = 329
total = 652
total = 976
total = 1304
total = 1630
total = 1961
total = 2297
total = 2629
average = 328.625
total = 332
total = 660
total = 997
total = 1320
total = 1646
total = 1967
For each group there is an average calculated, except for the last group of values. How can I calculate its average? For the last group it gets into the first condition but it doesn't take a break (change hour) to fall into the else so that the average is calculated.
I'm almost certain the sample code you have provided doesn't match your real code, so I'm not going to try and fix that, but I will try and explain how you might fix the problem you claim to have in your real code.
Assumedly you have a loop of some sort that is iterating over the values in the $values array. I also assume that some of the values in that array are less than or equal to zero, and those are the points at which you display the average.
Your problem then is that your loop reaches the end of the array without having a chance to display the average for the last set of values.
One easy solution, is to add a zero to the end of the array if there isn't already one there. That way the last value the loop encounters will always be a zero, which will match the condition needed to display an average.
If that is not feasible, you could make your loop not stop when it reaches the end of the array, so if it's a while loop, it essentially becomes:
while (true) ...
Then you change your first condition to this:
if ($i < count($values) && $values[$i] > 0) {
so it only matches if you haven't gone past the end of the array. Otherwise it falls through to displaying the average.
And at the end of the average code, you add another check to to see if you're past the end of the array and break out of the loop.
if ($i >= count($values)) break;
These changes alone won't fix the sample code you've provided, but it might help if your real code matches the functionality you've described in the question.