Find the smallest positive integer that does not occur in an array - php

I am trying out the following codility.com exercise to improve my skills online, I was presented with the following problem.
This is a demo task.
Write a function:
class Solution { public int solution(int[] A); }
that, given an array A of N integers, returns the smallest positive
integer (greater than 0) that does not occur in A.
For example,
given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [-1, -3], the function should return 1.
Write an efficient algorithm for the following assumptions:
• N is an
integer within the range [1..100,000);
• each element of array A is an
integer within the range (-1,000,000..1,000,000).
Copyright 2009– by Codility Limited
rendered description
I solved it using the following solution:
<?php
class Solution {
public function($A) {
$posInts = [1, 2, 3, 4, 5, 6, 7, 8, 9];
$diffs = array_diff($postInts, $A);
$smallestPosInt = min($diffs);
return $smallestPosInt;
}
}
However upon submitting I got the following score:
Now I am very unsure of what I did wrong here or how I can rewrite the code with a better algorithm.

Check out this answer using Javascript in a way that works with the best possible performance -If I am not mistaken- O(N).
function solution(A) {
const set = new Set(A)
let i = 1
while (set.has(i)) {
i++
}
return i
}

I would just loop over (increment) any possible integers:
function solution($A) {
$result = 1;
$maxNumber = max($A);
for (; $result <= $maxNumber; $result++) {
if (!in_array($result, $A)) {
break;
}
}
return $result;
}
var_dump(solution([1, 3, 6, 4, 1, 2])); // int(5)
var_dump(solution([1, 2, 3])); // int(4)
var_dump(solution([-1, -3])); // int(1)
// As a bonus, this also works for larger numbers:
var_dump(solution([1, 3, 6, 4, 1, 2, 7, 8, 9, 10, 11, 12, 13, 5, 15])); // int(14)
Edit regarding performance:
As pointed out in the comments (and you already said yourself), this is not a very efficient solution.
While I do not have enough time on my hands currently to do real performance testing, I think this should be close to an O(n) solution: (keeping in mind that I am not sure how arrays are implemented on the C-side of PHP)
function solution($A) {
$result = 1;
$maxNumber = max($A);
$values = array_flip($A);
for (; $result <= $maxNumber; $result++) {
if (!isset($values[$result])) {
break;
}
}
return $result;
}
// Not posting the output again because it is naturally the same ;)
The "trick" here is to flip the array first so that the values become the indexes. Since a) we do not care about the original indexes and b) we do not care if duplicated values overwrite each other, we can safely do that.
Using isset() instead of in_array() should be a lot quicker since it basically just checks if a variable (in this case stored at a specific index of the array) exists and PHP does therefore not have to iterate through the array in order to check whether or not each number we loop over exists within it.
P.S.: After thinking twice I think this may still be closer to O(n*2) because max() probably loops to find the highest value. You could also remove that line and just check against the highest number there is in PHP as an emergency exit, like so: for (; $result <= PHP_INT_MAX; $result++) { ... } as a further optimization. Or maybe just hard-code the highest allowed number as specified in the task.

If we're allowed to modify the input, perform this in place, otherwise create a new array of size n + 1:
For each element encountered in the original array, if it is greater than n + 1 or smaller than 1, assign 0 at the element's index (index - 1 if performing in place); otherwise assign 1 at the index of the array the value is and assign 0 at its own index if it is different. After that run a second traversal and report the first index (index + 1 if performing in place) greater than zero with value 0, or n + 1.
[1, 3, 6, 4, 1, 2]
=>
[1, 1, 1, 1, 0, 1]
report 5

Related

Permutations and big arrays in PHP - performance issues

I have an array of numbers (int or float) and I need to find a value by combining array values. Once the smallest possible combination is found the function returns the array values. Therefore I start with sample-size=1 and keep incrementing it.
Here's a simplified example of the given data:
$values = [10, 20, 30, 40, 50];
$lookingFor = 80;
Valid outcomes:
[30, 50] // return this
[10, 20, 50], [10, 30, 40] // just to demonstrate the possible combinations
Permutations solve this problem and I've tried many different implementations (for example: Permutations - all possible sets of numbers, Get all permutations of a PHP array?, https://github.com/drupol/phpermutations). My favourite is this one with a parameter for permutation-size using the Generator pattern: https://stackoverflow.com/a/43307800
What's my problem? Performance! My arrays have 5 - 150 numbers and sometimes the sum of 30 array numbers is needed to find the searched value. Sometimes the value can't be found, which means I needed to try all possible combinations. Basically with permutation-size > 5 the task becomes too time consuming.
An alternative, yet not precise way is to sort the array, take the first X and last X numbers and compare with the searched value. Like this:
sort($values, SORT_NUMERIC);
$countValues = count($values);
if ($sampleSize > $countValues)
{
$sampleSize = $countValues;
}
$minValues = array_slice($values, 0, $sampleSize);
$maxValues = array_slice($values, $countValues - $sampleSize, $sampleSize);
$possibleMin = array_sum($minValues);
$possibleMax = array_sum($maxValues);
if ($possibleMin === $lookingFor)
{
return $minValues;
}
if ($possibleMax === $lookingFor)
{
return $maxValues;
}
return [];
Hopefully somebody has dealt with a similar problem and can guide me in the right direction. Thank you!
you must use combination instead of permutations {ex: P(15) = 130767436800 vs C(15) = 32768}
if array_sum < target_number then no solution exists
if in_array(target_number, numbers) solution found with 1 element
sort lowest to highest
start with C(n,2) where 2 represents 1st 2nd then 1st 3rd etc (static one is 1st element)
if above loop found no solution continue with 2nd 3rd then 2nd 4th, etc)
if C(n,2) had no solution then jump to C(n,3)s but this time 2 static numbers and 1 dynamic one
if loop ended with no solution then there exists no solution
lastly, I would adjust this question and ask in statistics branch of stack exchange (crossvalidated) since mean, median and cumulative distribution of the sums of the numbers may hint to decrease the number of iterations significantly and this is their profession.

Finding the nearest array set

I wanted to know what possible algorithm I could use to find which in the set of array below is nearest to [0,0,0]. I am thinking of giving points to each set by adding the values of each set, but the problem is the array index 1 [0,2,1] will have a sum of 3 which is equal to array index 3. The answer below should be index 3, or do you have better suggestion? thanks in advance.
$sets = [
[4,5,6], // 0
[0,2,1], // 1
[1,3,0], // 2
[1,1,1], // 3
[0,1,3], // 4
[5,4,3], // 5
]
Well, what you're basically describing corresponds to finding the distance to the origin for a point in 3D space, the formula for which is:
Based on that, the point [1, 1, 1] is indeed closer to the origin than [0, 2, 1]:
In PHP, you could calculate the distances as follows:
$sets = [[4,5,6], [0,2,1], [1,3,0], [1,1,1], [0,1,3], [5,4,3]];
$distances = array_map(function ($i) {
return sqrt($i[0]**2 + $i[1]**2 + $i[2]**2);
}, $sets);
print_r($distances);
Finding the closest point then becomes trivial.

Iterate through 2d array of booleans and leave only the largest contiguous "2D blob of ones"

Ok, so the question is kind of awkwardly phrased, but I hope this will clear things up.
I have this sample 2d array.
$array = array(
array(1, 0, 0, 0, 1, 0, 0, 1),
array(0, 0, 1, 1, 1, 1, 0, 1),
array(0, 1, 1, 0, 1, 0, 0, 0),
array(0, 1, 1, 0, 0, 0, 1, 0),
array(1, 0, 0, 0, 1, 1, 1, 1),
array(0, 1, 1, 0, 1, 0, 1, 0),
array(0, 0, 0, 0, 0, 0, 0, 1)
);
When iterated by rows (and terminating each row with \n), and for every row then iterated by column, it will echo something like this: (░░ = 0, ▓▓ = 1)
▓▓░░░░░░▓▓░░░░▓▓
░░░░▓▓▓▓▓▓▓▓░░▓▓
░░▓▓▓▓░░▓▓░░░░░░
░░▓▓▓▓░░░░░░▓▓░░
▓▓░░░░░░▓▓▓▓▓▓▓▓
░░▓▓▓▓░░▓▓░░▓▓░░
░░░░░░░░░░░░░░▓▓
But what I'd like to do is to "analyse" the array and only leave 1 contiguous shape (the one with the most "cells"), in this example, the result would be:
░░░░░░░░▓▓░░░░░░
░░░░▓▓▓▓▓▓▓▓░░░░
░░▓▓▓▓░░▓▓░░░░░░
░░▓▓▓▓░░░░░░░░░░
▓▓░░░░░░░░░░░░░░
░░▓▓▓▓░░░░░░░░░░
░░░░░░░░░░░░░░░░
My initial approach was to:
Assign each ▓▓ cell a unique number (be it completely random, or the current iteration number):
01 02 03
04050607 08
0910 11
1213 14
15 16171819
2021 22 23
24
Iterate through the array many, MANY times: every iteration, each ▓▓ cell assumes the largest unique number among his neighbours. The loop would go on indefinitely until there's no change detected between the current state and the previous state. After the last iteration, the result would be this:
01 21 08
21212121 08
2121 21
2121 24
21 24242424
2121 24 24
24
Now it all comes down to counting the value that occurs the most. Then, iterating once again, to turn all the cells whose value is not the most popular one, to 0, giving me the desired result.
However, I feel it's quite a roundabout and computationally heavy approach for such a simple task and there has to be a better way. Any ideas would be greatly appreciated, cheers!
BONUS POINTS: Divide all the blobs into an array of 2D arrays, ordered by number of cells, so we can do something with the smallest blob, too
Always fun, these problems. And done before, so I'll dump my code here, maybe you can use some of it. This basically follows every shape by looking at a cell and its surrounding 8 cells, and if they connect go to the connecting cell, look again and so on...
<?php
$shape_nr=1;
$ln_max=count($array);
$cl_max=count($array[0]);
$done=[];
//LOOP ALL CELLS, GIVE 1's unique number
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
if($array[$ln][$cl]===0)continue;
$array[$ln][$cl] = ++$shape_nr;
}}
//DETECT SHAPES
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
if($array[$ln][$cl]===0)continue;
$shape_nr=$array[$ln][$cl];
if(in_array($shape_nr,$done))continue;
look_around($ln,$cl,$ln_max,$cl_max,$shape_nr,$array);
//SET SHAPE_NR to DONE, no need to look at that number again
$done[]=$shape_nr;
}}
//LOOP THE ARRAY and COUNT SHAPENUMBERS
$res=array();
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
if($array[$ln][$cl]===0)continue;
if(!isset($res[$array[$ln][$cl]]))$res[$array[$ln][$cl]]=1;
else $res[$array[$ln][$cl]]++;
}}
//get largest shape
$max = max($res);
$shape_value_max = array_search ($max, $res);
//get smallest shape
$min = min($res);
$shape_value_min = array_search ($min, $res);
// recursive function: detect connecting cells
function look_around($ln,$cl,$ln_max,$cl_max,$nr,&$array){
//create mini array
$mini=mini($ln,$cl,$ln_max,$cl_max);
if($mini===false)return false;
//loop surrounding cells
foreach($mini as $v){
if($array[$v[0]][$v[1]]===0){continue;}
if($array[$v[0]][$v[1]]!==$nr){
// set shape_nr of connecting cell
$array[$v[0]][$v[1]]=$nr;
// follow the shape
look_around($v[0],$v[1],$ln_max,$cl_max,$nr,$array);
}
}
return $nr;
}
// CREATE ARRAY WITH THE 9 SURROUNDING CELLS
function mini($ln,$cl,$ln_max,$cl_max){
$look=[];
$mini=[[-1,-1],[-1,0],[-1,1],[0,-1],[0,1],[1,-1],[1,0],[1,1]];
foreach($mini as $v){
if( $ln + $v[0] >= 0 &&
$ln + $v[0] < $ln_max &&
$cl + $v[1] >= 0 &&
$cl + $v[1] < $cl_max
){
$look[]=[$ln + $v[0], $cl + $v[1]];
}
}
if(count($look)===0){return false;}
return $look;
}
Here's a fiddle
I can only think of a few minor improvements:
Keep a linked list of the not empty fields. In step 2 you do not need to touch n² matrix-elements, you only need to touch the ones in your linked list. Which might be much less depending how sparse your matrix is.
You only need to compare to the right, right-down, left-down and down directions. Otherwise The other directions are already checked from the former row/column. What I mean: When I am greater that my right neighbour, I can already change the number of the right neighbour. (same for down and right-down). This halfs the number of compairs.
If your array size isn't huge and memory won't be a problem maybe a recursive solution would be faster. I found a c++ algorithm that does this here:
https://www.geeksforgeeks.org/find-length-largest-region-boolean-matrix/

Peculiar behavior of array_udiff?

I've got the following Php script:
<?php
function filt($k, $l){
if($k===$l){
var_dump("valid: ".$k."-".$l);
return 0;
}
return 1;
}
$a6=array(7, 9, 3, 33);
$a7=array(2, 9, 3, 33);
$u=array_udiff($a6, $a7, "filt");
var_dump($u);
?>
With the following output:
string 'valid: 3-3' (length=10)
array
0 => int 7
1 => int 9
3 => int 33
As I know, the array_udiff should dump the equal values and let only the different values from the first array.
What seems to be the problem here?
I run WampServer Version 2.2 on Windows 7. Php version: 5.3.9.
Note that the documentation says:
The comparison function must return an integer less than, equal to, or
greater than zero if the first argument is considered to be respectively
less than, equal to, or greater than the second.
You're not doing that. To make sure that you do, simply make your filt function return $l - $k
There is a simple explanation for that: the elements might be in any order. To avoid having to compare each element to every other element, it first sorts them. That's why you need + / 0 / -
you're not returning all necessary values (e.g. -1, 0, 1). See: array_udiff
$a6 = array(7, 9, 3, 33);
$a7 = array(2, 9, 3, 33);
$u = array_udiff($a6, $a7, function ($k, $l){
return $k > $l ? 1 : ($k < $l ? -1 : 0);
});
print_r($u);

Detecting if integer can be written as sum of given integers

Supposing I'm having the constants 3,5,6,9,10. How can I detect how to write $n, which is the input, as a sum of these constants with the least number of terms?
Examples
$n=10, S=10
$n=18, S=9+9
$n=24, S=9+9+6
$n=27, S=9+9+9
$n=28, S=10+9+9
Thanks
This is another Python solution, but hopefully it's easy for you to convert to PHP (I would do it myself, but I'm no PHP expert - I'm sure you could do a better job of it). I've tried not to use any advanced Python funcitons, so that it is easier for non-Python readers to understand, but if some Python syntax is not clear, please just ask.
allowed = [3, 5, 6, 9, 10]
n = 28
solutions = [ None ] * (n + 1)
solutions[0] = []
for i in range(n + 1):
if solutions[i] is None: continue
for a in allowed:
if i + a > n: continue
if solutions[i + a] is None or len(solutions[i]) + 1 < len(solutions[i + a]):
solutions[i + a] = solutions[i] + [a]
print solutions[28]
It works by starting from 0 and building up to the desired number, keeping a cache of the shortest solution seen so far for each possible total. It has a running time of O(n * a), where a is the number of different allowed values.
By the way, your answer to n=28 is wrong. It should be [9, 9, 10].
Update: here's my attempt at a PHP solution:
<?php
$allowed = array(3, 5, 6, 9, 10);
$n = 28;
$solutions = array();
$solutions[0] = array();
foreach (range(0, $n) as $i) {
if (is_null($solutions[$i])) continue;
foreach ($allowed as $a) {
if ($i + $a > $n) continue;
if (is_null($solutions[$i + $a]) ||
sizeof($solutions[$i]) + 1 < sizeof($solutions[$i + $a])) {
$solutions[$i + $a] = array_merge($solutions[$i], array($a));
}
}
}
var_dump($solutions[$n]);
?>
It gives the right answer, but please be aware that I'm not a professional PHP coder - I just looked up the equivalent functions in the PHP documentation.
This is Mark Byers' algorithm, rewritten using loop structures that are more familiar to PHP developers, and constructs that won't generate PHP notices. $C is your set of integers, $S the solutions.
$n = 28;
$C = array(3, 5, 6, 9, 10);
$S = array(array());
// if your set isn't sorted already, you have to call sort()
//sort($C);
for ($i = 0; $i <= $n; ++$i)
{
if (!isset($S[$i]))
{
continue;
}
foreach ($C as $v)
{
if ($i + $v > $n)
{
break;
}
if (!isset($S[$i + $v])
|| count($S[$i + $v]) > 1 + count($S[$i]))
{
$S[$i + $v] = $S[$i];
$S[$i + $v][] = $v;
}
}
}
print_r($S[$n]);
Two obvious approaches suggest themselves:
Write a series of linear equations,
and solve to find various solutions.
Choose one with the least number of
terms.
Trial and error, starting
with the largest terms first.
Find all possible solutions for "S=3A+5B+6C+9D+10E" then choose the one with the most 0 values for A,B,C,D,E
a rough sketch of an unscalable but correct solution (sorry, so far its only python ..):
#!/usr/bin/env python
import itertools, sys
pool = [3, 5, 6, 9, 10]
repeat, found, solutions = 1, False, set()
try: x = int(sys.argv[1])
except: x = 42
while not found:
for n in itertools.product(pool, repeat=repeat):
s = sum(n)
if s == x:
solutions.add(n)
found = True
break
repeat = repeat + 1
print solutions
would yield:
$ python 1850629.py 11
set([(5, 6)])
$ python 1850629.py 19
set([(9, 10)])
$ python 1850629.py 21
set([(3, 9, 9)])
$ python 1850629.py 42
set([(3, 9, 10, 10, 10)])
In addition to the excellent general answers already provided, bear in mind that if your set of values has certain properties, much more optimal solutions exist.
Specifically, if your solution is 'minimal' - that is, a single best solution exists for any value - then you can find the smallest number of elements using a 'greedy' algorithm: Simply add the largest value until the remainder is smaller than it, repeat with the next largest value, and so forth.
As an example, the denominations used for money in many countries are .01, .02, .05, .10, .20, .50, 1, 2, 5, .... This set is minimal, so you can just repeatedly add the largest valid denomination.
NP-complete problem
Subset sum problem

Categories