I have an array of 52 different values that I can pass through a class to get a number in return.
$array = array("A","B","C","D"...);
Each value passed through the class gives a different number that can be either positive or negative.
The numbers are not equally distributed but are sorted in natural order.
E.g.
$myclass->calculate("A"); // 2.3
$myclass->calculate("B"); // 0.25
$myclass->calculate("C"); // -1.3
$myclass->calculate("D"); // -6
I want to get the last value that return a number >= 0.20 (in the example would be "B").
This should be done in the minimum number of "class invocation" to avoid time wasting.
I thought something like: divide $array in 2 pieces and calculate the number I get, if it is >= 20, then split the last part of $array in other 2 smaller pieces and so on. But I don't know if this would work.
How would you solve this?
Thanks in advance.
What you're describing is called a binary search, but it won't really work for this use case, because you aren't searching for a known value. Rather, you're searching for the value that is the lowest number >= 0.2 in a set where the exact value 0.2 may not exist (if it were guaranteed to exist, then you could do a binary search for 0.2, and then your letter would simply be n - 1; n != 0).
If your range is always A-Z, a simple linear search would definitely be the easiest method. The time savings on a data set of 26 elements for using a more efficient method is negligible (talking milliseconds here), compared to implementation time.
Edit: I see you actually mentioned 52 elements, not 26. My point is still the same, though. The number of elements would need to be in the tens of thousands or more for there to be any significant savings, unless you are performing this operation in a tight loop.
Related
In my php script I do a calculation of entries from a MySQL db. The concerning fields in the db are defined as decimal(10,3). It's an accounting plattform where I have to check if in every entry debit = credit.
I do this with the following operation:
$sumupNet = 0;
$sumup = 0;
foreach($val['Record'] as $subkey => $subval)
{
$sumupNet = $sumupNet + $subval['lc_amount_net'];
$sumup = $sumup + $subval['lc_amount_debit'] - $subval['lc_amount_credit'];
}
Now we say every entry is correkt, then $sumupNet and $sumup results in 0. In most cases, this works. But in some cases the result is something like this: -1.4432899320127E-15 or this -8.8817841970013E-15. If I calculate this values manually, the result is 0. I guess (not sure) that the above results are numbers near 0 and are outputted in the form of exponential.
So I think I have to convert something or my calculation is wrong. But what? I tried floatval() at some points but didn't work. If anybody has a hint, I thank you very much.
You're getting this because you are doing math with floating-point values. Read some theory about it.
You really don't want to calculate money like that as you might get weird rounding problems that you can't really do anything to fix.
For PHP, there are plenty of libraries that help you evade the problem, such as BC Math, or GMP.
Other solution would be to calculate all of the values using the smallest monetary value that the currency has (like cents) so you are always using integers.
These are rounding problems. These are perfectly normal when we are talking about floats. To give you an everyday example,
1/3 = 0.3333333333333333...333333333...3333...
Reason: 10 is relative prime with 3. You might wonder where is 10 coming from. We are using 10-base for numbers, that is, whenever we speak about a number, its digits represent 10-base exponential values. The computer works with binary numbers, that is, 2-base numbers. This means that division with such numbers often result in endless sequences of digits. For instance, 1/3 as a binary number looks like this:
0.010101010101010101010101010101010101010101010101010101...
Decimal types are representing decimal numbers, that is, 10-base numbers. You use three digits for the part after the . Let's supose your number ends like this:
.xyz
this means:
xyz / 1000
However, 1000 can be divided with the following prime numbers:
2 and 5.
Since 5 is relative prime with 2, whenever you are representing the result of a division by 5 as a binary number, there is a potential that the result will be an endless cycle of digits. 1/5 as a binary number looks like this:
0.0011001100110011001100110011001100110011001100110011...
Since a computer cannot store endless digits, it has to round the number, that is, find a number close to its value which can be represented in an easier manner. If the number a is rounded to b and the two numbers are not equal, then a certain amount of precision is lost and this is the reason of the bug you have mentioned.
You can solve the problem as follows: when you select the values from the database, multiply them by 1000 (thus, converting them into integers) and then check the operations. At the end, divide by 1000.
I am using an application that collects price data and makes sensible buying and selling prices each time data is retrieved. Now it can happen that the numbers are way to high or way too small because of how to system works. I can't do anything about this.
Now my question is, if I have an array of number like:
$prices = ['300','312','293','298','1025','12'];
What would be a good algorithm to get rid of the 12 and 1025? Note that a higher number appears far more often than a really low number so simply taking a average doesn't work.
I thought about taking a average of the whole array, looping through the array and checking for a differential percentage for each item and check if it under the threshold but I thought that this wouldn't be as accurate as I would like.
Have you thought about absolute numbers?
If I understood you correct there are multiple price lists so the average valid price could differ, it could be 1000 and some could be around 300 like in your example, my algorithm suggestion will work with both. You did not inform if the price would always be as close as in the examples or it could be higher if the price was higher.
I will split my answer in four parts, the first part will be for both situations (price difference is low at low values and high at high values). And the second part will be useful if the price difference will increase as the average valid price increases. The third part will be the whole algorithm for how you want to wrap it all together. The last part will be what to do at the first run.
Part 1: Finding a value for validation processing
you say that you have a list of these numbers and that it retrieves new data all the time. The way I would suggest you do, is that if you subtract two numbers with each other and the absolute value.
Example:
300-312=|12|
With the number 12 we can conclude that both these prices are in the valid price range. Now let's take 3 other examples, one where both values are invalid and one where only one is invalid.
Example:
1025-12=|1013|
We can see that 1013 is no way an average price in this list, since both are invalid we have to test them both against a valid price. The algorithm will then remove them both.
Example:
300-12=|288|
We can see that 288 isn't a valid price either, the algorithm will remove 12.
Part 2: validating a price with varying price differences
If you have lists where the average price could have a difference of 400, -50 and +50 in difference will give you bugs in your algorithm, therefore you need a way to determine this in a scalable way, that will make sure higher numbers can have higher differences in prices.
If the absolute value is Higher than 20%(or another number) of the average number of the two numbers, they would need further validation.
Example:
(300+312)/2=306 is the average number.
306*0.2=61,2
If you have a stored value of the highest and lowest valid number you could use 20% of their average to determine the threshold.
(293+312)/2=302,5
302,5*0,2=60,5
Part 3: wrapping it all up and making an algorithm
So the first thing you should do is to determine the amount of data in each list, the number of lists, and how often you recieve data, the bigger the amount of data and the more often you recieve data, it would be reasonable to index your data. The way I would suggest is that for each list you save the highest and lowest valid number. If this is not the case you can skip this part and look at part 4 as you can basically run the algorithm against the whole list each time you recieve new data.
First add 4 values to a list, min price, max price, average price and threshold. The average price is (max price+min price)/2. After this you can use a % of the average price to determine a threshold for your prices, I will suggest 20% since it will result in a number close to the number you use which is 50, find the threshold by multiplying the average number with 0,2.
Depending on your data you can always chose to find a threshold based on 20% of the average of min value, max value and a new number ((min+max+new)/2*0,2), you can change this calculation if the difference should ever change.
When you recieve new numbers your algorithm should check the absolute number against the threshold.
Depending on the frequency of new numbers I would suggest this at a low frequency.
ProcessNumber(var value)
{
if(absoluteValue(MinValue-value)<=MaxValue*0,2) //depending on how many numbers you want to be valid you can change the threshold, by doing this you allow the maximum value to change if the new number is valid but higher than max value
{
addNumber(value);
}
else
{
deleteNumber(value);
}
}
If the process of retrieving new numbers happens very often you can add two numbers at once, if odd numbers occur 1/3 times I'd suggest the above method instead.
ProcessNumbers(var value1, var value2)
{
if(absoluteValue(value1-value2)<=threshold) //if you want the thresholdnumber to be valid too, use less than or equal to
{
addnumber(value1);
addnumber(value2);
return true
}//If you have a method to add them
else
if(checkNumber(value1)) // returns true if valid)
{ //we now know value 1 is valid
deleteNumber(value2); //because the check was false and we know value1 is valid value2 must be the invalid one
addNumber(value1);
}
else if(checkNumber(value2))
{ //we now know value 2 is valid
deleteNumber(value1);
addNumber(value2);
}
else
{ //we now know both values are invalid
deleteNumber(value1);
deleteNumber(value2);
}
}
Part 4: first run
You will need an algorithm for the first run, if there currently are no invalid numbers and you didn't skip you can ignore this part.
For the first run you should group the numbers to sorted lists by what threshold they are in.
You take two numbers at a time and see if the absolute value is below the threshold.
absolute = value1-value2;
threshold = value1+value2)/2*0.2;
if(absolute<threshold)
AddToThreshold(threshold,value1,value2);
else
AddToLater(value1,value2);
the AddTolater is a list that contains values you have to doublecheck since you don't know if value1, value2 or both values sent them into this list.
The addtothreshold makes sure that if there's a threshold group with a value higher than the threshold submitted the values will be submitted to this group.
Now you should have a few groups with thresholds, what you do now is take the lowest of the lowest group and take the lowest of the highest group and check if their absolute value is below their threshold, you can then use this threshold to figure out if other absolute values are below this particular threshold and sort them from each other, let's take your list and use the lowest threshold with the highest absolute number from two valid numbers.
Threshold:
(293+298)/2=295,5*0.2=59,1 (this is the threshold)
Highest possible absolute number from 2 valid numbers:
293-312=|19|
This became a really long post and I hope it can give you at least some inspiration, although it might not be necessary with that much processing if you do not have that many lists all of this might be an overkill unless you are planning something scalable.
best of luck!
What you are describing is called outlier detection. There are statistical tests for this purpose. Beware anyway that nothing can guarantee 100% reliability.
http://en.wikipedia.org/wiki/Outlier#Identifying_outliers
Can someone tell me about this performance issue
I've got 2 arrays,
I need to pick 5 numbers from these 2 arrays and work on the logic
the first array has got 5 number, out of which I need to pick 3 numbers
and the second array has got 4 numbers, out of which I need to pick 2 number
so taking this into consideration 5c3 - 10 and 4c2 - 6
which means 60 iterations for a single case
Is the method I'm approaching the right way??
is there any performance issue on this type of iterations ??
If you have to go through the whole array and pick numbers, then there is no optimization for that. The execution time depends on the size of arrays, meaning the bigger the size - higher execution time.
Although, if you know that it will always be exactly 5 numbers from two rows whose elements will not change, than I think you could generate all the possible valid combinations, store them in a database or file, and return a random one (if random choice is what you are looking for). In this case, you will achieve some optimization.
This is more of a maths/general programming question, but I am programming with PHP is that makes a difference.
I think the easiest way to explain is with an example.
If the range is between 1 and 10.
I want to generate a number that is between 1 an 10 but is more likely lower than high.
The only way I can think is generate an array with 10 elements equal to 1, 9 elements equal to 2, 8 elements equal to 3.....1 element equal to 10. Then generate a random number based on the number of elements.
The trouble is I am potentially dealing with 1 - 100000 and that array would be ridiculously big.
So how best to do it?
Generate a random number between 0 and a random number!
Generate a number between 1 and foo(n), where foo runs an algorithm over n (e.g. a logarithmic function). Then reverse foo() on the result.
Generate number n which is 0 <= n < 1, multiply it by itself, than multiply by x, run floor on it and add 1. Sorry I used php toooo long ago to write code in it
You could do
$rand = floor(100000 * (rand(0, 1)*rand(0, 1)));
Or something along these lines
There are basically two (or more?) ways to map uniform density to any distribution function: Inverse transformation sampling and Rejection sampling. I think in your case you should use the former.
Quick and simple:
rand(1, rand(1, n))
What you need to do is generate a random number over a greater interval (preferably floating point), and map that into [1,10] in a nonuniform way. Exactly what way depends on how much more likely you want a 1 to be than a 9 or 10.
For C language solutions, see these libraries. You may find use for this in PHP.
Generally speaking, it looks like you want to draw a random number from a Poisson distribution rather than the [uniform distribution](http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)). On the wiki page cited above there is a section which specifically states how you can use the continuous distribution to generate a pseudo-Poisson distribution... check it out. Note that you may want to test different values of λ to ensure the distribution works as you want it to.
It depends on what distribution you want to have exactly, i.e., what number should appear with what probability.
For instance, for even n you could do the following: generate one integer random number x between 1 and n/2 and generate a second number between 1 and n+1. If y > x you generate x otherwise you generate n-x+1. This should give you the distribution in your example.
I think this should give the requested distribution:
Generate a random number in the range 1 .. x. Generate another one in the range 1 .. x+1.
Return the minimum of the two.
Let's think about how your array idea changes the probabilities. Normally every element from 1 to n has a probability of 1/n and is thus equally likely.
Since you have n entries for 1, n-1 entries for 2...1 entry for n, then the total number of entries you have is an arithmetic series. The sum of an arithmetic series counting from 1 to n is n(1+n)/2. So now we know every element's probability should use that as the denominator.
Element 1 has n entries, so it's probability is n/n(1+n)/2. Element 2 is n-1/n(1+n)/2 ... n is 1/n(1+n)/2. That gives a general formula of the numerator as n+1 -i, where i is the number you are checking. That means we now have a function for the probability of any element as n-i+1/n(1+n)/2. all probabilities are between 0 and 1 and sum to 1 by definition, and that is key to the next step.
How can we use this function to skew the number of times an element appears? It's easier with continuous distributions (ie doubles instead of ints) but we can do it. First let's make an array of our probabilities, call it c, and make a running sum of them (cumsum) and store it back in c. If that doesn't make sense, its just a loop like
for(j=0; j < n-1; j++)
if(j) c[j]+=c[j-1]
Now that we have this cumulative distribution, generate a number i from 0 to 1 (a double, not an int. We can check if i is between 0 and c[0], return 1. if i is between c[1] and c[2] return 2...all the way up to n. e.g.
for(j=0; j < n=1;j++)
if(i %lt;= c[j]) return i+1
This will distribute the integers according to the probabilities you have calculated.
<?php
//get random number between 1 and 10,000
$random = mt_rand(1, 10000);
?>
I'm trying to write a function in PHP that gets all permutations of all possible sizes. I think an example would be the best way to start off:
$my_array = array(1,1,2,3);
Possible permutations of varying size:
1
1 // * See Note
2
3
1,1
1,2
1,3
// And so forth, for all the sets of size 2
1,1,2
1,1,3
1,2,1
// And so forth, for all the sets of size 3
1,1,2,3
1,1,3,2
// And so forth, for all the sets of size 4
Note: I don't care if there's a duplicate or not. For the purposes of this example, all future duplicates have been omitted.
What I have so far in PHP:
function getPermutations($my_array){
$permutation_length = 1;
$keep_going = true;
while($keep_going){
while($there_are_still_permutations_with_this_length){
// Generate the next permutation and return it into an array
// Of course, the actual important part of the code is what I'm having trouble with.
}
$permutation_length++;
if($permutation_length>count($my_array)){
$keep_going = false;
}
else{
$keep_going = true;
}
}
return $return_array;
}
The closest thing I can think of is shuffling the array, picking the first n elements, seeing if it's already in the results array, and if it's not, add it in, and then stop when there are mathematically no more possible permutations for that length. But it's ugly and resource-inefficient.
Any pseudocode algorithms would be greatly appreciated.
Also, for super-duper (worthless) bonus points, is there a way to get just 1 permutation with the function but make it so that it doesn't have to recalculate all previous permutations to get the next?
For example, I pass it a parameter 3, which means it's already done 3 permutations, and it just generates number 4 without redoing the previous 3? (Passing it the parameter is not necessary, it could keep track in a global or static).
The reason I ask this is because as the array grows, so does the number of possible combinations. Suffice it to say that one small data set with only a dozen elements grows quickly into the trillions of possible combinations and I don't want to task PHP with holding trillions of permutations in its memory at once.
Sorry no php code, but I can give you an algorithm.
It can be done with small amounts of memory and since you don't care about dupes, the code will be simple too.
First: Generate all possible subsets.
If you view the subset as a bit vector, you can see that there is a 1-1 correspondence to a set and a binary number.
So if your array had 12 elements, you will have 2^12 subsets (including empty set).
So to generate a subset, you start with 0 and keep incrementing till you reach 2^12. At each stage you read the set bits in the number to get the appropriate subset from the array.
Once you get one subset, you can now run through its permutations.
The next permutation (of the array indices, not the elements themselves) can be generated in lexicographic order like here: http://www.de-brauwer.be/wiki/wikka.php?wakka=Permutations and can be done with minimal memory.
You should be able to combine these two to give your-self a next_permutation function. Instead of passing in numbers, you could pass in an array of 12 elements which contains the previous permutation, plus possibly some more info (little memory again) of whether you need to go to the next subset etc.
You should actually be able to find very fast algorithms which use minimal memory, provide a next_permutation type feature and do not generate dupes: Search the web for multiset permutation/combination generation.
Hope that helps. Good luck!
The best set of functions I've come up with was the one provided by some user at the comments of the shuffle function on php.net Here is the link It works pretty good.
Hope it's useful.
The problem seems to be trying to give an index to every permutation and having a constant access time. I cannot think of a constant time algorithm, but maybe you can improve this one to be so. This algorithm has a time complexity of O(n) where n is the length of your set. The space complexity should be reducible to O(1).
Assume our set is 1,1,2,3 and we want the 10th permutation. Also, note that we will index each element of the set from 0 to 3. Going by your order, this means the single element permutations come first, then the two element, and so on. We are going to subtract from the number 10 until we can completely determine the 10th permutation.
First up are the single element permutations. There are 4 of those, so we can view this as subtracting one four times from 10. We are left with 6, so clearly we need to start considering the two element permutations. There are 12 of these, and we can view this as subtracting three up to four times from 6. We discover that the second time we subtract 3, we are left with 0. This means the indexes of our permutation must be 2 (because we subtracted 3 twice) and 0, because 0 is the remainder. Therefore, our permutation must be 2,1.
Division and modulus may help you.
If we were looking for the 12th permutation, we would run into the case where we have a remainder of 2. Depending on your desired behavior, the permutation 2,2 might not be valid. Getting around this is very simple, however, as we can trivially detect that the indexes 2 and 2 (not to be confused with the element) are the same, so the second one should be bumped to 3. Thus the 12th permutation can trivially be calculated as 2,3.
The biggest confusion right now is that the indexes and the element values happen to match up. I hope my algorithm explanation is not too confusing because of that. If it is, I will use a set other than your example and reword things.
Inputs: Permutation index k, indexed set S.
Pseudocode:
L = {S_1}
for i = 2 to |S| do
Insert S_i before L_{k % i}
k <- k / i
loop
return L
This algorithm can also be easily modified to work with duplicates.