Make an if-statement understand which "time-left" format is bigger - php

I am trying to make an IF-statement that can calculate which "time-left" is bigger.
This is my code.
if('6h : 27m : 45s' > '6h : 27m : 15s') {
echo 'Timer to the left is bigger';
}
This code works fine, it actually prints Timer to the left is bigger.
However, when any time value (hour, minute or seconds) is only one digit instead of two. It doesn't work. Like this:
if('6h : 27m : 45s' > '6h : 27m : 5s') {
echo 'Timer to the left is bigger';
}
This time it does not print out Timer to the left is bigger. It is because the timer to the right is only 5s left, which is less than 15s, but it is because it is only one digit instead of two, and the IF-statement doesn't understand that.
How can I parse this to an understandable format for the if-statement?
NOTE the string is from an API, I can't edit it in anyway.

php function version_compare() is like strcmp(), but compares numbers as single unit. It has some side effects because of (virtually) converting chars, but I think it does not matter here:
So
if('6h : 27m : 45s' > '6h : 27m : 15s')
becomes
if( version_compare('6h : 27m : 45s','6h : 27m : 15s') > 0 )

Your post suggests that all three time units are always available. So just create two arrays of numbers how ever you wish. PHP7's spaceship operator will make numeric comparisons when handling two numeric values. It will compare the elements from left to right as you require.
No time parsing is necessary because the time units are ordered from greater to lesser.
Code: (Demo)
$left = '6h : 27m : 5s';
$right = '6h : 27m : 15s';
$lookup = [
-1 => 'less',
0 => 'equal',
1 => 'more'
];
$lArray = preg_split('/\D+/', $left, 0, PREG_SPLIT_NO_EMPTY);
$rArray = preg_split('/\D+/', $right, 0, PREG_SPLIT_NO_EMPTY);
$comparison = $lArray <=> $rArray;
echo $comparison . ' -> ' . $lookup[$comparison];
Output:
-1 -> less
You make write a conditional check for -1 if that is all that you require. The lookup was purely to clarify the demonstration.
In other words:
if ((preg_split('/\D+/', $left, 0, PREG_SPLIT_NO_EMPTY) <=> preg_split('/\D+/', $right, 0, PREG_SPLIT_NO_EMPTY)) == -1) {
// left is less than right
}
Or use a non-regex alternative that will cast each matched number as an integer then compare the two generated arrays containing three integers each. (Demo)
$format = '%dh : %dm %ds';
$comparison = sscanf($left, $format) <=> sscanf($right, $format);
I just noticed Wiimm's outside-the-box call of version_compare() to parse your time expressions. (+1 from me, but I think I need to explain more about it) While not its intended use, it is a clever and reliable non-regex technique because your time unit substrings (h, m, s) are not found in the list of development-stage substrings which hold special meaning uniformly positioned in both strings. This means that even if you had non-numeric substrings with special meaning, the comparison would still be evaluated correctly based solely on the numeric values.
Special substrings in order:
dev
alpha or a
beta or b
RC or rc
#
pl or p
This function is a three-way-comparing tool - like the spaceship operator. It also accepts a third parameter to command that it returns a boolean response. Using gt will return true if the first string is "greater than" the second string. While the third parameter is an abbreviation, I find it more intuitive than checking for -1.
Code: (Demo)
if (version_compare($left, $right, 'gt')) {
echo 'left is greater than right';
} else {
echo 'left is less than or equal to right';
}
Output:
left is less than or equal to right

It's not the most prettiest way of doing it, but since you get the data returned from an API you can't modify, you'll have to work with the data you got.
There are several ways of doing it. One way is to convert the time into seconds, and compare that. Another, is to convert them into objects - and work with that. The "gruntwork" is the same; you have to split up the data into hours, minutes and seconds, and use that - which is more usable than the format Xh : Ym : Zs.
Here we split up each time into pieces of three, by the delimiter :. This gets us the individual parts - but we still need to fetch the integer value, which is what we're interested in. We can grab that by using inval(), as the number comes first. Once we have the three pieces, we can either put them into a HH:mm:ss format, or you can multiply them up to everything being in seconds, and add them together (first is hours, second is minutes, third is seconds). Here we're not doing that, we glue them together in a readable format and use strtotime() to convert them into a timestamp - which we finally, after all this work, can compare!
$time_left = '6h : 27m : 45s';
$time_right = '6h : 27m : 5s';
$parts_left = explode(":", $time_left);
$parts_right = explode(":", $time_right);
$left = intval($parts_left[0]).":".intval($parts_left[1]).":".intval($parts_left[2]);
$right = intval($parts_right[0]).":".intval($parts_right[1]).":".intval($parts_right[2]);
if (strtotime($left) > strtotime($right)) {
echo 'Timer to the left is bigger';
}
Live demo at https://3v4l.org/uUQVh
According to your edited comment, the hours can be greater than 24 - in which case you need to convert the hours and minutes to seconds, and sum that up.
$time_left = '6h : 27m : 45s';
$time_right = '6h : 27m : 5s';
$parts_left = explode(":", $time_left);
$parts_right = explode(":", $time_right);
$left = intval($parts_left[0]) * 60*60 + intval($parts_left[1])*60 + intval($parts_left[2]);
$right = intval($parts_right[0]) * 60*60 + intval($parts_right[1])*60 + intval($parts_right[2]);
if ($left > $right) {
echo "Timer to the left is bigger";
}
Live demo at https://3v4l.org/DaINX

Related

similar_text - string / integer comparing

it's my first post here, so welcome everyone!
I'm trying to write a rule to simply protect my website against flooding by users posting it's content. I decided to use similar_text() function in PHP to compare strings (last added string by user and the one that one is adding at the moment), calculate similarity (%) and if the result is too high (similar in more than 90%) the script will not add a record to database.
Here is what I have:
similar_text($last_record, $new_record, $sim);
$similarity = (int) number_format($sim, 0);
if ($similarity < 90)
{
// add the record
}
else
{
// dont add anything
}
The problem is with this: if ($similarity < 90). I format the number and then convert it from string to int value, but the script doesn't care.
When I use quotas it works: if ($similarity < "90"). The question is why script doesn't work when I use the value as an integer value and it works when I use it as a string?
number_format returns a string where you need an int. So a string comparison to "90" works, but an int comparison fails. You can use int_val to convert to an int.
Also, I'm wondering if maybe you have something else wrong. I took your code sample and ran it locally, and it seems to work just fine even without swapping (int) for int_val.
With the following values:
$last_record = "asdfasdfasdf";
$new_record = "aasdfasdfasdf";
$similarity is 96 and the greater than section of the if triggers.
With these values:
$last_record = "asdfasdfasdf";
$new_record = "fffdfasdfasdf";
$similarity is 80 and the less than section of the if triggers.

How to get number of digits in both right, left sides of a decimal number

I wonder if is there a good way to get the number of digits in right/left side of a decimal number PHP. For example:
12345.789 -> RIGHT SIDE LENGTH IS 3 / LEFT SIDE LENGTH IS 5
I know it is readily attainable by helping string functions and exploding the number. I mean is there a mathematically or programmatically way to perform it better than string manipulations.
Your answers would be greatly appreciated.
Update
The best solution for left side till now was:
$left = floor(log10($x))+1;
but still no sufficient for right side.
Still waiting ...
To get the digits on the left side you can do this:
$left = floor(log10($x))+1;
This uses the base 10 logarithm to get the number of digits.
The right side is harder. A simple approach would look like this, but due to floating point numbers, it would often fail:
$decimal = $x - floor($x);
$right = 0;
while (floor($decimal) != $decimal) {
$right++;
$decimal *= 10; //will bring in floating point 'noise' over time
}
This will loop through multiplying by 10 until there are no digits past the decimal. That is tested with floor($decimal) != $decimal.
However, as Ali points out, giving it the number 155.11 (a hard to represent digit in binary) results in a answer of 14. This is because as the number is stored as something like 155.11000000000001 with the 32 bits of floating precision we have.
So instead, a more robust solution is needed. (PoPoFibo's solutions above is particularly elegant, and uses PHPs inherit float comparison functions well).
The fact is, we can never distinguish between input of 155.11 and 155.11000000000001. We will never know which number was originally given. They will both be represented the same. However, if we define the number of zeroes that we can see in a row before we just decide the decimal is 'done' than we can come up with a solution:
$x = 155.11; //the number we are testing
$LIMIT = 10; //number of zeroes in a row until we say 'enough'
$right = 0; //number of digits we've checked
$empty = 0; //number of zeroes we've seen in a row
while (floor($x) != $x) {
$right++;
$base = floor($x); //so we can see what the next digit is;
$x *= 10;
$base *= 10;
$digit = floor($x) - $base; //the digit we are dealing with
if ($digit == 0) {
$empty += 1;
if ($empty == $LIMIT) {
$right -= $empty; //don't count all those zeroes
break; // exit the loop, we're done
}
} else {
$zeros = 0;
}
}
This should find the solution given the reasonable assumption that 10 zeroes in a row means any other digits just don't matter.
However, I still like PopoFibo's solution better, as without any multiplication, PHPs default comparison functions effectively do the same thing, without the messiness.
I am lost on PHP semantics big time but I guess the following would serve your purpose without the String usage (that is at least how I would do in Java but hopefully cleaner):
Working code here: http://ideone.com/7BnsR3
Non-string solution (only Math)
Left side is resolved hence taking the cue from your question update:
$value = 12343525.34541;
$left = floor(log10($value))+1;
echo($left);
$num = floatval($value);
$right = 0;
while($num != round($num, $right)) {
$right++;
}
echo($right);
Prints
85
8 for the LHS and 5 for the RHS.
Since I'm taking a floatval that would make 155.0 as 0 RHS which I think is valid and can be resolved by String functions.
php > $num = 12345.789;
php > $left = strlen(floor($num));
php > $right = strlen($num - floor($num));
php > echo "$left / $right\n";
5 / 16 <--- 16 digits, huh?
php > $parts = explode('.', $num);
php > var_dump($parts);
array(2) {
[0]=>
string(5) "12345"
[1]=>
string(3) "789"
As you can see, floats aren't the easiest to deal with... Doing it "mathematically" leads to bad results. Doing it by strings works, but makes you feel dirty.
$number = 12345.789;
list($whole, $fraction) = sscanf($number, "%d.%d");
This will always work, even if $number is an integer and you’ll get two real integers returned. Length is best done with strlen() even for integer values. The proposed log10() approach won't work for 10, 100, 1000, … as you might expect.
// 5 - 3
echo strlen($whole) , " - " , strlen($fraction);
If you really, really want to get the length without calling any string function here you go. But it's totally not efficient at all compared to strlen().
/**
* Get integer length.
*
* #param integer $integer
* The integer to count.
* #param boolean $count_zero [optional]
* Whether 0 is to be counted or not, defaults to FALSE.
* #return integer
* The integer's length.
*/
function get_int_length($integer, $count_zero = false) {
// 0 would be 1 in string mode! Highly depends on use case.
if ($count_zero === false && $integer === 0) {
return 0;
}
return floor(log10(abs($integer))) + 1;
}
// 5 - 3
echo get_int_length($whole) , " - " , get_int_length($fraction);
The above will correctly count the result of 1 / 3, but be aware that the precision is important.
$number = 1 / 3;
// Above code outputs
// string : 1 - 10
// math : 0 - 10
$number = bcdiv(1, 3);
// Above code outputs
// string : 1 - 0 <-- oops
// math : 0 - INF <-- 8-)
No problem there.
I would like to apply a simple logic.
<?php
$num=12345.789;
$num_str="".$num; // Converting number to string
$array=explode('.',$num_str); //Explode number (String) with .
echo "Left side length : ".intval(strlen($array[0])); // $array[0] contains left hand side then check the string length
echo "<br>";
if(sizeof($array)>1)
{
echo "Left side length : ".intval(strlen($array[1]));// $array[1] contains left hand check the string length side
}
?>

Can the for loop be eliminated from this piece of PHP code?

I have a range of whole numbers that might or might not have some numbers missing. Is it possible to find the smallest missing number without using a loop structure? If there are no missing numbers, the function should return the maximum value of the range plus one.
This is how I solved it using a for loop:
$range = [0,1,2,3,4,6,7];
// sort just in case the range is not in order
asort($range);
$range = array_values($range);
$first = true;
for ($x = 0; $x < count($range); $x++)
{
// don't check the first element
if ( ! $first )
{
if ( $range[$x - 1] + 1 !== $range[$x])
{
echo $range[$x - 1] + 1;
break;
}
}
// if we're on the last element, there are no missing numbers
if ($x + 1 === count($range))
{
echo $range[$x] + 1;
}
$first = false;
}
Ideally, I'd like to avoid looping completely, as the range can be massive. Any suggestions?
Algo solution
There is a way to check if there is a missing number using an algorithm. It's explained here. Basically if we need to add numbers from 1 to 100. We don't need to calculate by summing them we just need to do the following: (100 * (100 + 1)) / 2. So how is this going to solve our issue ?
We're going to get the first element of the array and the last one. We calculate the sum with this algo. We then use array_sum() to calculate the actual sum. If the results are the same, then there is no missing number. We could then "backtrack" the missing number by substracting the actual sum from the calculated one. This of course only works if there is only one number missing and will fail if there are several missing. So let's put this in code:
$range = range(0,7); // Creating an array
echo check($range) . "\r\n"; // check
unset($range[3]); // unset offset 3
echo check($range); // check
function check($array){
if($array[0] == 0){
unset($array[0]); // get ride of the zero
}
sort($array); // sorting
$first = reset($array); // get the first value
$last = end($array); // get the last value
$sum = ($last * ($first + $last)) / 2; // the algo
$actual_sum = array_sum($array); // the actual sum
if($sum == $actual_sum){
return $last + 1; // no missing number
}else{
return $sum - $actual_sum; // missing number
}
}
Output
8
3
Online demo
If there are several numbers missing, then just use array_map() or something similar to do an internal loop.
Regex solution
Let's take this to a new level and use regex ! I know it's nonsense, and it shouldn't be used in real world application. The goal is to show the true power of regex :)
So first let's make a string out of our range in the following format: I,II,III,IIII for range 1,3.
$range = range(0,7);
if($range[0] === 0){ // get ride of 0
unset($range[0]);
}
$str = implode(',', array_map(function($val){return str_repeat('I', $val);}, $range));
echo $str;
The output should be something like: I,II,III,IIII,IIIII,IIIIII,IIIIIII.
I've come up with the following regex: ^(?=(I+))(^\1|,\2I|\2I)+$. So what does this mean ?
^ # match begin of string
(?= # positive lookahead, we use this to not "eat" the match
(I+) # match I one or more times and put it in group 1
) # end of lookahead
( # start matching group 2
^\1 # match begin of string followed by what's matched in group 1
| # or
,\2I # match a comma, with what's matched in group 2 (recursive !) and an I
| # or
\2I # match what's matched in group 2 and an I
)+ # repeat one or more times
$ # match end of line
Let's see what's actually happening ....
I,II,III,IIII,IIIII,IIIIII,IIIIIII
^
(I+) do not eat but match I and put it in group 1
I,II,III,IIII,IIIII,IIIIII,IIIIIII
^
^\1 match what was matched in group 1, which means I gets matched
I,II,III,IIII,IIIII,IIIIII,IIIIIII
^^^ ,\2I match what was matched in group 1 (one I in thise case) and add an I to it
I,II,III,IIII,IIIII,IIIIII,IIIIIII
^^^^ \2I match what was matched previously in group 2 (,II in this case) and add an I to it
I,II,III,IIII,IIIII,IIIIII,IIIIIII
^^^^^ \2I match what was matched previously in group 2 (,III in this case) and add an I to it
We're moving forward since there is a + sign which means match one or more times,
this is actually a recursive regex.
We put the $ to make sure it's the end of string
If the number of I's don't correspond, then the regex will fail.
See it working and failing. And Let's put it in PHP code:
$range = range(0,7);
if($range[0] === 0){
unset($range[0]);
}
$str = implode(',', array_map(function($val){return str_repeat('I', $val);}, $range));
if(preg_match('#^(?=(I*))(^\1|,\2I|\2I)+$#', $str)){
echo 'works !';
}else{
echo 'fails !';
}
Now let's take in account to return the number that's missing, we will remove the $ end character to make our regex not fail, and we use group 2 to return the missed number:
$range = range(0,7);
if($range[0] === 0){
unset($range[0]);
}
unset($range[2]); // remove 2
$str = implode(',', array_map(function($val){return str_repeat('I', $val);}, $range));
preg_match('#^(?=(I*))(^\1|,\2I|\2I)+#', $str, $m); // REGEEEEEX !!!
$n = strlen($m[2]); //get the length ie the number
$sum = array_sum($range); // array sum
if($n == $sum){
echo $n + 1; // no missing number
}else{
echo $n - 1; // missing number
}
Online demo
EDIT: NOTE
This question is about performance. Functions like array_diff and array_filter are not magically fast. They can add a huge time penalty. Replacing a loop in your code with a call to array_diff will not magically make things fast, and will probably make things slower. You need to understand how these functions work if you intend to use them to speed up your code.
This answer uses the assumption that no items are duplicated and no invalid elements exist to allow us to use the position of the element to infer its expected value.
This answer is theoretically the fastest possible solution if you start with a sorted list. The solution posted by Jack is theoretically the fastest if sorting is required.
In the series [0,1,2,3,4,...], the n'th element has the value n if no elements before it are missing. So we can spot-check at any point to see if our missing element is before or after the element in question.
So you start by cutting the list in half and checking to see if the item at position x = x
[ 0 | 1 | 2 | 3 | 4 | 5 | 7 | 8 | 9 ]
^
Yup, list[4] == 4. So move halfway from your current point the end of the list.
[ 0 | 1 | 2 | 3 | 4 | 5 | 7 | 8 | 9 ]
^
Uh-oh, list[6] == 7. So somewhere between our last checkpoint and the current one, one element was missing. Divide the difference in half and check that element:
[ 0 | 1 | 2 | 3 | 4 | 5 | 7 | 8 | 9 ]
^
In this case, list[5] == 5
So we're good there. So we take half the distance between our current check and the last one that was abnormal. And oh.. it looks like cell n+1 is one we already checked. We know that list[6]==7 and list[5]==5, so the element number 6 is the one that's missing.
Since each step divides the number of elements to consider in half, you know that your worst-case performance is going to check no more than log2 of the total list size. That is, this is an O(log(n)) solution.
If this whole arrangement looks familiar, It's because you learned it back in your second year of college in a Computer Science class. It's a minor variation on the binary search algorithm--one of the most widely used index schemes in the industry. Indeed this question appears to be a perfectly-contrived application for this searching technique.
You can of course repeat the operation to find additional missing elements, but since you've already tested the values at key elements in the list, you can avoid re-checking most of the list and go straight to the interesting ones left to test.
Also note that this solution assumes a sorted list. If the list isn't sorted then obviously you sort it first. Except, binary searching has some notable properties in common with quicksort. It's quite possible that you can combine the process of sorting with the process of finding the missing element and do both in a single operation, saving yourself some time.
Finally, to sum up the list, that's just a stupid math trick thrown in for good measure. The sum of a list of numbers from 1 to N is just N*(N+1)/2. And if you've already determined that any elements are missing, then obvously just subtract the missing ones.
Technically, you can't really do without the loop (unless you only want to know if there's a missing number). However, you can accomplish this without first sorting the array.
The following algorithm uses O(n) time with O(n) space:
$range = [0, 1, 2, 3, 4, 6, 7];
$N = count($range);
$temp = str_repeat('0', $N); // assume all values are out of place
foreach ($range as $value) {
if ($value < $N) {
$temp[$value] = 1; // value is in the right place
}
}
// count number of leading ones
echo strspn($temp, '1'), PHP_EOL;
It builds an ordered identity map of N entries, marking each value against its position as "1"; in the end all entries must be "1", and the first "0" entry is the smallest value that's missing.
Btw, I'm using a temporary string instead of an array to reduce physical memory requirements.
I honestly don't get why you wouldn't want to use a loop. There's nothing wrong with loops. They're fast, and you simply can't do without them. However, in your case, there is a way to avoid having to write your own loops, using PHP core functions. They do loop over the array, though, but you simply can't avoid that.
Anyway, I gather what you're after, can easily be written in 3 lines:
function highestPlus(array $in)
{
$compare = range(min($in), max($in));
$diff = array_diff($compare, $in);
return empty($diff) ? max($in) +1 : $diff[0];
}
Tested with:
echo highestPlus(range(0,11));//echoes 12
$arr = array(9,3,4,1,2,5);
echo highestPlus($arr);//echoes 6
And now, to shamelessly steal Pé de Leão's answer (but "augment" it to do exactly what you want):
function highestPlus(array $range)
{//an unreadable one-liner... horrid, so don't, but know that you can...
return min(array_diff(range(0, max($range)+1), $range)) ?: max($range) +1;
}
How it works:
$compare = range(min($in), max($in));//range(lowest value in array, highest value in array)
$diff = array_diff($compare, $in);//get all values present in $compare, that aren't in $in
return empty($diff) ? max($in) +1 : $diff[0];
//-------------------------------------------------
// read as:
if (empty($diff))
{//every number in min-max range was found in $in, return highest value +1
return max($in) + 1;
}
//there were numbers in min-max range, not present in $in, return first missing number:
return $diff[0];
That's it, really.
Of course, if the supplied array might contain null or falsy values, or even strings, and duplicate values, it might be useful to "clean" the input a bit:
function highestPlus(array $in)
{
$clean = array_filter(
$in,
'is_numeric'//or even is_int
);
$compare = range(min($clean), max($clean));
$diff = array_diff($compare, $clean);//duplicates aren't an issue here
return empty($diff) ? max($clean) + 1; $diff[0];
}
Useful links:
The array_diff man page
The max and min functions
Good Ol' range, of course...
The array_filter function
The array_map function might be worth a look
Just as array_sum might be
$range = array(0,1,2,3,4,6,7);
// sort just in case the range is not in order
asort($range);
$range = array_values($range);
$indexes = array_keys($range);
$diff = array_diff($indexes,$range);
echo $diff[0]; // >> will print: 5
// if $diff is an empty array - you can print
// the "maximum value of the range plus one": $range[count($range)-1]+1
echo min(array_diff(range(0, max($range)+1), $range));
Simple
$array1 = array(0,1,2,3,4,5,6,7);// array with actual number series
$array2 = array(0,1,2,4,6,7); // array with your custom number series
$missing = array_diff($array1,$array2);
sort($missing);
echo $missing[0];
$range = array(0,1,2,3,4,6,7);
$max=max($range);
$expected_total=($max*($max+1))/2; // sum if no number was missing.
$actual_total=array_sum($range); // sum of the input array.
if($expected_total==$actual_total){
echo $max+1; // no difference so no missing number, then echo 1+ missing number.
}else{
echo $expected_total-$actual_total; // the difference will be the missing number.
}
you can use array_diff() like this
<?php
$range = array("0","1","2","3","4","6","7","9");
asort($range);
$len=count($range);
if($range[$len-1]==$len-1){
$r=$range[$len-1];
}
else{
$ref= range(0,$len-1);
$result = array_diff($ref,$range);
$r=implode($result);
}
echo $r;
?>
function missing( $v ) {
static $p = -1;
$d = $v - $p - 1;
$p = $v;
return $d?1:0;
}
$result = array_search( 1, array_map( "missing", $ARRAY_TO_TEST ) );

python to php code conversion

I have to translate two Python functions into PHP. The first one is:
def listspaces(string):
return [i -1 for i in range(len(string)) if string.startswith(' ', i-1)]
I am assuming that this will check for space in provided string and return True when first occurrence of space is found, is this correct ?
What is i-1 here ? is it -1 ?
In PHP we use [] for arrays . Here we are [] with return, will this function return true or false or array of locations of spaces ?
Second function is
def trimcopy(copy, spaces, length=350):
try:
if len(copy) < length:
return copy
else:
loc = 0
for space in spaces:
if space < length:
loc = space
else:
return copy[:loc]
except :
return None
Whats for space in spaces: here and whats is this return copy[:loc]
I think a good process for these type of conversions is:
work out what the code is doing
refactor it into a PHP-style in Python (this enables you to check that the logic still works, e.g. using assertion tests). e.g. convert list comprehensions to for loops
convert to PHP
For example, listspaces(string) returns the positions of spaces in string, and although using a list comprehension is Pythonic, it's not very "PHP-onic".
def listspaces2(string): #PHP-onic listspaces
space_positions = []
for i in range(len(string))]:
if string[i] == ' ':
space_positions.append(i)
return space_positions
The second example, trimcopy is rather trickier (since the try, except may purposefully be catching some expected - to the writer (!) - exceptions - two possibles are string not having a len and spaces containing values longer than len(copy)), but it's hard to say so it's a good idea to refactor in Python and test.
You can do array slicing in PHP like copy[:loc] using array_slice($copy, 0, $loc);.
Note: usually in Python we would state explicitly which exception we are defending against (as opposed to Pokemon exception handling).
You may notice that the first function could also have been written as
def listspaces(str):
return [i for i, c in enumerate(str) if c==' ']
That version has the following straightforward conversion to PHP:
function listspaces($str) {
$spaces = array();
foreach (str_split($str) as $i => $chr)
if ($chr == ' ') $spaces[] = $i;
return $spaces;
}
As for the other function, this seems to do the same thing in very nearly the same idiom:
function trimcopy($copy, $spaces, $length=350) {
if (strlen($copy) < $length) {
return $copy;
} else {
foreach ($spaces as $space) {
if ($space < $length) {
$loc = $space;
} else {
return substr($copy, 0, $loc);
}
}
}
}
As others have pointed out, the intent of both of these functions could probably be better expressed by using wordwrap.
Why don't you just test those functions to see what they are doing?
listspaces(string) returns an array with the positions of all spaces within the string:
$ ipython
IPython 0.10.2 -- An enhanced Interactive Python.
In [1]: def listspaces(string):
...: return [i -1 for i in range(len(string)) if string.startswith(' ', i-1)]
...:
In [2]: listspaces('Hallo du schöne neue Welt!')
Out[2]: [5, 8, 16, 21]
(i -1 is the position of a space when starting to count with zero)
I don't know much about Python and I can't paste the second function as there are to many "IndentationError"'s.
I think that trimcopy() will return a string (from input copy), where everything behind the last space position given in the array spaces (obviously a return value from listspaces()) is trimmed, unless the input is no longer than length.
In other words: the input is cut off at the highest space position that is smaller than length.
As of the example above, the part ' Welt!' will get cut off:
s = 'Hallo du schöne neue Welt!'
trimcopy( s, listspaces( s ) )
/* should return: 'Hallo du schöne neue' */
The first function returns indexes of all spaces in given string.
range(len(string)) results in list with numbers from 0 to length of the input string
if string.startswith(' ', i-1)] condition is evaluated for each index i, it returns true when string (here it is not a keyword) starts with ' ' at position given by the index i-1
The result is as feela posted.
For the second function I don't know what the spaces parameter is.
Hope this will help you to create a PHP version.
This is equivalent to both the functions in Python
list($short) = explode("\n",wordwrap($string,350));

Calculate average without being thrown by strays

I am trying to calculate an average without being thrown off by a small set of far off numbers (ie, 1,2,1,2,3,4,50) the single 50 will throw off the entire average.
If I have a list of numbers like so:
19,20,21,21,22,30,60,60
The average is 31
The median is 30
The mode is 21 & 60 (averaged to 40.5)
But anyone can see that the majority is in the range 19-22 (5 in, 3 out) and if you get the average of just the major range it's 20.6 (a big difference than any of the numbers above)
I am thinking that you can get this like so:
c+d-r
Where c is the count of a numbers, d is the distinct values, and r is the range. Then you can apply this to all the possble ranges, and the highest score is the omptimal range to get an average from.
For example 19,20,21,21,22 would be 5 numbers, 4 distinct values, and the range is 3 (22 - 19). If you plug this into my equation you get 5+4-3=6
If you applied this to the entire number list it would be 8+6-41=-27
I think this works pretty good, but I have to create a huge loop to test against all possible ranges. In just my small example there are 21 possible ranges:
19-19, 19-20, 19-21, 19-22, 19-30, 19-60, 20-20, 20-21, 20-22, 20-30, 20-60, 21-21, 21-22, 21-30, 21-60, 22-22, 22-30, 22-60, 30-30, 30-60, 60-60
I am wondering if there is a more efficient way to get an average like this.
Or if someone has a better algorithm all together?
You might get some use out of standard deviation here, which basically measures how concentrated the data points are. You can define an outlier as anything more than 1 standard deviation (or whatever other number suits you) from the average, throw them out, and calculate a new average that doesn't include them.
Here's a pretty naive implementation that you could fix up for your own needs. I purposely kept it pretty verbose. It's based on the five-number-summary often used to figure these things out.
function get_median($arr) {
sort($arr);
$c = count($arr) - 1;
if ($c%2) {
$b = round($c/2);
$a = $b-1;
return ($arr[$b] + $arr[$a]) / 2 ;
} else {
return $arr[($c/2)];
}
}
function get_five_number_summary($arr) {
sort($arr);
$c = count($arr) - 1;
$fns = array();
if ($c%2) {
$b = round($c/2);
$a = $b-1;
$lower_quartile = array_slice($arr, 1, $a-1);
$upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
$fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
return $fns;
}
else {
$b = round($c/2);
$a = $b-1;
$lower_quartile = array_slice($arr, 1, $a);
$upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
$fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
return $fns;
}
}
function find_outliers($arr) {
$fns = get_five_number_summary($arr);
$interquartile_range = $fns[3] - $fns[1];
$low = $fns[1] - $interquartile_range;
$high = $fns[3] + $interquartile_range;
foreach ($arr as $v) {
if ($v > $high || $v < $low)
echo "$v is an outlier<br>";
}
}
//$numbers = array( 19,20,21,21,22,30,60 ); // 60 is an outlier
$numbers = array( 1,230,239,331,340,800); // 1 is an outlier, 800 is an outlier
find_outliers($numbers);
Note that this method, albeit much simpler to implement than standard deviation, will not find the two 60 outliers in your example, but it works pretty well. Use the code for whatever, hopefully it's useful!
To see how the algorithm works and how I implemented it, go to: http://www.mathwords.com/o/outlier.htm
This, of course, doesn't calculate the final average, but it's kind of trivial after you run find_outliers() :P
Why don't you use the median? It's not 30, it's 21.5.
You could put the values into an array, sort the array, and then find the median, which is usually a better number than the average anyway because it discounts outliers automatically, giving them no more weight than any other number.
You might sort your numbers, choose your preferred subrange (e.g., the middle 90%), and take the mean of that.
There is no one true answer to your question, because there are always going to be distributions that will give you a funny answer (e.g., consider a biased bi-modal distribution). This is why may statistics are often presented using box-and-whisker diagrams showing mean, median, quartiles, and outliers.

Categories