PHP is_numeric vs ~(~(float)$value) performance and function

PHP is_numeric vs ~(~(float)$value) performance and function - php

Recently I wanted to search an array for numeric values (ints, doubles, and numbers with exponent notation) as quickly as possible.
I initially used 'is_numeric()' as we often use that as our goto for checking but I wanted to see if I could put in something faster.
I noticed that if I cast to float then as long as the value is numeric PHP will produce a value other than zero. So using the bitwise operators I can do a logical NOT zero within the if statement surrounding the search.
if (~(~(float)$value)) {
// add to result array
}
After initial testing I found things seemed to speed up by 2 whole seconds with a moderately sized array of numerics and non numerics. However this was little more than a simple unit test.
Does anyone have experience of performance of casting as a float vs is_numeric? I know they're probably not 100% functionally equivalent (I think the cast to float would convert hexadecimal) but for my purposes I'm only going to be casting ints, doubles and numbers with an exponent notation. Is this a performance gain over is_numeric() or have I imagined this?

warning!
isnumeric () is not just a whim, I am attaching a small piece of code that shows you the error that your conversion type makes. in many attacks on php there are strings that can be both numbers and squeaks where the attackers inject bad code.
code:
<?php
$a="1809809808908099878758765<?php echo \"I powned you\"; ?>";
echo is_numeric($a)?"yes":"no"; // out no
echo "\n";
echo (~(~(float)$a))?"Yes":"No"; // out Yes
if you do it that way you could gain performance but depending on what you have to do you could open a hole in security!

Related

PHP Rounding Float

I'm working on a system where I need to round down to the nearest penny financial payments. Naively I thought I would multiply up by 100, take the floor and then divide back down. However the following example is misbehaving:
echo 1298.34*100;
correctly shows:
129834
but
echo floor(1298.34*100);
unexpectedly shows:
129833
I get the same problem using intval for example.
I suspect the multiplication is falling foul of floating point rounding. But if I can't rely on multiplication, how can I do this? I always want to round down reliably, and I don't need to take negative amounts into consideration.
To be clear, I want any fractional penny amounts to be stripped off:
1298.345 should give 1298.34
1298.349 should give 1298.34
1298.342 should give 1298.34

Since you mention you only use this for displaying purposes, you could take the amount, turn it into a string and truncate anything past the second decimal. A regular expression could do the job:
preg_match('/\d+\.{0,1}\d{0,2}/', (string) $amount, $matches);
This expression works with any number of decimals (including zero). How it works in detail:
\d+ matches any number of digits
\.{0,1} matches 0 or 1 literal dot
\d{0,2} matches zero or two digits after the dot
You can run the following code to test it:
$amounts = [
1298,
1298.3,
1298.34,
1298.341,
1298.349279745,
];
foreach ($amounts as $amount) {
preg_match('/\d+\.{0,1}\d{0,2}/', (string) $amount, $matches);
var_dump($matches[0]);
}
Also available as a live test in this fiddle.

You can use round() to round to the required precision, and with the expected behavior when rounding the final 5 (which is another financial hurdle you might encounter).
$display = round(3895.0 / 3.0, 2);
Also, as a reminder, I have the habit of always writing floating point integers with a final dot or a ".0". This prevents some languages from inferring the wrong type and doing, say, integer division, so that 5 / 3 will yield 1.
If you need a "custom rounding" and want to be sure, well, the reason it didn't work is because not all floating point numbers exist in machine representation. 1298.34 does not exist; what does exist (I'm making the precise numbers up!) in its place might be 1298.33999999999999124.
So when you multiply it by 100 and get 129833.999999999999124, of course truncating it will yield 129833.
What you need to do then is to add a small quantity that must be enough to cover the machine error but not enough to matter in the financial calculation. There is an algorithm to determine this quantity, but you can probably get away with "one thousandth after upscaling".
So:
$display = floor((3895.0 / 3.0)*100.0 + 0.001);
Please be aware that this number, which you will "see" as 1234.56, might again not exist precisely. It might really be 1234.5600000000000123 or 1234.559999999999876. This might have consequences in complex, composite calculations.

Since You're working with financial, You should use some kind of Money library (https://github.com/moneyphp/money). Almost all other solutions are asking for trouble.
Other ways, which I don't recommend, are: a) use integers only, b) calculate with bcmath or c) use Number class from the Money library e.g.:
function getMoneyValue($value): string
{
if (!is_numeric($value)) {
throw new \RuntimeException(sprintf('Money value has to be a numeric value, "%s" given', is_object($value) ? get_class($value) : gettype($value)));
}
$number = \Money\Number::fromNumber($value)->base10(-2);
return $number->getIntegerPart();
}

he other function available is round(), which takes two parameters -
the number to round, and the number of decimal places to round to. If
a number is exactly half way between two integers, round() will always
round up.
use round :
echo round (1298.34*100);
result :
129834

CTF Type Juggling with ripemd160 hash

I am trying to solve a CTF in which the juggling type should be used. The code is:
if ($_GET["hash"] == hash("ripemd160", $_GET["hash"]))
{
echo $flag;
}
else
{
echo "<h1>Bad Hash</h1>";
}
I made a script in python which checks random hashes in ripemd160 that begins with "0e" and ends with only numbers. The code is:
def id_generator(size, chars=string.digits):
return ''.join(random.choice(chars) for _ in range(size))
param = "0e"
results = []
while True:
h = hashlib.new('ripemd160')
h.update("{0}".format(str(param)).encode('utf-8'))
hashed = h.hexdigest()
if param not in results:
print(param)
if hashed.startswith("0e") and hashed[2:].isdigit():
print(param)
print(hashed)
break
results.append(param)
else:
print("CHECKED")
param = "0e" + str(id_generator(size=10))
Any suggestions on how to solve it? Thank you!

There seems to be a bit of misunderstanding in the comments, so I'll start by explaining the problem a little more:
Type juggling refers to the behaviour of PHP whereby variables are implicitly cast to different data types under certain conditions. For example, all the following logical expressions will evaluate to true in PHP:
0 == 0 // int vs. int
"0" == 0 // str -> int
"abc" == 0 // any non-numerical string -> 0
"1.234E+03" == "0.1234E+04" // string that looks like a float -> float
"0e215962017" == 0 // another string that looks like a float
The last of these examples is interesting because its MD5 hash value is another string consisting of 0e followed by a bunch of decimal digits (0e291242476940776845150308577824). So here's another logical expression in PHP that will evaluate to true:
"0e215962017" == md5("0e215962017")
To solve this CTF challenge, you have to find a string that is "equal" to its own hash value, but using the RIPEMD160 algorithm instead of MD5. When this is provided as a query string variable (e.g., ?hash=0e215962017), then the PHP script will disclose the value of a flag.
Fake hash collisions like this aren't difficult to find. Roughly 1 in every 256 MD5 hashes will start with '0e', and the probability that the remaining 30 characters are all digits is (10/16)^30. If you do the maths, you'll find that the probability of an MD5 hash equating to zero in PHP is approximately one in 340 million. It took me about a minute (almost 216 million attempts) to find the above example.
Exactly the same method can be used to find similar values that work with RIPEMD160. You just need to test more hashes, since the extra hash digits mean that the probability of a "collision" will be approximately one in 14.6 billion. Quite a lot, but still tractable (in fact, I found a solution to this challenge in about 15 minutes, but I'm not posting it here).
Your code, on the other hand, will take much, much longer to find a solution. First of all, there is absolutely no point in generating random inputs. Sequential values will work just as well, and will be much faster to generate.
If you use sequential input values, then you also won't need to worry about repeating the same hash calculations. Your code uses a list structure to store previously hashed values. This is a terrible idea. Searching for an item in a list is an O(n) operation, so once your code has (unsuccessfully) tested a billion inputs, it will have to compare every new input against each of these billion inputs at each iteration, causing your code to grind to a complete standstill. Your code would actually run a lot faster if you didn't bother checking for duplicates. When you have time, I suggest you learn when to use lists, dicts and sets in Python.
Another problem is that your code only tests 10-digit numbers, which means it can only test a maximum of 10 billion possible inputs. Based on the numbers given above, are you sure this is a sensible limit?
Finally, your code is printing every single input string before you calculate its hash. Before your program outputs a solution, you can expect it to print out somewhere in the order of a billion screenfuls of incorrect guesses. Is there any point in doing this? No.
Here's the code I used to find the MD5 collision I mentioned earlier. You can easily adapt it to work with RIPEMD160, and you can convert it to Python if you like (although the PHP code is much simpler):
$n = 0;
while (1) {
$s = "0e$n";
$h = md5($s);
if ($s == $h) break;
$n++;
}
echo "$s : $h\n";
Note: Use PHP's hash_equals() function and strict comparison operators to avoid this sort of vulnerability in your own code.

Reliable Margin of Error for Float -> String -> Float Conversion?

I have a float value that I need to store as a string in PHP and then compare later after casting back into a float.
Due to the conversion I know that relying on equality would be a mistake, as there's potential for a loss of precision, so I'm doing something like the following:
if (abs((float)$string_value - $float_value) < 0.001) { echo "Values are close enough\n"; }
Now, while a margin for error of 0.001 should be fine for my immediate purposes, it got me wondering; what is the smallest margin of error that I can reliably/safely use?
I realise that the safe margin of error will change with the size of the float (i.e- larger values have less or even no fractional precision), so an answer should probably account for this.
So to put it another way; given a float value that I want to store in base 10 and read back, how can I reliably decide what my margin of error should be such that I can reasonably confirm that the two values are the same?
Unfortunately the values I'm handling must be stored in plain decimal form, so my usual go-to of packing them as a network order 64-bit integer is not an option here ☹️
EDIT: To clarify; please assume that my question is about handling arbitrarily sized floats; the example code I've given is for a recent case where I'm handling floats within a limited range, so setting the margin of error manually is fine, but I'd like to be able to handle floats of any magnitude in future.

As mentioned in Mark Dickinson's comment, it is possible to convert a floating-point number to a string and back without losing precision. This only works if
you use enough significant decimal digits (17 for IEEE doubles)
the conversions are accurate (i.e. they're guaranteed to convert to the nearest number)
From a quick look, it seems that casting a double $f to a string in PHP, either implicitly or with (string) $f, only uses 14 significant digits, so this method isn't accurate enough. But you can use sprintf with a %.16e conversion specifier to get 17 significant digits. So after the following roundtrip
$s = sprintf("%.16e", $f);
$f2 = (double) $s;
$f2 should equal $f exactly unless PHP uses suboptimal algorithms internally.
Note that the %e conversion specifier uses scientific (exponential) notation. If you need plain decimal strings, you can use the %f specifier and calculate the required number of digits after the decimal point using log10:
if ($f != 0) {
$prec = 16 - floor(log10(abs($f)));
if ($prec < 0) $prec = 0;
}
else {
$prec = 0;
}
$s = sprintf("%.${prec}f", $f);
This can produce extremely long strings for very small or large numbers, though.
It would probably require a huge amount of research to tell the whether these methods are completely reliable, and if not what the maximum error is. It all depends on several implementation details like PHP version, underlying C library, etc.
Another idea is to compare the string representations instead of floating-point values:
# Assuming $string_value was also converted with float_to_string
if ($string_value == float_to_string($float_value)) {
echo "Values are close enough\n";
}
This should be reliable as long as you stick to the same PHP version.
If you must compare floating-point numbers, it often makes more sense to compare the relative error. See Bruce Dawson's excellent blog for more details.

PHP Long string. Output it all

So I got a really long string, made by a calculator.
$string='483451102828322427131269442894636268716773727170';
$result=(8902543901+$string)*($string/93.189)/($string)+55643907015.57895461;
echo $result;
This outputs 5.1878558931668E+45
So now my question is. How can I output the whole string, without that nasty E+45?

PHP on a 64 bit machine can only accurately calculate number up until 9223372036854775807. As soon as you calculate with numbers higher than that, php will switch to floats which may loose some of it's precision, especially when you use divisions.
There's an extension for php that will allow you to make calculations based on string, called BCMath.
Example:
$string = '483451102828322427131269442894636268716773727170';
$result = bcadd($string, 8902543901);
echo $result;
bcadd() is for additions, bcdiv() for divisions and bcmul() for multiplying.

You can't print exact value because you are using calculation, so this $string becomes a number (float in this case) and all numbers have limited precision.
If you want to do operations on big numbers you should use BCMath
However if you want to display it without scientific notation you can do it using:
echo sprintf("%f",$result);
or
echo sprintf("%.0f",$result);
if you want to omit decimal part

algorithm to convert md5 (or maybe another hashing method?) to integer where it is possible to set possible resulting integer ranges (eg: 1-10000)?

the topic pretty much describes what we would like to accomplish.
a) start with a possible range of integers, for example, 1 to 10000.
b) take any md5 hash, run it thru this algo.
c) result that pops out will be an integer between 1 to 10000.
we are open to using another hashing method too.
the flow would ideally look like this:
string -> md5(string) -> algo(md5(string),range) -> resulting integer within range
is something like this possible?
final note: the range will always start with 1.
if you have an answer, feel free to post just the general idea, or if you so desire, php snippet works too :)
thanks!

Since MD5 (and SHA-1, etc.) will give you 128 bits of data (in PHP, you'll get it in hexadecimal string notation, so you need to convert it to an integer first). That number modulo 10000 will give you your integer.
Note however that many different hashes will convert to the same integer; this is unavoidable with any sort of conversion to your integer range, as the modulo operation essentially maps a larger set of numbers (in this case, 128 bits, that is numbers from 0 to 340,282,366,920,938,463,463,374,607,431,768,211,456) to a smaller set of numbers (less than 17 bits, numbers from 1 to 100,000).

since the range that we want will always start at 1, the following works great. all credit goes to Piskvor, as he was the one who provided the basic idea of how to go at this.
the code below seams to accomplish what we want. please chime in if this can be (not the code, its just for reference, but if the idea) improved at all. running the code below will result in 6305 / 10000 unique results. that in our case is good enough.
<?
$final=array();
$range=10000;
for($i=1;$i<=$range;$i++){
$string='this is my test string - attempt #'.$i;
echo 'initial string: '.$string.PHP_EOL;
$crc32=crc32($string);
echo 'crc32 of string: '.$crc32.PHP_EOL;
$postalgo=$crc32%$range;
echo 'post algo: '.$postalgo.PHP_EOL;
if(!in_array($postalgo,$final)){
$final[]=$postalgo;
}
}
echo 'unique results for '.($i-1).' attempts: '.count($final).PHP_EOL;
?>
enjoy!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP is_numeric vs ~(~(float)$value) performance and function - php

Related

PHP Rounding Float

CTF Type Juggling with ripemd160 hash

Reliable Margin of Error for Float -> String -> Float Conversion?

PHP Long string. Output it all

algorithm to convert md5 (or maybe another hashing method?) to integer where it is possible to set possible resulting integer ranges (eg: 1-10000)?

Categories

Resources