I have the following code as the only way I know to convert a float to a string with the fewest possible significant digits required to reproduce it (dtoa() with mode 4 in C).
$i = 14;
do {
$str = sprintf("%.{$i}e", $x);
$i++;
} while ($x != (float) $str);
The Hack typechecker reports an error because it expects the first parameter to sprintf() to be a literal string so it can check it against the arguments. Is there a way I can turn that off for this line?
Or is there another way I could achieve the same thing? Perhaps with the NumberFormatter class?
The typechecker has various methods of suppressing errors. The most appropriate in this case is probably HH_IGNORE_ERROR to suppress this particular error.
As written, your code will produce an error like Typing[4110] Invalid argument. Take the error code, in this case "4110", and use it to add the ignore annotation:
/* HH_IGNORE_ERROR[4110] Allow dynamic sprintf() explain explain etc */
$str = sprintf("%.{$i}e", $x);
I think your error code probably is exactly 4110, but I don't have the typechecker in front of me to verify for sure, make sure to use the right code from your error message.
Note that for technical reasons the parser is pretty finicky about HH_IGNORE_ERROR -- it must be a block-style comment with no extra whitespace from what I've written above, until after the final ] at which point you can write as much as you like in the comment explaining.
Related
For example
$var = '10/2';
Is there a way for me to output the value 5 from that easily?
So something like this:
$foo = ($var) + 5;
I want $foo to have a value of 10?
Currently, the best way I know is to explode $var and then divide $var[0] by $var[1]. Is there a quicker way?
Another way of asking; Is there a way to tell the system to treat '10/2' as an equation instead of a string?
Could be you can use a solution like this
$foo = eval('return '.$var +5 .';');
print $foo;
eval require a line of code .. so you could build a valid code line
eval() to either assign in place or return:
eval("\$foo = ($var) + 5;");
$foo = eval("return ($var) + 5;");
First things first, EVAL IS HAZARDOUS AND SHOULD NEVER BE USED UNLESS YOU KNOW WHAT YOU'RE DOING! As an addendum to this, you don't know what you're doing. No matter how experienced you are as a programmer, when it comes to eval just assume you don't know what you're doing. Believe me, in the long run your sanity will thank you.
As for your problem, you're basically asking how to write an equation parser. Writing a full blown one that can handle all cases of valid input (and reliably identify invalid input) is a much bigger job than you might at first think, so if you really do need to parse string equations it may be better to look for a library that will do it for you because the chances are whoever wrote the library thought of a lot of stuff that you didn't, such as handling operator precedence and how parenthesis can modify it, how to pass strings in scientific notation, etc.
One of your comments suggests using PHP Math Parser. I've never personally used it, but I know the author by reputation well enough to believe it's a reliable library.
If your use case is always going to be as simple as your example then you can simply split the string and process the resulting fragments.
$parts = explode ("/", $var);
$foo = $parts[0] / $parts[1];
The NO-BREAK SPACE and many other UTF-8 symbols need 2 bytes to its representation; so, in a supposed context of UTF8 strings, an isolated (not preceded by xC2) byte of non-ASCII (>127) is a non-recognized character... Ok, it is only a layout problem (!), but it corrupts the whole string?
How to avoid this "non-expected behaviour"? (it occurs in some functions and not in others).
Example (generating an non-expected behaviour with preg_match only):
header("Content-Type: text/plain; charset=utf-8"); // same if text/html
//PHP Version 5.5.4-1+debphp.org~precise+1
//using a .php file enconded as UTF8.
$s = "THE UTF-8 NO-BREAK\xA0SPACE"; // a non-ASCII byte
preg_match_all('/[-\'\p{L}]+/u',$s,$m);
var_dump($m); // empty! (corrupted)
$m=str_word_count($s,1);
var_dump($m); // ok
$s = "THE UTF-8 NO-BREAK\xC2\xA0SPACE"; // utf8-encoded nbsp
preg_match_all('/[-\'\p{L}]+/u',$s,$m);
var_dump($m); // ok!
$m=str_word_count($s,1);
var_dump($m); // ok
This is not a complete answer because I not say why some PHP functions "fail entirely on invalidly encoded strings" and others not: see #deceze at question's comments and #hakre answer.
If you are looking for an PCRE-replacement for str_word_count(), see my preg_word_count() below.
PS: about "PHP5's build-in-library behaviour uniformity" discussion, my conclusion is that PHP5 is not so bad, but we have create a lot of user-defined wrap (façade) functions (see diversity of PHP-framworks!)... Or wait for PHP6 :-)
Thanks #pebbl! If I understand your link, there are a lack of error messagens on PHP. So a possible workaround of my illustred problem is to add an error condition... I find the condition here (it ensures valid utf8!)... And thanks #deceze for remember that exists a build-in function for check this condition (I edited the code after).
Putting the issues together, a solution translated to a function (EDITED, thanks to #hakre comments!),
function my_word_count($s,$triggError=true) {
if ( preg_match_all('/[-\'\p{L}]+/u',$s,$m) !== false )
return count($m[0]);
else {
if ($triggError) trigger_error(
// not need mb_check_encoding($s,'UTF-8'), see hakre's answer,
// so, I wrong, there are no 'misteious error' with preg functions
(preg_last_error()==PREG_BAD_UTF8_ERROR)?
'non-UTF8 input!': 'other error',
E_USER_NOTICE
);
return NULL;
}
}
Now (edited after thinking around #hakre answer), about uniform behaviour: we can develop a reasonable function with PCRE library that mimic the str_word_count behaviour, accepting bad UTF8. For this task I used the #bobince iconv tip:
/**
* Like str_word_count() but showing how preg can do the same.
* This function is most flexible but not faster than str_word_count.
* #param $wRgx the "word regular expression" as defined by user.
* #param $triggError changes behaviour causing error event.
* #param $OnBadUtfTryAgain mimic the str_word_count behaviour.
* #return 0 or positive integer as word-count, negative as PCRE error.
*/
function preg_word_count($s,$wRgx='/[-\'\p{L}]+/u', $triggError=true,
$OnBadUtfTryAgain=true) {
if ( preg_match_all($wRgx,$s,$m) !== false )
return count($m[0]);
else {
$lastError = preg_last_error();
$chkUtf8 = ($lastError==PREG_BAD_UTF8_ERROR);
if ($OnBadUtfTryAgain && $chkUtf8)
return preg_word_count(
iconv('CP1252','UTF-8',$s), $wRgx, $triggError, false
);
elseif ($triggError) trigger_error(
$chkUtf8? 'non-UTF8 input!': "error PCRE_code-$lastError",
E_USER_NOTICE
);
return -$lastError;
}
}
Demonstrating (try other inputs!):
$s = "THE UTF-8 NO-BREAK\xA0SPACE"; // a non-ASCII byte
print "\n-- str_word_count=".str_word_count($s,0);
print "\n-- preg_word_count=".preg_word_count($s);
$s = "THE UTF-8 NO-BREAK\xC2\xA0SPACE"; // utf8-encoded nbsp
print "\n-- str_word_count=".str_word_count($s,0);
print "\n-- preg_word_count=".preg_word_count($s);
Okay, I can somewhat feel your disappointment that things didn't worked easily out switching from str_word_count to preg_match_all. However the way you ask the question is a bit imprecise, I try to answer it anyway. Imprecise, because you have a high amount of wrong assumptions that you obviously take for granted (it happens to the best of us). I hope I can correct this a little:
$s = "THE UTF-8 NO-BREAK\xA0SPACE"; // a non-ASCII byte
preg_match_all('/[-\'\p{L}]+/u',$s,$m);
var_dump($m); // empty! (corrupted)
This code is wrong. You blame PHP here for not giving a warning or something, but I must admit, the only one to blame here is "you". PHP does allow you to check for the error. Before you judge so early that a warning has to be given in error handling, I have to remind you that there are different ways how to deal with errors. Some dealing is with giving messages, another type of dealing with errors is by telling about them with return values. And if we visit the manual page of preg_match_all and look for the documentation of the return value, we can find this:
Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.
The part at the end:
FALSE if an error occurred [Highlight by me]
is some common way in error handling to signal the calling code that some error occured. Let's review your code of which you think it does not work:
$s = "THE UTF-8 NO-BREAK\xA0SPACE"; // a non-ASCII byte
preg_match_all('/[-\'\p{L}]+/u',$s,$m);
var_dump($m); // empty! (corrupted)
The only thing this code shows is that the person who typed it (I guess it was you), clearly decided to not do any error handling. That's fine unless that person as well protests that the code won't work.
The sad thing about this is, that this is a common user-error, if you write fragile code (e.g. without error handling), don't expect it to work in a solid manner. That will never happen.
So what does this require when you program? First of all you should know about the functions you use. That normally requires knowledge about the input parameters and the return values. You find that information normally documented. Use the manual. Second you actually need to care about return values and do the error handling your own. The function alone does not know what it means if an error occured. Is it an exception? Then you need to do the exception handling probably as in the demo example:
<?php
/**
* #link http://stackoverflow.com/q/19316127/367456
*/
$s = "THE UTF-8 NO-BREAK\xA0SPACE"; // a non-ASCII byte
$result = preg_match_all('/[-\'\p{L}]+/u',$s,$m);
if ($result === FALSE) {
switch (preg_last_error()) {
case PREG_BAD_UTF8_ERROR:
throw new InvalidArgumentException(
'UTF-8 encoded binary string expected.'
);
default:
throw new RuntimeException('preg error occured.');
}
}
var_dump($m); // nothing at all corrupted...
In any case it means you need to look what you do, learn about it and write more code. No magic. No bug. Just a bit of work.
The other part you've in front of you is perhaps to understand what characters in a software are, but that is more independent to concrete programming languages like PHP, for example you can take an introductory read here:
A tutorial on character code issues
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
The first is a must read or perhaps must-bookmark, because it is a lot to read but it explains it all very good.
I don't know if it's just me or not, but I am allergic to one line ifs in any c like language, I always like to see curly brackets after an if, so instead of
if($a==1)
$b = 2;
or
if($a==1) $b = 2;
I'd like to see
if($a==1){
$b = 2;
}
I guess I can support my preference by arguing that the first one is more prone to errors, and it has less readability.
My problem right now is that I'm working on a code that is packed with these one line ifs, I was wondering if there is some sort of utility that will help me correct these ifs, some sort of php code beautifier that would do this.
I was also thinking of developing some sort of regex that could be used with linux's sed command, to accomplish this, but I'm not sure if that's even possible given that the regex should match one line ifs, and wrap them with curley brackets, so the logic would be to find an if and the conditional statement, and then look for { following the condition, if not found then wrap the next line, or the next set of characters before a line break and then wrap it with { and }
What do you think?
Your best bet is probably to use the built-in PHP tokenizer, and to parse the resulting token stream.
See this answer for more information about the PHP Tokenizer: https://stackoverflow.com/a/5642653/1005039
You can also take a look at a script I wrote to parse PHP source files to fix another common problem in legacy code, namely to fix unquoted array indexes:
https://github.com/GustavBertram/php-array-index-fixer/blob/master/aif.php
The script uses a state machine instead of a generalized parser, but in your case it might be good enough.
I found this line of code in the Virtuemart plugin for Joomla on line 2136 in administrator/components/com_virtuemart/classes/ps_product.php
eval ("\$text_including_tax = \"$text_including_tax\";");
Scrap my previous answer.
The reason this eval() is here is shown in the php eval docs
This is what's happening:
$text_including_tax = '$tax ...';
...
$tax = 10;
...
eval ("\$text_including_tax = \"$text_including_tax\";");
At the end of this $text_including_tax is equal to:
"10 ..."
The single quotes prevents $tax being included in the original definition of the string. By using eval() it forces it to re-evaluate the string and include the value for $tax in the string.
I'm not a fan of this particular method, but it is correct. An alternative could be to use sprintf()
This code seems to be a bad way of forcing $text_including_tax to be a string.
The reason it is bad is because if $text_including_tax can contain data entered by a user it is possible for them to execute arbitrary code.
For example if $text_include_tax was set to equal:
"\"; readfile('/etc/passwd'); $_dummy = \"";
The eval would become:
eval("$text_include_tax = \"\"; readfile('/etc/passwd'); $_dummy =\"\";");
Giving the malicious user a dump of the passwd file.
A more correct method for doing this would be to cast the variable to string:
$text_include_tax = (string) $text_include_tax;
or even just:
$text_include_tax = "$text_include_tax";
If the data $text_include_tax is only an internal variable or contains already validated content there isn't a security risk. But it's still a bad way to convert a variable to a string because there are more obvious and safer ways to do it.
I'm guessing that it's a funky way of forcing $text_including_tax to be a string and not a number.
Perhaps it's an attempt to cast the variable as a string? Just a guess.
You will need the eval to get the tax rate into the output. Just moved this to a new server and for some reason this line caused a server error. As a quick fix, I changed it to:
//eval ("\$text_including_tax = \"$text_including_tax\";");
$text_including_tax = str_replace('$tax', $tax, $text_including_tax);
It is evaluating the string as PHP code.
But it seems to be making a variable equal itself? Weird.
As others have pointed out, it's code written by someone who doesn't know what on earth they're doing.
I also had a quick browse of the code to find a total lack of text escaping when putting HTML/URIs/etc. together. There are probably many injection holes to be found here in addition to the eval issues, if you can be bothered to audit it properly.
I would not want this code running on my server.
I've looked through that codebase before. It's some of the worst PHP I have seen.
I imagine you'd do that kind of thing to cover up mistakes you made somewhere else.
No, it's doing this:
Say $text_including_tax = "flat". This code evaluates the line:
$flat = "flat";
It isn't necessarily good, but I did use a technique like this once to suck all the MySQL variables in an array like this:
while ($row = mysql_fetch_assoc($result)) {
$var = $row["Variable_name"];
$$var = $row["Value"];
}
At work today we were trying to come up with any reason you would use strspn.
I searched google code to see if it's ever been implemented in a useful way and came up blank. I just can't imagine a situation in which I would really need to know the length of the first segment of a string that contains only characters from another string. Any ideas?
Although you link to the PHP manual, the strspn() function comes from C libraries, along with strlen(), strcpy(), strcmp(), etc.
strspn() is a convenient alternative to picking through a string character by character, testing if the characters match one of a set of values. It's useful when writing tokenizers. The alternative to strspn() would be lots of repetitive and error-prone code like the following:
for (p = stringbuf; *p; p++) {
if (*p == 'a' || *p == 'b' || *p = 'c' ... || *p == 'z') {
/* still parsing current token */
}
}
Can you spot the error? :-)
Of course in a language with builtin support for regular expression matching, strspn() makes little sense. But when writing a rudimentary parser for a DSL in C, it's pretty nifty.
It's based on the the ANSI C function strspn(). It can be useful in low-level C parsing code, where there is no high-level string class. It's considerably less useful in PHP, which has lots of useful string parsing functions.
Well, by my understanding, its the same thing as this regex:
^[set]*
Where set is the string containing the characters to be found.
You could use it to search for any number or text at the beginning of a string and split.
It seems it would be useful when porting code to php.
I think its great for blacklisting and letting the user know from where the error started. Like MySQL returns part of the query from where the error occured.
Please see this function, that lets the user know which part of his comment is not valid:
function blacklistChars($yourComment){
$blacklistedChars = "!##$%^&*()";
$validLength = strcspn($yourComment, $blacklistedChars);
if ($validLength !== strlen($yourComment))
{
$error = "Your comment contains invalid chars starting from here: `" .
substr($yourComment, (int) '-' . $validLength) . "`";
return $error;
}
return false;
}
$yourComment = "Hello, why can you not type and $ dollar sign in the text?";
$yourCommentError = blacklistChars($yourComment);
if ($yourCommentError <> false)
echo $yourCommentError;
It is useful specificaly for functions like atoi - where you have a string you want to convert to a number, and you don't want to deal with anything that isn't in the set "-.0123456789"
But yes, it has limited use.
-Adam