'Greedy Token Parsing' in PHP - php

What is 'Greedy token parsing' in PHP ?
I found this in Codeigniter guideline:
"Always use single quoted strings unless you need variables parsed, and in cases where you do need variables parsed, use braces to prevent greedy token parsing."
"My string {$foo}"
An answer with good explanation will help.
Thanks !!

Greedy token parsing refers to something like this:
$fruit = "apple";
$amount = 3;
$string = "I have $amount $fruits";
Possible expected output: "I have 3 apples"
Actual output: "I have 3 "
Of course, this is a beginner mistake, but even experts make mistakes sometimes!
Personally, I don't like to interpolate variables at all, braces or not. I find my code much more readable like this:
$string = "I have ".$amount." ".$fruit."s";
Note that code editors have an easier job colour-coding this line, as shown here in Notepad++:
Then again, some people might prefer letting the engine do the interpolation:
$string = sprintf("I have %d %ss",$amount,$fruit);
It's all up to personal preference really, but the point made in the guideline you quoted is to be careful of what you are writing.

"Greedy" is a general term in parsing referring to "getting as much as you can". The opposite would be "ungreedy" or "get only as much as you need".
The difference in variable interpolation is, for example:
$foo = 'bar';
echo "$foos";
The parser here will greedily parse as much as makes sense and try to interpolate the variable "$foos", instead of the actually existing variable "$foo".
Another example in regular expressions:
preg_match('/.+\s/', 'foo bar baz')
This greedily grabs "foo bar", because it's the longest string that fits the pattern .+\s. On the other hand:
preg_match('/.+?\s/', 'foo bar baz')
This ungreedy +? only grabs "foo", which is the minimum needed to match the pattern.

Related

preg_replace not working but regex working [duplicate]

Lately I've been studying (more in practice to tell the truth) regex, and I'm noticing his power. This demand made by me (link), I am aware of 'backreference'. I think I understand how it works, it works in JavaScript, while in PHP not.
For example I have this string:
[b]Text B[/b]
[i]Text I[/i]
[u]Text U[/u]
[s]Text S[/s]
And use the following regex:
\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]
This testing it on regex101.com works, the same for JavaScript, but does not work with PHP.
Example of preg_replace (not working):
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
While this way works:
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/(b|i|u|s)\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
I can not understand where I'm wrong, thanks to everyone who helps me.
It is because you use a double quoted string, inside a double quoted string \1 is read as the octal notation of a character (the control character SOH = start of heading), not as an escaped 1.
So two ways:
use single quoted string:
'/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i'
or escape the backslash to obtain a literal backslash (for the string, not for the pattern):
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\\1\]/i"
As an aside, you can write your pattern like this:
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\1]~i';
// with oniguruma notation
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{1}]~i';
// oniguruma too but relative:
// (the second group on the left from the current position)
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{-2}]~i';

Find all instances of $variables in single-quoted strings?

Someone has made a change to the repo replacing all or most instances of " with ' to assign strings. This has had the unintended effect of breaking many strings that are parsing variables. Examples:
$query = 'ALTER TABLE ' . $items . ' ADD `user_$name`';
$query .= '($length)';
etc.
Obviously this is breaking SQL queries, but it may not be limited to just strings assigned to $query.
Is there a regex or some function of PhpStorm that I can use to find all instances of this and fix them, either by reverting back to " or using ' with concatenation?
So you want to find single-quoted string literals containing a $. The obvious thing to try would be this:
/'[^'$]*[$][^']*'/
That should find any single-quoted string containing a $. Unfortunately that doesn't quite work, since there may be backslash-escaped characters. So this is slightly better:
/'(?:[^'$]|\\.)*[$](?:[^']|\\.)*'/
But another problem is that there might be more than you might have something like this:
'foo' . $bar . 'baz'
and that regex would match ' . $bar . '. To get round this, we could write a regex that matches a substring from the start of a line until the first single-quoted string literal that contains $:
/^(?:[^']|'(?:[^\\']|\\.)*')*'(?:[^'$]|\\.)*[$](?:[^']|\\.)*'/
Caveats:
I haven't tested these, so they may not be quite right.
My PHP is a little rusty, so I'm not sure exactly what string literals are supposed to look like.
I don't know how you're running the regex; if you're using a regex constructor from a string literal you need an extra layer of character escaping (e.g. in javascript, /\\/ is equivalent to new RegExp('\\\\')).
You probably have to set a flag, possibly called m (again, I don't know what regex engine you're using), so that the initial ^ will match at the start of every line, not just the start of the string; likewise, unset the flag which may be called s, since you don't want . to match newlines.
There may be some obscure case I haven't thought of. It's theoretically possible there might be a code comment containing a case you don't want to replace.

Backreference does not work in PHP

Lately I've been studying (more in practice to tell the truth) regex, and I'm noticing his power. This demand made by me (link), I am aware of 'backreference'. I think I understand how it works, it works in JavaScript, while in PHP not.
For example I have this string:
[b]Text B[/b]
[i]Text I[/i]
[u]Text U[/u]
[s]Text S[/s]
And use the following regex:
\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]
This testing it on regex101.com works, the same for JavaScript, but does not work with PHP.
Example of preg_replace (not working):
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
While this way works:
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/(b|i|u|s)\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
I can not understand where I'm wrong, thanks to everyone who helps me.
It is because you use a double quoted string, inside a double quoted string \1 is read as the octal notation of a character (the control character SOH = start of heading), not as an escaped 1.
So two ways:
use single quoted string:
'/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i'
or escape the backslash to obtain a literal backslash (for the string, not for the pattern):
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\\1\]/i"
As an aside, you can write your pattern like this:
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\1]~i';
// with oniguruma notation
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{1}]~i';
// oniguruma too but relative:
// (the second group on the left from the current position)
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{-2}]~i';

Is it possible in PHP to escape newlines in strings as in C?

In C you can continue a string literal in the next line escaping the newline character:
char* p = "hello \
new line.";
( My C is a bit rusty and this could be non 100% accurate )
But in php, the backslash is taking literally:
$p = "hello \
new line.";
I.E. the backslash character forms part of the string.
Is there a way to get the C behavior in PHP in this case?
Is it possible to simply concatenate your string like this:
$p = "hello " .
"new line.";
There's a few ways to do this in PHP that are similar, but no way to do it with a continuation terminator.
For starters, you can continue your string on the next line without using any particular character. The following is valid and legal in PHP.
$foo = 'hello there two line
string';
$foo = 'hello there two line
string';
The second example should one of the drawbacks of this approach. Unless you left jusity the remaining lines, you're adding additional whitespace to your string.
The second approach is to use string concatenation
$foo = 'hell there two line'.
'string';
$foo = 'hell there two line'.
'string';
Both example above will result in the string being created the same, in other words there's no additional whitespace. The trade-off here is you need to perform a string concatenation, which isn't free (although with PHP's mutable strings and modern hardware, you can get away with a lot of concatenation before you start noticing performance hits)
Finally there's the HEREDOC format. Similar to the first option, HEREDOC will allow you to break your strings over multiple lines as well.
$foo = <<<TEST
I can go to town between the the start and end TEST modifiers.
Wooooo Hoooo. You can also drop $php_vars anywhere you'd like.
Oh yeah!
TEST;
You get the same problems of leading whitespace as you would with the first example, but some people find HEREDOC more readable.
In PHP you can simply continue on a new line with a string.
eg.
<?php
$var = 'this is
some text
in a var';
?>
The short answer is that you cannot do it as easily as in C, you need to either concatenate (best):
$p = "This is a long".
" non-multiline string";
or remove the newlines afterwards (awful, don't do this):
$p = "This will contain the newlines
before this line";
//for instance str_replace() can remove the newlines

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!
Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.
Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.
Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!
You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

Categories