php preg_replace wont accept outside reference - php

I have the following function which as you can see, replaces certain characters in a string with the pattern, yet it only works when I enter in the pattern as a string like in the first commented out line. I put an echo in there to test what was coming back and its as it should be so I dont know whats going on! Has anyone any clues?
private function check_string( $s )
{
//return preg_replace( '/[^a-z 0-9~%\.:_\\-()"]/i', '', $s );
// a-z 0-9~%\.:_\\-()"
echo $this->permitted_uri_chars;
// /[^a-z 0-9~%\.:_\\-()"]/i
$pattern = '/[^'. $this->permitted_uri_chars .']/i';
return preg_replace( $pattern, '', $s );
}
The error I get is
Message: preg_replace(): Compilation failed: range out of order in character class at offset 18
ANSWER
Thanks to Jason McCreary
$pattern = '/[^'. preg_quote($this->config->item('permitted_uri_chars'), '/') .']+/i';

It is working in the first example because you properly escaped characters for both PHP and the Regular Expression. (i.e. \\).
When using a string, you have only escaped for PHP. So when you use this string in your Regular Expression it is no longer escaped.
This is demonstrated by the following example:
echo '/[^a-z 0-9~%\.:_\\-()"]/i';
// becomes: /[^a-z 0-9~%\.:_\-()"]/i
A few options would be:
Double escape.
Avoid the Regular Expression escaping by placing the dash at the end: /[^a-z 0-9~%.:_()"-]/
Use preg_quote() if you're going to accept strings regular expression syntax.
Note: I'd encourage you to read about escaping inside character classes.

Related

Function preg_quote works incorrect?

Suppose I want to check input in order to allow Unicode letters and numbers plus configured symbols.
$allow_symbols = './*!#%&[]:,-_ ';
// $allow_symbols = '';
$pattern = '/^['.preg_quote($allow_symbols).'\p{L}\p{N}]+$/iu';
print $pattern."\n";
preg_match($pattern, '');
Sandbox is here: http://sandbox.onlinephpfunctions.com/code/b99a8f042695d1dc1528834d21e6eb6ad62972e6
I got
Warning</b>: preg_match(): Unknown modifier '\' in <b>[...][...]</b> on line <b>9</b>
The problem originates from $allow_symbols, if I override it with empty string as it commented out - nothing wrong happens. And when I past exactly printed pattern to https://www.phpliveregex.com/p/rxj it works fine.
So, what's the matter and how to deal with it?
preg_quote does not escape the regex's delimiter by default, because it can be any non-alphanumeric, non-backslash, non-whitespace character.
Set its second parameter ($delimiter) to also escape forward slashes:
$escaped_symbols = preg_quote($allow_symbols, '/');
$pattern = "/^[$escaped_symbols\p{L}\p{N}]+$/iu";
You can use T-Regx library which automatically choses delimiters and handles unsafe characters:
$allow_symbols = './*!#%&[]:,-_ ';
Pattern::prepare(['^[', [$allow_symbols], '\p{L}\p{N}]+$'], 'iu')->match('');

preg_replace PHP not working?

Why doesn't preg_replace return anything in this scenario? I've been trying to figure it out all night.
Here is the text contained within $postContent:
Test this. Here is a quote: [Quote]1[/Quote] Quote is now over.
Here is my code:
echo "Test I'm Here!!!";
$startQuotePos = strpos($postContent,'[Quote]')+7;
$endQuotePos = strpos($postContent,'[/Quote]');
$postStrLength = strlen($postContent);
$quotePostID = substr($postContent,$startQuotePos,($endQuotePos-$postStrLength));
$quotePattern = '[Quote]'.$quotePostID.'[/Quote]';
$newPCAQ = preg_replace($quotePattern,$quotePostID,$postContent);
echo "<br />$startQuotePos<br />$endQuotePos<br />$quotePostID<br />Qpattern:$quotePattern<br />PCAQ: $newPCAQ<br />";
This is my results:
Test I'm Here!!!
35
36
1
Qpattern:[Quote]1[/Quote]
PCAQ:
For preg_replace(), "[Quote]" matches a single character that is one of the following: q, u, o, t, or e.
If you want that preg_replace() finds the literal "[Quote]", you need to escape it as "\[Quote\]". preg_quote() is the function you should use: preg_quote("[Quote]").
Your code is also wrong because a regular expression is expected to start with a delimiter. In the preg_replace() call I am showing at the end of my answer, that is #, but you could use another character, as long as it doesn't appear in the regular expression, and it is used also at the end of the regular expression. (In my case, # is followed by a pattern modifier, and pattern modifiers are the only characters allowed after the pattern delimiter.)
If you are going to use preg_replace(), it doesn't make sense that you first find where "[Quote]" is. I would rather use the following code:
$newPCAQ = preg_replace('#\[Quote\](.+?)\[/Quote\]#i', '\1', $postContent);
I will explain the regular expression I am using:
The final '#i' is saying to preg_replace() to ignore the difference between lowercase, and uppercase characters; the string could contain "[QuOte]234[/QuOTE]", and that substring would match the regular expression the same.
I use a question mark in "(.+?)" to avoid ".+" is too greedy, and matches too much characters. without it, the regular expression could include in a single match a substring like "[Quote]234[/Quote] Other text [Quote]475[/Quote]" while this should be matched as two substrings: "[Quote]234[/Quote]", and "[Quote]475[/Quote]".
The '\1' string I am using as replacement string is saying to preg_replace() to use the string matched from the sub-group "(.+?)" as replacement. In other words, the call to preg_replace() is removing "[Quote]", and "[/Quote]" surrounding other text. (It doesn't replace "[/Quote]" that doesn't match with "[Quote]", such as in "[/Quote] Other text [Quote]".)
your regex must start & end with '/':
$quotePattern = '/[Quote]'.$quotePostID.'[/Quote]/';
The reason you don't see anything for the return value of preg_replace is because it has returned NULL (see the manual link for details). This is what preg_replace returns when an error occurs, which is what happened in your situation. The string value of NULL is a zero-length string. You can see this by using var_dump instead, which will tell you that preg_replace returned NULL.
Your regular expression is invalid and as such PHP will throw an E_WARNING level error of Warning: preg_replace(): Unknown modifier '['
There are a couple of reason for this. First, you need to specify an opening and closing delimiter for you regular expression as preg_* functions use PCRE style regular expression. Second, you want to also consider using preg_quote on your patter (sans the delimiter) to ensure it is escaped properly.
$postContent = "Test this. Here is a quote: [Quote]1[/Quote] Quote is now over.";
/* Specify a delimiter for your regular expression */
$delimiter = '#';
$startQuotePos = strpos($postContent,'[Quote]')+7;
$endQuotePos = strpos($postContent,'[/Quote]');
$postStrLength = strlen($postContent);
$quotePostID = substr($postContent,$startQuotePos,($endQuotePos-$postStrLength));
/* Make sure you use the delimiter in your pattern and escape it properly */
$quotePattern = $delimiter . preg_quote("[Quote]{$quotePostID}[/Quote]", $delimiter) . $delimiter;
$newPCAQ = preg_replace($quotePattern,$quotePostID,$postContent);
echo "<br />$startQuotePos<br />$endQuotePos<br />$quotePostID<br />Qpattern:$quotePattern<br />PCAQ: $newPCAQ<br />";
The output will be:
35
36
1
Qpattern:#[Quote]1[/Quote]#
PCAQ: Test this. Here is a quote: 1 Quote is now over.

Preg_Match returns unkown modifier 'C' error

I'm new to regular expressions and am trying to find a string using preg_match, here's my code:
$artist = $row['ARTIST'];
$bool = preg_match("/$artist/", $description, $match);
My error is:
Unknown modifier 'C' in ...
If anyone could tell me what I'm doing wrong I'd appreciate it, thanks.
You have to escape possible special characters in your variable:
$bool = preg_match('/' . preg_quote($artist, '/') . '/', $description, $match);
preg_quote() in the PHP Manual:
preg_quote() takes str and puts a backslash in front of every
character that is part of the regular expression syntax. This is
useful if you have a run-time string that you need to match in some
text and the string may contain special regex characters.
Hint: try echoing your $artist variable and you should see which character is causing the problem

how to use long regex strings in php

i have this regex string that i got from a website to pull emails from a file:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Ive tested it in regex buddy ( regex testing software ) and it works!
when i copy and paste the regex from regex buddy to my php file, i have to escape 2 " characters to make the regex form a valid string in php.
in php i use it like this:
$file = file_get_contents(/* URL TO GET */);
$email_pattern = "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])";
$matches = array();
if ( preg_match_all ( $email_pattern, $file, $matches ))
{
echo print_r($matches, true);
}
but i get this warning!?!?
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '#'
however this regex works in regex buddy?
Where am i going wrong???
2 things:
step 1:
You need to put delimiters ( the / before and after the regex, so that you may add modifier ):
$email_pattern = "/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/";
step2:
And as your in a PHP string, you'll need to escape all the special character ( like \ that must become \\ , and $ that would become \$ , etc... )
So the escape to include the regex in a PHP String should look like this:
(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|\\\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\\\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])
And you also have to escape /, as we use that caracter for the delimiter of the first step. So we need the regex to see \/, but as we express the regex in a php string, we will replace / by \\/
If I'm right -- usually I use regex buddy too to do the conversion with the PHP export tool, but now I don't have it so I've done it by hand-- it should give something LIKE this:
$email_pattern = '/(?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/';
I would also suggest that you put the string inside single quote.
I tried and...
Single quotes will give an error...
Use double quotes and the {} as delimiters // gives an error also

PHP's filter_var() Function Generating Warning

Does any body know why the filter_var() function below is generating the warning? Is there a limit on how many characters can be in a character class?
$regex = "/^[\w\041\042\043\044\045\046\047\050\051\052\053\054\055\056\057\072\073\074\075\076\077\100\133\134\135\136\140\173\174\175\176]*$/";
$string = "abc";
if(!filter_var($string, FILTER_VALIDATE_REGEXP, array("options" => array("regexp"=>$regex))))
{
echo "dirty";
}
else
{
echo "clean";
}
Warning: filter_var() [function.filter-var]: Unknown modifier ':'
Your regex is interpreted by PHP as this string :
string '/^[\w!"#$%&'()*+,-./:;<=>?#[\]^`{|}~]*$/' (length=40)
(use var_dump on $regex, and you'll get that)
Right in the middle of your regex, so, there is a slash ; as you are using a slash to delimit the regex (it's the first character of $regex), PHP thinks this slash in the middle is marking the end of the regex.
So, PHP thinks your regex is actually :
/^[\w!"#$%&'()*+,-./
Every character that comes after the ending slash are interpreted as modifiers.
And ':' is not a valid modifier.
You might want to escape the slash in the middle of the regex ;-)
As well as some other characters, btw...
A solution for that might be to use the preg_quote function.
Here is the current working regex:
/^[\w\041\042\043\044\045\046\047\050\051\052\053\054\134\055\056\134\057\072\073\074\075\076\077\100\133\134\134\134\135\134\136\140\173\174\175\176]*$/i

Categories