Range out of order in character class

Range out of order in character class - php

I'm getting this odd error in the preg_match() function:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 54
The line which is causing this is:
preg_match("/<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sEND-->/s", $fileData, $matches);
What this regular expression does is parse an HTML file, extracting only the part between:
<!--GSM PER NUMBER - 5550101 - START-->
and:
<!--GSM PER NUMBER - 5550101 - END-->
Do you have a hint about what could be causing this error?

Hi I got the same error and solved it:
Warning: preg_match(): Compilation failed: range out of order in character class at offset <N>
Research Phase:
.. Range out of order .. So there is a range defined which can't be used.
.. at offset N .. I had a quick look at my regex pattern. Position N was the "-". It's used to define ranges like "a-z" or "0-9" etc.
Solution
I simply escaped the "-".
\-
Now it is interpreted as the character "-" and not as range!

If $gsmNumber contains a square bracket, backslash or various other special characters it might trigger this error. If that's possible, you might want to validate that to make sure it actually is a number before this point.
Edit 2016:
There exists a PHP function that can escape special characters inside regular expressions: preg_quote().
Use it like this:
preg_match(
'/<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sEND-->/s', $fileData, $matches);
Obviously in this case because you've used the same string twice you could assign the quoted version to a variable first and re-use that.

This error is caused for an incorrect range. For example: 9-0 a-Z
To correct this, you must change 9-0 to 0-9 and a-Z to a-zA-Z
In your case you are not escaping the character "-", and then, preg_match try to parse the regex and fail with an incorrect range.
Escape the "-" and it must solve your problem.

I was receiving this error with the following sequence:
[/-.]
Simply moving the . to the beginning fixed the problem:
[./-]

While the other answers are correct, I'm surprised to see that no-one has suggested escaping the variable with preg_quote() before using it in a regex. So if you're looking to match an actual bracket or anything else that means something in regex, that'll be converted to a literal token:
$escaped = preg_quote($gsmNumber);
preg_match( '/<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sEND-->/s', $fileData, $matches);

You probably have people insert mobile numbers including +, -, ( and/or ) characters and just use these as is in your preg_match, so you might want to sanitize the data provided before using it (ie. by stripping these characters out completely).

This is a bug in several versions of PHP, as I have just verified for the current 5.3.5 version, as packaged with XAMPP 1.7.4 on Windows XP home edition.
Even some very simple examples exhibit the problem, e.g.,
$pattern = '/^[\w_-. ]+$/';
$uid = 'guest';
if (preg_match($pattern, $uid)) echo
("<style> p { text-decoration:line-through } </style>");
The PHP folks have known about the bug since 1/10/2010.
See http://pear.php.net/bugs/bug.php?id=18182.
The bug is marked "closed" yet persists.

Related

Symfony2 url validation : "preg_match(): Compilation failed: range out of order in character class" [duplicate]

I'm getting this odd error in the preg_match() function:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 54
The line which is causing this is:
preg_match("/<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sEND-->/s", $fileData, $matches);
What this regular expression does is parse an HTML file, extracting only the part between:
<!--GSM PER NUMBER - 5550101 - START-->
and:
<!--GSM PER NUMBER - 5550101 - END-->
Do you have a hint about what could be causing this error?

Hi I got the same error and solved it:
Warning: preg_match(): Compilation failed: range out of order in character class at offset <N>
Research Phase:
.. Range out of order .. So there is a range defined which can't be used.
.. at offset N .. I had a quick look at my regex pattern. Position N was the "-". It's used to define ranges like "a-z" or "0-9" etc.
Solution
I simply escaped the "-".
\-
Now it is interpreted as the character "-" and not as range!

If $gsmNumber contains a square bracket, backslash or various other special characters it might trigger this error. If that's possible, you might want to validate that to make sure it actually is a number before this point.
Edit 2016:
There exists a PHP function that can escape special characters inside regular expressions: preg_quote().
Use it like this:
preg_match(
'/<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sEND-->/s', $fileData, $matches);
Obviously in this case because you've used the same string twice you could assign the quoted version to a variable first and re-use that.

This error is caused for an incorrect range. For example: 9-0 a-Z
To correct this, you must change 9-0 to 0-9 and a-Z to a-zA-Z
In your case you are not escaping the character "-", and then, preg_match try to parse the regex and fail with an incorrect range.
Escape the "-" and it must solve your problem.

I was receiving this error with the following sequence:
[/-.]
Simply moving the . to the beginning fixed the problem:
[./-]

While the other answers are correct, I'm surprised to see that no-one has suggested escaping the variable with preg_quote() before using it in a regex. So if you're looking to match an actual bracket or anything else that means something in regex, that'll be converted to a literal token:
$escaped = preg_quote($gsmNumber);
preg_match( '/<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sEND-->/s', $fileData, $matches);

You probably have people insert mobile numbers including +, -, ( and/or ) characters and just use these as is in your preg_match, so you might want to sanitize the data provided before using it (ie. by stripping these characters out completely).

This is a bug in several versions of PHP, as I have just verified for the current 5.3.5 version, as packaged with XAMPP 1.7.4 on Windows XP home edition.
Even some very simple examples exhibit the problem, e.g.,
$pattern = '/^[\w_-. ]+$/';
$uid = 'guest';
if (preg_match($pattern, $uid)) echo
("<style> p { text-decoration:line-through } </style>");
The PHP folks have known about the bug since 1/10/2010.
See http://pear.php.net/bugs/bug.php?id=18182.
The bug is marked "closed" yet persists.

Unexpected ] error in simple preg replace script [duplicate]

This question already has answers here:
preg_match() Unknown modifier '[' help
(2 answers)
Closed 8 years ago.
I have a script that downloads the latest newsletter from a group inbox on a spare touchscreen in our office. It works fine, but people keep accidentally unsubscribing us so I want to hide the unsubscribe link from the email.
$preg_replace seems like it would work because I can set up a pattern that simply removes any link withthe word "unsubscribe" in. I validated the pattern below using the tool at http://regex101.com/ , and it even picks up variations like "manage subscription" as well. It is ok if the odd legitimate link with the word subscribe also get removed - there won't be many and it's only for internal use.
However, when I execute I get an error.
Here's my code:
line 53: $pat='<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>';
line 54: $themail[bodycontent]= preg_replace($pat, ' ',$themail[bodycontent]);
and I get this error:
preg_replace() [function.preg-replace]: Unknown modifier ']' in /home/trev/public_html/bigscreen/screen-functions.php on line 54
It must be something really simple like an unescaped char but I have gone code blind and can't for the life of me see it.
How do I get this pattern:
<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>
to run in a simple php script?
Thanks

You haven't used any delimiters so it's treating the < character as the delimiter
Try something like this instead
$pat='#<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>#';

You have no delimiter. Or rather you do, but it's not the one you meant. PCRE is interpreting your first < as the opening delimiter (you can use matching brackets as delimiters - in fact, I use parentheses to help remind myself that the entire match is index 0). Then it sees the first > as the ending delimiter. Anything after that should be a modifier, but of course ] is not a modifier.
Wrap your regex with (...) to give it a proper set of delimiters.

$themail[bodycontent] should be either $themail['bodycontent'] or $themail[$bodycontent].
It's trying to parse bodycontent] ... as the array index.

Patterns used in preg_match need to be enclosed by a pair of delimiter characters.
For example, a / or a ~ at the start and end of the string.
Anything outside of these delimiters at the end of the string is considered to be a regex "modifier".
Your example doesn't have delimiters, so PHP is wrongly assuming that the < character is the delimiter. It therefore sees the next < character as the closing delimiter, and therefore, anything after that as a modifier. Obviously all that stuff is supposed to be inside the pattern and isn't valid as modifiers, which is why PHP is complaining.
Solution: Add a pair of modifier characters:
$pat='~<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>~';
^ ^
add this ...and this
(it doesn't have to be ~, you can choose your own modifier character to suit your needs. Best one to use is one that doesn't occur in your string (although you can escape it if it does)

Starting and ending of pattern with slash /
$pat='/<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>/';

Trying To Find Forward Slash In Preg_Match

I've been searching for hours trying to find a solution to this. I am trying to determing if the REQUEST URI is legit and break it down from there.
$samplerequesturi = "/variable/12345678910";
To determine if it is legit, the first section variable is only letters and is variable in length. The second section is numbers, which should have 11 total. My problem is escaping the forward slash so it is matched in the uri. I've tried:
preg_match("/^[\/]{1}[a-z][\/]{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^[\\/]{1}[a-z][\\/]{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^#/#{1}[a-z]#/#{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^|/|{1}[a-z]|/|{1}[0-9]{11}+$/", $samplerequesturi)
Among others which I can't remember now.
The request usually errors out:
preg_match(): Unknown modifier '|'
preg_match(): Unknown modifier '#'
preg_match(): Unknown modifier '['
Edit:
I guess I should state that the REQUEST URI is already known. I'm trying to prove the whole string to make sure it isn't a bogus string ie to make sure there the 1st set is only lower case letters, and the 2nd set is only 11 numbers.

/ is not the only thing you can use as a delimiter. In fact, you can use almost any non-slphanumeric character. Personally I like to use () because it reminds me that the first item of the result array is the entire match and it also never needs escaping in the pattern.
preg_match("(^/([a-z]+)/(\d+)$)i",$samplerequesturi,$out);
var_dump($out);
That should do it.

If you want to use regex (which I don't think is necessary in this case, simply splitting on "/" should be fine:
$samplerequesturi = "/variable/12345678910";
preg_match("#^/([A-Za-z]+)/(\d+)$#", $samplerequesturi, $out);
echo $out[1];
echo $out[2];
should get you going

Your problem may be that you are using the / forward-slash as a regex delimiter (at the start and end of the regex expression). Switch to using a character other than the forward-slash, such as a # hash symbol or any other symbol which will never need to appear in this particular expression. Then you won't need to escape the forward-slash character at all in the expression.

regular expression to validate URL not working correctly in PHP

I am using a regular expression to validate URL. This expression works very well in JavaScript, But in PHP it gives me this error
A PHP Error was encountered
Severity: Warning
Message: preg_match() [function.preg-match]: Unknown modifier '('
Filename: home/auth.php
Line Number: 1596
A PHP Error was encountered
Severity: Warning
Message: preg_match() [function.preg-match]: Unknown modifier '('
Filename: home/auth.php
Line Number: 1601
This is my expression
$pattern ="/^(http|https|ftp)\:\/\/www\.([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*(\.){1}((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$/";
This is the php function
public function valid_url($data)
{
$data = trim($data);
if(!$data)
{
return TRUE;
}
$pattern ="/^(http|https|ftp)\:\/\/www\.([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*(\.){1}((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$/";
$valid = preg_match($pattern,$data);
if(!$valid)
{
$data = "http://".$data;
$valid = preg_match($pattern,$data);
}
if(!$valid)
{
$this->form_validation->set_message('valid_url', 'Please enter a valid URL.');
return FALSE;
}
else
{
return TRUE;
}
}
I am not very good at regular expressions so I could not figure out the issue, please help me correct the regular expression.

Wow, that is a big expression. I found several faults in it, and I shall hopefully explain them to you. Let's break it apart:
$pattern ="/
Here was your first mistake. As a forward slash is used in multiple sections of a url, you should use a different delimiter. I would suggest a tilde ~, as this is not used in a url very often. This would mean you don't have to keep escaping the forward slash every where with \/.
^(http|https|ftp)\:\/\/www\.([a-zA-Z0-9\.\-]+
This character class contains the next error. Within a character class, a dot just means a dot. There is no need to escape it. Furthermore, with placing the dash at the end, it also does not need escaping as it cannot possibly mean a range. The character class can be shortened to become [a-zA-Z0-9.-]+.
(\:[a-zA-Z0-9\.&%\$\-]+
Here we have the next error, & within the character class. This will match an & or an a or an m or a ;, not just an &. You don't need to convert it to the html code as doing so will mean to match any of the characters that the code contains. And using the previous knowledge, you don't need to escape the dot, or the dash if it is at the end. You also don't need to escape the dollar sign, as in a character class it just means a dollar. Remember, within a character class, all meta characters are just standard characters except the caret ^, the backslash \, the closing square bracket ], the dash - (but this can be left if it's at the end), and whatever you choose as your delimiter, e.g. tilde ~. This character class can then become, [a-zA-Z0-9.&%$-]+.
)*#)*(\.){1}
Part of this might be an error, it might not be. Basically, is there any need to capture the dot here? If there is not a need to capture it, leave the brackets alone. However, there is a definite error in the repetition. {1} is completely and utterly superfluous. Everything in there has to be repeated at least once. This is just making the code messy. The above can shortened into, )*#)*\..
((25[0-5]|2[0-4][0-9]|[0-1]{1}
Again, the {1} is not needed. Remove it, ((25[0-5]|2[0-4][0-9]|[0-1].
[0-9]{2}|[1-9]{1}[0-9]{1}
And again twice, this becomes [0-9]{2}|[1-9][0-9].
You keep doing this, the next block of code you have can be shortened:
|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])
Into
|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])
It's not amazingly better, but every little helps. Next:
|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+
The two character classes can be optimized, |([a-zA-Z0-9-]+\.)*[a-zA-Z0-9-]+.
\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2})
This is very restrictive, but I assume you have it like this for a reason so I'll leave it.
)(\:[0-9]+)*(/
And here is the cause of your error. You did not escape the forward slash. However, I am going to leave it as using a different delimiter would avoid this and also tidy up your pattern.
($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$/";
That character class can be greatly shortened now knowing that we don't need to escape everything within them. It can become, ($|[a-zA-Z0-9.,?'\\+&%$#=~_-]+))*$/";.
Using everything we now know your pattern can be made much prettier and easier to handle.
It can become instead:
$pattern = "~^(http|https|ftp)://www\.([a-zA-Z0-9.-]+(:[a-zA-Z0-9.&%$-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])|([a-zA-Z0-9-]+\.)+(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(:[0-9]+)*(/($|[a-zA-Z0-9.,?'\\+&%$#=\~_-]+))*$~";
Now that you have a smaller expression, finding faults and more customization should be a little easier.
Just a quick note
I keep noticing that you have used the following syntax at the beginning of some groupings, (\:. I have removed the backslash as it is not needed for a colon. However, were you trying to make it so the group was not captured? If so, the syntax for that is, (?:.
Edit:: You can also optimize the pattern further by utilizing character classes
\d = [0-9]
\w = [a-zA-Z0-9_]
Adding i to the end of the last pattern delimiter turns case insensitivity on too. Which means, instead of writing [a-zA-Z] you can just write [a-z] instead.
Also, the http|https can just become https?
So you pattern could be shortened further too:
$pattern = "~^(https?|ftp)://www\.([a-z\d.-]+(:[a-z\d.&%$-]+)*#)*((25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|[1-9])\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|[1-9]|0)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|[1-9]|0)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]\d|\d)|([a-z\d-]+\.)+(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-z]{2}))(:\d+)*(/($|[\w.,?'\\+&%$#=\~-]+))*$~i";

I see one error:
[0-9]+)*(/($
to
[0-9]+)*(\/($
or to
[0-9]+)*(($
if the / is supposed to be an ender, which it's not supposed to be.
But seriously, is there no other way you can achieve this? This string is really hard to troubleshoot.

Why don't use standard php function filter_var?
http://lv.php.net/manual/ru/function.filter-var.php

Weird error using preg_match and unicode

if (preg_match('(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)', '2010/02/14/this-is-something'))
{
// do stuff
}
The above code works. However this one doesn't.
if (preg_match('/\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+/u', '2010/02/14/this-is-something'))
{
// do stuff
}
Maybe someone could shed some light as to why the one below doesn't work. This is the error that is being produced:
A PHP Error was encountered
Severity: Warning
Message: preg_match()
[function.preg-match]: Unknown
modifier '\'

Try this: (delimit the regex with ())
if (preg_match('#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#', '2010/02/14/this-is-something'))
{
// do stuff
}
Edited

The modifier u is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32.
Also as nvl observed, you are using / as the delimiter and you are not escaping the / present in the regex. So you'lll have to use:
/\p{Nd}{4}\/\p{Nd}{2}\/\p{Nd}{2}\/\p{L}+/u
To avoid this escaping you can use a different set of delimiters like:
#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#
or
#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#
As a tip, if your delimiter is present in your regex, its better to choose a different delimiter not found in the regex. This keeps the regex clean and short.

In the second regex you're using / as the regex delimiter, but you're also using it in the regex. The compiler is trying to interpret this part as a complete regex:
/\p{Nd}{4}/
It thinks the next character after the second / should be a modifier like 'u' or 'm', but it sees a backslash instead, so it throws that cryptic exception.
In the first regex you're using parentheses as regex delimiters; if you wanted to add the u modifier, you would put it after the closing paren:
'(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)u'
Although it's legal to use parentheses or other bracketing characters ({}, [], <>) as regex delimiters, it's not a good idea IMO. Most people prefer to use one of the less common punctuation characters. For example:
'~\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+~u'
'%\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+%u'
Of course, you could also escape the slashes in the regex with backslashes, but why bother?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Range out of order in character class - php

I was receiving this error with the following sequence: [/-.] Simply moving the . to the beginning fixed the problem: [./-]

You probably have people insert mobile numbers including +, -, ( and/or ) characters and just use these as is in your preg_match, so you might want to sanitize the data provided before using it (ie. by stripping these characters out completely).

Related

Symfony2 url validation : "preg_match(): Compilation failed: range out of order in character class" [duplicate]

Unexpected ] error in simple preg replace script [duplicate]

Trying To Find Forward Slash In Preg_Match

regular expression to validate URL not working correctly in PHP

Weird error using preg_match and unicode

Categories

Resources