I have a regular expression that is used to re-define a constant in php file using preg_match and input file is screened using htmlspecialchars
eg for
define('MEMBERSHIP', 'GOLD');
the following regex works
/define.*["e\']' . $constant . '["e;\'].*;/i
however it matches the last semi colon, works in most scenarios but fails in a case like the following
eg:
define("MEMBERSHIP", 'GOLD'); // membership subscription; empty means not in use.
notice the last semicolon, resulting in replaced code as
define("MEMBERSHIP", 'SILVER'); empty means not in use.
which breaks the code. tried the regex below but it didn't work for those with double quote
/define.*["e;\']' . $constant . '["e;\'][^;]*;/i
any idea how to fix this?
if you add a ? after the *, it will become greedy and take the smallest possible amount of characters. So try
/define.*?["e\']' . $constant . '["e;\'].*?;/i
to see if it does what you want.
Generally speaking, you should avoid using .s if you don't actually mean any character.
You can match corresponding quotes by using a backreference:
'/define[^"\']*(["\'])' . $constant . '\1[^;]*;/i'
Otherwise, the negated character class you have at the end is definitely the way to go.
Related
So I'm trying to check for match and if match, extract a variable name out of a string. The variable name should be preceded by "$" and cannot be escaped with "\", so for example "$name" should extract "name" and "\$name" or "name" shouldn't match. Heres the command:
$match = preg_match("/^(?<!\\)(\$.*)$/", $potential, $name);
I constructed and tested it using regex101.com and it works there, however, I'm getting an error from PHP saying
"preg_match(): Compilation failed: missing ) at offset 13 in ..."
and I have no clue what its referring to.
My thought is that you will need to escape certain characters to consume the regular expression in PHP
$match = preg_match('/^(?<!\\\\)(\$.*)$/', $potential, $name);
Edit: the backslash is the escape character in both Regex and PHP, you will need to doubly escape the slashes.
You've escaped a bracket:
preg_match('/^(?<!\\) <----HERE
FYI you can use several other delimiters to make your regex's more readable. Because so often we have slashes and escaped chars, then using '/' makes it hard to read. Consider using '#' or '~' or even '#' to increase readability.
Also reL your online regex tool of choice, it depends on which regular expression implementation (and version) the service uses, as to how accurate your results. I always use rubular.com (Uses PCRE) but for PHP you can use phpliveregex.com
I'm getting this odd error in the preg_match() function:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 54
The line which is causing this is:
preg_match("/<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sEND-->/s", $fileData, $matches);
What this regular expression does is parse an HTML file, extracting only the part between:
<!--GSM PER NUMBER - 5550101 - START-->
and:
<!--GSM PER NUMBER - 5550101 - END-->
Do you have a hint about what could be causing this error?
Hi I got the same error and solved it:
Warning: preg_match(): Compilation failed: range out of order in character class at offset <N>
Research Phase:
.. Range out of order .. So there is a range defined which can't be used.
.. at offset N .. I had a quick look at my regex pattern. Position N was the "-". It's used to define ranges like "a-z" or "0-9" etc.
Solution
I simply escaped the "-".
\-
Now it is interpreted as the character "-" and not as range!
If $gsmNumber contains a square bracket, backslash or various other special characters it might trigger this error. If that's possible, you might want to validate that to make sure it actually is a number before this point.
Edit 2016:
There exists a PHP function that can escape special characters inside regular expressions: preg_quote().
Use it like this:
preg_match(
'/<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sEND-->/s', $fileData, $matches);
Obviously in this case because you've used the same string twice you could assign the quoted version to a variable first and re-use that.
This error is caused for an incorrect range. For example: 9-0 a-Z
To correct this, you must change 9-0 to 0-9 and a-Z to a-zA-Z
In your case you are not escaping the character "-", and then, preg_match try to parse the regex and fail with an incorrect range.
Escape the "-" and it must solve your problem.
I was receiving this error with the following sequence:
[/-.]
Simply moving the . to the beginning fixed the problem:
[./-]
While the other answers are correct, I'm surprised to see that no-one has suggested escaping the variable with preg_quote() before using it in a regex. So if you're looking to match an actual bracket or anything else that means something in regex, that'll be converted to a literal token:
$escaped = preg_quote($gsmNumber);
preg_match( '/<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sEND-->/s', $fileData, $matches);
You probably have people insert mobile numbers including +, -, ( and/or ) characters and just use these as is in your preg_match, so you might want to sanitize the data provided before using it (ie. by stripping these characters out completely).
This is a bug in several versions of PHP, as I have just verified for the current 5.3.5 version, as packaged with XAMPP 1.7.4 on Windows XP home edition.
Even some very simple examples exhibit the problem, e.g.,
$pattern = '/^[\w_-. ]+$/';
$uid = 'guest';
if (preg_match($pattern, $uid)) echo
("<style> p { text-decoration:line-through } </style>");
The PHP folks have known about the bug since 1/10/2010.
See http://pear.php.net/bugs/bug.php?id=18182.
The bug is marked "closed" yet persists.
I'm getting this odd error in the preg_match() function:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 54
The line which is causing this is:
preg_match("/<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s$gsmNumber\s-\sEND-->/s", $fileData, $matches);
What this regular expression does is parse an HTML file, extracting only the part between:
<!--GSM PER NUMBER - 5550101 - START-->
and:
<!--GSM PER NUMBER - 5550101 - END-->
Do you have a hint about what could be causing this error?
Hi I got the same error and solved it:
Warning: preg_match(): Compilation failed: range out of order in character class at offset <N>
Research Phase:
.. Range out of order .. So there is a range defined which can't be used.
.. at offset N .. I had a quick look at my regex pattern. Position N was the "-". It's used to define ranges like "a-z" or "0-9" etc.
Solution
I simply escaped the "-".
\-
Now it is interpreted as the character "-" and not as range!
If $gsmNumber contains a square bracket, backslash or various other special characters it might trigger this error. If that's possible, you might want to validate that to make sure it actually is a number before this point.
Edit 2016:
There exists a PHP function that can escape special characters inside regular expressions: preg_quote().
Use it like this:
preg_match(
'/<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s' .
preg_quote($gsmNumber, '/') . '\s-\sEND-->/s', $fileData, $matches);
Obviously in this case because you've used the same string twice you could assign the quoted version to a variable first and re-use that.
This error is caused for an incorrect range. For example: 9-0 a-Z
To correct this, you must change 9-0 to 0-9 and a-Z to a-zA-Z
In your case you are not escaping the character "-", and then, preg_match try to parse the regex and fail with an incorrect range.
Escape the "-" and it must solve your problem.
I was receiving this error with the following sequence:
[/-.]
Simply moving the . to the beginning fixed the problem:
[./-]
While the other answers are correct, I'm surprised to see that no-one has suggested escaping the variable with preg_quote() before using it in a regex. So if you're looking to match an actual bracket or anything else that means something in regex, that'll be converted to a literal token:
$escaped = preg_quote($gsmNumber);
preg_match( '/<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sSTART-->(.*)<!--GSM\sPER\sNUMBER\s-\s'.$escaped.'\s-\sEND-->/s', $fileData, $matches);
You probably have people insert mobile numbers including +, -, ( and/or ) characters and just use these as is in your preg_match, so you might want to sanitize the data provided before using it (ie. by stripping these characters out completely).
This is a bug in several versions of PHP, as I have just verified for the current 5.3.5 version, as packaged with XAMPP 1.7.4 on Windows XP home edition.
Even some very simple examples exhibit the problem, e.g.,
$pattern = '/^[\w_-. ]+$/';
$uid = 'guest';
if (preg_match($pattern, $uid)) echo
("<style> p { text-decoration:line-through } </style>");
The PHP folks have known about the bug since 1/10/2010.
See http://pear.php.net/bugs/bug.php?id=18182.
The bug is marked "closed" yet persists.
I'm using PHP. I'm trying to get a Regex pattern to match everything between value=" and " i.e. Line 1 Line 2,...,to Line 4.
value="Line 1
Line 2
Line 3
Line 4"
I've tried /.*?/ but it doesn't seem to work.
I'd appreciate some help.
Thanks.
P.S. I'd just like to add, in response to some comments, that all strings between the first " and last " are acceptable. I'm just trying to find a way to get everything between the very first " and very last " even when there is a " in between. I hope this makes sense. Thanks.
Assuming the desired character is "double quote":
$pat = '/\"([^\"]*?)\"/'; // text between quotes excluding quotes
$value='"Line 1 Line 2 Line 3 Line 4"';
preg_match($pat, $value, $matches);
echo $matches[1]; // $matches[0] is string with the outer quotes
if you just want answer and not want specific regex,then you can use this:
<?php
$str='value="Line 1
Line 2
Line 3
Line 4"';
$need=explode("\"",$str);
var_dump($need[1]);
?>
/.*?/ has the effect to not match the new line characters. If you want to match them too, you need to use a regular expression like /([^"]*)/.
I agree with Josh K that a regular expression is not required in this case (especially if you know there will not be any apices apart the one to delimit the string). You could adopt the solution given by him as well.
If you must use regex:
if (preg_match('!"([^"]+)"!', $value, $m))
echo $m[1];
You need s pattern modifier. Something like: /value="(.*)"/s
I'm not a regex guru, but why not just explode it?
// Say $var contains this value="..." string
$arr = explode('value="');
$mid = explode('"', $arr[1]);
$fin = $mid[0]; // Contains what you're looking for.
The specification isn't clear, but you can try something like this:
/value="[^"]*"/
Explanation:
First, value=" is matched literally
Then, match [^"]*, i.e. anything but ", possibly spanning multiple lines
Lastly, match " literally
This does not allow " to appear between the "real" quotes, not even if it's escaped by e.g. preceding with a backslash.
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
References
regular-expressions.info/Examples - Programming Language Constructs - Strings
Has variations on different string patterns (e.g. allowing escaped quotes)
Related questions
Difference between .*? and .* for regex
As much as is practical, negated character class is always a better option than .*?
How can I match everything with a PHP regular expression? I tried: /[.\r\n]*/, but it isn't working. Any ideas? Thanks.
This is for a method I made for a PHP class to parse e-mails:
public function getHeader($headerName) {
preg_match('/[\r\n]' . $headerName . '[:][ ](.+)[\r\n][^ \t]/Uis', "\n" . ltrim($this->originalMessage), $matches); return preg_replace('/[\r\n]*/', '', $matches[1]);
}
/.*/s (see perl's docs). The s option means (quoting from that URL):
Treat string as single line. (Make . match a newline)
I assume, based on your inclusion of \n and \r above, that you want to match across multiple lines. In this case, use:
/.*/s
(note the explicit /s modifier, that is, change . to match any character whatsoever, even a newline, which it normally would not match.)
See http://www.perl.com/doc/manual/html/pod/perlre.html
Why do you want to match everything? There's no point in using it as a condition because it's always true. If you want to capture the text you don't need a regex to do it because you just use the entire string. If you're trying to get around taint-checking, then shame on you (and ask a separate question about doing that right).
Note that we have a bit of the XY Problem here. You have some task X in mind, and think Y is part of the solution. You ask about Y but never tell us X. It's hard to answer your real question when we don't know what you are trying to do. :)
What about /.*/s?
In a character class ( the [] ), . just means period.
Does /[\.\r\n]+/ do what you want?
This kludge has also worked for me before:
my $abstract_text = /Abstract:([\s\S]+?)\nReferences/m;
It's useful if you want to capture patterns with arbitrary text included or intervening between multiple captures.