So I'm trying to check for match and if match, extract a variable name out of a string. The variable name should be preceded by "$" and cannot be escaped with "\", so for example "$name" should extract "name" and "\$name" or "name" shouldn't match. Heres the command:
$match = preg_match("/^(?<!\\)(\$.*)$/", $potential, $name);
I constructed and tested it using regex101.com and it works there, however, I'm getting an error from PHP saying
"preg_match(): Compilation failed: missing ) at offset 13 in ..."
and I have no clue what its referring to.
My thought is that you will need to escape certain characters to consume the regular expression in PHP
$match = preg_match('/^(?<!\\\\)(\$.*)$/', $potential, $name);
Edit: the backslash is the escape character in both Regex and PHP, you will need to doubly escape the slashes.
You've escaped a bracket:
preg_match('/^(?<!\\) <----HERE
FYI you can use several other delimiters to make your regex's more readable. Because so often we have slashes and escaped chars, then using '/' makes it hard to read. Consider using '#' or '~' or even '#' to increase readability.
Also reL your online regex tool of choice, it depends on which regular expression implementation (and version) the service uses, as to how accurate your results. I always use rubular.com (Uses PCRE) but for PHP you can use phpliveregex.com
Related
I've been searching for hours trying to find a solution to this. I am trying to determing if the REQUEST URI is legit and break it down from there.
$samplerequesturi = "/variable/12345678910";
To determine if it is legit, the first section variable is only letters and is variable in length. The second section is numbers, which should have 11 total. My problem is escaping the forward slash so it is matched in the uri. I've tried:
preg_match("/^[\/]{1}[a-z][\/]{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^[\\/]{1}[a-z][\\/]{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^#/#{1}[a-z]#/#{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^|/|{1}[a-z]|/|{1}[0-9]{11}+$/", $samplerequesturi)
Among others which I can't remember now.
The request usually errors out:
preg_match(): Unknown modifier '|'
preg_match(): Unknown modifier '#'
preg_match(): Unknown modifier '['
Edit:
I guess I should state that the REQUEST URI is already known. I'm trying to prove the whole string to make sure it isn't a bogus string ie to make sure there the 1st set is only lower case letters, and the 2nd set is only 11 numbers.
/ is not the only thing you can use as a delimiter. In fact, you can use almost any non-slphanumeric character. Personally I like to use () because it reminds me that the first item of the result array is the entire match and it also never needs escaping in the pattern.
preg_match("(^/([a-z]+)/(\d+)$)i",$samplerequesturi,$out);
var_dump($out);
That should do it.
If you want to use regex (which I don't think is necessary in this case, simply splitting on "/" should be fine:
$samplerequesturi = "/variable/12345678910";
preg_match("#^/([A-Za-z]+)/(\d+)$#", $samplerequesturi, $out);
echo $out[1];
echo $out[2];
should get you going
Your problem may be that you are using the / forward-slash as a regex delimiter (at the start and end of the regex expression). Switch to using a character other than the forward-slash, such as a # hash symbol or any other symbol which will never need to appear in this particular expression. Then you won't need to escape the forward-slash character at all in the expression.
It seems to be that the HTML5 spec (and therefore ECMA262) allows <input type="text" pattern="[0-9]/[0-9]" /> to match the string '0/0' even though the forward slash is not escaped. Web applications like Drupal would like to provide server-side validation for browsers that don't support HTML5 with something like:
<?php
preg_match('/^(' . $pattern . ')$/', $value);
?>
Unfortunately the string '[0-9]/[0-9]' is not a valid PRCE regex. It appears that most if not all HTML5-capable browser support both pattern="[0-9]/[0-9]" and pattern="[0-9]\/[0-9]" which begs the question - what can we use as a delimiter to run this pattern against Perl-style regex?
We've filed a bug report against the W3C spec but are the browsers wrong here? Does the HTML5 spec need to be clarified? Is there a workaround we can use in PHP?
It is a valid regex if you use # instead of / for the delimiter. Example:
preg_match('#^('.$pattern.')$#', $value);
I recomend using "\xFF" byte as pattern delimiter, because it is not allowed in UTF-8 string, so we can be sure it will not occur in the pattern. And because preg_match does not understand UTF-8, it will cause no trouble.
Example: preg_match("\xFF$pattern\$\xFFADmsu", $subject);
Please note ADmsu modifiers and adding $. The u modifier requires valid UTF-8 bytes only in the pattern, but not in delimiters around.
One of the problems with PCRE is that almost any delimiter is legal for the start and end markers, depending on what makes the rest of the escaping easier. So #foo# is legal, /foo/ is legal, !foo! is legal (I think), etc. Undelimited regex, I'd say, are extremely dangerous for exactly that reason. That sounds like an HTML5 spec bug that it doesn't specify.
Maybe in PHP, scan the string and pick a delimiter from a whitelist that is not present in the string? (Eg, if there's no / use that, if there is use #, if that's there use %, etc.)
I think chr(0) would work just fine. Edit: no. But chr(1) does work.
Just enclose it in brackets or parentheses (yes, that's strange!):
<?php
preg_match('(^' . $pattern . '$)', $value);
?>
The manual states that you can use all corresponding pairs: http://php.net/manual/en/regexp.reference.delimiters.php
Not easy at first, but it clearly deals with ANY character you may use in between. For example '(^(foo|bar)$)' works as the final regular expression: ^(foo|bar)$, without any potentially risky escapes.
Given that a PHP application (Drupal in this case) is generating the input field, it seems like a workaround would be to do something along the lines of:
$pattern = '[0-9]/[0-9]';
...
$cleanPattern = preg_replace('/\//', '\\/', $pattern);
preg_match('/' . $cleanPattern . '/', $subject, $matches);
I couldn't think of a case where this wouldn't work, with / being used as a literal in the expression.
The HTML5 spec defers to ECMA262 for the legal pattern specification:
If specified, the attribute's value must match the JavaScript Pattern production. [ECMA262]
Since there is BNF defined in ECMA262, a full parser (instead of using PCRE) seems like the safest approach.
You can also use T-Regx and let it choose the delimiter accordingly:
<?php
pattern("^($pattern)$")->match($value);
it adds whatever delimiter was not used in the pattern.
Here is the input I am searching:
\u003cspan class=\"prs\">email_address#me.com\u003c\/span>
Trying to just return email_address#me.com.
My regex class=\\"prs\\">(.*?)\\ returns "class=\"prs\">email_address#me.com\" in RegExp which is OK, I can work with that result.
But I can't get it to work in PHP.
$regex = "/class=\\\"prs\\\">(.*?)\\/";
Gives me an error "No ending delimiter"
Can someone please help?
Your original code:
$regex = "/class=\\\"prs\\\">(.*?)\\/";
The reason you get No ending delimiter is that although you are escaping the backslash prior to the closing forward slash, what you have done is escaped it in the context of the PHP string, not in the context of the regex engine.
So the PHP string escaping mechanism does its thing, and by the time the regex engine gets it, it will look like this:
/class=\"prs\">(.*?)\/
This means that the regular expression engine will see the backslash at the end of the expression as escaping the forward slash that you are intending to use to close the expression.
The usual PHP solution to this kind of thing is to switch to using single-quoted string instead of a double-quoted one, but this still won't work, as \\ is an escaped backslash in both single and double quoted strings.
What you need to do is double up the number of backslash characters at the end of your string, so your code needs to look like this:
$regex = "/class=\\\"prs\\\">(.*?)\\\\/";
The way to prove what it's doing is to print the contents of the $regex variable, so you can see what the string will look like to the regex engine. These kinds of errors are actually very hard to spot, but looking at the actual content of the string will help you spot them.
Hope that helps.
If you change to single quotes it should fix it
$regex = '/class=\\\"prs\\\">(.*?)\\/';
i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....
I get this warning from php after the change from split to preg_split for php 5.3 compatibility :
PHP Warning: preg_split(): Delimiter must not be alphanumeric or backslash
the php code is :
$statements = preg_split("\\s*;\\s*", $content);
How can I fix the regex to not use anymore \
Thanks!
The error is because you need a delimiter character around your regular expression.
$statements = preg_split("/\s*;\s*/", $content);
Although the question was tagged as answered two minutes after being asked, I'd like to add some information for the records.
Similar to the way strings are delimited by quotation marks, regular expressions in many languages, such as Perl or JavaScript, are delimited by forward slashes. This will lead to expressions looking like this:
/\s*;\s*/
This syntax also allows to specify modifiers:
/\s*;\s*/Ui
PHP's Perl-compatible regular expressions (aka preg_... functions) inherit this. However, PHP itself doesn't support this syntax so feeding preg_split() with /\s*;\s*/ would raise a parse error. Instead, you enclose it with quotes to build a regular string.
One more thing you must take into account is that PHP allows to change the delimiter. For instance, you can use this:
#\s*;\s*#Ui
What is it good for? It simplifies the use of forward slashes inside the expression since you don't need to escape them. Compare:
/^\/home\/.*$/i
#^/home/.*$#i
If you don't like delimiters, you can use T-Regx tool:
pattern("\\s*;\\s*")->split($content):
You can also use Pattern::of("\\s*;\\s*")->split()