PHP how to use a variable in match - php

I am using the blow code to see if my password contains special char which works fine. but I would like to be able to use a variable like $mySpecialChar instead of the "[\'^£$%&*()}{##~?><>,|=_+¬-]" string, I'm not sure if I can do that. Reason for that is because I want to be able to pull string from a datatable.
I've tried preg_match_all("/".$mySpecialChar."/"), but no luck.
$matches = array();
if (preg_match_all("/[\'^£$%&*()}{##~?><>,|=_+¬-]/", $pwd, $matches) > 0) {
foreach ($matches[0] as $match) { $specialcase += strlen($match); }
}

Make sure to escape any variables you put in a regular expression
preg_match_all('/'.preg_quote($mySpecialChar, '/').'/', $pwd, $matches);
preg_quote
string preg_quote ( string $str [, string $delimiter = NULL ] )
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Note that / is not a special regular expression character.
You have 5 or more special characters in there.
Of note is that last line in the quote above Note that / is not a special regular expression character. While not entirely necessary in your case (I don't see / in your variable string), If you put the second argument as the delimiter it will escape that too. If you pay close attention to what I put above, you will see that is exactly what I did, preg_quote($mySpecialChar, '/')
If you don't quote these, well it's anyone guess what it will do. you could get an error, you could get an empty capture group () you could match anything with the . etc. etc. AS you have it,
[\'^£$%&*()}{##~?><>,|=_+¬-]
This is a character set, so it will escape most of the stuff inside it, that's if that's intentional. If you had [^\'£$%&*()}{##~?><>,|=_+¬-] you would have a not (or negative) character set.
Seeing as you are using preg_match_all, and not preg_match, I can probably assume you don't want the character set. Otherwise why use preg_match_all
It should simply be, if you want to match everything in $mySpecialChar:
preg_match('/['.preg_quote($mySpecialChar, '/').']+', $pwd, $matches);
If you are just trying to match the stuff between the [....], I would still escape it as it doesn't matter, but if you put it in a database and have it start with ^ instead it will make a difference, or if you get the - between certain characters 0-9 for example it may make a difference. Escaping never hurts, just remove the [] when you save it and replace them as I have above.
The [ .... ]+ means 1 or more, the [ ... ]* means none or more. the [...]+
? means one or more non-gready etc. Then you should be able to use just [...]+ with preg_match which will give you a cleaner match then using [...] match one, with preg_match_all.
Most of the time \W (uppercase) will also match most symbols, basically that means [^a-zA-Z0-9_] or not a-Z, 0-9 and _

You could always just look for characters that AREN'T the basic ones:
preg_match_all('/[^0-9A-Za-z]/', $pwd, $matches)
Much shorter and just as effective.
You can easily put this in a string if you like:
$specialChars = '[^0-9A-Za-z]';
preg_match_all("/{$specialChars}/", $pwd, $matches);
Running this on the provided password will return an array in $matches which contains all of the special characters from the string. All you need to do in order to evaluate password complexity is look at the length of $pwd and how many entries are in $matches, as this tells you the number of special characters.

Related

Why does PHP's preg_quote escape unnecessary characters?

From http://php.net/manual/en/function.preg-quote.php:
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Note that / is not a special regular expression character.
} is unnecessary but I can understand why they'd include it for symmetry. E.g. the following code works:
$re = '/}{This is fine}{/';
preg_match($re, $re, $match);
var_dump($match);
The output is:
array(1) {
[0] =>
string(16) "}{This is fine}{"
}
Why do they include = ! < > :? As far as I can tell, they're only ever special after being introduced by another unescaped meta character, e.g. immediately after (?, both of which characters also get escaped. : can also be special inside character classes like so: [[:alpha:]], but all four brackets get escaped.
I think that the idea behind is to have a consistent behaviour.
The goal of preg_quote is to produce a literal string for a regex pattern. This means that no character in the returned string can be interpreted as something else than itself whatever the context, and the context can be a concatenation with an other part of the pattern.
If I write '/(?' . preg_quote('>') . 'abc)/', I expect that the > will not be interpreted as the > of an atomic group, and that the pattern returns an error.
If I write '/.{3' . preg_quote('}') . '/', I expect that the } will not be interpreted as the closing curly bracket of a quantifier, and that the pattern matches a string like 'a{3}', but not 'abc'.
You can easily build the same kind of examples for = ! < > : using lookahead assertions, named groups, non-capturing groups, or atomic groups.
The important is that the expected behaviour is always the same whatever the way or the context in which the function is used.
Well what happens if you're trying to write some code like this:
$lookahead = getUserInput(); // Not escaped
$results = preg_match('/abc(?' . $lookahead . ')/', $subject);
and the user gives the input !def? The answer is you get negative lookahead instead of regular lookahead. If you don't want to allow negative lookaheads, you're going to want to make sure that exclamation mark is escaped.

Finding match, removing the bits I don't want, and then putting it back in

I'm trying to parse thru a file and find a particular match, filter it in some way, and then print that data back into the file with some of the characters removed. I've been trying different things for a couple hours with preg slits and preg replace, but my regular express knowledge is limited so I haven't made much progress.
I have a large file that has many instances like this [something]{title:value}. I want to find everything between "[" and "}" and remove everything besides the "something" bit.
After that parts done I want to find everything between "{" and "}" on everything left like {title:value} and then remove everything besides the "value" part. I'm sure there is some simple method to do this, so even just a resource on how to get started would be helpful.
Not sure if I get your meaning right (and haven't touched PHP for months), what about this?
$matches = array();
preg_match_all("/\[(.*?)\]\{.*?:(.*?)\}/", $str, $matches);
$something = $matches[1]; // $something stores all texts in the "something" part
$value = $matches[2]; // $value stores all texts in the "value" part
Doc for preg_match_all
For the regex pattern \[(.*?)\]\{.*?:(.*?)\}:
We escapes all the [, ], { and } with a slash because these characters have a special meaning in regex, and need an escape for the literal character.
.*? is a lazy match all, which will match any character until the next character matches the next token. It is used instead of .* so that it won't match other symbols
(.*?) is a capturing group, getting what we need and PHP will put those matches in $matches array
So the entire thing is - match the [ character, then any string until getting the ] character and put it in capturing group 1, then ]{ characters, then any string until getting the : character (no capturing group because we don't care.), then match the : character, then any string until the } character and put it incapturing group 2.
You can do it in one shot:
$txt = preg_replace('~\[\K[^]]*(?=])|{[^:}]+:\K[^}]+(?=})~', '', $txt);
\K removes from match result all that have been matched on his left.
The lookahead (?=...) (followed by) performs a check but add nothing to the match result.

PCRE regex with lookahead and lookbehind always returns true

I’m trying to create a regex for form validation but it always returns true. The user must be able to add something like {user|2|S} as input but also use brackets if they are escaped with \.
This code checks for the left bracket { for now.
$regex = '/({(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)}))|[^{]|(?<=\\\){)*/';
if (preg_match($regex, $value)) {
return TRUE;
} else {
return FALSE;
}
A possible correct input would be:
Hello {user|1|S}, you have {amount|2|D2}
or
Hello {user|1|S}, you have {amount|2|D2} in \{the_bracket_bank\}
However, this should return false:
Hello {user|1|S}, you have {amount|2}
and this also:
Hello {user|1|S}, you have {amount|2|D2} in {the_bracket_bank}
A live example can be found here: http://regexr.com?37tpu Note that there is a \ in the lookbehind at the end, PHP was giving me error messages because I had to escape it an extra time in my code.
The main error is that you do not specify that the regex should match from the beginning to the of the checked string. Use the ^ and $ assertions.
I think you have to escape { and } in your regex as they have special meaning. Together they form a quantifier.
The (?<=\\\) is better written (?<=\\\\). The backslash has to be double escaped as it has special meaning in both single-quoted string and PCRE regex. Using \\\ works too, because if single-quoted string contains any escape sequence except \\ and \', it handles it as literal backslash and letter, therefore \) is taken literally. But explicitly escaping the backslash twice seems easier to read to me.
The regex should be
$regex = '/^(\{(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\}))|[^{]|(?<=\\\\)\{)*$/';
But notice that the look-around assertions are not necessary. This regex should do the job too:
$regex = '/^([^{]|\\\{|\{[a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\})*$/';
Any non-{ characters are matched by the first alternative. When a { is read, one of the remaining two alternatives is used. Either the pattern for the brace thing matches, or the regex engine backtracks one character and tries to match \{ character sequence. If it fails, both ways, it backtracks further till it reaches string start and fails completely.
Matching without lookbehind
You can make a regex for this without using lookbehind/lookaheads (which is usually recommended).
For example, if your requirement is that you can match any character but a { and a } unless it's preceded by a \. You can also say:
Match any character but a { and a } OR match a \{ or a \}. To match any character but a { and a } use:
[^{}]
To match a \{ use:
\\\{
One backslash is for escaping the { (which might not be necessary, depending on your regex compiler) and one backslash is for escaping the other backslash.
You would end up with this:
(?:
[^{}]
|
\\\{
|
\\\}
)+
I nicely formatted this regex so that it's readable. If you want to use it in your code like this make sure to use the [PCRE_EXTENDED][1] modifier.
Looks more of a job for a lookbehind to me:
/((?<!\\\\)\{[a-zA-Z0-9]+\|[0-9]+\|[SD][0-9]*\})/
However, the obfuscation factor is so high that I would rather recognize all bracketed strings and parse them later.

Regex match fails when adding special characters

Consider this,
$uri = '/post/search/foo';
$pattern = '~/post/search/[A-Za-z0-9_-]+(?=/|$)~i';
$matches = array();
preg_match($pattern, $uri, $matches);
print_r($matches); // Success
It works fine, since [A-Za-z0-9_-] belongs to foo. Since I'm writing a route plugin,
I want this to be abble to match special chars as well.
I imagine a regex pattern to be like this:
[A-Z0-9!##$%^&*()_+|\/?><~"№;:'*]+(?=/|$)
I've tried to escape each special character with a slash, and escape a whole pattern using preg_quote() with no luck - I always encounter compilation failures.
The question is, how a proper matching for A-Z0-9!##$%^&*()_+|\/?><~"№;:'* should be done?
Is there a reason you don't want to just use an ungreedy .?
As in:
'~/post/search/.+(?=/|$)~iU'
Escaping inside the character class is not difficult, only ^ (only at the first position), - (not at the first or last position), \ and [] are special characters there, and ' as string delimiter. And additionaly the regex delimiter.
You use ~ as regex delimiter and I think this is the critical point in your character class, because the delimiter is not escaped by default when using preg_quote().
So this should be working
[A-Z0-9!##$%^&*()_+|\/?><\~"№;:\'*]+(?=/|$)

Can you explain/simplify this regular expression (PCRE) in PHP?

preg_match('/.*MyString[ (\/]*([a-z0-9\.\-]*)/i', $contents, $matches);
I need to debug this one. I have a good idea of what it's doing but since I was never an expert at regular expressions I need your help.
Can you tell me what it does block by block (so I can learn)?
Does the syntax can be simplified (I think there is no need to escape the dot with a slash)?
The regexp...
'/.*MyString[ (\/]*([a-z0-9\.\-]*)/i'
.* matches any character zero or more times
MyString matches that string. But you are using case insensitive matching so the matched string will spell "mystring" by but with any capitalization
EDIT: (Thanks to Alan Moore) [ (\/]*. This matches any of the chars space ( or / repeated zero of more times. As Alan points out the final escape of / is to stop the / being treated as a regexp delimeter.
EDIT: The ( does not need escaping and neither does the . (thanks AlexV) because:
All non-alphanumeric characters other than \, -, ^ (at the start) and
the terminating ] are non-special in character classes, but it does no
harm if they are escaped.
-- http://www.php.net/manual/en/regexp.reference.character-classes.php
The hyphen, generally does need to be escaped, otherwise it will try to define a range. For example:
[A-Z] // matches all upper case letters of the aphabet
[A\-Z] // matches 'A', '-', and 'Z'
However, where the hyphen is at the end of the list you can get away with not escaping it (but always best to be in the habit of escaping it... I got caught out by this].
([a-z0-9\.\-]*) matches any string containing the characters a through z (note again this is effected by the case insensitive match), 0 through 9, a dot, a hyphen, repeated zero of more times. The surrounding () capture this string. This means that $matches[1] will contain the string matches by [a-z0-9\.\-]*. The brackets () tell preg_match to "capture" this string.
e.g.
<?php
$input = "aslghklfjMyString(james321-james.org)blahblahblah";
preg_match('/.*MyString[ (\/]*([a-z0-9.\-]*)/i', $input, $matches);
print_r($matches);
?>
outputs
Array
(
[0] => aslghklfjMyString(james321-james.org
[1] => james321-james.org
)
Note that because you use a case insensitive match...
$input = "aslghklfjmYsTrInG(james321898-james.org)blahblahblah";
Will also match and give the same answer in $matches[1]
Hope this helps....
Let's break this down step-by step, removing the explained parts from the expression.
"/.*MyString[ (\/]*([a-z0-9\.\-]*)/i"
Let's first strip the regex delimiters (/i at the end means it's case-insensitive):
".*MyString[ (\/]*([a-z0-9\.\-]*)"
Then we've got a wildcard lookahead (search for any symbol any number of times until we match the next statement.
"MyString[ (\/]*([a-z0-9\.\-]*)"
Then match 'MyString' literally, followed by any number (note the '*') of any of the following: ' ', '(', '/'. This is probably the error zone, you need to escape that '('. Try [ (/].
"([a-z0-9\.\-]*)"
Then we get a capture group for any number of any of the following: a-z literals, 0-9 digits, '.', or '-'.
That's pretty much all of it.

Categories