Regex Character Meaning - php

Im sure this has been posted before but I am having trouble locating an answer.
preg_match("/^[a-zA-Z0-9 -\.]{1,25}+$/i", ...
The regular expression above allows for all alphabetic characters, all numeric characters, and the following (,-,.). It also limits whatever string we are checking against to max 25 characters total. What I cannot understand is the purpose of +$/i. I can find most of those characters in documentation but do not understand why they are needed. The only one I cannot find any information on is i.
Edit: I suppose the $ ties into our use of the ^ character?
Edit2: Thanks to comments below it seems the i makes the expression case insensitive. Still looking for information in regards to the other characters.

The /i flag at the end of the regex makes the preceding pattern case insensitive. So actually, you could have just used this:
preg_match("/^[a-z0-9 -\.]{1,25}+$/i", ...
That is, in /i mode, preg_match will match a-z for both lowercase and uppercase letters, so you need only specify one range.

Related

Using Multiple Regular Expressions in PHP

I need to write a regular expression that will evaluate the following conditions:
2 consecutive lower case characters
at least 1 digit
at least 1 upper case character
2 consecutive identical punctuation characters
For example, the string 'aa1A!!' should match, as should '!!A1aa'.
I have written the following regular expression:
'/(?=([a-z]){2,})(?=[0-9])(?=[A-Z])(?=(\W)\1)/'
I have found each individual expression works, but I am struggling to put it all together. What am I missing?
First, your pattern must be anchored to be sure that lookaheads are only tested from the position at the start of string. Then, since your characters can be everywhere in the string, you need to start the subpatterns inside lookahead with .*.
\W is a character class for non-word characters (all that is not [A-Za-z0-9_] that includes spaces, control characters, accented letters...). IMO, \pP or [[:punct:]] are more appropriate.
/^(?=.*[a-z]{2})(?=.*[0-9])(?=.*[A-Z])(?=.*(\pP)\1)/
About the idea to make 4 patterns instead of 1, it looks like a good idea, it tastes like a good idea, but it's useless and slower. However, it can be interesting if you want to know what particular rule fails.

Regex Challenge - either ... or

I havent been able to figure this one out.
I need to match all those strings by matching whole and its surroundings underscores (in one regex statement):
whole_anything
anything_whole
anything_whole_anything
but it must NOT match this
anythingwholeanything
anything_wholeanything
anythingwhole_anything
That means... make a regex statement, that match phrase whole only if it has underscore before, after or both. Not if there are no underscores.
The following
preg_match("/(whole_|_whole_|_whole)/",string)
is not a solution ;)
2015/02/09 Edit: added conditions 5. and 6. for clarification
You could reduce the number of cases in the alternatives:
preg_match('/(_whole_?|whole_)/', $string);
If there's an underscore before, the underscore after is optional. But if there's no underscore before, the underscore after is required.
You can use a PHP variable to solve the problem of putting the word twice:
$word = preg_quote('whole');
preg_match("/(_{$word}_?|{$word}_)/", $string);
Another alternative. This way we check for the existence of a word boundary or _ both before and after whole, but we exclude the word whole by itself through a negative lookahead.
(?!\bwhole\b)((?:_|\b)whole(?:_|\b))
Regex Demo here.
You could exclude all alphanumeric characters prior to and after. Unfortunately you can't use \w because _ is considered a word character
([^a-zA-Z0-9])_?whole_?([^a-zA-Z0-9])
That will exclude alphanumeric before and after from matching, and the underscore in front, behind, or both, is optional. If none exist, it can't match because it can'be proceeded by a letter or number. You could change it to include special characters and the lot.

secured regular expression that restrict specific special characters

I tried to create regular expression with specification below
any alphabetic character (at least one)
any numeric character (at least one)
no spaces
accept all special characters (except ",;&|')
^(?=.*[0-9])(?=.*[a-z])(?!.*\s)((?!.*[",;&|'])|(?=(.*\W){1,}))(?!.*[",;&|'])$
This is the one I tried.
What I can do with this?
Question is still vague in nature, please provide some examples of accepted strings.
Just to get you started you can use:
character class in a negative lookahead
Don't forget start & end anchors:
Regex:
/^(?=.*?\d)(?=.*?[a-z])(?!.*?[ ",;&|']).+$/i
This regex will match 1 or more characters that are not one of ",;&|' and atleast one digit and a-z alpgabet is required.
Live Demo: http://www.rubular.com/r/nxdi79ZcRx
In PHP use it like this:
'/^(?=.*?\d)(?=.*?[a-z])(?!.*?[ ",;&|\']).+$/i'

Regex to find string containing special characters in text

I'm trying to formulate a regular expression that will allow me to find a string within a piece of text, if the string exists on its own i.e. not within another word (but surrounded by special characters is ok).
/\bword\b/i
The above regex works fine, and finds "word" in the text. The problem comes when the word I want to find is something like "c++". In this case it matches on any occurrence of the "c" character on it's own. I've tried escaping the "+" characters but it doesn't make any difference. I'm assuming because "+" is a non-word character, I'm possibly going down the wrong route and using word boundaries is not what I should be doing.
So I guess the question is, how can I use a regular expression to find a string in a piece of text, on it's own, and regardless of whether the string is alphanumeric or contains special characters. So in the following piece of text it should match on the 3 occurences of "c++":
c++
(c++)
perl/c++/assembly
But it should not match on the following:
maniac++
c++abc
This is intended so that my script can tell if a specific skill exists within a user's CV/resume. I'm using this with PHP's preg_match_all() function.
I've done a lot of searching but can't come up with a solution, hopefully someone with good regex knowledge can help.
Try this:
/(?<!\w)(c\+\+)(?!\w)/
The (?<!\w) is a negative lookbehind clause, meaning that a word character should not immediately precede your pattern. The (?!\w) part is negative lookahead, meaning that a word character should not immediately follow.
Hope this helps!

Why does this regex not validate in the same way in PHP?

when I try preg_match with the following expression: /.{0,5}/, it still matches string longer than 5 characters.
It does, however, work properly when trying in online regexp matcher
The site you reference, myregexp.com, is focussed on Java.
Java has a specific function for matching an exact pattern, without needing to use anchor characters. This is the function which myregexp.com uses.
In most other languages, in order to match an exact pattern, you would need to add the anchoring characters ^ and $ at the start and end of the pattern respectively, otherwise the regex assumes it only needs to find the matched pattern somewhere within the string, rather than the whole string being the match.
This means that without the anchors, your pattern will match any string, of any length, because whatever the string, it will contain within it somewhere a match for "zero to five of any character".
So in PHP, and Perl, and virtually any other language, you need your pattern to look like this:
/^.{0,5}$/
Having explained all that, I would make one final observation though: this specific pattern really doesn't need to be a regular expression -- you could achieve the same thing with strlen(). In addition, the dot character in regex may not work exactly as you expect: it typically matches almost any character; some characters, including new line characters, are excluded by default, so if your string contains five characters, but one of them is a new line, it will fail your regex when you might have expected it to pass. With this in mind, strlen() would be a safer option (or mb_strlen() if you expect to have unicode characters).
If you need to match any character in regex, and the default behaviour of the dot isn't good enough, there are two options: One is to add the s modifier at the end of the expression (ie it becomes /^.{0,5}$/s). The s modifier tells regex to include new line characters in the dot "any character" match.
The other option (which is useful for languages that don't support the s modifier) is to use an expression and its negative together in a character class - eg [\s\S] - instead of the dot. \s matches any white space character, and \S is a negative of \s, so any character not matched by \s. So together in a character class they match any character. It's more long winded and less readable than a dot, but in some languages it's the only way to be sure.
You can find out more about this here: http://www.regular-expressions.info/dot.html
Hope that helps.
You need to anchor it with ^$. These symbols match the beginning and end of the string respectively, so it must be 0-5 characters between the beginning and end. Leaving out the anchors will match anywhere in the string so it could be longer.
/^.{0,5}$/
For better readability, I would probably also enclose the . in (), but that's kind of subjective.
/^(.){0,5}$/

Categories