I require a regex to match the string in the following way:
#1234abc : Should get matched
#abc123 : Should get matched
#123abc123 : Should get matched
#123 : Should not get matched
#123_ : Should not get matched
#123abc_ : Should get matched
This implies that it should only get matched if the string contains numbers or underscore along with alphabets. Only numbers/underscore should not get matched. Any other special characters should not get matched either.
This regex is basically to get hashtags from string. I have already tried the following but it didn't worked well for me.
preg_match_all('/(?:^|\s)#([a-zA-Z0-9_]+$)/', $text, $matches);
Please suggest something.
If you need to match hashtags in the format you specified in a larger string, use
(?<!\S)#\w*[a-zA-Z]\w*
See the regex demo
Details:
(?<!\S) - there must be a start of string or a whitespace before
# - a hash symbol
\w* - 0+ word chars (that is, letters, digits or underscore)
[a-zA-Z] - a letter (you may use \p{L} instead)
\w* - 0+ word chars.
Other alternatives (that may appear faster, but are a bit more complex):
(?<!\S)#(?![0-9_]+\b)\w+
(?<!\S)#(?=\w*[a-zA-Z])\w+
The point here is that the pattern basically matches 1+ word chars preceded with # that is either at the string start or after whitespace, but (?![0-9_]+\b) negative lookahead fails all matches where the part after # is all digits/underscores, and the (?=\w*[a-zA-Z]) positive lookahead requires that there should be at least 1 ASCII letter after 0+ word chars.
You can use this Regex:
((.*?(\d+)[a-zA-Z]+.*)|(.*[a-zA-Z]+(\d+).*)).
Access it here: http://regexr.com/3ef6q
see it working:
Do:
^(?=.*[A-Za-z])[\w_]+$
[\w_]+ matches one or more of letters, digits, _
The zero width positive lookahead pattern, (?=.*[A-Za-z]), makes sure the match contains at least one letter
Demo
Related
Trying to understand the negative lookaheads or positive. Basically I want to match everything that isn't in the capture group [a-zA-Z] and the literal string "b4g". So I would be left with just a-z and the b4g literal if it was in the string.
Given: $100b4gb$2000
It would match $100$200
I would do a regex replace all matches, so they would be replaced with ''.
Something like
preg_replace('/(?!.*b4g)[^a-zA-Z]+/', '', $subject);
I've tried this and can't get it to work
Matches everything but strips 4 from "b4g"
[^a-zA-Z]+
Can't get this lookahead to work either
(?!.*b4g)[^a-zA-Z]+
You can use
preg_replace('~b4g(*SKIP)(*F)|[^a-zA-Z]~', '', $text)
See the regex demo. Details:
b4g(*SKIP)(*F) - matches a b4g substring and omits it from the match, and the next search starts from the failure position
| - or
[^a-zA-Z] - any char other than an ASCII letter.
I need to check if a password match the following rules:
At least 8 characters (lenth)
One capital letter
One lower letter
One number
One special char
Can't contain '.' or '_' (tricky part)
For example:
Bft$ns2E => should match
H2od%^.,3 => should't match (notice the '.')
I tried this:
^(?=.*?[A-Z])(?=(.*[a-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W]){1,})(?!.*\s).{8,}$
That satisfy all rules, except the last one ( Can't contain '.' or '_'). Regex are always a pain for me and can't figure out how to do this.
Thanks to all!
Your regex is on the right track. I would use:
^(?=.*?[A-Z])(?=.*[a-z])(?=.*\d)(?=.*\W)(?!.*[._]).{8,}$
This pattern says to:
^
(?=.*?[A-Z]) assert capital letter
(?=.*[a-z]) assert lowercase letter
(?=.*\d) assert digit
(?=.*\W) assert non word/special character
(?!.*[._]) assert NO dot or underscore
.{8,} match a password of length 8 or greater
$
Using the lookaheads like this (?=(.*[a-z]){1,}), you can omit the group with the quantifier {1,} as asserting it once in the string is enough.
If you don't want to match a space . or _ you can use a negated character class to match 8 or more times excluding those characters.
Using a negated character class as well in the lookahead assertions prevents unnecessary backtracking.
^(?=[^A-Z\r\n]*[A-Z])(?=[^a-z\r\n]*[a-z])(?=[^\d\r\n]*\d)(?=\w*\W)[^\s._]{8,}$
The pattern matches:
^ Start of string
(?=[^A-Z\r\n]*[A-Z]) Assert a char A-Z
(?=[^a-z\r\n]*[a-z]) Assert a char a-z
(?=[^\d\r\n]*\d) Assert a digit
(?=\w*\W) Assert a non word char
[^\s._]{8,} Match 8+ times any char except a whitespace char . or -
$ End of string
Regex demo
I have already found helpful answers for a regex that matches twitter like username mentions in this answer and this answer
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9_]+)
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9-_]+)
However, I need to update this regex to also include usernames that has dots.
One or more dots are allowed in a username.
The username must not start or end with a dot.
No two consecutive dots are allowed.
Example of a matched string:
#valid.user.name
^^^^^^^^^^^^^^^^
Examples of non-matched strings:
#.user.name // starts with a dot
#user.name. // ends with a dot
#user..name // has two consecutive dots
You can use this refactored regex:
(?<=[^\w.-]|^)#([A-Za-z]+(?:\.\w+)*)$
RegEx Demo
RegEx Details:
(?<=[^\w.-]|^): Lookbehind to assert that we have start of line or any non-word, non-dot, non-hyphen character before current position
#: Match literal `#1
(: Start capture group
[A-Za-z]+: Match 1+ ASCII letters
(?:\.\w+)*: Match 0 or more instances of dot followed 1+ word characters
): End capture group
$: End
The (?<=^|(?<=[^a-zA-Z0-9-_\.])) is a positive lookbehind that requires a match to be at the start of the string or right after an alphanumeric, -, _, ., you may write it in a more compact way as (?<![\w.-]), a negative lookbehind.
Next, ([A-Za-z]+[A-Za-z0-9_]+) captures 1+ ASCII letters and then 1+ ASCII letters or/and underscores. You seem to make sure the first char is a letter, then any number of sequences of . and 1+ word chars are allowed, that is, you may use [A-Za-z]\w*(?:\.\w+)*.
As you do not want to match it if there is a . right after the expected match, you need to set a lookahead that will require a space or end of string, (?!\S).
So, combining it, you can use
'~(?<![\w.-])#([A-Za-z]\w*(?:\.\w+)*)(?!\S)~'
See the regex demo
Details
(?<![\w.-]) - no letters, digits, _, . and - immediately to the left of the current location are allowed
# - a # char
([A-Za-z]\w*(?:\.\w+)*) - Group 1:
[A-Za-z] - an ASCII letter
\w* - 0+ letters, digits, _
(?:\.\w+)* - 0+ sequences of
\. - dot
\w+ - 1+ letters, digits, _
(?!\S) - whitespace or end of string are required immediately to the right of the current location.
EDIT: Simpler version (same result)
^#[a-zA-Z](\.?[\w-]+)*$
Original
Another one:
^#[a-zA-Z][a-zA-Z_-]?(\.?[\w\d-]+){0,}$
^# starts with #
[a-zA-Z] first char
[a-zA-Z_-]? match a-zA-Z_- 0 or more times
( start group
\.? match . (optional)
[\w\d-]+ match a-zA-Z0-9-_ 1 or more times
) end group
{0,} repeat group 0 to infinite times
$ end
Tests
valid:
#validusername
#valid.user.name
#valid-user-name
#valid_user-name
#valid-user123_name
#a.valid-user123_name
not valid:
#-invalid.user
#_invalid.user
#1notvalid-user_123name33
#.user.name
#user.name.
#user..name
I'm trying to apply a regex constraint to a Symfony form input. The requirement for the input is that the start of the string and all commas must be followed by zero or more whitespace, then a # or # symbol, except when it's the empty string.
As far as I can tell, there is no way to tell the constraint to use preg_match_all instead of just preg_match, but it does have the ability to negate the match. So, I need a regular expression that preg_match will NOT MATCH for the given scenario: any string containing the start of the string or a comma, followed by zero or more whitespace, followed by any character that is not a # or # and is not the end of the string, but will match for everything else. Here are a few examples:
preg_match(..., ''); // No match
preg_match(..., '#yolo'); // No match
preg_match(..., '#yolo, #swag'); // No match
preg_match(..., '#yolo,#swag'); // No match
preg_match(..., '#yolo, #swag,'); // No match
preg_match(..., 'yolo'); // Match
preg_match(..., 'swag,#yolo'); // Match
preg_match(..., '#swag, yolo'); // Match
I would've thought for sure that /(^|,)\s*[^##]/ would work, but it's failing in every case with 1 or more spaces and it appears to be because of the asterisk. If I get rid of the asterisk, preg_match('/(^|,)\s[^##]/', '#yolo, #swag') does not match (as desired) when there's exactly once space, but as as soon as I reintroduce the asterisk it breaks for any quantity of spaces > 0.
My theory is that the regex engine is interpreting the second space as a character that is not in the character set [##], but that's just a theory and I don't know what to do about it. I know that I could create a custom constraint to use preg_match_all instead to get around this, but I'd like to avoid that if possible.
You may use
'~(?:^|,)\s*+[^##]~'
Here, the + symbol defines a *+ possessive quantifier matching 0 or more occurrences of whitespace chars, and disallowing the regex engine to backtrack into \s* pattern if [^##] cannot match the subsequent char.
See the regex demo.
Details
(?:^|,) - either start of string or ,
\s*+ - zero or more whitespace chars, possessively matched (i.e. if the next char is not matched with [^##] pattern, the whole pattern match will fail)
[^##] - a negated character class matching any char but # and #.
im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).