PHP regex: zero or more whitespace not working - php

I'm trying to apply a regex constraint to a Symfony form input. The requirement for the input is that the start of the string and all commas must be followed by zero or more whitespace, then a # or # symbol, except when it's the empty string.
As far as I can tell, there is no way to tell the constraint to use preg_match_all instead of just preg_match, but it does have the ability to negate the match. So, I need a regular expression that preg_match will NOT MATCH for the given scenario: any string containing the start of the string or a comma, followed by zero or more whitespace, followed by any character that is not a # or # and is not the end of the string, but will match for everything else. Here are a few examples:
preg_match(..., ''); // No match
preg_match(..., '#yolo'); // No match
preg_match(..., '#yolo, #swag'); // No match
preg_match(..., '#yolo,#swag'); // No match
preg_match(..., '#yolo, #swag,'); // No match
preg_match(..., 'yolo'); // Match
preg_match(..., 'swag,#yolo'); // Match
preg_match(..., '#swag, yolo'); // Match
I would've thought for sure that /(^|,)\s*[^##]/ would work, but it's failing in every case with 1 or more spaces and it appears to be because of the asterisk. If I get rid of the asterisk, preg_match('/(^|,)\s[^##]/', '#yolo, #swag') does not match (as desired) when there's exactly once space, but as as soon as I reintroduce the asterisk it breaks for any quantity of spaces > 0.
My theory is that the regex engine is interpreting the second space as a character that is not in the character set [##], but that's just a theory and I don't know what to do about it. I know that I could create a custom constraint to use preg_match_all instead to get around this, but I'd like to avoid that if possible.

You may use
'~(?:^|,)\s*+[^##]~'
Here, the + symbol defines a *+ possessive quantifier matching 0 or more occurrences of whitespace chars, and disallowing the regex engine to backtrack into \s* pattern if [^##] cannot match the subsequent char.
See the regex demo.
Details
(?:^|,) - either start of string or ,
\s*+ - zero or more whitespace chars, possessively matched (i.e. if the next char is not matched with [^##] pattern, the whole pattern match will fail)
[^##] - a negated character class matching any char but # and #.

Related

Special Expression not allowed - Regular Expression in PHP

I am trying match my String to not allow the case: for example 150x150 from the image name below:
test-string-150x150.png
I am using the following pattern to match this String:
/^([^0-9x0-9]+)\..+/
It works fine, Except in such a case:
teststring.com-150x150.jpg
What i need to get - the mask must disallow only dimensions in the end of string, here is some examples:
test-string-150x150.png > must disallow
any-string.png > allow
200x200-test.png > allow
1x1.png-100x100.jpg > disallow
You could use a negative lookahead to assert that the string does not contain the sizes followed by a dot and 1+ word characters till the end of the string.
^(?!.*\d+x\d+\.\w+$).+$
Explanation
^ Start of string
(?! Negative lookahead, assert what is on the right is not
.* Match 0+ occurrences of any char except a newline
\d+x\d+ Match the sizes format, where \d+ means 1 or more digits
\.\w+$ Match a dot, 1+ word characters and assert the end of the string $
) Close lookahead
.+ Match 1+ occurrences of any char except a newline
$ End of string
Regex demo
If I understand your question, you're trying to find image names that do not include the image dimensions. If so, try this:
/^(?![\w-\.]+(\d+x\d+))[\w-\.]+\.\w+$/gm
For details about this code, please see regexr.com/4tmd1. This site is a great place to play around with regexes to make sure you're getting the results you expect.
Be aware that the exact syntax of the regular expression depends on the regex engine used by whatever program you're running.

Regex to match numbers only if alphabets are present

I require a regex to match the string in the following way:
#1234abc : Should get matched
#abc123 : Should get matched
#123abc123 : Should get matched
#123 : Should not get matched
#123_ : Should not get matched
#123abc_ : Should get matched
This implies that it should only get matched if the string contains numbers or underscore along with alphabets. Only numbers/underscore should not get matched. Any other special characters should not get matched either.
This regex is basically to get hashtags from string. I have already tried the following but it didn't worked well for me.
preg_match_all('/(?:^|\s)#([a-zA-Z0-9_]+$)/', $text, $matches);
Please suggest something.
If you need to match hashtags in the format you specified in a larger string, use
(?<!\S)#\w*[a-zA-Z]\w*
See the regex demo
Details:
(?<!\S) - there must be a start of string or a whitespace before
# - a hash symbol
\w* - 0+ word chars (that is, letters, digits or underscore)
[a-zA-Z] - a letter (you may use \p{L} instead)
\w* - 0+ word chars.
Other alternatives (that may appear faster, but are a bit more complex):
(?<!\S)#(?![0-9_]+\b)\w+
(?<!\S)#(?=\w*[a-zA-Z])\w+
The point here is that the pattern basically matches 1+ word chars preceded with # that is either at the string start or after whitespace, but (?![0-9_]+\b) negative lookahead fails all matches where the part after # is all digits/underscores, and the (?=\w*[a-zA-Z]) positive lookahead requires that there should be at least 1 ASCII letter after 0+ word chars.
You can use this Regex:
((.*?(\d+)[a-zA-Z]+.*)|(.*[a-zA-Z]+(\d+).*)).
Access it here: http://regexr.com/3ef6q
see it working:
Do:
^(?=.*[A-Za-z])[\w_]+$
[\w_]+ matches one or more of letters, digits, _
The zero width positive lookahead pattern, (?=.*[A-Za-z]), makes sure the match contains at least one letter
Demo

PHP Regex display either abc or abc xyz format

I am trying to build regex for the expression to get values for either Boost Mobile or BoostMobile whichever is present.
Any suggestions please ?
In NFA regexes, in unanchored alternation groups, the first branch matched stops the group processing, the other branches located further on the right are not checked against the string. You may read more on that at Alternation with The Vertical Bar or Pipe Symbol.
So, swapping the values and simplifying the pattern you could use
/\b(Boost \s*Mobile|Boost)\b/i
However, the most effective way here is through using an optional group:
/\bBoost(?:\s*Mobile)?\b/i
^^ ^^
See the regex demo
The i case insensitive modifier is set on the whole regex. You need not switch it on and off at the beginning/end of the pattern. Also, \W* can match an empty string, so your way of checking a word boundary may fail here when \b will work.
Pattern details:
\b - leading word boundary
Boost - a literal substring
(?:\s*Mobile)? - an optional group matching 1 or 0 sequences of
\s* - 0+ whitespaces
Mobile - a literal substring
\b - trailing word boundary

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

Regular Expression to match ([^>(),]+) but include some \w's in it?

I'm using php's preg_replace function, and I have the following regex:
(?:[^>(),]+)
to match any characters but >(),. The problem is that I want to make sure that there is at least one letter in it (\w) and the match is not empty, how can I do that?
Is there a way to say what i DO WANT to match in the [^>(),]+ part?
You can add a lookahead assertion:
(?:(?=.*\p{L})[^>(),]+)
This makes sure that there will be at least one letter (\p{L}; \w also matches digits and underscores) somewhere in the string.
You don't really need the (?:...) non-capturing parentheses, though:
(?=.*\p{L})[^>(),]+
works just as well. Also, to ensure that we always match the entire string, it might be a good idea to surround the regex with anchors:
^(?=.*\p{L})[^>(),]+$
EDIT:
For the added requirement of not including surrounding whitespace in the match, things get a little more complicated. Try
^(?=.*\p{L})(\s*)((?:(?!\s*$)[^>(),])+)(\s*)$
In PHP, for example to replace all those strings we found with REPLACEMENT, leaving leading and trailing whitespace alone, this could look like this:
$result = preg_replace(
'/^ # Start of string
(?=.*\p{L}) # Assert that there is at least one letter
(\s*) # Match and capture optional leading whitespace (--> \1)
( # Match and capture... (--> \2)
(?: # ...at least one character of the following:
(?!\s*$) # (unless it is part of trailing whitespace)
[^>(),] # any character except >(),
)+ # End of repeating group
) # End of capturing group
(\s*) # Match and capture optional trailing whitespace (--> \3)
$ # End of string
/xu',
'\1REPLACEMENT\3', $subject);
You can just "insert" \w inside (?:[^>(),]+\w[^>(),]+). So it will have at least one letter and obviously not empty. BTW \w captures digits as well as letters. If you want only letters you can use unicode letter character class \p{L} instead of \w.
How about this:
(?:[^>(),]*\w[^>(),]*)

Categories