I have an initial string of words, like:
abab sbs abc ffuuu qwerty uii onnl ghj
And I would like to be able to extract only the words that do not contain adjacently-repeating characters, so that the above string is returned as:
abc qwerty ghj
How to accomplish this task using Regular Expressions?
I guess the post is open again after a little rewording of the question.
This is moved from the comments, to the answer region.
A while ago I saw this style problem on a question about no duplicate characters
that encompased the entire string. I just translated it to word boundries.
#Michael J Mulligan did a test case for it (see comments).
The links:
'Working Regex test (regex101.com/r/bA2wB0/1) and a working PHP example (ideone.com/7ID8Ct)'
# For NO duplicate letters anywhere within word characters
# -----------------------------------------------------------
# \b(?!\w*(\w)\w*\1)\w+
\b # Word boundry
# Only word chars now
(?! # Lookahead assertion (like a true/false conditional)
# It doesn't matter if the assertion is negative or positive.
# In this section, the engine is forced to match if it can,
# it has no choice, it can't backtrack its way out of here.
\w*
( \w ) # (1), Pick a word char, any word char
\w*
\1 # Now it is here again
# Ok, the expression matched, time to check if the assertion is correct.
) # End assertion
\w+ # Its here now, looks like the assertion let us through
# The assert is that no duplicate word chars ahead,
# so free to match word chars 'en masse'
# For ONLY duplicate letters anywhere within word characters
# just do the inverse. In this case, the inverse is changing
# the lookahead assertion to positive (want duplicates).
# -----------------------------------------------------------
# \b(?=\w*(\w)\w*\1)\w+
Related
I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.
I 'm working on a regex to match the example multi-line input below. I have tried this pattern but it is not working:
(^[A].*\n.*\nZ.*)[^A]
What regex can I use to select texts BETWEEN two delimiters? The markers A and Z are case-sensitive and start of line.
Start Marker=A
Stop Marker=Z
---INPUT---
AThis is the first linea AA
Csecond line - today is a good day
ZC is the delimeter
AZThis A is the fourth line
Bravo
Delta blah blah's test
Echo test test
Z The end of the second match
AAnother match here - the third one
CZharlie test--
Omega test
Zend of the third match...
------------
---EXPECTED MATCHES-----
[1]
AThis is the first linea AA
Csecond line - today is a good day
ZC is the delimeter
[2]
AZThis A is the fourth line
Bravo
Delta blah blah's test
Echo test test
Z The end of the second match
[3]
AAnother match here - the third one
CZharlie test--
Omega test
Zend of the third match...
------------------------
Can anyone help me figure out the correct pattern?
Activating multiline and dotall modifiers you could have that expression:
/^A.*?^Z.*?$/ms
Key points:
Caret used with multiline (m) modifier ensures A or Z match only at the beginning of a line.
Dotall (s) modifier makes . match all characters including newline.
Use of non-greedy repetition (*?) to not go to far.
$ at the end allow to capture the whole line instead of just the character Z
Demo
Edit: Effectively with a double star, catastrophic backtracking is never that far. Let's build something stronger.
Add more contrast, be explicit, and be possessive!
I ended with /^A(?>[^Z]*(?>(?!^)Z[^Z]*)*)^Z.*$/gm.
Let's decompose :
^A # Match starts with an A at the beginning of a line.
(?> # Matches the following as atomic and never come back to it!
[^Z]* # Matches any non Z.
(?> # Nested atomic group.
(?!^)Z # Matches a Z if not at the beginning of a line.
[^Z]* # Matches any non Z.
)* # Repeat this atomic group as much as possible.
) # End of the atomic group.
^Z # Matches a line beginning Z.
.*$ # Matches any character until end of the line.
Note that we removed the single line flag which isn't needed anymore.
I have this text:
SU4R
C45G
G3HD
61U14XE7AR23 914K16W471LV V6SQ5V16LG91 24YL4HW956C3 UZ26J12K615V T741MH4N739W 31ST445G726H 621EH6VW7Q6M 55N629WJ945P 56TX2W6LC949 44DS765CF739 XC262HV1JZ6V 26YD4N1Y71F7 S4M3F1XeDC0D
I want to use preg_match to find specific type of code in that text, so I should contains:
4 or 12 characters
it should returs all elements
non case sensitive
letters and numbers
I've ended with this:
preg_match("/^(?=.*\d)(?=.*[A-Za-z])[0-9A-Za-z!##$%]{4,12}/", $input_line, $output_array);
But:
it only works at http://www.phpliveregex.com/p/byZ
it find only first 4 elements
why when I try to use this page https://www.functions-online.com/preg_match.html it shows only 1 match?
Using preg_match_all(), something like this probably works
http://www.phpliveregex.com/p/bz0
# '/(?<!\S)(?i:[a-z\d]{4}|[a-z\d]{12})(?!\S)/'
(?<! \S ) # whitespace boundary
(?i: # case insensitive cluster group
[a-z\d]{4} # 4 alnum
| # or
[a-z\d]{12} # 12 alnum
)
(?! \S ) # whitespace boundary
I am a serious newbie with regular expression so please disregard my mistakes. I need to be sure that several criteria in a string are met.
Requirements:
Have at most 5 words
Max of 256 characters
Word is considered 1 or more characters - no spaces
Shouldn't contain two consecutive spaces
Example:
Tree blows in the wind
1-Tree falls over
Failure Example:
Tree blows in the night sky
Tree breaks 2 limbs during night
Can this be done in one single expression or should it be broken up?
Validating for 2 spaces:
- /^\s\s$/
Max of 256 characters:
- /^[a-zA-Z0-9]{,256}$/
I am not sure how to test case for the 5 words and combine the other criteria that I impose. Can anyone help?
Test for word:
- /^\w{1,5}$
You can try this:
(?s)\A(?!.{257}|.*\s\s)\W*\w*(?:\W+\w+){0,4}\W*\z
pattern details:
(?s) # turn on the singleline mode: allow the dot to match newlines
\A # start of the string anchor
(?! # open a negative lookahead assertion: means not followed by
.{257} # 257 characters
| # OR
.*\s\s # two consecutive whitespaces
) # close the negative lookahead
\W* # optional non-word characters
\w* # optional word characters (nothing in your requirements forbids to have a string without words or an empty string)
(?: # open a non-capturing group
\W+ # non-word characters: words are obviously separated with non-word characters
\w+ # an other word
){0,4} # repeat the non-capturing group between zero and 4 times
\W* # optional non-word characters
\z # anchor for the end of the string
I have a string that looks like this:
[if-abc] 12345 [if-def] 67890 [/if][/if]
I have the following regex:
/\[if-([a-z0-9-]*)\]([^\[if]*?)\[\/if\]/s
This matches the inner brackets just like I want it to. However, when I replace the 67890 with text (ie. abcdef), it doesn't match it.
[if-abc] 12345 [if-def] abcdef [/if][/if]
I want to be able to match ANY characters, including line breaks, except for another opening bracket [if-.
This part doesn't work like you think it does:
[^\[if]
This will match a single character that is neither of [, i or f. Regardless of the combination. You can mimic the desired behavior using a negative lookahead though:
~\[if-([a-z0-9-]*)\]((?:(?!\[/?if).)*)\[/if\]~s
I've also included closing tags in the lookahead, as this avoid the ungreedy repetition (which is usually worse performance-wise). Plus, I've changed the delimiters, so that you don't have to escape the slash in the pattern.
So this is the interesting part ((?:(?!\[/?if).)*) explained:
( # capture the contents of the tag-pair
(?: # start a non-capturing group (the ?: are just a performance
# optimization). this group represents a single "allowed" character
(?! # negative lookahead - makes sure that the next character does not mark
# the start of either [if or [/if (the negative lookahead will cause
# the entire pattern to fail if its contents match)
\[/?if
# match [if or [/if
) # end of lookahead
. # consume/match any single character
)* # end of group - repeat 0 or more times
) # end of capturing group
Modifying a little results in:
/\[if-([a-z0-9-]+)\](.+?)(?=\[if)/s
Running it on [if-abc] 12345 [if-def] abcdef [/if][/if]
Results in a first match as: [if-abc] 12345
Your groups are: abc and 12345
And modifying even further:
/\[if-([a-z0-9-]+)\](.+?)(?=(?:\[\/?if))/s
matches both groups. Although the delimiter [/if] is not captured by either of these.
NOTE: Instead of matching the delimeters I used a lookahead ((?=)) in the regex to stop when the text ahead matches the lookahead.
Use a period to match any character.