I have this text:
SU4R
C45G
G3HD
61U14XE7AR23 914K16W471LV V6SQ5V16LG91 24YL4HW956C3 UZ26J12K615V T741MH4N739W 31ST445G726H 621EH6VW7Q6M 55N629WJ945P 56TX2W6LC949 44DS765CF739 XC262HV1JZ6V 26YD4N1Y71F7 S4M3F1XeDC0D
I want to use preg_match to find specific type of code in that text, so I should contains:
4 or 12 characters
it should returs all elements
non case sensitive
letters and numbers
I've ended with this:
preg_match("/^(?=.*\d)(?=.*[A-Za-z])[0-9A-Za-z!##$%]{4,12}/", $input_line, $output_array);
But:
it only works at http://www.phpliveregex.com/p/byZ
it find only first 4 elements
why when I try to use this page https://www.functions-online.com/preg_match.html it shows only 1 match?
Using preg_match_all(), something like this probably works
http://www.phpliveregex.com/p/bz0
# '/(?<!\S)(?i:[a-z\d]{4}|[a-z\d]{12})(?!\S)/'
(?<! \S ) # whitespace boundary
(?i: # case insensitive cluster group
[a-z\d]{4} # 4 alnum
| # or
[a-z\d]{12} # 12 alnum
)
(?! \S ) # whitespace boundary
Related
I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.
Example text:
There is an unique news in itlogic.com. I was read it when Mrs.leafa is cooking.
I want to get output like this:
Array (
[0] There is an unique news in itlogic.com.
[1] I was read it when Mrs.leafa is cooking.
)
If I use explode() with '.' as the first parameter, itlogic.com and Mrs.leafa are separated.
I think preg_split is a good tool for this as there may or may not be a space after the dot, right?
$array = preg_split("/\.(?=\s|$)/m", $Text);
Explanation:
\. Match a period
(?=\s|$) Then assert a whitespace character or end of line afterwards
See here: Click on preg_split, http://www.phpliveregex.com/p/kdz
Update #2
Regex:
(?(DEFINE) # Construct a definition structure
(?<punc>[!?.]+) # Define `punc` group consisting of `.`, `?` and `!`
) # End of definition
\b # Match a word boundary position
(?> # Open a grouping (non-capturing) (a)
[a-z0-9] # Match a digit or a lower case letter
\w* # And any number of word characters
| # Or
[A-Z] # Match an upper case letter
\w{3,} # And word characters more than 3
(?= # Followed by
(?&punc) # Any number of `.`, `?` and `!` characters
) # End of positive lookahead
) # End of grouping (a)
(?&punc) # Match any number of `.`, `?` and `!` characters
\K\B\s* # Reset match, assert a NWB position + any number of whitespaces
Live demo
PHP code:
$str = 'There is an unique news in itlogic.com. I was read it when Mrs. leafa is cooking.';
print_r(preg_split($RE, $str, -1, PREG_SPLIT_NO_EMPTY));
Outputs:
Array
(
[0] => There is an unique news in itlogic.com.
[1] => I was read it when Mrs. leafa is cooking.
)
try this once
$s= explode('. ',$your_sentence);
I'm trying to write regex pattern to parse string with similar tags (3 chars) and those tags are retired in the string
ABC=TEXT 1 - HERE.. DEF=/TEXT 2: TEXT .. ZYX=TEXT 3 TEXT
When I use
#([A-Z]{3})=(.*)+#isU
I only get tags ABC, DEF, ... but didn't get content. How do I can get both?
I would like to get result with pairs tags and content
ABC
TEXT 1 - HERE..
DEF
/TEXT 2: TEXT ..
ZYX
TEXT 3 TEXT
Update: See my example at https://regex101.com/r/uI0fW4/1
You need to use a positive lookahead assertion.
([A-Z]{3})=(.*?)(?=[A-Z]{3}=|$)
DEMO
This ([A-Z]{3})=(.*)+ regex, specifically
this subexpression
(.*)+
tells the engine to overwrite capture group 1 as many times as it can.
On the last write, .* matched nothing because it can match nothing.
Thus that capture group is empty.
You could use this instead to get data in capture group 2.
# (\b[A-Z]{3})=((?:(?!\b[A-Z]{3}=).)*)
( \b [A-Z]{3} ) # (1)
=
( # (2 start)
(?:
(?! \b [A-Z]{3} = )
.
)*
) # (2 end)
I have an initial string of words, like:
abab sbs abc ffuuu qwerty uii onnl ghj
And I would like to be able to extract only the words that do not contain adjacently-repeating characters, so that the above string is returned as:
abc qwerty ghj
How to accomplish this task using Regular Expressions?
I guess the post is open again after a little rewording of the question.
This is moved from the comments, to the answer region.
A while ago I saw this style problem on a question about no duplicate characters
that encompased the entire string. I just translated it to word boundries.
#Michael J Mulligan did a test case for it (see comments).
The links:
'Working Regex test (regex101.com/r/bA2wB0/1) and a working PHP example (ideone.com/7ID8Ct)'
# For NO duplicate letters anywhere within word characters
# -----------------------------------------------------------
# \b(?!\w*(\w)\w*\1)\w+
\b # Word boundry
# Only word chars now
(?! # Lookahead assertion (like a true/false conditional)
# It doesn't matter if the assertion is negative or positive.
# In this section, the engine is forced to match if it can,
# it has no choice, it can't backtrack its way out of here.
\w*
( \w ) # (1), Pick a word char, any word char
\w*
\1 # Now it is here again
# Ok, the expression matched, time to check if the assertion is correct.
) # End assertion
\w+ # Its here now, looks like the assertion let us through
# The assert is that no duplicate word chars ahead,
# so free to match word chars 'en masse'
# For ONLY duplicate letters anywhere within word characters
# just do the inverse. In this case, the inverse is changing
# the lookahead assertion to positive (want duplicates).
# -----------------------------------------------------------
# \b(?=\w*(\w)\w*\1)\w+
I have a string that looks like this:
[if-abc] 12345 [if-def] 67890 [/if][/if]
I have the following regex:
/\[if-([a-z0-9-]*)\]([^\[if]*?)\[\/if\]/s
This matches the inner brackets just like I want it to. However, when I replace the 67890 with text (ie. abcdef), it doesn't match it.
[if-abc] 12345 [if-def] abcdef [/if][/if]
I want to be able to match ANY characters, including line breaks, except for another opening bracket [if-.
This part doesn't work like you think it does:
[^\[if]
This will match a single character that is neither of [, i or f. Regardless of the combination. You can mimic the desired behavior using a negative lookahead though:
~\[if-([a-z0-9-]*)\]((?:(?!\[/?if).)*)\[/if\]~s
I've also included closing tags in the lookahead, as this avoid the ungreedy repetition (which is usually worse performance-wise). Plus, I've changed the delimiters, so that you don't have to escape the slash in the pattern.
So this is the interesting part ((?:(?!\[/?if).)*) explained:
( # capture the contents of the tag-pair
(?: # start a non-capturing group (the ?: are just a performance
# optimization). this group represents a single "allowed" character
(?! # negative lookahead - makes sure that the next character does not mark
# the start of either [if or [/if (the negative lookahead will cause
# the entire pattern to fail if its contents match)
\[/?if
# match [if or [/if
) # end of lookahead
. # consume/match any single character
)* # end of group - repeat 0 or more times
) # end of capturing group
Modifying a little results in:
/\[if-([a-z0-9-]+)\](.+?)(?=\[if)/s
Running it on [if-abc] 12345 [if-def] abcdef [/if][/if]
Results in a first match as: [if-abc] 12345
Your groups are: abc and 12345
And modifying even further:
/\[if-([a-z0-9-]+)\](.+?)(?=(?:\[\/?if))/s
matches both groups. Although the delimiter [/if] is not captured by either of these.
NOTE: Instead of matching the delimeters I used a lookahead ((?=)) in the regex to stop when the text ahead matches the lookahead.
Use a period to match any character.