How to match required characters in random order using regular expression?

How to match required characters in random order using regular expression? - php

I need to match text which has #, #, and any number in it. The characters can be in random position as long as they are in the text. Given this input:
abc##d9
a9b#c#d
##abc#9
abc9d##
a#b#c#d
The regex should match the first 3 lines. Currently my regex is:
/#.*?#.*?[0-9]/
Which doesn't work since it will only match the three chars in sequence. How to match the three chars in random order?

Found one of this ugly regex, if you really must use one:
/(?=.*#)(?=.*#)(?=.*[0-9]).*/
http://jsfiddle.net/BP53f/2/
The regex is basically using what they call lookahead
http://www.regular-expressions.info/lookaround.html
A simple case from the link above is trying to match q, followed by u, by doing q(?=u), that's why it's called lookahead, it finds q followed by u ahead.
Let's take one of your valid case: a9b#c#d
The first lookahead is (?=.*#), which states: Match anything, followed by a #. So it does, which is the string a9b#c, then since the match from the lookahead must be discarded, the engine steps back to the start of the string, which is an a. Then it goes to
(?=.*#), which states: Match anything that is followed by #, then it finds it at a9b. etc. The difference between using lookahead and (a)(b)(c) is basically the stepping back.
From the link above:
Let's take one more look inside, to make sure you understand the
implications of the lookahead. Let's apply q(?=u)i to quit. I have
made the lookahead positive, and put a token after it. Again, q
matches q and u matches u. Again, the match from the lookahead must be
discarded, so the engine steps back from i in the string to u. The
lookahead was successful, so the engine continues with i. But i cannot
match u. So this match attempt fails. All remaining attempts will fail
as well, because there are no more q's in the string.
It is ugly because it's difficult to maintain... You basically have 3 different sub-regex inside the brackets.

Use separate expressions to make sure # and # are present. Once they are, remove them and match for the rest of the characters/digits.

Decided I better write this as an answer:
$text = "a9b#c#d";
$themAll = "##";
$themAny = "0123456789";
echo (strspn($themAll, $text)==strlen($themAll) && strpbrk($text, $themAny));
For maintenance and some (limited) extending this should be as easy as it gets, especially whth longer $themAll lists.

Related

PHP PCRE regex with multiple SKIP FAIL in a pattern

I have a simple string:
$string = '--#--%--%2B--';
I want to percent-encode all characters (inclusive the "lonely" %), except the - character and the triplets of the form %xy. So I wrote the following pattern alternatives:
$pattern1 = '/(?:[\-]+|%[A-Fa-f0-9]{2})(*SKIP)(*FAIL)|./us';
$pattern2 = '/(?:[\-]+)(*SKIP)(*FAIL)|(?:%[A-Fa-f0-9]{2})(*SKIP)(*FAIL)|./us';
Please notice the use of (multiple) (*SKIP)(*FAIL) and of (?:).
The result of matching and replacing is the same - and the correct one too:
--%23--%25--%2B--
I would like to ask:
Are the two patterns equivalent? If not, which one whould be the proper one to use for url-encoding? Could you please explain in few words, why?
Would you suggest other alternatives (implying backtracking control verbs), or are my patterns a good choice?
Can I apply only one (?:) around the whole (chosen) pattern, even if the (multiple) (*SKIP)(*FAIL) will be inside it?
I know that I request a little too much from you by asking more questions at once. Please accept my apology! Thank you very much.
P.S: I've tested with the following PHP code:
$result = preg_replace_callback($patternX, function($matches) {
return rawurlencode($matches[0]);
}, $string);
echo $result;

First of all, both the patterns leverage the SKIP-FAIL PCRE verb sequence that is quite a well-known "trick" to match some text and skip it. See How do (*SKIP) or (*F) work on regex? for some more details.
The two patterns yield the same results, (?:[\-]+|%[A-Fa-f0-9]{2})(*SKIP)(*FAIL) matches either [\-]+ or %[A-Fa-f0-9]{2} and then skips the match, and (?:[\-]+)(*SKIP)(*FAIL)|(?:%[A-Fa-f0-9]{2})(*SKIP)(*FAIL) first tries to match [\-]+ and skips it if found, and then tries to match %[A-Fa-f0-9]{2} and skips the match if it is found. The (?:...) non-capturing groups in the second pattern are redundant as there is no alternation inside and the groups are not quantified. You may use any number of (*SKIP)(*FAIL) in your pattern, just make sure you use them before the | to skip the relevant match.
SKIP-FAIL technique is used when you want to match some text in specific context, when a char should be skipped/"avoided" if it is preceded and followed with some chars, or when you need to "avoid" matching a whole sequence of chars, like in this scenario, thus, the SKIP-FAIL is good to use.

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.

A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.

You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.

In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

Why does this regex fail to work...any ideas?

I am faced with strings as follows:
start of line;
characters C, M, P, T, K, X, or Q;
3 more word characters;
any number of other characters except newline;
space;
possible M literal;
2 digits;
/;
possible M literal;
2 or 3 digits;
space.
I am nearly certain I have translated this into the following regex correctly but this line of PHP code still returns NULL when passed valid strings. Furthermore, when I test this regex with regexpal and the identical subject string, the correct result is returned. I'm pretty sure I'm having a problem with the pattern delimiter or the first 2 groups (start of line then character check). Any ideas? - Brandon
preg_match_all('&^(\C|\M|\P|\T|\K|\X|\Q)[A-Z0-9]{3}.*\sM?[0-9]{2}/M?[0-9]{2,3}\s&', $subject, $resultArr);

First, I would typically suggest using a more common pattern delimiter such as /, #, or ~. I personally would not actually use / here since you use that in the pattern. This is just preference though, & is totally valid.
Second, there is no need for backslashes along with the characters at the start of the line (you can also use a character class for these, which I find more readable). As shown, some of these do form valid escape sequences, so you are likely getting unpredictable behavior.
Third, I am guessing you want an ungreedy search (U pattern modifier after pattern). I find in most cases this is desired behavior when using .* somewhere in pattern. In this case, since you are using preg_match_all() a greedy search is particularly problematic, as it would match the first case where the first portion of your pattern matches along with the last case with the last part of the pattern matches with all other potential matches lumped into the .* portion of the pattern.
So this leaves us with something like this:
$pattern = '#^[CMPTKXQ][A-Z0-9]{3}.*\sM?[0-9]{2}/M?[0-9]{2,3}\s#U';
preg_match_all($pattern, $subject, $resultArr);

preg_replace with exceptions doesn't work for me

I got a "little" problem already...
I only want to replace some names with other names or something.
It works fine so far, but with some names I got a problem.
For example I want to replace "Cho" with "Cho'Gath",
but of course I don't want to replace "Cho'Gath" with "Cho'Gath'Gath".
So therefore I created this regular expression, and replace all "Cho"'s except of "Cho'Gath":
/\bCho(?!.+Gath)\b/i
This works and it doesn't replace "Cho'Gath", but it also doesn't replace "Cho Hello World Gath" ... that is my first problem!
The second one is follwing: I also want to replace all "Yi", but not "Master Yi", so I tried the same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This doesn't replace "Master Yi", okay. But it also doesn't replace "Yi", but it should do! (I also tried /\b(?!Master(**\s**))Yi\b/i but this also doesn't work)
So far I don't know what to do know... can anyone help me with that?

Your first problem is easily solved if you replace .+ with the actual character that you want to match (or not to match): ', but let's have a look at the second one, this is quite interesting:
I also want to replace all "Yi", but not "Master Yi", so I tried the
same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This is a negative lookahead on \b. The expression does match a single "Yi", but look what it does with "Master Yi":
Hello I am Master Yi
^
\b
This boundary is not followed by "Master" but followed by "Yi". So your expression also matches the "Yi" in this string.
The negative lookahead is quite pointless because it checks if the boundary that is directly followed by "Yi" (remember that a lookahead assertion just "looks ahead" without moving the pointer forward) is not directly followed by "Master". This is always the case.
You could use a lookbehind assertion instead, but only without the (anyways unnecessary) .+, because lookbehind assertions must have fixed lengths:
/\b(?<!Master )Yi\b/i
matches every "Yi" that is not preceded by "Master ".

For the first regex:
\bCho(?!.Gath)\b
For the second:
(?<!\bMaster )Yi\b
Your first regex had .+ in it, that is one character, one or more times; and as quantifiers are greedy by default, this swallows the whole input before reluctantly giving back to match the next token (G).
Your second regex used a negative lookahead, what you wanted was a negative lookbehind. That is, a position where the text before that position does not match.
And note that regexes in lookbehinds must be of finite length.

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?

In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)

I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to match required characters in random order using regular expression? - php

Use separate expressions to make sure # and # are present. Once they are, remove them and match for the rest of the characters/digits.

Decided I better write this as an answer: $text = "a9b#c#d"; $themAll = "##"; $themAny = "0123456789"; echo (strspn($themAll, $text)==strlen($themAll) && strpbrk($text, $themAny)); For maintenance and some (limited) extending this should be as easy as it gets, especially whth longer $themAll lists.

Related

PHP PCRE regex with multiple SKIP FAIL in a pattern

(PHP) How to find words beginning with a pattern and replace all of them?

Why does this regex fail to work...any ideas?

preg_replace with exceptions doesn't work for me

Matching ugly extra abbreviations and numbers in titles with PHP regex

Categories

Resources