preg_replace with exceptions doesn't work for me

preg_replace with exceptions doesn't work for me - php

I got a "little" problem already...
I only want to replace some names with other names or something.
It works fine so far, but with some names I got a problem.
For example I want to replace "Cho" with "Cho'Gath",
but of course I don't want to replace "Cho'Gath" with "Cho'Gath'Gath".
So therefore I created this regular expression, and replace all "Cho"'s except of "Cho'Gath":
/\bCho(?!.+Gath)\b/i
This works and it doesn't replace "Cho'Gath", but it also doesn't replace "Cho Hello World Gath" ... that is my first problem!
The second one is follwing: I also want to replace all "Yi", but not "Master Yi", so I tried the same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This doesn't replace "Master Yi", okay. But it also doesn't replace "Yi", but it should do! (I also tried /\b(?!Master(**\s**))Yi\b/i but this also doesn't work)
So far I don't know what to do know... can anyone help me with that?

Your first problem is easily solved if you replace .+ with the actual character that you want to match (or not to match): ', but let's have a look at the second one, this is quite interesting:
I also want to replace all "Yi", but not "Master Yi", so I tried the
same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This is a negative lookahead on \b. The expression does match a single "Yi", but look what it does with "Master Yi":
Hello I am Master Yi
^
\b
This boundary is not followed by "Master" but followed by "Yi". So your expression also matches the "Yi" in this string.
The negative lookahead is quite pointless because it checks if the boundary that is directly followed by "Yi" (remember that a lookahead assertion just "looks ahead" without moving the pointer forward) is not directly followed by "Master". This is always the case.
You could use a lookbehind assertion instead, but only without the (anyways unnecessary) .+, because lookbehind assertions must have fixed lengths:
/\b(?<!Master )Yi\b/i
matches every "Yi" that is not preceded by "Master ".

For the first regex:
\bCho(?!.Gath)\b
For the second:
(?<!\bMaster )Yi\b
Your first regex had .+ in it, that is one character, one or more times; and as quantifiers are greedy by default, this swallows the whole input before reluctantly giving back to match the next token (G).
Your second regex used a negative lookahead, what you wanted was a negative lookbehind. That is, a position where the text before that position does not match.
And note that regexes in lookbehinds must be of finite length.

Related

NOT words in Regex Pattern

I am trying to grab the text after the first hyphen in a pattern
<title>.*?-(.*?)(-|<\/title>)
which then grabs DesiredText from the pattern below:
<title>Stuff - DesiredText - Other Stuff</title>
However in this pattern:
<title>Stuff - Unwanted - DesiredText - Otherstuff</title>
I want it to skip the 'Unwanted' text and match the text after the next hyphen instead (DesiredText). I made a regex101 with both patterns and need to modify my basic regex so that if a word or words I don't want to match are present in that capture group it then matches the second hyphen text instead:
https://regex101.com/r/veSqH3/1

I believe this is what you are looking for. The key is in using the caret (^) character within the square-bracket character list ([]). Using the caret and brackets together indicate a blacklist. It will only match things that are NOT in the list.
https://regex101.com/r/alAZhj/3
Pattern: <title>.*?-\s*([^-\s]*)\s*- End<\/title>
This matches anything in between the middle hyphens that is not a hyphen or space. You can of course modify the pattern to include such characters by using the following pattern.
Pattern: <title>.*?-\s*([^-]*)\s*- End<\/title>
This will match anything in between the middle hyphens that is not a hyphen, so that you can have less restricted text in there.

This will use a negative lookahead to disqualify Note. There may be ways to optimize the pattern, but I cannot do so with confidence because I don't know how variable your inputs strings are.
Pattern: /<title>.*?- (?P<title>(?!Note).*?)(?= -|<])/
Demo
I am using a positive lookahead to ensure the captured match doesn't have any unwanted trailing characters.
If you just want the second last delimited value, you could do something like this to return the value as the fullstring match:
~- \K[^-]*(?= - [^-]*?</title>)~
Or faster with a capture group:
~- ([^-]*) - [^-]*?</title>~
This assumes there are no hyphens in the value.

I took a different approach and focused on returning the capture prior to the last word, rather than any sort of negation. In this way it's highly generic.
This pattern will match what you want in the capture group:
\s-\s([a-zA-Z]+)\s-\s[a-zA-Z]+<\/title>
If you are concerned that this only match between title tags, then you can add:
<title>.*?\s-\s([a-zA-Z]+)\s-\s[a-zA-Z]+<\/title>
Here's a link to the Test
The only limitation to this I see, is that it uses words and whitespace, so if your desired match is "- Some phrase -" then this won't work with it, but that was not indicated in your example. It's a bit unclear because you used "other stuff" and then "otherstuff".

Php regex that matches substring followed by any length of character and then comma

I have a long string containing Copyright: 'any length of unknown string here',
what regex should I write to exactly match this as substring in a string?
I tried this preg_replace('/Copyright:(.*?)/', 'mytext', $str); but its not working, it only matches the Copyright:

A lazily quantified pattern at the end of the pattern will always match no text in case of *? and 1 char only in case of +?, i.e. will match as few chars as possible to return a valid match.
You need to make sure you get to the ', by putting them into the pattern:
'/Copyright:.*?\',/'
^^^
See the regex demo

The ? in your group 1 (.*?) makes this block lazy, i.e. matching as few characters as possible. Removing that would solve it.
Copyright:(.*)',
However, that would match everything in that same line. If you have text in that same line, make sure to limit it further. My screenshot below just just grouping () to make it easier for you to look, you can do without the parentheses.
I usually use Regxr.com to test my regular expression, there's also many other similar tools online, note that this one is great in UX, but does not support lookbehind.

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.

A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.

You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.

In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

What do these certain symbols/parts mean in preg_match?

I know a little about preg_match, however there are some that look rather complex and some that contain symbols that I don't entirely understand. For example:
On the first one - I can only assume this has something to do with an e-mail address and url, but what do things like [^/] and the ? mean?
preg_match('#^(?:http://)?([^/]+)#i', $variable);
.....
In the second one - what do things like the ^, {5} and $ mean?
preg_match("/^[A-Z]{5}[0-9]{4}[A-Z]{1}$/", $variable);
It's just these small things I'm not entirely sure on and a brief explanation would be much appreciated.

Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at http://www.regular-expressions.info/tools.html. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.
Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?
Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.
With this in mind, let's dissect your regex:
/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
Reads as:
^.* From Start, capture 0-many of any character
(?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
.*$ 0-many of anything preceding the End

Although I am not a fan of just posting links, I think a regex tutorial would be too much. So check out this Regular Expression cheat sheet it will probably get you on your way if you already have a little understanding of what it does.
Also check out this for some explanations and more helpful links; http://coding.smashingmagazine.com/2009/06/01/essential-guide-to-regular-expressions-tools-tutorials-and-resources/

First one:
The # actually don't have anything to do with the content that is matched. Usually, you use / as the delimiter character in a regex. Downside is, that you need to escape it everytime you want to use it. So here, # is used as the delimiter.
[^/] is a character group. [/] would match only the / character, ^ inverts this. [^/] matches all characters except the /.
Second one:
^ matches the beginning of the string, $ the end of the string. You can use this to enforce that the regex has to apply to the whole string you are matching on.
{5} is a quantifier. It is equivalent to {5,5} which is minimum 5, maximum 5, so it matches exactly 5 characters.

first one:
[^/] = everything but no slash
second one:
^ look from beginning of $variable
{5} exactly 5 occurencies of [A-Z]
$ look until end of $variable reached
combination of ^ and $ means that everything between that has to apply to $variable

How to match required characters in random order using regular expression?

I need to match text which has #, #, and any number in it. The characters can be in random position as long as they are in the text. Given this input:
abc##d9
a9b#c#d
##abc#9
abc9d##
a#b#c#d
The regex should match the first 3 lines. Currently my regex is:
/#.*?#.*?[0-9]/
Which doesn't work since it will only match the three chars in sequence. How to match the three chars in random order?

Found one of this ugly regex, if you really must use one:
/(?=.*#)(?=.*#)(?=.*[0-9]).*/
http://jsfiddle.net/BP53f/2/
The regex is basically using what they call lookahead
http://www.regular-expressions.info/lookaround.html
A simple case from the link above is trying to match q, followed by u, by doing q(?=u), that's why it's called lookahead, it finds q followed by u ahead.
Let's take one of your valid case: a9b#c#d
The first lookahead is (?=.*#), which states: Match anything, followed by a #. So it does, which is the string a9b#c, then since the match from the lookahead must be discarded, the engine steps back to the start of the string, which is an a. Then it goes to
(?=.*#), which states: Match anything that is followed by #, then it finds it at a9b. etc. The difference between using lookahead and (a)(b)(c) is basically the stepping back.
From the link above:
Let's take one more look inside, to make sure you understand the
implications of the lookahead. Let's apply q(?=u)i to quit. I have
made the lookahead positive, and put a token after it. Again, q
matches q and u matches u. Again, the match from the lookahead must be
discarded, so the engine steps back from i in the string to u. The
lookahead was successful, so the engine continues with i. But i cannot
match u. So this match attempt fails. All remaining attempts will fail
as well, because there are no more q's in the string.
It is ugly because it's difficult to maintain... You basically have 3 different sub-regex inside the brackets.

Use separate expressions to make sure # and # are present. Once they are, remove them and match for the rest of the characters/digits.

Decided I better write this as an answer:
$text = "a9b#c#d";
$themAll = "##";
$themAny = "0123456789";
echo (strspn($themAll, $text)==strlen($themAll) && strpbrk($text, $themAny));
For maintenance and some (limited) extending this should be as easy as it gets, especially whth longer $themAll lists.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_replace with exceptions doesn't work for me - php

Related

NOT words in Regex Pattern

Php regex that matches substring followed by any length of character and then comma

(PHP) How to find words beginning with a pattern and replace all of them?

What do these certain symbols/parts mean in preg_match?

How to match required characters in random order using regular expression?

Categories

Resources