regexpneed to find word with minus and others - php

i have for now few types of code that i need to find and replace with regex.
{word1/G_KP8zXsDp8/word2}
{word1/GKP8zXsDp8/word2}
{word1/G-KP8zXsDp8/word2}
my replacement now is: /({word1\/)(\w+)\/(\w+)/ and it finds 1st and 2nd cases, but don't find 3rd one. and I need it what it would be in match[2] $2.

The problem is that \w doesn't match a hyphen by default — it is a shorthand representation for the character class [a-zA-Z0-9_].
To fix this, you can update your regex to include - as well. Use [\w-]+ instead of \w+.
/({word1\/)([\w-]+)\/(\w+)/
RegEx Demo

Related

preg_replace doesnt not replace what I want

I have this regex that matches strings that I want to check on validity.
However recently I want to use this same regex to replace every character that is not valid to the regex with a character (let's say x).
My regex to match these types of strings is: '#^[\pL\'\’\d][\pL\.\-\ \'\/\,\’\d]*$#iu'
Which allows for the first character to be of any language or any digit and some determined special chars. And all the following letters to be slightly the same but slightly more special characters.
This is what I do (nothing special).
preg_replace($regex, 'x', $string);
Things I tried include trying to negate the regex:
'(?![\pL\'\’\d][\pL\.\-\ \'\/\,\’\d]*)'
'[^\pL\'\’\d][^\pL\.\-\ \'\/\,\’\d]*'
I've also tried splitting up the string into the firstchar and the rest of the string and split the regex in 2.
$validationRegex1 = '[^\pL\'\’\d]';
$validationRegex2 = '[^\pL\.\-\ \'\/\,\’\d]*';
$fixedStr1 = (string) preg_replace($validationRegex1, 'x', $firstChar)
. (string) preg_replace($validationRegex2, 'x', $theRest);
But this also did not seemed to work.
I've experimented a bit with this online tool: https://www.functions-online.com/preg_replace.html
Does anyone know what I am overlooking?
Examples of strings and their expected results
'-' should become 'x'.
'Random-morestuff' stays 'Random-morestuff'
'Random%morestuff' should become 'Randomxmorestuff'
'Rândôm' stays 'Rândôm'
Just an idea but if I got you right, you could use
(?(DEFINE)
(?<first>[\pL\d'’])
(?<other>[-\ \pL\d.'/,’])
)
\b(?&first)(?&other)+\b(*SKIP)(*FAIL)|.
This needs to be replaced by x. You do not have to escape everything in a character class, I changed this accordingly.
See a demo on regex101.com.
A bit more explanation: The (?(DEFINE)...) thingy lets you define subroutines that can be used afterwards and is just syntactic sugar in this case (maybe a bit showing off, really). As you have stated that other characters are allowed depending on theirs positions, I just called them first and other. The \b marks a word boundary, that is a boundary between \w (usually [a-zA-Z0-9_]) and \W (not \w). All of these "words" are allowed, so we let the engine "forget" what has been matched with the (*SKIP)(*FAIL) mechanism and match any other character on the right side of the alternation (|). See how (*SKIP)(*FAIL) works here on SO.
Use
$fixedStr1 = preg_replace('/[\p{L}\'\’\d][\p{L}\.\ \'\/\,\’\d-]*(*SKIP)(*FAIL)|./u', 'x', $input_string);
See regex proof.
Fail matches that match valid symbol words and replace every character appearing in other places.

Regex match section within string

I have a string foo-foo-AB1234-foo-AB12345678. The string can be in any format, is there a way of matching only the following pattern letter,letter,digits 3-5 ?
I have the following implementation:
preg_match_all('/[A-Za-z]{2}[0-9]{3,6}/', $string, $matches);
Unfortunately this finds a match on AB1234 AND AB12345678 which has more than 6 digits. I only wish to find a match on AB1234 in this instance.
I tried:
preg_match_all('/^[A-Za-z]{2}[0-9]{3,6}$/', $string, $matches);
You will notice ^ and $ to mark the beginning and end, but this only applies to the string, not the section, therefore no match is found.
I understand why the code is behaving like it is. It makes logical sense. I can't figure out the solution though.
You must be looking for word boundaries \b:
\b\p{L}{2}\p{N}{3,5}\b
See demo
Note that \p{L} matches a Unicode letter, and \p{N} matches a Unicode number.
You can as well use your modified regex \b[a-zA-Z]{2}[0-9]{3,5}\b. Note that using anchors makes your regex match only at the beginning of a string (with ^) or/and at the end of the string (with $).
In case you have underscored words (like foo-foo_AB1234_foo_AB12345678_string), you will need a slight modification:
(?<=\b|_)\p{L}{2}\p{N}{3,5}(?=\b|_)
You have to end your regular expression with a pattern for a non-digit. In Java this would be \D, this should be the same in PHP.

Regex Challenge - either ... or

I havent been able to figure this one out.
I need to match all those strings by matching whole and its surroundings underscores (in one regex statement):
whole_anything
anything_whole
anything_whole_anything
but it must NOT match this
anythingwholeanything
anything_wholeanything
anythingwhole_anything
That means... make a regex statement, that match phrase whole only if it has underscore before, after or both. Not if there are no underscores.
The following
preg_match("/(whole_|_whole_|_whole)/",string)
is not a solution ;)
2015/02/09 Edit: added conditions 5. and 6. for clarification
You could reduce the number of cases in the alternatives:
preg_match('/(_whole_?|whole_)/', $string);
If there's an underscore before, the underscore after is optional. But if there's no underscore before, the underscore after is required.
You can use a PHP variable to solve the problem of putting the word twice:
$word = preg_quote('whole');
preg_match("/(_{$word}_?|{$word}_)/", $string);
Another alternative. This way we check for the existence of a word boundary or _ both before and after whole, but we exclude the word whole by itself through a negative lookahead.
(?!\bwhole\b)((?:_|\b)whole(?:_|\b))
Regex Demo here.
You could exclude all alphanumeric characters prior to and after. Unfortunately you can't use \w because _ is considered a word character
([^a-zA-Z0-9])_?whole_?([^a-zA-Z0-9])
That will exclude alphanumeric before and after from matching, and the underscore in front, behind, or both, is optional. If none exist, it can't match because it can'be proceeded by a letter or number. You could change it to include special characters and the lot.

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

preg_replace with exceptions doesn't work for me

I got a "little" problem already...
I only want to replace some names with other names or something.
It works fine so far, but with some names I got a problem.
For example I want to replace "Cho" with "Cho'Gath",
but of course I don't want to replace "Cho'Gath" with "Cho'Gath'Gath".
So therefore I created this regular expression, and replace all "Cho"'s except of "Cho'Gath":
/\bCho(?!.+Gath)\b/i
This works and it doesn't replace "Cho'Gath", but it also doesn't replace "Cho Hello World Gath" ... that is my first problem!
The second one is follwing: I also want to replace all "Yi", but not "Master Yi", so I tried the same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This doesn't replace "Master Yi", okay. But it also doesn't replace "Yi", but it should do! (I also tried /\b(?!Master(**\s**))Yi\b/i but this also doesn't work)
So far I don't know what to do know... can anyone help me with that?
Your first problem is easily solved if you replace .+ with the actual character that you want to match (or not to match): ', but let's have a look at the second one, this is quite interesting:
I also want to replace all "Yi", but not "Master Yi", so I tried the
same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This is a negative lookahead on \b. The expression does match a single "Yi", but look what it does with "Master Yi":
Hello I am Master Yi
^
\b
This boundary is not followed by "Master" but followed by "Yi". So your expression also matches the "Yi" in this string.
The negative lookahead is quite pointless because it checks if the boundary that is directly followed by "Yi" (remember that a lookahead assertion just "looks ahead" without moving the pointer forward) is not directly followed by "Master". This is always the case.
You could use a lookbehind assertion instead, but only without the (anyways unnecessary) .+, because lookbehind assertions must have fixed lengths:
/\b(?<!Master )Yi\b/i
matches every "Yi" that is not preceded by "Master ".
For the first regex:
\bCho(?!.Gath)\b
For the second:
(?<!\bMaster )Yi\b
Your first regex had .+ in it, that is one character, one or more times; and as quantifiers are greedy by default, this swallows the whole input before reluctantly giving back to match the next token (G).
Your second regex used a negative lookahead, what you wanted was a negative lookbehind. That is, a position where the text before that position does not match.
And note that regexes in lookbehinds must be of finite length.

Categories