PHP REGEX find & replace patterns - php

Trying to construct a regex that will locate a pattern of ANY character followed by double quotes
This regex locates each occurrence properly
(\S"")
Given the example below
$string='"WEINSTEIN","ANTONIA \"TOBY"","STILES","HOOPER \"PETER"","HENDERSON",';
$pattern = '(\S"")';
$replacement = '\\""';
$result=preg_replace($pattern, $replacement, $string);
My result turns out to be
"WEINSTEIN","ANTONIA \"TOB\"","STILES","HOOPER \"PETE\"","HENDERSON"
But I am seeking
"WEINSTEIN","ANTONIA \"TOBY\"","STILES","HOOPER \"PETER\"","HENDERSON"
I understand the replacement is removing/replacing the whole match, but how can I remove all but the first letter rather than completely replacing it?

You can change your pattern to use a positive lookbehind instead so that it doesn't capture the non-space character:
$string='"WEINSTEIN","ANTONIA \"TOBY"","STILES","HOOPER \"PETER"","HENDERSON",';
$pattern = '/(?<=\S)""/';
$replacement = '\\""';
$result=preg_replace($pattern, $replacement, $string);
echo $result;
Output
"WEINSTEIN","ANTONIA \"TOBY\"","STILES","HOOPER \"PETER\"","HENDERSON",
Demo on 3v4l.org

Related

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.
If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo
Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);
Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

PHP exploding url from text, possible?

i need to explode youtube url from this line:
[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]
It is possible? I need to delete [embed] & [/embed].
preg_match is what you need.
<?php
$str = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
preg_match("/\[embed\](.*)\[\/embed\]/", $str, $matches);
echo $matches[1]; //https://www.youtube.com/watch?v=L3HQMbQAWRc
$string = '[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]';
$string = str_replace(['[embed]', '[/embed]'], '', $string);
See str_replace
why not use str_replace? :) Quick & Easy
http://php.net/manual/de/function.str-replace.php
Just for good measure, you can also use positive lookbehind's and lookahead's in your regular expressions:
(?<=\[embed\])(.*)(?=\[\/embed\])
You'd use it like this:
$string = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
$pattern = '/(?<=\[embed\])(.*)(?=\[\/embed\])/';
preg_match($pattern, $string, $matches);
echo $match[1];
Here is an explanation of the regex:
(?<=\[embed\]) is a Positive Lookbehind - matches something that follows something else.
(.*) is a Capturing Group - . matches any character (except a newline) with the Quantifier: * which provides matches between zero and unlimited times, as many times as possible. This is what is matched between the groups prior to and after. This are the droids you're looking for.
(?=\[\/embed\]) is a Positive Lookahead - matches things that come before it.

Regular Expression to replace a bracket with word in front when not preceded by word

I am facing problem with a regular expression.
I have a string like ('A'&'B')
Now I want to convert it to CONCAT('A'&'B') which is simple and I have done using
str_replace("(", "CONCAT(", $subject)
But I want to replace "(" to "CONCAT(" if the string doesn't have prior string "extract_json_value".
So I don't want to replace extract_json_value('A'&'B') to extract_json_valueCONCAT('A'&'B') but it will stay as it is extract_json_value('A'&'B').
You can expand your regex with a negative lookbehind:
(?<!extract_json_value)\(
Here is a regex demo!
You could use strpos to do this.
if (strpos($subject, '(') === 0) {
$subject = str_replace('(', 'CONCAT(', $subject);
}
If your string contains other text you can use preg_replace() and use a word boundary \B for this.
$subject = preg_replace('/\B\(/', 'CONCAT(', $subject);
You can use negative lookbehind in order to match a group not preceded by a string.
First, let's have a regexp matching all strings but those containing "extract_json_value":
(?<!extract_json_value).*
Now, let's use preg_replace
$string = "extract_json_value('A'&'B')";
$pattern = '/^(?<!extract_json_value)(\(.+\))$/';
$replacement = 'CONCAT\1';
echo preg_replace($pattern, $replacement, $string);
// prints out "extract_json_value('A'&'B')"
It works too with
$string = "('A'&'B')";
...
// prints out "CONCAT('A'&'B')"
However, it does not work with
$string = "hello('A'&'B')";
...
// prints out "helloCONCAT('A'&'B')"
So, continue with a preg_replace_callback:
http://php.net/manual/fr/function.preg-replace-callback.php

preg_replace only substring before defined char

I'm trying to replace chars not [A-Z] and before the # inside a string. So this
AreplacehereZ#domain.tld
needs to become:
A***********Z#domain.tld
I tried with:
$string = 'AreplacehereZ#domain.tld';
$pattern = '/(?<!#)[^A-Z#\.]/';
$replacement = '*';
$replace = preg_replace($pattern, $replacement, $tring);
but the result is
'A***********Z#d*****.***'
So I can't find the way how to avoid the replacement of #domain.tld by only using preg_replace().
domain.tld can be anything so I can't use (?<!#domain.tld) in the $pattern var.
You can just assert that from the current position, match [^A-Z], then make sure you can consume any number of characters but still hit the #:
$pattern = '/[^A-Z](?=[^#]*#)/';
Produces:
A***********Z#domain.tld

preg_replace() seems to remove entire word instead of part of it

I'm trying to match a certain word and replace part of the word with certain text but leave the rest of the word intact. It is my understanding that adding parentheses to part of the regex pattern means that the pattern match within the parentheses gets replaced when you use preg_replace()
for testing purposes I used:
$text = 'batman';
echo $new_text = preg_replace('#(bat)man#', 'aqua', $text);
I only want 'bat' to be replaced by 'aqua' to get 'aquaman'. Instead, $new_text echoes 'aqua', leaving out the 'man' part.
preg_replace replaces all the string matched by regular expression
$text = 'batman';
echo $new_text = preg_replace('#bat(man)#', 'aqua\\1', $text);
Capture man instead and append it to your aqua prefix
Another way of doing that is to use assertions:
$text = 'batman';
echo $new_text = preg_replace('#bat(?=man)#', 'aqua', $text);
I would not use preg_* functions for this and just do str_replace() DOCs:
echo str_replace('batman', 'aquaman', $text);
This is simpler as a regex is not really needed in this case. Otherwise it would be with a regular expression:
echo $new_text = preg_replace('#bat(man)#', 'aqua\\1', $text);
This will substitute your man in after aqua when replacing the entire search phrase. preg_replace DOCs replaces the entire matching portion of the pattern.
The way you're trying to do it, it would be more like:
preg_replace('#bat(man)#', 'aqua$1', $text);
I'd using positive lookahead:
preg_replace('/bat(?=man)/', 'aqua', $text)
Demo here: http://ideone.com/G9F4q
The brackets are creating a capturing group, that means you can access the part matched by this group using \1.
you can do either what zerkms suggested or use a lookahead that does just check but not match.
$text = 'batman';
echo $new_text = preg_replace('#bat(?=man)#', 'aqua', $text);
This will match "bat" but only if it is followed by "man", and only "bat" is replaced.

Categories