PHP Replace consecutive occurrence of characters in sentence - php

I want to replace all consecutive characters in each WORD if there are more than three (three being the most possible in German language, two for English so I know the output example is grammatically wrong).
Example input:
Hellooooo Louis, whaaaaaat's up pal?
Expected output:
Hellooo Louis, whaaat's up pal?
I tried to change:
preg_replace('/(\w)\1+/', '$1', $word);
to
preg_replace('/(\w)\3+/', '$1', $word);
However, it doesn't output anything.

You can use the following regex:
((\w)\2{2})\2+
See demo
Replace with $1.
IDEONE:
$re = "#((\w)\\2{2})\\2+#";
$str = "Hellooooo Louis, whaaaaaat's up pal?";
$subst = "$1";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
Hellooo Louis, whaaat's up pal?
EXPLANATION:
We capture the symbol with (\w) - it is Group 2 value. Then, we check if it is followed by the same character with \2{2} exactly 2 times, and we capture it into Group 1. Then, we match any more identical subsequent characters with the \2 backreference.

Here is a way to go:
preg_replace('/((\w)\2\2)\2+/', '$1', $word);

Also you can use \K for resetting after and replace with empty, which is a bit more efficient:
(\w)\1\1\K\1+
See regex101

Related

no solution for me. how can i replace second occurence of a find in php

im searching a paragrahp (string) for a certain word. and i want to replace that word with another word, but i want to replace on the second occurence of my find.
here is what i tried
$string = 'hello my name is hello';
$output = str_replace('hello', 'Gary', $string);
// desired output
//hello my name is Gary
It is very simple but i cant get it right. Please bare in mind my string is very long and has all types of characters in it
With this regex : /^.*?hello\b.*?\Khello/ :
^ assert position at start of the string
.*? matches any character (except newline)
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
Check this demo : https://regex101.com/r/lW2kK1/2
which gives you :
$re = "/^.*?hello\\b.*?\\Khello/";
$str = "hello my name is hello";
$subst = "Gary";
$result = preg_replace($re, $subst, $str);

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.
If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo
Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);
Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

preg_match parenthesized pattern

I'm trying to change a bunch of decimals in a string to two decimal points. The regex seems to match it just fine. The problem is with the replace.
This is my code:
$input_lines = "-33.873293252 151.201538015999972,-33.873175 151.201689183999946";
print preg_replace("/[0-9]+(\.[0-9][0-9]?)?/", "$0 $2", $input_lines);
Which outputs decimal that I want | truncated decimals that I don't want:
-33.87 3293252 151.20 1538015999972 ,-33.87 3175 151.20 1689183999946
So I tried changing the replacement to $0. But now the replace stopped working, and is instead giving me:
-33.873293252 151.201538015999972,-33.873175 151.201689183999946
How can I rewrite my regular expression so it gives me the desired output?
Better:
preg_replace("/(?<=\.\d\d)\d+/","",$input_lines);
Replaces all trailing decimals after the first two with nothing.
([-+]?\d+(?:\.\d{2})?)(\d*)
Try this.Replace by $1.See demo.
https://regex101.com/r/vD5iH9/46
$re = "/([-+]?\\d+(?:\\.\\d{2})?)(\\d*)/m";
$str = "-33.873293252 151.201538015999972,-33.873175 151.201689183999946";
$subst = "$1";
$result = preg_replace($re, $subst, $str);

Regex in PHP: Replacing text between strings

Okay I have made some progress on a problem I am solving, but need some help with a small glitch.
I need to remove all characters from the filenames in the specific path images/prices/ BEFORE the first digit, except for where there is from_, in which case remove all characters from the filename BEFORE from_.
Examples:
BEFORE AFTER
images/prices/abcde40.gif > images/prices/40.gif
images/prices/UgfVe5559.gif > images/prices/5559.gif
images/prices/wedsxcdfrom_88457.gif > images/prices/from_88457.gif
What I've done:
$pattern = '%images/(.+?)/([^0-9]+?)(from_|)([0-9]+?)\.gif%';
$replace = 'images/\\1/\\3\\4.gif';
$string = "AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD";
$newstring = str_ireplace('from_','733694521548',$string);
while(preg_match($pattern,$newstring)){
$newstring=preg_replace($pattern,$replace,$newstring);
}
$newstring=str_ireplace('733694521548','from_',$newstring);
echo "Original:\n$string\n\nNew:\n$newstring";
My expected output is:
AAA images/prices/40.gif BBB images/prices/from_88457.gif CCC images/prices/5559.gif DDD"
But instead I am getting:
AAA images/prices/40.gif BBB images/from_88457.gif CCC images/5559.gif DDD
The prices/ part of the path is missing from the last two paths.
Note that the AAA, BBB etc. portions are just placeholders. In reality the paths are scattered all across a raw HTML file parsed into a string, so we cannot rely on any pattern in between occurrences of the text to be replaced.
Also, I know the method I am using of substituting from_ is hacky, but this is purely for a local file operation and not for a production server, so I am okay with it. However if there is a better way, I am all ears!
Thanks for any assistance.
You can use lookaround assertions:
preg_replace('~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i', '', $value);
Explanation:
(?<=/) # If preceded by a '/':
(?: # Begin group
([a-z]+) # Match alphabets from a-z, one or more times
(?=\d+\.gif) # If followed followed by digit(s) and '.gif'
| # OR
(\w+) # Match word characters, one or more times
(?=from_) # If followed by 'from_'
) # End group
Visualization:
Code:
$pattern = '~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i';
echo preg_replace($pattern, '', $string);
Demo
You can use this regex for replacement:
^(images/prices/)\D*?(from_)?(\d+\..+)$
And use this expression for replacement:
$1$2$3
RegEx Demo
Code:
$re = '~^(images/prices/)\D*?(from_)?(\d+\..+)$~m';
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$result = preg_replace($re, '$1$2$3', $str);
You can try with Lookaround as well. Just replace with blank string.
(?<=^images\/prices\/).*?(?=(from_)?\d+\.gif$)
regex101 demo
Sample code: (directly from above site)
$re = "/(?<=^images\\/prices\\/).*?(?=(from_)?\\d+\\.gif$)/m";
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$subst = '';
$result = preg_replace($re, $subst, $str);
If string is not multi-line then use \b as word boundary instead of ^ and $ to match start and end of the line/string.
(?<=\bimages\/prices\/).*?(?=(from_)?\d+\.gif\b)
$arr = array(
'images/prices/abcde40.gif',
'images/prices/UgfVe5559.gif',
'images/prices/wedsxcdfrom_88457.gif'
);
foreach($arr as $str){
echo preg_replace('#images/prices/.*?((from_|\d).*)#i','images/prices/$1',$str);
}
DEMO
EDIT:
$str = 'AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD';
echo preg_replace('#images/prices/.*?((from_|\d).*?\s|$)#i','images/prices/$1',$str), PHP_EOL;

Splitting a string using lookback in PHP

I have a product feed where the product options is formatted like this:
Color{1} : Black[14], White[42] Size{2} : Small[16], Medium[17], Large[18]
For my script to understand and parse the product options correctly, it needs to be in the following format:
Color:Black,White|Size:Small,Medium,Large
I started out like this to remove unnecessary information:
$matches[1] = preg_replace("/\{\d{1,}\} : /", ': ', $matches[1]);
$matches[1] = preg_replace("/\[\d{1,}\]/", '', $matches[1]);
Which gives this output:
Color: Black, White Size: Small, Medium, Large
But my problem now is "how to insert a pipe before the option name, unless its only one option, or the first option". I guess I need to use some sort of lookback, but I have no idea.
First, split the string into several individual options using preg_split():
$arr = preg_split('/\s+(?=[a-z]+{\d+})/i', $str);
(?=[a-z]+{\d+}) is a positive lookahead that asserts that the whitespace (\s+) is followed by a string of the format <string>{xx}. It's used here to pinpoint on which spaces the split should happen. It's important to note that the lookahead assertion is zero-width, i.e. it doesn't consume any characters at all.
Once you have the split array, loop through it, and remove {xx}, [xx] parts and whitespace:
foreach ($arr as &$str)
$str = preg_replace('/(?:{\d+}|\[\d+\]|\s*)/', '', $str);
Join the array by |:
echo join('|', $arr);
Output:
Color:Black,White|Size:Small,Medium,Large
Demo
This method uses only two iterations of regex substitution
First, delete all spaces along with digits
$re = "/(.\\d+.|[ ]+)/";
$str = "Color{1} : Black[14], White[42] Size{2} : Small[16], Medium[17], Large[18]";
$subst = '';
$result = preg_replace($re, $subst, $str);
Then add in the pipe
$re = "/([a-z])([A-Z])/";
$subst = '\1|\2';
$endresult = preg_replace($re, $subst, $result);
Input:
Color{1} : Black[14], White[42] Size{2} : Small[16], Medium[17], Large[18]
Output:
Color:Black,White|Size:Small,Medium,Large
Here's a quick demo
Note: I'm assuming that the digits are always surrounded by a curly brace or a bracket without any spacing in between and that the quantity names are only alpha character (never digits).

Categories