Regex in PHP: Replacing text between strings

Regex in PHP: Replacing text between strings - php

Okay I have made some progress on a problem I am solving, but need some help with a small glitch.
I need to remove all characters from the filenames in the specific path images/prices/ BEFORE the first digit, except for where there is from_, in which case remove all characters from the filename BEFORE from_.
Examples:
BEFORE AFTER
images/prices/abcde40.gif > images/prices/40.gif
images/prices/UgfVe5559.gif > images/prices/5559.gif
images/prices/wedsxcdfrom_88457.gif > images/prices/from_88457.gif
What I've done:
$pattern = '%images/(.+?)/([^0-9]+?)(from_|)([0-9]+?)\.gif%';
$replace = 'images/\\1/\\3\\4.gif';
$string = "AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD";
$newstring = str_ireplace('from_','733694521548',$string);
while(preg_match($pattern,$newstring)){
$newstring=preg_replace($pattern,$replace,$newstring);
}
$newstring=str_ireplace('733694521548','from_',$newstring);
echo "Original:\n$string\n\nNew:\n$newstring";
My expected output is:
AAA images/prices/40.gif BBB images/prices/from_88457.gif CCC images/prices/5559.gif DDD"
But instead I am getting:
AAA images/prices/40.gif BBB images/from_88457.gif CCC images/5559.gif DDD
The prices/ part of the path is missing from the last two paths.
Note that the AAA, BBB etc. portions are just placeholders. In reality the paths are scattered all across a raw HTML file parsed into a string, so we cannot rely on any pattern in between occurrences of the text to be replaced.
Also, I know the method I am using of substituting from_ is hacky, but this is purely for a local file operation and not for a production server, so I am okay with it. However if there is a better way, I am all ears!
Thanks for any assistance.

You can use lookaround assertions:
preg_replace('~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i', '', $value);
Explanation:
(?<=/) # If preceded by a '/':
(?: # Begin group
([a-z]+) # Match alphabets from a-z, one or more times
(?=\d+\.gif) # If followed followed by digit(s) and '.gif'
| # OR
(\w+) # Match word characters, one or more times
(?=from_) # If followed by 'from_'
) # End group
Visualization:
Code:
$pattern = '~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i';
echo preg_replace($pattern, '', $string);
Demo

You can use this regex for replacement:
^(images/prices/)\D*?(from_)?(\d+\..+)$
And use this expression for replacement:
$1$2$3
RegEx Demo
Code:
$re = '~^(images/prices/)\D*?(from_)?(\d+\..+)$~m';
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$result = preg_replace($re, '$1$2$3', $str);

You can try with Lookaround as well. Just replace with blank string.
(?<=^images\/prices\/).*?(?=(from_)?\d+\.gif$)
regex101 demo
Sample code: (directly from above site)
$re = "/(?<=^images\\/prices\\/).*?(?=(from_)?\\d+\\.gif$)/m";
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$subst = '';
$result = preg_replace($re, $subst, $str);
If string is not multi-line then use \b as word boundary instead of ^ and $ to match start and end of the line/string.
(?<=\bimages\/prices\/).*?(?=(from_)?\d+\.gif\b)

$arr = array(
'images/prices/abcde40.gif',
'images/prices/UgfVe5559.gif',
'images/prices/wedsxcdfrom_88457.gif'
);
foreach($arr as $str){
echo preg_replace('#images/prices/.*?((from_|\d).*)#i','images/prices/$1',$str);
}
DEMO
EDIT:
$str = 'AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD';
echo preg_replace('#images/prices/.*?((from_|\d).*?\s|$)#i','images/prices/$1',$str), PHP_EOL;

Related

How to not perform preg_replace if subject starts with quote

I'm trying to convert plain links to HTML links using preg_replace. However it's replacing links that are already converted.
To combat this I'd like it to ignore the replacement if the link starts with a quote.
I think a positive lookahead may be needed but everything I've tried hasn't worked.
$string = 'test http://www.example.com';
$string = preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $string);
var_dump($string);
The above outputs:
http://www.example.com">test</a> http://www.example.com
When it should output:
test http://www.example.com

You might get along with lookarounds.
Lookarounds are zero-width assertions that make sure to match/not to match anything immediately around the string in question. They do not consume any characters.
That being said, a negative lookbehind might be what you need in your situation:
(?<![">])\bhttps?://\S+\b
In PHP this would be:
<?php
$string = 'I want to be transformed to a proper link: http://www.google.com ';
$string .= 'But please leave me alone ';
$string .= '(https://www.google.com).';
$regex = '~ # delimiter
(?<![">]) # a neg. lookbehind
https?://\S+ # http:// or https:// followed by not a whitespace
\b # a word boundary
~x'; # verbose to enable this explanation.
$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>
See a demo on ideone.com. However, maybe a parser is more appropriate.

Since you can use Arrays in preg_replace, this might be convenient to use depending on what you want to achieve:
<?php
$string = 'test http://www.example.com';
$rx = array("&(<a.+https?:\/\/[\w]+[^ \,\"\n\r\t<]*>)(.*)(<\/a\>)&si", "&(\s){1,}(https?:\/\/[\w]+[^ \,\"\n\r\t<]*)&");
$rp = array("$1$2$3", "$2");
$string = preg_replace($rx,$rp, $string);
var_dump($string);
// DUMPS:
// 'testhttp://www.example.com'

The Idea
You can split your string at the already existing anchors, and only parse the pieces in between.
The Code
$input = 'test http://www.example.com';
// Split the string at existing anchors
// PREG_SPLIT_DELIM_CAPTURE flag includes the delimiters in the results set
$parts = preg_split('/(<a.*?>.*?<\/a>)/is', $input, PREG_SPLIT_DELIM_CAPTURE);
// Use array_map to parse each piece, and then join all pieces together
$output = join(array_map(function ($key, $part) {
// Because we return the delimiter in the results set,
// every $part with an uneven key is an anchor.
return $key % 2
? preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $part)
: $part;
}, array_keys($parts), $parts);

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo

Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);

Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

PHP Replace consecutive occurrence of characters in sentence

I want to replace all consecutive characters in each WORD if there are more than three (three being the most possible in German language, two for English so I know the output example is grammatically wrong).
Example input:
Hellooooo Louis, whaaaaaat's up pal?
Expected output:
Hellooo Louis, whaaat's up pal?
I tried to change:
preg_replace('/(\w)\1+/', '$1', $word);
to
preg_replace('/(\w)\3+/', '$1', $word);
However, it doesn't output anything.

You can use the following regex:
((\w)\2{2})\2+
See demo
Replace with $1.
IDEONE:
$re = "#((\w)\\2{2})\\2+#";
$str = "Hellooooo Louis, whaaaaaat's up pal?";
$subst = "$1";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
Hellooo Louis, whaaat's up pal?
EXPLANATION:
We capture the symbol with (\w) - it is Group 2 value. Then, we check if it is followed by the same character with \2{2} exactly 2 times, and we capture it into Group 1. Then, we match any more identical subsequent characters with the \2 backreference.

Here is a way to go:
preg_replace('/((\w)\2\2)\2+/', '$1', $word);

Also you can use \K for resetting after and replace with empty, which is a bit more efficient:
(\w)\1\1\K\1+
See regex101

Change URL and append file extension with REGEX

I've been reading up on RegEx docs but I must say I'm still a bit out of my element so I apologize for not posting what I have tried because it was all just plain wrong.
Heres the issue:
I've got images using the following source:
src="http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi"
I need to get to this:
src="http://newsite.com/wp-content/uploads/2014/07/6a015433877b2b970c01a3fd22309b970b-800wi.jpg"
Essentially removing the /.a/ from the URL and appending a .jpg to the end of the image file name. If it helps in a solution I'm using this plug-in: http://urbangiraffe.com/plugins/search-regex/
Thanks All.

This might help you.
(?<=src="http:\/\/)samplesite\/\.a\/([^"]*)
Online demo
Sample code:
$re = "/(?<=src=\"http:\/\/)samplesite\/\.a\/([^\"]*)/";
$str = "src=\"http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi\"";
$subst = 'newsite.com/wp-content/uploads/2014/07/$1.jpg';
$result = preg_replace($re, $subst, $str);
Output:
src="http://newsite.com/wp-content/uploads/2014/07/6a015433877b2b970c01a3fd22309b970b-800wi.jpg"
Pattern Description:
(?<= look behind to see if there is:
src="http: 'src="http:'
\/ '/'
\/ '/'
) end of look-behind
samplesite 'samplesite'
\/ '/'
\. '.'
a 'a'
\/ '/'
( group and capture to \1:
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
) end of \1
You can try it without using Positive Lookbehind as well
(src="http:\/\/)samplesite\/\.a\/([^"]*)
Online demo
Sample code:
$re = "/(src=\"http:\/\/)samplesite\/\.a\/([^\"]*)/";
$str = "src=\"http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi\"";
$subst = '$1newsite.com/wp-content/uploads/2014/07/$2.jpg';
$result = preg_replace($re, $subst, $str);

You can use this:
$replaced = preg_replace('~src="http://samplesite/\.a/([^"]+)"~',
'src="http://newsite.com/wp-content/uploads/2014/07/\1.jpg"',
$yourstring);
Explanation
([^"]+) matches any characters that are not a " to Group 1
\1 inserts Group 1 in the replacement.

CSS or PHP add space in number format

How can I add a space after 3 and 4 digits ?
I have this numbers: +4420719480
The result needs to be: +44 2071 9480
How can I add the spaces with css or php after 4 characters?
I have tried the following code:
$str = "+4420719480";
echo chunk_split($str, 4, ' ');
But how do I add the space to the first 3 characters and then to the 4th?

You can use preg_replace
$str = '+4420719480';
echo preg_replace('~^.{3}|.{4}(?!$)~', '$0 ', $str);
pattern explanation:
~ # pattern delimiter
^.{3} # any character 3 times at the start of the string
| # OR
.{4} # any character 4 times
(?!$) # not followed by the end of the string
~ # pattern delimiter
replacement: '$0 ' (the whole pattern and a space)

Sometimes the most mundane solution will do the job just fine.
$str = "+4420719480";
$new = substr($str,0,3).' '.substr($str,3,4).' '.substr($str,7);

Using your code, you can do:
$str = "+4420719480";
echo strrev(chunk_split(strrev($str),4," "));
Kinda clunky and only works for this size of $str, but it works!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex in PHP: Replacing text between strings - php

Related

How to not perform preg_replace if subject starts with quote

Matching all of a certain character after a Positive Lookbehind

PHP Replace consecutive occurrence of characters in sentence

Change URL and append file extension with REGEX

CSS or PHP add space in number format

Categories

Resources