Change URL and append file extension with REGEX

Change URL and append file extension with REGEX - php

I've been reading up on RegEx docs but I must say I'm still a bit out of my element so I apologize for not posting what I have tried because it was all just plain wrong.
Heres the issue:
I've got images using the following source:
src="http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi"
I need to get to this:
src="http://newsite.com/wp-content/uploads/2014/07/6a015433877b2b970c01a3fd22309b970b-800wi.jpg"
Essentially removing the /.a/ from the URL and appending a .jpg to the end of the image file name. If it helps in a solution I'm using this plug-in: http://urbangiraffe.com/plugins/search-regex/
Thanks All.

This might help you.
(?<=src="http:\/\/)samplesite\/\.a\/([^"]*)
Online demo
Sample code:
$re = "/(?<=src=\"http:\/\/)samplesite\/\.a\/([^\"]*)/";
$str = "src=\"http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi\"";
$subst = 'newsite.com/wp-content/uploads/2014/07/$1.jpg';
$result = preg_replace($re, $subst, $str);
Output:
src="http://newsite.com/wp-content/uploads/2014/07/6a015433877b2b970c01a3fd22309b970b-800wi.jpg"
Pattern Description:
(?<= look behind to see if there is:
src="http: 'src="http:'
\/ '/'
\/ '/'
) end of look-behind
samplesite 'samplesite'
\/ '/'
\. '.'
a 'a'
\/ '/'
( group and capture to \1:
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
) end of \1
You can try it without using Positive Lookbehind as well
(src="http:\/\/)samplesite\/\.a\/([^"]*)
Online demo
Sample code:
$re = "/(src=\"http:\/\/)samplesite\/\.a\/([^\"]*)/";
$str = "src=\"http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi\"";
$subst = '$1newsite.com/wp-content/uploads/2014/07/$2.jpg';
$result = preg_replace($re, $subst, $str);

You can use this:
$replaced = preg_replace('~src="http://samplesite/\.a/([^"]+)"~',
'src="http://newsite.com/wp-content/uploads/2014/07/\1.jpg"',
$yourstring);
Explanation
([^"]+) matches any characters that are not a " to Group 1
\1 inserts Group 1 in the replacement.

Related

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo

Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);

Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

How can I remove a specific format from string with RegEx?

I have a list of string like this
$16,500,000(#$2,500)
$34,000(#$11.00)
$214,000(#$18.00)
$12,684,000(#$3,800)
How can I extract all symbols and the (#$xxxx) from these strings so that they can be like
16500000
34000
214000
12684000

\(.*?\)|\$|,
Try this.Replace by empty string.See demo.
https://regex101.com/r/vD5iH9/42
$re = "/\\(.*?\\)|\\$|,/m";
$str = "\$16,500,000(#\$2,500)\n\$34,000(#\$11.00)\n\$214,000(#\$18.00)\n\$12,684,000(#\$3,800)";
$subst = "";
$result = preg_replace($re, $subst, $str);

To remove the end (#$xxxx) characters, you could use the regex:
\(\#\$.+\)
and replace it with nothing:
preg_replace("/\(\#\$.+\)/g"), "", $myStringToReplaceWith)
Make sure to use the g (global) modifier so the regex doesn't stop after it finds the first match.
Here's a breakdown of that regex:
\( matches the ( character literally
\# matches the # character literally
\$ matches the $ character literally
.+ matches any character 1 or more times
\) matches the ) character literally
Here's a live example on regex101.com
In order to remove all of these characters:
$ , ( ) # .
From a string, you could use the regex:
\$|\,|\(|\)|#|\.
Which will match all of the characters above.
The | character above is the regex or operator, effectively making it so
$ OR , OR ( OR ) OR # OR . will be matched.
Next, you could replace it with nothing using preg_replace, and with the g (global) modifier, which makes it so the regex doesn't return on the first match:
preg_replace("/\$|\,|\(|\)|#|\./g"), "", $myStringToReplaceWith)
Here's a live example on regex101.com
So in the end, your code could look like this:
$str = preg_replace("/\(\#\$.+\)/g"), "", $str)
$str = preg_replace("/\$|\,|\(|\)|#|\./g"), "", $str)
Although it isn't in one regex, it does not use any look-ahead, or look-behind (both of which are not bad, by the way).

Regex in PHP: Replacing text between strings

Okay I have made some progress on a problem I am solving, but need some help with a small glitch.
I need to remove all characters from the filenames in the specific path images/prices/ BEFORE the first digit, except for where there is from_, in which case remove all characters from the filename BEFORE from_.
Examples:
BEFORE AFTER
images/prices/abcde40.gif > images/prices/40.gif
images/prices/UgfVe5559.gif > images/prices/5559.gif
images/prices/wedsxcdfrom_88457.gif > images/prices/from_88457.gif
What I've done:
$pattern = '%images/(.+?)/([^0-9]+?)(from_|)([0-9]+?)\.gif%';
$replace = 'images/\\1/\\3\\4.gif';
$string = "AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD";
$newstring = str_ireplace('from_','733694521548',$string);
while(preg_match($pattern,$newstring)){
$newstring=preg_replace($pattern,$replace,$newstring);
}
$newstring=str_ireplace('733694521548','from_',$newstring);
echo "Original:\n$string\n\nNew:\n$newstring";
My expected output is:
AAA images/prices/40.gif BBB images/prices/from_88457.gif CCC images/prices/5559.gif DDD"
But instead I am getting:
AAA images/prices/40.gif BBB images/from_88457.gif CCC images/5559.gif DDD
The prices/ part of the path is missing from the last two paths.
Note that the AAA, BBB etc. portions are just placeholders. In reality the paths are scattered all across a raw HTML file parsed into a string, so we cannot rely on any pattern in between occurrences of the text to be replaced.
Also, I know the method I am using of substituting from_ is hacky, but this is purely for a local file operation and not for a production server, so I am okay with it. However if there is a better way, I am all ears!
Thanks for any assistance.

You can use lookaround assertions:
preg_replace('~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i', '', $value);
Explanation:
(?<=/) # If preceded by a '/':
(?: # Begin group
([a-z]+) # Match alphabets from a-z, one or more times
(?=\d+\.gif) # If followed followed by digit(s) and '.gif'
| # OR
(\w+) # Match word characters, one or more times
(?=from_) # If followed by 'from_'
) # End group
Visualization:
Code:
$pattern = '~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i';
echo preg_replace($pattern, '', $string);
Demo

You can use this regex for replacement:
^(images/prices/)\D*?(from_)?(\d+\..+)$
And use this expression for replacement:
$1$2$3
RegEx Demo
Code:
$re = '~^(images/prices/)\D*?(from_)?(\d+\..+)$~m';
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$result = preg_replace($re, '$1$2$3', $str);

You can try with Lookaround as well. Just replace with blank string.
(?<=^images\/prices\/).*?(?=(from_)?\d+\.gif$)
regex101 demo
Sample code: (directly from above site)
$re = "/(?<=^images\\/prices\\/).*?(?=(from_)?\\d+\\.gif$)/m";
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$subst = '';
$result = preg_replace($re, $subst, $str);
If string is not multi-line then use \b as word boundary instead of ^ and $ to match start and end of the line/string.
(?<=\bimages\/prices\/).*?(?=(from_)?\d+\.gif\b)

$arr = array(
'images/prices/abcde40.gif',
'images/prices/UgfVe5559.gif',
'images/prices/wedsxcdfrom_88457.gif'
);
foreach($arr as $str){
echo preg_replace('#images/prices/.*?((from_|\d).*)#i','images/prices/$1',$str);
}
DEMO
EDIT:
$str = 'AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD';
echo preg_replace('#images/prices/.*?((from_|\d).*?\s|$)#i','images/prices/$1',$str), PHP_EOL;

Regex PHP - dont match specific string followed by numeric

Im looping over a large number of files in a directory, and want to extract all the numeric values in a filename where it starts lin64exe , for instance, lin64exe005458002.17 would match 005458002.17. I have this part sorted, but in the directory there are other files, such as part005458 and others. How can I make it so I only get the numeric (and . ) after lin64exe ?
This is what I have so far:
[^lin64exe][^OTHERTHINGSHERE$][0-9]+

Regex to match the number with decimal point which was just after to lin64exe is,
^lin64exe\K\d+\.\d+$
DEMO
<?php
$mystring = "lin64exe005458002.17";
$regex = '~^lin64exe\K\d+\.\d+$~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 005458002.17

You can try with look around as well
(?<=^lin64exe)\d+(\.\d+)?$
Here is demo
Pattern explanation:
(?<= look behind to see if there is:
^ the beginning of the string
lin64exe 'lin64exe'
) end of look-behind
\d+ digits (0-9) (1 or more times (most possible))
( group and capture to \1 (optional):
\. '.'
\d+ digits (0-9) (1 or more times (most possible))
)? end of \1
$ the end of the string
Note: use i for ignore case
sample code:
$re = "/(?<=^lin64exe)\\d+(\\.\\d+)?$/i";
$str = "lin64exe005458002.17\nlin64exe005458002\npart005458";
preg_match_all($re, $str, $matches);

You can use this regex and use captured group #1 for your number:
^lin64exe\D*([\d.]+)$
RegEx Demo
Code:
$re = '/^lin64exe\D*([\d.]+)$/i';
$str = "lin64exe005458002.17\npart005458";
if ( preg_match($re, $str, $m) )
var_dump ($m[1]);

PHP regex: Converting and manipulating text into specific HTML

I have the following text:
I'm a link - http://google.com
I need to convert this into the following HTML
I'm a link
How can I achieve this in PHP? I'm assuming this needs some sort of regex to search for the actual text and link then manipulate the text into the HTML but I wouldn't know where to start, any help would be greatly appreciated.

If its always like this, you don't really need regex here:
$input = "I'm a link - http://google.com";
list($text, $link) = explode(" - ", $input);
echo "<a href='". $link ."'>". $text ."</a>";

If a regex is needed, here's a fully function code:
<?php
$content = <<<EOT
test
http://google.com
test
EOT;
$content = preg_replace(
'/(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?/',
'<a href=\'$0\'>I\'m a link</a>',
$content
);
echo $content;
?>
Example here: http://phpfiddle.io/fiddle/1166866001
If it's always one line of text though, it would be better to go with 1nflktd solution.

Try with capturing groups and substitution:
^([^-]*) - (.*)$
DEMO
Sample code:
$re = "/^([^-]*) - (.*)$/i";
$str = "I'm a link - http://google.com";
$subst = '$1';
$result = preg_replace($re, $subst, $str);
Output:
I'm a link
Pattern Explanation:
^ the beginning of the string
( group and capture to \1:
[^-]* any character except: '-' (0 or more times)
) end of \1
- ' - '
( group and capture to \2:
.* any character except \n (0 or more times)
) end of \2
$ before an optional \n, and the end of the string

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Change URL and append file extension with REGEX - php

You can use this: $replaced = preg_replace('~src="http://samplesite/\.a/([^"]+)"~', 'src="http://newsite.com/wp-content/uploads/2014/07/\1.jpg"', $yourstring); Explanation ([^"]+) matches any characters that are not a " to Group 1 \1 inserts Group 1 in the replacement.

Related

Matching all of a certain character after a Positive Lookbehind

How can I remove a specific format from string with RegEx?

Regex in PHP: Replacing text between strings

Regex PHP - dont match specific string followed by numeric

PHP regex: Converting and manipulating text into specific HTML

Categories

Resources