Exclude link starts with a character from PREG_REPLACE - php

This codes convert any url to clickable link:
$str = preg_replace('/(http[s]?:\/\/[^\s]*)/i', '$1', $str);
How to make it not convert when url starts with [ character? Like this:
[http://google.com

Use a negative lookbehind:
$str = preg_replace('/(?<!\[)(http[s]?:\/\/[^\s]*)/i', '$1', $str);
^^^^^^^
Then, the http... substring that is preceded with [ won't be matched.
You may enhance the pattern as
preg_replace('/(?<!\[)https?:\/\/\S*/i', '$0', $str);
that is: remove the ( and ) (the capturing group) and replace the backreferences from $1 with $0 in the replacement pattern, and mind that [^\s] = \S, but shorter. Also, [s]? = s?.

Related

PHP Regular Expression Exclusion

Here is the sample PHP code:
<?php
$str = '10,000.1 $100,000.1';
$pattern = '/(?!\$)\d+(,\d{3})*\.?\d*/';
$replacement_str = 'Without$sign';
echo preg_replace($pattern, $replacement_str, $str);?>
Target is to replace numbers only (i.e. "$100,000.1" should not be replaced). But the above code replaces both 10,000.1 and $100,000.1. How to achieve the exclusion?
This assertion is always true (?!\$)\d+ as you match a digit which can not be a $
As the . and the digits at the end of the pattern are optional, it could also match ending on a dot like for example 0,000.
Instead you can assert a whitespace boundary to the left, and optionally match a dot followed by 1 or more digits:
(?<!\S)\d+(?:,\d{3})*(?:\.\d+)?\b
Regex demo
Example:
$str = '10,000.1 $100,000.1';
$pattern = '/(?<!\S)\d+(?:,\d{3})*(?:\.\d+)?\b/';
$replacement_str = 'Without$sign';
echo preg_replace($pattern, $replacement_str, $str);
Output (If you remove the numbers, the text "Without$sign" is not correct)
Without$sign $100,000.1

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.
If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo
Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);
Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

Regex to remove non-alphanumeric characters and all characters after dot?

I need a regular expression (php) to remove the forward slash, the dot and eveything after the dot in my string so that
$str = "ab/12c.3de";
becomes
$newstr = "ab12c";
You can use alternation in regex:
$str = "ab/12c.3de";
$newstr = preg_replace('~/|\..*~', '', $str);
//=> ab12c
Regex: /|\..*
/ matches literal /
| OR (alternation)
\..* matches a dot and everything after it
Replacement is just by empty string.

How can I remove a specific format from string with RegEx?

I have a list of string like this
$16,500,000(#$2,500)
$34,000(#$11.00)
$214,000(#$18.00)
$12,684,000(#$3,800)
How can I extract all symbols and the (#$xxxx) from these strings so that they can be like
16500000
34000
214000
12684000
\(.*?\)|\$|,
Try this.Replace by empty string.See demo.
https://regex101.com/r/vD5iH9/42
$re = "/\\(.*?\\)|\\$|,/m";
$str = "\$16,500,000(#\$2,500)\n\$34,000(#\$11.00)\n\$214,000(#\$18.00)\n\$12,684,000(#\$3,800)";
$subst = "";
$result = preg_replace($re, $subst, $str);
To remove the end (#$xxxx) characters, you could use the regex:
\(\#\$.+\)
and replace it with nothing:
preg_replace("/\(\#\$.+\)/g"), "", $myStringToReplaceWith)
Make sure to use the g (global) modifier so the regex doesn't stop after it finds the first match.
Here's a breakdown of that regex:
\( matches the ( character literally
\# matches the # character literally
\$ matches the $ character literally
.+ matches any character 1 or more times
\) matches the ) character literally
Here's a live example on regex101.com
In order to remove all of these characters:
$ , ( ) # .
From a string, you could use the regex:
\$|\,|\(|\)|#|\.
Which will match all of the characters above.
The | character above is the regex or operator, effectively making it so
$ OR , OR ( OR ) OR # OR . will be matched.
Next, you could replace it with nothing using preg_replace, and with the g (global) modifier, which makes it so the regex doesn't return on the first match:
preg_replace("/\$|\,|\(|\)|#|\./g"), "", $myStringToReplaceWith)
Here's a live example on regex101.com
So in the end, your code could look like this:
$str = preg_replace("/\(\#\$.+\)/g"), "", $str)
$str = preg_replace("/\$|\,|\(|\)|#|\./g"), "", $str)
Although it isn't in one regex, it does not use any look-ahead, or look-behind (both of which are not bad, by the way).

preg_replace() seems to remove entire word instead of part of it

I'm trying to match a certain word and replace part of the word with certain text but leave the rest of the word intact. It is my understanding that adding parentheses to part of the regex pattern means that the pattern match within the parentheses gets replaced when you use preg_replace()
for testing purposes I used:
$text = 'batman';
echo $new_text = preg_replace('#(bat)man#', 'aqua', $text);
I only want 'bat' to be replaced by 'aqua' to get 'aquaman'. Instead, $new_text echoes 'aqua', leaving out the 'man' part.
preg_replace replaces all the string matched by regular expression
$text = 'batman';
echo $new_text = preg_replace('#bat(man)#', 'aqua\\1', $text);
Capture man instead and append it to your aqua prefix
Another way of doing that is to use assertions:
$text = 'batman';
echo $new_text = preg_replace('#bat(?=man)#', 'aqua', $text);
I would not use preg_* functions for this and just do str_replace() DOCs:
echo str_replace('batman', 'aquaman', $text);
This is simpler as a regex is not really needed in this case. Otherwise it would be with a regular expression:
echo $new_text = preg_replace('#bat(man)#', 'aqua\\1', $text);
This will substitute your man in after aqua when replacing the entire search phrase. preg_replace DOCs replaces the entire matching portion of the pattern.
The way you're trying to do it, it would be more like:
preg_replace('#bat(man)#', 'aqua$1', $text);
I'd using positive lookahead:
preg_replace('/bat(?=man)/', 'aqua', $text)
Demo here: http://ideone.com/G9F4q
The brackets are creating a capturing group, that means you can access the part matched by this group using \1.
you can do either what zerkms suggested or use a lookahead that does just check but not match.
$text = 'batman';
echo $new_text = preg_replace('#bat(?=man)#', 'aqua', $text);
This will match "bat" but only if it is followed by "man", and only "bat" is replaced.

Categories