How to not perform preg_replace if subject starts with quote - php

I'm trying to convert plain links to HTML links using preg_replace. However it's replacing links that are already converted.
To combat this I'd like it to ignore the replacement if the link starts with a quote.
I think a positive lookahead may be needed but everything I've tried hasn't worked.
$string = 'test http://www.example.com';
$string = preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $string);
var_dump($string);
The above outputs:
http://www.example.com">test</a> http://www.example.com
When it should output:
test http://www.example.com

You might get along with lookarounds.
Lookarounds are zero-width assertions that make sure to match/not to match anything immediately around the string in question. They do not consume any characters.
That being said, a negative lookbehind might be what you need in your situation:
(?<![">])\bhttps?://\S+\b
In PHP this would be:
<?php
$string = 'I want to be transformed to a proper link: http://www.google.com ';
$string .= 'But please leave me alone ';
$string .= '(https://www.google.com).';
$regex = '~ # delimiter
(?<![">]) # a neg. lookbehind
https?://\S+ # http:// or https:// followed by not a whitespace
\b # a word boundary
~x'; # verbose to enable this explanation.
$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>
See a demo on ideone.com. However, maybe a parser is more appropriate.

Since you can use Arrays in preg_replace, this might be convenient to use depending on what you want to achieve:
<?php
$string = 'test http://www.example.com';
$rx = array("&(<a.+https?:\/\/[\w]+[^ \,\"\n\r\t<]*>)(.*)(<\/a\>)&si", "&(\s){1,}(https?:\/\/[\w]+[^ \,\"\n\r\t<]*)&");
$rp = array("$1$2$3", "$2");
$string = preg_replace($rx,$rp, $string);
var_dump($string);
// DUMPS:
// 'testhttp://www.example.com'

The Idea
You can split your string at the already existing anchors, and only parse the pieces in between.
The Code
$input = 'test http://www.example.com';
// Split the string at existing anchors
// PREG_SPLIT_DELIM_CAPTURE flag includes the delimiters in the results set
$parts = preg_split('/(<a.*?>.*?<\/a>)/is', $input, PREG_SPLIT_DELIM_CAPTURE);
// Use array_map to parse each piece, and then join all pieces together
$output = join(array_map(function ($key, $part) {
// Because we return the delimiter in the results set,
// every $part with an uneven key is an anchor.
return $key % 2
? preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $part)
: $part;
}, array_keys($parts), $parts);

Related

RegEx for adding a space in a special pattern

Quick note: I know markdown parsers don't care about this issue. It's for the sake of visual consistency in the md file and also experimentation.
Sample:
# this
##that
###or this other
Goal: read each line and,if a markdown header does not have a space after the pound/hashtag sign, add one so that it would look like:
# this
## that
### or this other
My non-regex attempt:
function inelegantFunction (string $string){
$array = explode('#',$string);
$num = count($array);
$text = end($array);
return str_repeat('#', $num-1)." ".$text;
}
echo inelegantFunction("###or this other");
// returns ### or this other
This works, but it has no mechanism to match the unlikely case of seven '#'.
Regardless of efficacy, I would like to figure out how to do this with regex in php (and perhaps javascript if that matters).
Try to match (?m)^#++\K\S which matches lines starting with one or more number signs then replace it with $0 in your function:
return preg_replace('~(?m)^#++\K\S~', ' $0', $string);
See live demo here
To limit the number of #s to six use:
(?m)^(?!#{7})#++\K\S
I'm guessing that a simple expression with a right char-list boundary might be working here, maybe:
(#)([a-z])
If we might be having more chars, we can simply add it to [a-z].
Demo
Test
$re = '/(#)([a-z])/m';
$str = '#this
##that
###that
### or this other';
$subst = '$1 $2';
$result = preg_replace($re, $subst, $str);
echo "The result of the substitution is ".$result;

Modify my regex so that pattern search should not be case sensitive

I have following php code that removes whole word that matches the pattern
$patterns = ["re", "get", "ER"];
$string = "You are definitely getting better today";
$alternations = implode('|', $patterns);
$re = '(?!(?<=\s)(?:'.$alternations.')(?=\s))\S*(?:'.$alternations.')\S*';
$string = preg_replace('#'.$re.'#', '', $string);
$string = preg_replace('#\h{2,}#', ' ', $string);
echo $string;
I want two modifications
The pattern search should not be case sensitive e.g. the pattern ER must remove better in $string
If removed word in $string have line breaks before or after it, only one line break should be removed.
If $string is
You are definitely getting
better
today
Output must be
You definitely
today
Sample PHP Code
Regards,
You may use
$patterns = ["re", "get", "ER"];
$string = "You are definitely getting\nbetter\ntoday";
$alternations = implode('|', $patterns);
$re = '\R?(?!(?<=\s)(?:'.$alternations.')(?=\s))\S*(?:'.$alternations.')\S*';
$string = preg_replace('#'.$re.'#i', '', $string);
$string = preg_replace('#\h{2,}#', ' ', $string);
echo $string;
See the PHP demo.
While the i modifier provides the case insensitivity to regex matching, another, less obvious thing here is that you need to add an optional line break pattern.
That line break can be matched in various ways, but in PHP PCRE, you may easily match it with \R construct.
Adding a ? quantifier after it, you may make it match 1 or 0 times, i.e. make it optional, so that the whole pattern could still match at the start of the string.

Operation on string in PHP. Remove part of string

How can i remove part of string from example:
##lang_eng_begin##test##lang_eng_end##
##lang_fr_begin##school##lang_fr_end##
##lang_esp_begin##test33##lang_esp_end##
I always want to pull middle of string: test, school, test33. from this string.
I Read about ltrim, substr and other but I had no good ideas how to do this. Becouse each of strings can have other length for example :
'eng', 'fr'
I just want have string from middle between ## and ##. to Maye someone can help me? I tried:
foreach ($article as $art) {
$title = $art->titl = str_replace("##lang_eng_begin##", "", $art->title);
$art->cleanTitle = str_replace("##lang_eng_end##", "", $title);
}
But there
##lang_eng_end##
can be changed to
##lang_ger_end##
in next row so i ahvent idea how to fix that
If your strings are always in this format, an explode way looks easy:
$str = "##lang_eng_begin##test##lang_eng_end## ";
$res = explode("##", $str)[2];
echo $res;
You may use a regex and extract the value in between the non-starting ## and next ##:
$re = "/(?!^)##(.*?)##/";
$str = "##lang_eng_begin##test##lang_eng_end## ";
preg_match($re, $str, $match);
print_r($match[1]);
See the PHP demo. Here, the regex matches a ## that is not at the string start ((?!^)##), then captures into Group 1 any 0+ chars other than newline as few as possible ((.*?)) up to the first ## substring.
Or, replace all ##...## substrings with `preg_replace:
$re = "/##.*?##/";
$str = "##lang_eng_begin##test##lang_eng_end## ";
echo preg_replace($re, "", $str);
See another demo. Here, we just remove all non-overlapping substrings beginning with ##, then having any 0+ chars other than a newline up to the first ##.

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.
If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo
Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);
Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

Regular Expression to replace a bracket with word in front when not preceded by word

I am facing problem with a regular expression.
I have a string like ('A'&'B')
Now I want to convert it to CONCAT('A'&'B') which is simple and I have done using
str_replace("(", "CONCAT(", $subject)
But I want to replace "(" to "CONCAT(" if the string doesn't have prior string "extract_json_value".
So I don't want to replace extract_json_value('A'&'B') to extract_json_valueCONCAT('A'&'B') but it will stay as it is extract_json_value('A'&'B').
You can expand your regex with a negative lookbehind:
(?<!extract_json_value)\(
Here is a regex demo!
You could use strpos to do this.
if (strpos($subject, '(') === 0) {
$subject = str_replace('(', 'CONCAT(', $subject);
}
If your string contains other text you can use preg_replace() and use a word boundary \B for this.
$subject = preg_replace('/\B\(/', 'CONCAT(', $subject);
You can use negative lookbehind in order to match a group not preceded by a string.
First, let's have a regexp matching all strings but those containing "extract_json_value":
(?<!extract_json_value).*
Now, let's use preg_replace
$string = "extract_json_value('A'&'B')";
$pattern = '/^(?<!extract_json_value)(\(.+\))$/';
$replacement = 'CONCAT\1';
echo preg_replace($pattern, $replacement, $string);
// prints out "extract_json_value('A'&'B')"
It works too with
$string = "('A'&'B')";
...
// prints out "CONCAT('A'&'B')"
However, it does not work with
$string = "hello('A'&'B')";
...
// prints out "helloCONCAT('A'&'B')"
So, continue with a preg_replace_callback:
http://php.net/manual/fr/function.preg-replace-callback.php

Categories