Matching all of a certain character after a Positive Lookbehind - php

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo

Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);

Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

Related

PHP REGEX find & replace patterns

Trying to construct a regex that will locate a pattern of ANY character followed by double quotes
This regex locates each occurrence properly
(\S"")
Given the example below
$string='"WEINSTEIN","ANTONIA \"TOBY"","STILES","HOOPER \"PETER"","HENDERSON",';
$pattern = '(\S"")';
$replacement = '\\""';
$result=preg_replace($pattern, $replacement, $string);
My result turns out to be
"WEINSTEIN","ANTONIA \"TOB\"","STILES","HOOPER \"PETE\"","HENDERSON"
But I am seeking
"WEINSTEIN","ANTONIA \"TOBY\"","STILES","HOOPER \"PETER\"","HENDERSON"
I understand the replacement is removing/replacing the whole match, but how can I remove all but the first letter rather than completely replacing it?
You can change your pattern to use a positive lookbehind instead so that it doesn't capture the non-space character:
$string='"WEINSTEIN","ANTONIA \"TOBY"","STILES","HOOPER \"PETER"","HENDERSON",';
$pattern = '/(?<=\S)""/';
$replacement = '\\""';
$result=preg_replace($pattern, $replacement, $string);
echo $result;
Output
"WEINSTEIN","ANTONIA \"TOBY\"","STILES","HOOPER \"PETER\"","HENDERSON",
Demo on 3v4l.org

Exclude link starts with a character from PREG_REPLACE

This codes convert any url to clickable link:
$str = preg_replace('/(http[s]?:\/\/[^\s]*)/i', '$1', $str);
How to make it not convert when url starts with [ character? Like this:
[http://google.com
Use a negative lookbehind:
$str = preg_replace('/(?<!\[)(http[s]?:\/\/[^\s]*)/i', '$1', $str);
^^^^^^^
Then, the http... substring that is preceded with [ won't be matched.
You may enhance the pattern as
preg_replace('/(?<!\[)https?:\/\/\S*/i', '$0', $str);
that is: remove the ( and ) (the capturing group) and replace the backreferences from $1 with $0 in the replacement pattern, and mind that [^\s] = \S, but shorter. Also, [s]? = s?.

PHP Regex matches beween Slash and Subtract

Hello I need a regex to get a string "trkfixo" from
SIP/trkfixo-000072b6
I was trying to use explode but I prefer a regex solution.
$ex = explode("/",$sip);
$ex2 = explode("-",$ex[1]);
echo $ex2[0];
You may use '~/([^-]+)~':
$re = '~/([^-]+)~';
$str = "SIP/trkfixo-000072b6";
preg_match($re, $str, $match);
echo $match[1]; // => trkfixo
See the regex demo and a PHP demo
Pattern details:
/ - matches a /
([^-]+) - Group 1 capturing 1 or more (+) symbols other than - (due to the fact that [^-] is a negated character class that matches any symbols other than all symbols and ranges inside this class).
$match = preg_match('/\/[a-zA-Z]-/', "SIP/trkfixo-000072b6");

PHP exploding url from text, possible?

i need to explode youtube url from this line:
[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]
It is possible? I need to delete [embed] & [/embed].
preg_match is what you need.
<?php
$str = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
preg_match("/\[embed\](.*)\[\/embed\]/", $str, $matches);
echo $matches[1]; //https://www.youtube.com/watch?v=L3HQMbQAWRc
$string = '[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]';
$string = str_replace(['[embed]', '[/embed]'], '', $string);
See str_replace
why not use str_replace? :) Quick & Easy
http://php.net/manual/de/function.str-replace.php
Just for good measure, you can also use positive lookbehind's and lookahead's in your regular expressions:
(?<=\[embed\])(.*)(?=\[\/embed\])
You'd use it like this:
$string = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
$pattern = '/(?<=\[embed\])(.*)(?=\[\/embed\])/';
preg_match($pattern, $string, $matches);
echo $match[1];
Here is an explanation of the regex:
(?<=\[embed\]) is a Positive Lookbehind - matches something that follows something else.
(.*) is a Capturing Group - . matches any character (except a newline) with the Quantifier: * which provides matches between zero and unlimited times, as many times as possible. This is what is matched between the groups prior to and after. This are the droids you're looking for.
(?=\[\/embed\]) is a Positive Lookahead - matches things that come before it.

Regex to match words starting with hyphen

I have a regex which does all matches except one match.The PHP Code for the word match is:
$string = preg_replace("/\b".$wordToMatch."\b/","<span class='sp_err' style='background-color:yellow;'>".$wordToMatch."</span>",$string);
Here in the above regex when the $wordToMatch variable value becomes "-abc" and the $string value is "The word -abc should match and abc-abc should not match".With above regex it fails to catch "-abc".
I want enhancement in the above regex so that it can catch "-abc" in $string,but if it tries to match "-abc" in "abc-abc" of $string it should not.
In case your keywords can have non-word characters on both ends you can rely on lookarounds for a whole word match:
"/(?<!\\w)".$wordToMatch."(?!\\w)/"
Here, (?<!\w) will make sure there is no word character before the word to match, and (?!\w) negative lookahead will make sure there is no word character after the word to match. These are unambiguous subpatterns, while \b meaning depends on the context.
See regex demo showing that -abc is not matched in abc-abc and matches if it is not enclosed with word characters.
PHP demo:
$wordToMatch = "-abc";
$re = "/(?<!\\w)" . $wordToMatch . "(?!\\w)/";
$str = "abc-abc -abc";
$subst = "!$0!";
$result = preg_replace($re, $subst, $str);
echo $result; // => abc-abc !-abc!

Categories