Split and catch text by a variable delimiter - php

I have a text which include delimiter tags in the following format:
<\!--[od]+-\d+--\>
Example:
<!--od-14-->
<!--od-1--\>
<!--od-65--\>
I need a regex which will split the text and catch the \d+ numeric argument in the split, also the text after it.
Here's a regex i come up, the problem is it does not return multiple lines.
https://regex101.com/r/xvw8Xw/2

One option is to make the dot match a newline using for example an inline modifier (?s). Then use a non greedy match with a positive lookahead to assert the next comment or the end of the string:
(?s)<\!--[od]+-(\d+)-->(.*?)(?=<!--|$)
(?s) Inline modifier, make the dot match a newline
<\!-- match <!--
[od]+-(\d+)--> Match 1+ times either o or d (which might just be od)
(.*?) Match any char 0+ times except a newline non greedy
(?=<!--|$) Positive lookahead, assert what is on the right is <!-- or the end of the string
Regex demo | Php demo
For example using /s in the pattern:
$re = '/<\!--[od]+-(\d+)-->(.*?)(?=<!--|$)/s';
$str = '<!--od-1--> cdskc sdkjc
dsd
sk<!--od-2-->cscdscsdcsd
cdscs
csdcsdc
<!--od-432-->cdcdscsd';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);

This expression might also work here on m mode:
<!--od-(\d+)--\>([\s\S]*?)(?=<|$)
or this one on s mode:
<!--od-(\d+)--\>(.*?)(?=<|$)
Demo
Test
$re = '/<!--od-(\d+)--\>(.*?)(?=<|$)/s';
$str = '<!--od-1--> cdskc sdkjc
dsd
sk<!--od-2-->cscdscsdcsd
cdscs
csdcsdc
<!--od-432-->cdcdscsd';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);

Related

Find a pattern in a string

I am trying to detect a string inside the following pattern: [url('example')] in order to replace the value.
I thought of using a regex to get the strings inside the squared brackets and then another to get the text inside the parenthesis but I am not sure if that's the best way to do it.
//detect all strings inside brackets
preg_match_all("/\[([^\]]*)\]/", $text, $matches);
//loop though results to get the string inside the parenthesis
preg_match('#\((.*?)\)#', $match, $matches);
To match the string between the parenthesis, you might use a single pattern to get a match only:
\[url\(\K[^()]+(?=\)])
The pattern matches:
\[url\( Match [url(
\K Clear the current match buffer
[^()]+ Match 1+ chars other than ( and )
(?=\)]) Positive lookahead, assert )] to the right
See a regex demo.
For example
$re = "/\[url\(\K[^()]+(?=\)])/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[0]);;
}
Output
string(9) "'example'"
Another option could be using a capture group. You can place the ' inside or outside the group to capture the value:
\[url\(([^()]+)\)]
See another regex demo.
For example
$re = "/\[url\(([^()]+)\)]/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[1]);;
}
Output
string(9) "'example'"

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?
To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.
If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo
Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);
Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

php regexp how to capture substring to variable

In php if I capture a string
$string = 'gardens, countryside #teddy135'
how do I capture #username from that string to a new variable in php username begins with # preceded by a space and terminating in a space or the end of the string?
so I would end up with
$string = 'gardens, countryside'
$username ='#teddy135'
Use following regex
\s#(\w+)\b
Regex101 Demo
\s: Matches one space
#: Matches # literally
(\w+): Matches one or more alphanumeric characters including _ and put it in first capturing group
\b: Word boundary
Code:
$re = "/\\s#(\\w+)\\b/";
$str = "gardens, countryside #teddy135 #tushar and #abc";
preg_match_all($re, $str, $matches);
$regex = "/\s(#\S+)/";
$mystr = "gardens, countryside #teddy135 #xyz-12 and #abc.abc";
preg_match_all($regex, $mystr, $matches);
print_r($matches);

Get all strings matching pattern in text

I'm trying to get from text all strings which are between t(" and ") or t(' and ').
I came up with regexp /[^t\(("|\')]*(?=("|\')\))/, but it is not ignoring character 't' when it is not before to '('.
For example:
$str = 'This is a text, t("string1"), t(\'string2\')';
preg_match_all('/[^t\(("|\')]*(?=("|\')\))/', $str, $m);
var_dump($m);
returns ring1 and ring2, but I need to get string1 and string2.
You can consider this also.
You need to use separate regex for each.
(?<=t\(").*?(?="\))|(?<=t\(\').*?(?='\))
DEMO
Code:
$re = "/(?<=t\\(\").*?(?=\"\\))|(?<=t\\(\\').*?(?='\\))/m";
$str = "This is a text, t(\"string1\"), t('string2')";
preg_match_all($re, $str, $matches);
OR
Use capturing group along with \K
t\((['"])\K.*?(?=\1\))
DEMO
\K discards the previously matched characters from printing at the final.
You can do it in few steps with this pattern:
$pattern = '~t\((?|"([^"\\\]*+(?s:\\\.[^"\\\]*)*+)"\)|\'([^\'\\\]*+(?s:\\\.[^\'\\\]*)*+)\'\))~';
if (preg_match_all($pattern, $str, $matches))
print_r($matches[1]);
It is a little long and repetitive, but it is fast and can deal with escaped quotes.
details:
t\(
(?| # Branch reset feature allows captures to have the same number
"
( # capture group 1
[^"\\]*+ # all that is not a double quote or a backslash
(?s: # non capturing group in singleline mode
\\. # an escaped character
[^"\\]* # all that is not a double quote or a backslash
)*+
)
"\)
| # OR the same with single quotes (and always in capture group 1)
'([^'\\]*+(?s:\\.[^'\\]*)*+)'\)
)
demo

Categories