Regular expression for between two dynamic patterns - php

I want to find anything that matches
[^1] and [/^1]
Eg if the subject is like this
sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]
I want to get back an array with abcdef and 12345 as the elements.
I read this
And I wrote this code and I am unable to advance past searching between []
<?php
$test = '[12345]';
getnumberfromstring($test);
function getnumberfromstring($text)
{
$pattern= '~(?<=\[)(.*?)(?=\])~';
$matches= array();
preg_match($pattern, $text, $matches);
var_dump($matches);
}
?>

Your test checks the string '[12345]' which does not apply for the rule of having an "opening" of [^digit] and a "closing" of [\^digit]. Also, you're using preg_match when you should be using: preg_match_all
Try this:
<?php
$test = 'sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]';
getnumberfromstring($test);
function getnumberfromstring($text)
{
$pattern= '/(?<=\[\^\d\])(.*?)(?=\[\/\^\d\])/';
$matches= array();
preg_match_all($pattern, $text, $matches);
var_dump($matches);
}
?>

That other answer doesn't really apply to your case; your delimiters are more complex and you have to use part of the opening delimiter to match the closing one. Also, unless the numbers inside the tags are limited to one digit, you can't use a lookbehind to match the first one. You have to match the tags in the normal way and use a capturing group to extract the content. (Which is how I would have done it anyway. Lookbehind should never be the first tool you reach for.)
'~\[\^(\d+)\](.*?)\[/\^\1\]~'
The number from the opening delimiter is captured in the first group and the backreference \1 matches the same number, thus insuring that the delimiters are correctly paired. The text between the delimiters is captured in group #2.

I have tested following code in php 5.4.5:
<?php
$foo = 'sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]';
function getnumberfromstring($text)
{
$matches= array();
# match [^1]...[/^1], [^2]...[/^2]
preg_match_all('/\[\^(\d+)\]([^\[\]]+)\[\/\^\1\]/', $text, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); ++$i)
printf("%s\n", $matches[$i][2]);
}
getnumberfromstring($foo);
?>
output:
abcdef
123456

Related

Find a pattern in a string

I am trying to detect a string inside the following pattern: [url('example')] in order to replace the value.
I thought of using a regex to get the strings inside the squared brackets and then another to get the text inside the parenthesis but I am not sure if that's the best way to do it.
//detect all strings inside brackets
preg_match_all("/\[([^\]]*)\]/", $text, $matches);
//loop though results to get the string inside the parenthesis
preg_match('#\((.*?)\)#', $match, $matches);
To match the string between the parenthesis, you might use a single pattern to get a match only:
\[url\(\K[^()]+(?=\)])
The pattern matches:
\[url\( Match [url(
\K Clear the current match buffer
[^()]+ Match 1+ chars other than ( and )
(?=\)]) Positive lookahead, assert )] to the right
See a regex demo.
For example
$re = "/\[url\(\K[^()]+(?=\)])/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[0]);;
}
Output
string(9) "'example'"
Another option could be using a capture group. You can place the ' inside or outside the group to capture the value:
\[url\(([^()]+)\)]
See another regex demo.
For example
$re = "/\[url\(([^()]+)\)]/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[1]);;
}
Output
string(9) "'example'"

Extract shortcode from Instagram URL

I try to extract the shortcode from Instagram URL
Here what i have already tried but i don't know how to extract when they are an username in the middle. Thank you a lot for your answer.
Instagram pattern : /p/shortcode/
https://regex101.com/r/nO4vdd/1/
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/
expected : BxKRx5CHn5i
I took you original query and added a .* bafore the \/p\/
This gave a query of
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com.*\/p\/)([\d\w\-_]+)(?:\/)?(\?.*)?$
This would be simpler assuming the username always follows the /p/
^(?:.*\/p\/)([\d\w\-_]+)
You could prepend an optional (?:\/\w+)? non capturing group.
Note that \w also matches _ and \d so the capturing group could be updated to ([\w-]+) and the forward slash in the non capturing group might also be written as just /
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com(?:\/\w+)?\/p\/)([\w-]+)(?:\/)?(\?.*)?$
Regex demo
You don't have to escape the backslashes if you use a different delimiter than /. Your pattern might look like:
^(?:https?://)?(?:www\.)?(?:instagram\.com(?:/\w+)?/p/)([\w-]+)/?(\?.*)?$
This expression might also work:
^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$
Test
$re = '/^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$/m';
$str = 'https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $match) {
var_export($match[1]);
}
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Assuming that you aren't simply trusting /p/ as the marker before the substring, you can use this pattern which will consume one or more of the directories before your desired substring.
Notice that \K restarts the fullstring match, and effectively removes the need to use a capture group -- this means a smaller output array and a shorter pattern.
Choosing a pattern delimiter like ~ which doesn't occur inside your pattern alleviates the need to escape the forward slashes. This again makes your pattern more brief and easier to read.
If you do want to rely on the /p/ substring, then just add p/ before my \K.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
echo preg_match('~(?:https?://)?(?:www\.)?instagram\.com(?:/[^/]+)*/\K\w+~', $string , $m) ? $m[0] : '';
echo " (from $string)\n";
}
Output:
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BrODg5XHlE6 (from https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176)
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere)
If you are implicitly trusting the /p/ as the marker and you know that you are dealing with instagram links, then you can avoid regex and just cut out the 11-character-substring, 3-characters after the marker.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
$pos = strpos($string, '/p/');
if ($pos === false) {
continue;
}
echo substr($string, $pos + 3, 11);
echo " (from $string)\n";
}
(Same output as previous technique)

Find next word after colon in regex

I am getting a result as a return of a laravel console command like
Some text as: 'Nerad'
Now i tried
$regex = '/(?<=\bSome text as:\s)(?:[\w-]+)/is';
preg_match_all( $regex, $d, $matches );
but its returning empty.
my guess is something is wrong with single quotes, for this i need to change the regex..
Any guess?
Note that you get no match because the ' before Nerad is not matched, nor checked with the lookbehind.
If you need to check the context, but avoid including it into the match, in PHP regex, it can be done with a \K match reset operator:
$regex = '/\bSome text as:\s*'\K[\w-]+/i';
See the regex demo
The output array structure will be cleaner than when using a capturing group and you may check for unknown width context (lookbehind patterns are fixed width in PHP PCRE regex):
$re = '/\bSome text as:\s*\'\K[\w-]+/i';
$str = "Some text as: 'Nerad'";
if (preg_match($re, $str, $match)) {
echo $match[0];
} // => Nerad
See the PHP demo
Just come from the back and capture the word in a group. The Group 1, will have the required string.
/:\s*'(\w+)'$/

preg_match match all starting words

I am trying to get all matched patterns from a list of words;
$pattern = '/^(ab|abc|abcd|asdf)/';
preg_match_all($pattern, 'abcdefgh', $matches);
I want to get 'ab, abc and abcd'
But this return only 'ab'. It works if I loop through patterns after exploding them.
Is there any way to solve it though single match?
Regular expressions consume characters as they are matching through the string, so they can't natively find overlapping matches.
You can use extended features like lookahead assertions together with capturings groups, but that requires an ugly construction:
preg_match_all(
'/^
(?:(?=(ab)))?
(?:(?=(abc)))?
(?:(?=(abcd)))?
(?:(?=(asdf)))?
/x',
$subject, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++) {
# Matched text = $result[$matchi][$backrefi];
}
}

PHP - regular expression (preg_match)

<?php
$string = "http://example.com/file/D1 http://example.com/file/D2
http://example.com/file/D3";
preg_match_all('/(https?\:\/\/)?(www\.)?example\.com\/file\/(\w+)/i', $string, $matches);
foreach($matches[3] as $value)
{
print $value;
}
?>
I want to preg match the third link and get "D3".
I dont want that it matches with the other two links. This is why it should check if the link has a whitespace at the beginning or the end.
I know that to match with whitespace the expression is \s. I tried but somehow I don't get it. :(
You can add the $ to match the end of the string like this, and it will only return the last one.
preg_match_all('/(https?\:\/\/)?(www\.)?example\.com\/file\/(\w+)$/i', $string, $matches);

Categories