I try to extract the shortcode from Instagram URL
Here what i have already tried but i don't know how to extract when they are an username in the middle. Thank you a lot for your answer.
Instagram pattern : /p/shortcode/
https://regex101.com/r/nO4vdd/1/
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/
expected : BxKRx5CHn5i
I took you original query and added a .* bafore the \/p\/
This gave a query of
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com.*\/p\/)([\d\w\-_]+)(?:\/)?(\?.*)?$
This would be simpler assuming the username always follows the /p/
^(?:.*\/p\/)([\d\w\-_]+)
You could prepend an optional (?:\/\w+)? non capturing group.
Note that \w also matches _ and \d so the capturing group could be updated to ([\w-]+) and the forward slash in the non capturing group might also be written as just /
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com(?:\/\w+)?\/p\/)([\w-]+)(?:\/)?(\?.*)?$
Regex demo
You don't have to escape the backslashes if you use a different delimiter than /. Your pattern might look like:
^(?:https?://)?(?:www\.)?(?:instagram\.com(?:/\w+)?/p/)([\w-]+)/?(\?.*)?$
This expression might also work:
^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$
Test
$re = '/^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$/m';
$str = 'https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $match) {
var_export($match[1]);
}
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Assuming that you aren't simply trusting /p/ as the marker before the substring, you can use this pattern which will consume one or more of the directories before your desired substring.
Notice that \K restarts the fullstring match, and effectively removes the need to use a capture group -- this means a smaller output array and a shorter pattern.
Choosing a pattern delimiter like ~ which doesn't occur inside your pattern alleviates the need to escape the forward slashes. This again makes your pattern more brief and easier to read.
If you do want to rely on the /p/ substring, then just add p/ before my \K.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
echo preg_match('~(?:https?://)?(?:www\.)?instagram\.com(?:/[^/]+)*/\K\w+~', $string , $m) ? $m[0] : '';
echo " (from $string)\n";
}
Output:
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BrODg5XHlE6 (from https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176)
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere)
If you are implicitly trusting the /p/ as the marker and you know that you are dealing with instagram links, then you can avoid regex and just cut out the 11-character-substring, 3-characters after the marker.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
$pos = strpos($string, '/p/');
if ($pos === false) {
continue;
}
echo substr($string, $pos + 3, 11);
echo " (from $string)\n";
}
(Same output as previous technique)
Related
I'm trying to decode the content-disposition header (from curl) to get the filename using the following regular expression:
<?php
$str = 'attachment;filename="unnamed.jpg";filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])([^"\']+)\1/m', $str, $matches);
print_r($matches);
So while it matches if the filename is in single or double quotes, it fails if there are no quotes around the filename (which can happen)
$str = 'attachment;filename=unnamed.jpg;filename*=unnamed.jpg';
Right now I'm using two regular expressions (with if-else) but I just wanted to learn if it is possible to do in a single regex? Just for my own learning to master regex.
I will use the branch reset feature (?|...|...|...) that gives a more readable pattern and avoids to create a capture group for the quotes. In a branch-reset group, each capture groups have the same numbers for each alternative:
if ( preg_match('~filename=(?|"([^"]*)"|\'([^\']*)\'|([^;]*))~', $str, $match) )
echo $match[1], PHP_EOL;
Whatever the alternative that succeeds, the capture is always in group 1.
Just to put my two cents in - you could use a conditional regex:
filename=(['"])?(?(1)(.+?)\1|([^;]+))
Broken down, this says:
filename= # match filename=
(['"])? # capture " or ' into group 1, optional
(?(1) # if group 1 was set ...
(.+?)\1 # ... then match up to \1
| # else
([^;]+) # not a semicolon
)
Afterwards, you need to check if group 2 or 3 was present.
Alternatively, go for #Casimir's answer using the (often overlooked) branch reset.
See a demo on regex101.com.
One approach is to use an alternation in a single regex to match either a single/double quoted filename, or a filename which is completely unquoted. Note that one side effect of this approach is that we introduce more capture groups into the regex. So we need a bit of extra logic to handle this.
<?php
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
$result = preg_match('/^.*?filename=(?:(?:(["\'])([^"\']+)\1)|([^"\';]+))/m',
$str, $matches);
print_r($matches);
$index = count($matches) == 3 ? 2 : 3;
if ($result) {
echo $matches[$index];
}
else {
echo "filename not found";
}
?>
Demo
You could make your capturing group optional (["\'])? and \1? like:
and add a semicolon or end of the string to the end of the regex in a non capturing group which checks if there is a ; or the end of the line (?:;|$)
^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)/m', $str, $matches);
print_r($matches);
Output php
You can also use \K to reset the starting point of the reported match and then match until you encounter a double quote or a semicolon [^";]+. This will only return the filename.
^.*?filename="?\K[^";]+
foreach ($strings as $string) {
preg_match('/^.*?filename="?\K[^";]+/m', $string, $matches);
print_r($matches);
}
Output php
I am getting a result as a return of a laravel console command like
Some text as: 'Nerad'
Now i tried
$regex = '/(?<=\bSome text as:\s)(?:[\w-]+)/is';
preg_match_all( $regex, $d, $matches );
but its returning empty.
my guess is something is wrong with single quotes, for this i need to change the regex..
Any guess?
Note that you get no match because the ' before Nerad is not matched, nor checked with the lookbehind.
If you need to check the context, but avoid including it into the match, in PHP regex, it can be done with a \K match reset operator:
$regex = '/\bSome text as:\s*'\K[\w-]+/i';
See the regex demo
The output array structure will be cleaner than when using a capturing group and you may check for unknown width context (lookbehind patterns are fixed width in PHP PCRE regex):
$re = '/\bSome text as:\s*\'\K[\w-]+/i';
$str = "Some text as: 'Nerad'";
if (preg_match($re, $str, $match)) {
echo $match[0];
} // => Nerad
See the PHP demo
Just come from the back and capture the word in a group. The Group 1, will have the required string.
/:\s*'(\w+)'$/
I have some string, for example:
cats, e.g. Barsik, are funny. And it is true. So,
And I want to get as result:
cats, e.g. Barsik, are funny.
My try:
mb_ereg_search_init($text, '((?!e\.g\.).)*\.[^\.]');
$match = mb_ereg_search_pos();
But it gets position of second dot (after word "true").
How to get desired result?
Since a naive approach works for you, I am posting an answer. However, please note that detecting a sentence end is a very difficult task for a regex, and although it is possible to some degree, an NLP package should be used for that.
Having said that, I suggested using
'~(?<!\be\.g)\.(?=\s+\p{Lu})~ui'
The regex matches any dot (\.) that is not preceded with a whole word e.g (see the negative lookbehind (?<!\be\.g)), but that is followed with 1 or more whitespaces (\s+) followed with 1 uppercase Unicode letter \p{Lu}.
See the regex demo
The case insensitive i modifier does not impact what \p{Lu} matches.
The ~u modifier is required since you are working with Unicode texts (like Russian).
To get the index of the first occurrence, use a preg_match function with the PREG_OFFSET_CAPTURE flag. Here is a bit simplified regex you supplied in the comments:
preg_match('~(?<!т\.н)(?<!т\.к)(?<!e\.g)\.(?=\s+\p{L})~iu', $text, $match, PREG_OFFSET_CAPTURE);
See the lookaheads are executed one by one, and at the same location in string, thus, you do not have to additionally group them inside a positive lookahead. See the regex demo.
IDEONE demo:
$re = '~(?<!т\.н)(?<!т\.к)(?<!e\.g)\.(?=\s+\p{L})~iu';
$str = "cats, e.g. Barsik, are funny. And it is true. So,";
preg_match($re, $str, $match, PREG_OFFSET_CAPTURE);
echo $match[0][1];
Here are two approaches to get substring from start to second last . position of the initial string:
using strrpos and substr functions:
$str = 'cats, e.g. Barsik, and e.g. Lusya are funny. And it is true. So,';
$len = strlen($str);
$str = substr($str, 0, (strrpos($str, '.', strrpos($str, '.') - $len - 1) - $len) + 1);
print_r($str); // "cats, e.g. Barsik, and e.g. Lusya are funny."
using array_reverse, str_split and array_search functions:
$str = 'cats, e.g. Barsik, and e.g. Lusya are funny. And it is true. So,';
$parts = array_reverse(str_split($str));
$pos = array_search('.', $parts) + 1;
$str = implode("", array_reverse(array_slice($parts, array_search('.', array_slice($parts, $pos)) + $pos)));
print_r($str); // "cats, e.g. Barsik, and e.g. Lusya are funny."
I am trying to make a regex that will look behind .txt and then behind the "-" and get the first digit .... in the example, it would be a 1.
$record_pattern = '/.txt.+/';
preg_match($record_pattern, $decklist, $record);
print_r($record);
.txt?n=chihoi%20%283-1%29
I want to write this as one expression but can only seem to do it as two. This is the first time working with regex's.
You can use this:
$record_pattern = '/\.txt.+-(\d)/';
Now, the first group contains what you want.
Your regex would be,
\.txt[^-]*-\K\d
You don't need for any groups. It just matches from the .txt and upto the literal -. Because of \K in our regex, it discards the previously matched characters. In our case it discards .txt?n=chihoi%20%283- string. Then it starts matching again the first digit which was just after to -
DEMO
Your PHP code would be,
<?php
$mystring = ".txt?n=chihoi%20%283-1%29";
$regex = '~\.txt[^-]*-\K\d~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 1
I'm trying to return a certain part of a string. I've looked at substr, but I don't believe it's what I'm looking for.
Using this string:
/text-goes-here/more-text-here/even-more-text-here/possibly-more-here
How can I return everything between the first two // i.e. text-goes-here
Thanks,
$str="/text-goes-here/more-text-here/even-more-text-here/possibly-more-here";
$x=explode('/',$str);
echo $x[1];
print_r($x);// to see all the string split by /
<?php
$String = '/text-goes-here/more-text-here/even-more-text-here/possibly-more-here';
$SplitUrl = explode('/', $String);
# First element
echo $SplitUrl[1]; // text-goes-here
# You can also use array_shift but need twice
$Split = array_shift($SplitUrl);
$Split = array_shift($SplitUrl);
echo $Split; // text-goes-here
?>
The explode methods above certainly work. The reason for matching on the second element is that PHP inserts blank elements in the array whenever it starts with or runs into the delimiter without anything else. Another possible solution is to use regular expressions:
<?php
$str="/text-goes-here/more-text-here/even-more-text-here/possibly-more-here";
preg_match('#/(?P<match>[^/]+)/#', $str, $matches);
echo $matches['match'];
The (?P<match> ... part tells it to match with a named capture group. If you leave out the ?P<match> part, you'll end up with the matching part in $matches[1]. $matches[0] will contain the part with the forward slashes like "/text-goes-here/".
Just use preg_match:
preg_match('#/([^/]+)/#', $string, $match);
$firstSegment = $match[1]; // "text-goes-here"
where
# - start of regex (can be any caracter)
/ - a litteral /
( - beginning of a capturing group
[^/] - anything that isn't a litteral /
+ - one or more (more than one litteral /)
) - end of capturing group
/ - a litteral /
# - end of regex (must match first character of the regex)