hello here is my html :
<div>
hello.domain.com
holla.domain.com
stack.domain.com
overflow.domain.com </div>
I want to return an array with : hello, holla, stack,overflow
then I have this https://hello.domain.com/c/mark?lang=fr
I want to return the value : mark
I know it should be done with regular expressions. As long as I know how to do it regular expression or not it will be good. thank you
Part 1: Subdomains
$regex = '~\w+(?=\.domain\.com)~i';
preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);
See the matches in the regex demo.
Match Array:
[0] => hello
[1] => holla
[2] => stack
[3] => overflow
Explanation
The i modifier makes it case-insensitive
\w+ matches letters, digits or underscores (our match)
The lookahead (?=\.domain\.com) asserts that it is followed by .domain.com
Part 2: Substring
$regex = '~https://hello\.domain\.com/c/\K[^\s#?]+(?=\?)~';
if (preg_match($regex, $yourstring, $m)) {
$thematch = $m[0];
}
else { // no match...
}
See the match in the regex demo.
Explanation
https://hello\.domain\.com/c/ matches https://hello.domain.com/c/
The \K tells the engine to drop what was matched so far from the final match it returns
[^\s#?]+ matches any chars that are not a white-space char, ? or # url fragment marker
The lookahead (?=\?) asserts that it is followed by a ?
Although I am not sure where you are trying to take this.
$input = 'somthing.domain.com';
$string = trim($input, '.domain.com');
may help you.
About the second part of your question, you can use the parse_url function:
$yourURL = 'https://hello.domain.com/c/mark?lang=fr';
$result = end(explode('/', parse_url($yourURL, PHP_URL_PATH)));
For the second part of your question (extract part of a URL) others have answered with a highly specific regex solution. More generally what you are trying to do is parse a URL for which there already exists the parse_url() function. You will find the following more flexible and applicable to other URLs:
php > $url = 'https://hello.domain.com/c/mark?lang=fr';
php > $urlpath = parse_url($url, PHP_URL_PATH);
php > print $urlpath ."\n";
/c/mark
php > print basename($urlpath) . "\n";
mark
php > $url = 'ftp://some.where.com.au/abcd/efg/wow?lang=id&q=blah';
php > print basename(parse_url($url, PHP_URL_PATH)) . "\n";
This assumes that you are after the last part of the URL path, but you could use explode("/", $urlpath) to access other components in the path.
Related
I try to extract the shortcode from Instagram URL
Here what i have already tried but i don't know how to extract when they are an username in the middle. Thank you a lot for your answer.
Instagram pattern : /p/shortcode/
https://regex101.com/r/nO4vdd/1/
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/
expected : BxKRx5CHn5i
I took you original query and added a .* bafore the \/p\/
This gave a query of
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com.*\/p\/)([\d\w\-_]+)(?:\/)?(\?.*)?$
This would be simpler assuming the username always follows the /p/
^(?:.*\/p\/)([\d\w\-_]+)
You could prepend an optional (?:\/\w+)? non capturing group.
Note that \w also matches _ and \d so the capturing group could be updated to ([\w-]+) and the forward slash in the non capturing group might also be written as just /
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com(?:\/\w+)?\/p\/)([\w-]+)(?:\/)?(\?.*)?$
Regex demo
You don't have to escape the backslashes if you use a different delimiter than /. Your pattern might look like:
^(?:https?://)?(?:www\.)?(?:instagram\.com(?:/\w+)?/p/)([\w-]+)/?(\?.*)?$
This expression might also work:
^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$
Test
$re = '/^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$/m';
$str = 'https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $match) {
var_export($match[1]);
}
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Assuming that you aren't simply trusting /p/ as the marker before the substring, you can use this pattern which will consume one or more of the directories before your desired substring.
Notice that \K restarts the fullstring match, and effectively removes the need to use a capture group -- this means a smaller output array and a shorter pattern.
Choosing a pattern delimiter like ~ which doesn't occur inside your pattern alleviates the need to escape the forward slashes. This again makes your pattern more brief and easier to read.
If you do want to rely on the /p/ substring, then just add p/ before my \K.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
echo preg_match('~(?:https?://)?(?:www\.)?instagram\.com(?:/[^/]+)*/\K\w+~', $string , $m) ? $m[0] : '';
echo " (from $string)\n";
}
Output:
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BrODg5XHlE6 (from https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176)
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere)
If you are implicitly trusting the /p/ as the marker and you know that you are dealing with instagram links, then you can avoid regex and just cut out the 11-character-substring, 3-characters after the marker.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
$pos = strpos($string, '/p/');
if ($pos === false) {
continue;
}
echo substr($string, $pos + 3, 11);
echo " (from $string)\n";
}
(Same output as previous technique)
There are many similar questions, but I still have not found a solution to what I try to achieve in php. I preg_match_all a string which can contain URLs written in various ways, but also contains text which should not match. What I need to match is:
www.something.com
https://something.com
http://something.com
https://www.something.com
http://www.something.com
And any /..../.... after the URL, but not:
www.something.com</p> // this should match everything until the '</p>'
www.something.com. // this should match everything until the '.'
So far I got so far is
/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/
and the function
if(preg_match_all("/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/",$text,$urls)){
foreach($urls[0]as $url ){
$text = str_replace($url,''.$url.'',$text);
}
}
but this gives a problem with http://www.... (the http:// won't be inlcuded in the displayed text), and with a URL without http or https the created link is relative to the domain I show the page on. Suggestions?
Here's a live Demo
Edit: my best regex so for any URL with http or https is /(http|https)\:\/\/[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(\/[A-Za-z0-9-._~!$&()*+,;=:]*)*/. Now I just need a way to regex the URLs with only www.something... and transform that into http://www.something... in the href.
Here's another live demo with different examples.
Edit 2: the answer from this question is quite good. The only problem with this that I still encounter is with </p> after the URL and if there are words before and after a dot (this.for example).
$url = '#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#';
$string = preg_replace($url, '$0', $string);
echo $string;
Maybe this one fits your needs:
$text = preg_replace_callback('~(https?://|www)[a-z\d.-]+[\w/.?=&%:#]*\w~i', function($m) {
$prefix = stripos($m[0], 'www') === 0 ? 'http://' : '';
return "<a href='{$prefix}{$m[0]}'>{$m[0]}</a>";
}, $text);
$text = "<p>Some string www.test.com with urls http://test.com in it http://www.test.com. </p>";
$text = preg_replace_callback("#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#", 'replace_callback', $text);
function replace_callback($matches){
return '' . $matches[0] . '';
}
You regex was almost correct!
You we're matching a literal dot \. followed by 0 or more group of characters including the dot.
So i changed it to matching a literal dot followed by 1 or more characters excluding the dot which seems to be what you want, here is the final regex:
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\/\?\:#\-_=#])+
See it in action:
https://regex101.com/r/h5pUvC/3/
I am getting a result as a return of a laravel console command like
Some text as: 'Nerad'
Now i tried
$regex = '/(?<=\bSome text as:\s)(?:[\w-]+)/is';
preg_match_all( $regex, $d, $matches );
but its returning empty.
my guess is something is wrong with single quotes, for this i need to change the regex..
Any guess?
Note that you get no match because the ' before Nerad is not matched, nor checked with the lookbehind.
If you need to check the context, but avoid including it into the match, in PHP regex, it can be done with a \K match reset operator:
$regex = '/\bSome text as:\s*'\K[\w-]+/i';
See the regex demo
The output array structure will be cleaner than when using a capturing group and you may check for unknown width context (lookbehind patterns are fixed width in PHP PCRE regex):
$re = '/\bSome text as:\s*\'\K[\w-]+/i';
$str = "Some text as: 'Nerad'";
if (preg_match($re, $str, $match)) {
echo $match[0];
} // => Nerad
See the PHP demo
Just come from the back and capture the word in a group. The Group 1, will have the required string.
/:\s*'(\w+)'$/
I would like to know how I can cut a string in PHP starting from the last character -> to a specific character. Lets say I have following link:
www.whatever.com/url/otherurl/2535834
and I want to get 2535834
Important note: the number can have a different length, which is why I want to cut out to the / no matter how many numbers there are.
Thanks
In this special case, an url, use basename() :
echo basename('www.whatever.com/url/otherurl/2535834');
A more general solution would be preg_replace(), like this:
<----- the delimiter which separates the search string from the remaining part of the string
echo preg_replace('#.*/#', '', $url);
The pattern '#.*/#' makes usage of the default greediness of the PCRE regex engine - meaning it will match as many chars as possible and will therefore consume /abc/123/xyz/ instead of just /abc/ when matching the pattern.
Use
explode() AND end()
<?php
$str = 'www.whatever.com/url/otherurl/2535834';
$tmp = explode('/', $str);
echo end ($tmp);
?>
Working Demo
This should work for you:
(So you can get the number with or without a slash, if you need that)
<?php
$url = "www.whatever.com/url/otherurl/2535834";
preg_match("/\/(\d+)$/",$url,$matches);
print_r($matches);
?>
Output:
Array ( [0] => /2535834 [1] => 2535834 )
With strstr() and str_replace() in action
$str = 'www.whatever.com/url/otherurl/2535834';
echo str_replace("otherurl/", "", strstr($str, "otherurl/"));
strstr() finds everything (including the needle) after the needle and the needle gets replaced by "" using str_replace()
if your pattern is fixed you can always do:
$str = 'www.whatever.com/url/otherurl/2535834';
$tmp = explode('/', $str);
echo $temp[3];
Here's mine version:
$string = "www.whatever.com/url/otherurl/2535834";
echo substr($string, strrpos($string, "/") + 1, strlen($string));
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
This code outputs the $captured array, but $captured[1] contains bar/this rather than my expected bar. What's missing in my regex to stop from returning more than bar?
<?php
$pattern = '/foo/:any/';
$subject = '/foo/bar/this/that';
$pattern = str_replace(':any', '(.+)', $pattern);
$pattern = str_replace(':num', '([0-9]+)', $pattern);
$pattern = str_replace(':alpha', '([A-Za-z]+)', $pattern);
echo '<pre>';
$pattern = '#^' . $pattern . '#';
preg_match($pattern, $subject, $captured);
print_r($captured);
echo '</pre>';
Use a non-greedy modifier to make the + match as few characters as possible instead of as many as possible:
$pattern = str_replace(':any', '(.+?)', $pattern);
^
You probably also want to add delimiters round your regular expression and anchor it to the start of the string:
$pattern = '#^/foo/:any/#';
The dot is greedy and matches as many characters as possible. Either make it lazy:
$pattern = str_replace(':any', '(.+?)', $pattern);
or keep it from matching slashes:
$pattern = str_replace(':any', '([^\/]+)', $pattern);
Your code is rather confusing and misleading and if run it, it outputs a warning:
Warning: preg_match(): Unknown modifier '(' in php shell code on line 1
What I think is wrong is:
$pattern = '/foo/:any/';
#should be
$pattern = '/foo\/:any/';
because you need to escape a forward slash in regexp.
After this is fixed the script returns:
(
[0] => foo/bar/this/that
[1] => bar/this/that
)
Which is an expected result. As you match foo/ and everything afterwards with (.*). If you want to match anything until the next forward slash you have some possibilities:
$pattern = '/foo/(.*?)/' #non greedy
$pattern = '/foo/([^\/]*)/' #not matching any forward slash
$pattern = '#foo/:any/#' #or using different start and end markers, e.g. #