So I have a working preg_match in PHP, however, for the life of me, I cannot get the same function to work using Javascript/jQuery.
This is what I am stuck on currently:
yt=$('#yt').val().match(/~^\(?:https?://\)?(?:www\.)?(?:youtube\.com|youtu\.be)(?:/)(?:watch\?v=)?([^&]+)~x/);
alert(yt[1]);
This is the working function in PHP:
$rx = "~"
."^(?:https?://)?" // Optional protocol
."(?:www\.)? " // Optional subdomain
."(?:youtube\.com|youtu\.be)" // Mandatory domain name
."(?:/)" //mandatory bracket
."(?:watch\?v=)?" //optional URI
."([^&]+)" //video id as capture group 1
."~x";
$has_match = preg_match($rx, $url, $matches);
Any idea how to get this functioning?
I found some similar posts on Stack, but they are far less complex than this regex, and couldnt get my head wrapped around the differences.
Not 100% sure but I think you haven't escaped everything correctly.
yt=$('#yt').val().match("^(?:https?://)?(?:www\.)?(?:youtube\.com|youtu\.be)(?:/)(?:watch\?v=)?([^&]+)")
alert(yt[1]);
"https://www.youtube.com/watch?v=dQw4w9WgXcQ".match("^(?:https?://)?(?:www\.)?(?:youtube\.com|youtu\.be)(?:/)(?:watch\?v=)?([^&]+)");
results in
["https://www.youtube.com/watch?v=iQbS-8m3svw", "watch?v=dQw4w9WgXcQ"]
Related
how can I validate youtube channel URL using REGEX ?
I found this pattern but it doesn't work properly
/((http|https):\/\/|)(www.|)youtube\.com\/(channel\/|user\/|)[a-zA-Z0-9]{1,}/
Can anyone help me ?
Your problem is the extra pipe after user\/
Here is the corrected regex:
((http|https):\/\/|)(www\.|)youtube\.com\/(channel\/|user\/)[a-zA-Z0-9_-]{1,}
The reason this is a problem is because it make (channel|user) optional.
A better way to write this regex is
(https?:\/\/)?(www\.)?youtube\.com\/(channel|user)\/[\w-]+
After some in-depth research, I came up with the following RegEx:
^https?:\/\/(www\.)?youtube\.com\/(channel\/UC[\w-]{21}[AQgw]|(c\/|user\/)?[\w-]+)$
It allows:
https or http
www. or not
channel/ URLs (since 2015)
user/ URLs (legacy)
custom (c/) URLs (since 2016)
short URLs (user/ or c/ removed)
It also validates channel IDs since they follow a pattern:
Start with UC
21 characters of [0-9a-zA-Z_-] (same as [\w-])
end with one of [AQgw]
Tested with:
https://www.youtube.com/channel/UCARj2eHnsYMuCZDmYZ5q4_g
https://www.youtube.com/channel/UCUZHFZ9jIKrLroW8LcyJEQQ
https://www.youtube.com/c/YouTubeCreators
https://www.youtube.com/YouTubeCreators
https://www.youtube.com/user/partnersupport
https://www.youtube.com/partnersupport
http://www.youtube.com/partnersupport
https://youtube.com/partnersupport
http://youtube.com/partnersupport
Sources:
https://support.google.com/youtube/answer/6180214
https://breadnbeyond.com/youtube-marketing/youtube-custom-channel-url/
https://webapps.stackexchange.com/a/101153
To get the channel name or channel id from a youtube URL use:
(?:https|http)\:\/\/(?:[\w]+\.)?youtube\.com\/(?:c\/|channel\/|user\/)?([a-zA-Z0-9\-]{1,})
Works for:
https://www.youtube.com/user/channelblabla
https://www.youtube.com/channel/channelblabla
https://www.youtube.com/c/channelblabla
https://www.youtube.com/channelblabla
Channel ID's start with 'UC'. I don't know of any other way to recognize channel ID's vs. channel names.
I found the best solution for me is
/(https?:\/\/)?(www\.)?youtu((\.be)|(be\..{2,5}))\/((user)|(channel))\/?([a-zA-Z0-9\-_]{1,})/
This works for
http://www.youtube.com/channel/uc_fglsfl
http://youtube.co.uk/channel/asdasgfgjd
https://youtube.com/channel/ghjgk+öää,
https://youtube.net/channel/43568&gsldkfj
https://youtube.de/channel/dtgzu&&dadg
http://youtube.com/channel/vgujsgh&as=gr
http://youtube.com/channel/xdfhxfgu
let str = 'http://www.youtube.com/channel/uc_fglsfl'
let pattern = '/(https?:\/\/)?(www\.)?youtu((\.be)|(be\..{2,5}))\/((user)|(channel))\/?([a-zA-Z0-9\-_]{1,})/'
let matchs = str.match(pattern)
//result id
matchs[9]
This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.
I am using a script to check links on a given page. I am using simple html DOM to parse the information into an array. I have to check the href of all the a tags to find if they contain a file or something like # or JS.
I tried the following without success.
if(preg_match("|^(.*)|iU", $href)){
save_link();
}
I dont know it my pattern is wrong or if there is a better method to complete this function.
I want to be able to detect if $href contains .com .php .file extensions. This way it will filter out items like # "function()" and other items used in the href attribute.
EDIT:
parse_url will not work stop posting it. The value # returns as a valid url like I stated above I am trying to look for any string followed by .* with no more than 4 chars following the .
I believe that the function you're looking for is parse_url().
This function will take a URL string, and return an array of components, which will allow you to work out what kind of URL it is.
However note that it has issues with incomplete URLs in PHP versions prior to 5.4.7, so you need to have the very latest PHP to get the best out of it.
Hope that helps.
See http://php.net/manual/en/function.parse-url.php
I'm assuming you don't want to match fragments (#) because you are not concerned with following internal anchors.
parse_url breaks up the different parts of the url into an array. You can see the path component of the URL in this array and run your check against that.
You can use parse_url() , like this :
$res = parse_url($href);
if ( $res['scheme'] == 'http' || $res['scheme'] == 'https'){
//valid url
save_link();
}
UPDATE:
I've added code to filter only http and https urls, thanks to Baba for spotting this.
I've been working with the Sphider search engine for an internal website, we need to be able to quickly search for contact details in exported .htm(l) files.
$fulltxt = ereg_replace("[_A-Za-z0-9-]+(\.[_A-Za-z0-9-]+)*#[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,3})", "\\0", $fulltxt);
I am replacing e-mail addresses with a convenient mailto: link so users can open Outlook straight from the search results.
However,
while (preg_match("/[^\>](".$change.")[^\<]/i", " ".$fulltxt." ", $regs)) {
$fulltxt = preg_replace("/".$regs[1]."/i", "<b>".$regs[1]."</b>", $fulltxt);
}
It replaces all matches in the search results with bold tags, which resuts into the tags been included in Outlook's 'To...' field. It looks something like this in HTML (thanks Yuriy):
<b>name</b>.surname#domain
I have tried adding a value to the 'limit' parameter:
while (preg_match("/[^\>](".$change.")[^\<]/i", " ".$fulltxt." ", $regs)) {
$fulltxt = preg_replace("/".$regs[1]."/i", "<b>".$regs[1]."</b>", $fulltxt, 1);
}
Supposingly this should be the solution to my problem by simply replacing only the first occurrence (being the name as the pattern is name-phone num-email and we always search by name), instead it only makes it incredibly slow to the point i get a timeout message from the server. I've been trying various solutions but have been out of luck.
Any ideas? Am i doing something wrong?
Thanks.
(*Original heavily edited).
Did I understand you right that something like this happens?
<b>email#domain</b>
Why don't you put tags into search results first, and only then apply "mailto:" anchors to emails? Added 's would be easy to filter out in the patter on that second step.
It seems Google's URLs are structured differently these days. So it is harder to extract the referring keyword from them. Here is an example:
http://www.google.co.uk/search?q=jquery+post+output+46&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#pq=jquery+post+output+46&hl=en&cp=30&gs_id=1v&xhr=t&q=jquery+post+output+php+not+running&pf=p&sclient=psy-ab&client=firefox-a&hs=8N5&rls=org.mozilla:en-US%3Aofficial&source=hp&pbx=1&oq=jquery+post+output+php+not+run&aq=0w&aqi=q-w1&aql=&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=bdeb326aa44b07c5&biw=1280&bih=875
The search I performed was actually "jquery post output php not running", so the first 'q=' does not contain the full search. The second one does. I'd like to write a script that always extracts the last 'q=', but I'm not sure if Google's URL's always have the full search last. Anyone had any experience with this.
You can accomplish this using parse_url(), parse_str(), and urldecode(), where $str is the refer string:
$fragment = parse_url($str, PHP_URL_FRAGMENT);
parse_str($fragment, $arr);
$query = urldecode($arr['q']); // jquery post output php not running