i want to extract only domain name from given URL for any kind of TLD's . TLD can have any number in character or my be two in size
<?php $domain_name = 'http://www.subdomain.domain.co.in' ;
$ParsedURL = parse_url($domain_name);
$domain_name = preg_replace("/^([a-zA-Z0-9].*\.)?([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z.]{2,})$/", '$2', $ParsedURL['host']);
$domain_name = current(explode('.', $domain_name));
print_r($domain_name);
so above solution is satisfying only limited URLs ,it fails when TDL's are three or more in characters e.g org.in ,
Again i am getting different results when there is no subdomain in url .
Kindly help me out as i only want "domain" from Given URL
Try this regex /^\.{1}.*\.[a-zA-Z0-9]{2,3}(\.[a-zA-Z0-9]{2,3})?$/ the only thing would be to strip the first '.' from the matched string: <?php substr($domain_name, 1);?>
Related
I'm working with some code used to try and find all the website URLs within a block of text. Right now we've already got checks that work fine for URLs formatted such as http://www.google.com or www.google.com but we're trying to find a regex that can locate a URL in a format such as just google.com
Right now our regex is set to search for every domain that we could find registered which is around 1400 in total, so it looks like this:
/(\S+\.(COM|NET|ORG|CA|EDU|UK|AU|FR|PR)\S+)/i
Except with ALL 1400 domains to check in the group(the full thing is around 8400 characters long). Naturally it's running quite slowly, and we've already had the idea to simply check for the 10 or so most commonly used domains but I wanted to check here first to see if there was a more efficient way to check for this specific formatting of website URLs rather than singling every single one out.
You could use a double pass search.
Search for every url-like string, e.g.:
((http|https):\/\/)?([\w-]+\.)+[\S]{2,5}
On every result do some non-regex checks, like, is the length enough, is the text after the last dot part of your tld list, etc.
function isUrl($urlMatch) {
$tldList = ['com', 'net'];
$urlParts = explode(".", $urlMatch);
$lastPart = end($urlParts);
return in_array($lastPart, $tldList);
}
Example
function get_host($url) {
$host = parse_url($url, PHP_URL_HOST);
$names = explode(".", $host);
if(count($names) == 1) {
return $names[0];
}
$names = array_reverse($names);
return $names[1] . '.' . $names[0];
}
Usage
echo get_host('https://google.com'); // google.com
echo "\n";
echo get_host('https://www.google.com'); // google.com
echo "\n";
echo get_host('https://sub1.sub2.google.com'); // google.com
echo "\n";
echo get_host('http://localhost'); // localhost
Demo
I am new to PHP and hope someone can help me with this.
I want PHP to give me the name of the current page of my website.
The important thing is that I need this without any leading slashes and without any trailing extensions etc., just the plain page name.
Example:
The URL of a page is http://www.myurl.com/index.php?lang=en
In this case it should only return "index".
I found a way to get rid of the leading part using the following but have trouble to remove the trailing part since this is variable (it can be just .php or .php?lang=en or .php=lang=de etc.).
$pageName = basename($_SERVER["REQUEST_URI"]);
The only thing I found is the following but this doesn't cover the variable extension part:
$pageName = basename($_SERVER["REQUEST_URI"], ".php");
Can someone tell me how to get rid of the trailing part as well ?
Many thanks in advance,
Mike
You can use parse_url in combination with pathinfo:
<?php
$input = 'http://www.myurl.com/index.php?lang=en';
$output = pathinfo(parse_url($input, PHP_URL_PATH), PATHINFO_FILENAME);
var_dump($output); // => index
demo: https://eval.in/382330
One possible way is:
$url = "http://www.myurl.com/index.php?lang=en";
preg_match('/\/([\w-_]+)\.php/i',$url,$match);
echo $match[1];
If you need help with the regex look here:
https://regex101.com/r/cM8sS3/1
here is simplest solution.
$pagename = basename($_SERVER['PHP_SELF']);
$a = explode(".",$pagename);
echo $a[0];
A tutorial on how to do it
With an .htaccess file you can:
Redirect the user to different page
Password protect a specific directory
Block users by IP Preventing hot
linking of your images
Rewrite URIs
Specify your own Error Documents
Try this
//return url
$pageName = base64_decode($_GET["return_url"]);
function Url($pageName) {
$pageName= strtolower($pageName);
$pageName= str_replace('.',' ',$pageName);
$pageName= preg_replace("/[^a-z0-9_\s-]/", "", $pageName);
$pageName= preg_replace("/[\s-]+/", " ", $pageName);
$pageName= preg_replace("/[\s_]/", "-", $pageName);
return $pageName ;
}
$cleanurl=Url($pageName);
echo $cleanurl;
This is a situation where I would just use a regular expression. Here's the code:
$pagename = basename("http://www.myurl.com/index.php?lang=en");
$pagename = preg_replace("/\..*/", "", $pagename);
You can see a working demo here: https://ideone.com/RdrHzc
The first argument is an expression that matches for a literal period followed by any number of characters. The second argument tells the function to replace the matched string with an empty string, and the last argument is the variable to operate on.
I need to get the vine video id from the url
so the output from link like this
https://vine.co/v/bXidIgMnIPJ
be like this
bXidIgMnIPJ
I tried to use code form other question here for Vimeo (NOT VINE)
Get img thumbnails from Vimeo?
This what I tried to use but I did not succeed
$url = 'https://vine.co/v/bXidIgMnIPJ';
preg_replace('~^https://(?:www\.)?vine\.co/(?:clip:)?(\d+)~','$1',$url)
basename maybe?
<?php
$url = 'https://vine.co/v/bXidIgMnIPJ';
var_dump(basename($url));
http://codepad.org/vZiFP27y
Assuming it will always be in that format, you can just split the url by the / delimiter. Regex is not needed for a simple url such as this.
$id = end(explode('/', $url));
Referring to as the question is asked here is a solution for preg_replace:
$s = 'https://vine.co/v/bXidIgMnIPJ';
$new_s = preg_replace('/^.*\//','',$s);
echo $new_s;
// => bXidIgMnIPJ
or if you need to validate that an input string is indeed a link to vine.co :
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co.*\//','',$s);
I don't know if that /v/ part is always present or is it always v... if it is then it may also be added to regex for stricter validation:
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co\/v\//','',$s);
Here's what I am using:
function getVineId($url) {
preg_match("#(?<=vine.co/v/)[0-9A-Za-z]+#", $url, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return false;
}
I used a look-behind to ensure "vine.co/v/" always precedes the ID, while ignoring if the url is HTTP or HTTPS (or if it lacks a protocol altogether). It assumes the ID is alphanumeric, of any length. It will ignore any characters or parameters after the id (like Google campaign tracking parameters, etc).
I used the "#" delimiter so I wouldn't have to escape the forward slashes (/), for a cleaner look.
explode the string with '/' and the last string is what you are looking for :) Code:
$vars = explode("/",$url);
echo $vars[count($vars)-1];
$url = 'https://vine.co/v/b2PFre2auF5';
$regex = '/^http(?:s?):\/\/(?:www\.)?vine\.co\/v\/([a-zA-Z0-9]{1,13})$/';
preg_match($regex,$url,$m);
print_r($m);
1. b2PFre2auF5
I have this code right here:
// get host name from URL
preg_match('#^(?:http://)?([^/]+)#i',
"http://www.joomla.subdomain.php.net/index.html", $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
The output will be php.net
I need just php without .net
Although regexes are fine here, I'd recommend parse_url
$host = parse_url('http://www.joomla.subdomain.php.net/index.html', PHP_URL_HOST);
$domains = explode('.', $host);
echo $domains[count($domains)-2];
This will work for TLD's like .com, .org, .net, etc. but not for .co.uk or .com.mx. You'd need some more logic (most likely an array of tld's) to parse those out .
Group the first part of your 2nd regex into /([^.]+)\.[^.]+$/ and $matches[1] will be php
Late answer and it doesn't work with subdomains, but it does work with any tld (co.uk, com.de, etc):
$domain = "somesite.co.uk";
$domain_solo = explode(".", $domain)[0];
print($domain_solo);
Demo
It's really easy:
function get_tld($domain) {
$domain=str_replace("http://","",$domain); //remove http://
$domain=str_replace("www","",$domain); //remowe www
$nd=explode(".",$domain);
$domain_name=$nd[0];
$tld=str_replace($domain_name.".","",$domain);
return $tld;
}
To get the domain name, simply return $domain_name, it works only with top level domain. In the case of subdomains you will get the subdomain name.
I want to work around email addresses and I want to explode them using php's explode function.
It's ok to separate the user from the domain or the host doing like this:
list( $user, $domain ) = explode( '#', $email );
but when trying to explode the domain to domain_name and domain_extention I realised that when exploding them using the "." as the argument it will not always be foo.bar, it can sometimes be foo.ba.ar like fooooo.co.uk
so how to separate "fooooo.co" from "uk" and let the co with the fooooo. so finally I will get the TLD separated from the other part.
I know that co.uk is supposed to be treated as the TLD but it's not official, like fooooo.nat.tn or fooooo.gov.tn
Thank You.
Just use strripos() to find the last occurrence of ".":
$blah = "hello.co.uk";
$i = strripos($blah, ".");
echo "name = " . substr($blah, 0, $i) . "\n";
echo "TLD = " . substr($blah, $i + 1) . "\n";
Better use imap_rfc822_parse_adrlist or mailparse_rfc822_parse_addresses to parse the email address if available. And for removing the “public suffix” from the domain name, see my answer to Remove domain extension.
Expanding on Oli's answer...
substr($address, (strripos($address, '.') + 1));
Will give the TLD without the '.'. Lose the +1 and you get the dot, too.
end(explode('.', $email)); will give you the TLD. To get the domain name without that, you can do any number of other string manipulation tricks, such as subtracting off that length.