Regular expression to get the main domain of a URL

Regular expression to get the main domain of a URL - php

I have never used regex before and I was wondering how to write a regular expression in PHP that gets the domain of the URL. For example:
http://www.hegnar.no/bors/article488276.ece --> hegnar.no

You dont need to use regexp for this task.
Check PHP's built in function, parse_url
http://php.net/manual/en/function.parse-url.php

Just use parse_url() if you are specifically dealing with URLs.
For example:
$url = "http://www.hegnar.no/bors/article488276.ece";
$url_u_want = parse_url($url, PHP_URL_HOST);
Docs
EDIT:
To take out the www. infront, use:
$url_u_want = preg_replace("/^www\./", "", $url_u_want);

$page = "http://google.no/page/page_1.html";
preg_match_all("/((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])/", $page, $result, PREG_PATTERN_ORDER);
print_r($result);

$host = parse_url($url, PHP_URL_HOST);
$host = array_reverse(explode('.', $host));
$host = $host[1].'.'.$host[0];

See
PHP Regex for extracting subdomains of arbitrary domains
and
Javascript/Regex for finding just the root domain name without sub domains

This is the problem when you use parse_url, the $url with no .com or .net or etc then the result returned is bannedadsense, this mean returning true, the fact bannedadsense is not a domain.
$url = 'http://bannedadsense/isbanned'; // this url will return false in preg_match
//$url = 'http://bannedadsense.com/isbanned'; // this url will return domain in preg_match
$domain = parse_url($url, PHP_URL_HOST));
// return "bannedadsense", meaning this is right domain.
So that we need continue to check more a case with no dot extension (.com, .net, .org, etc)
if(preg_match("/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$/i",$domain)) {
echo $domain;
}else{
echo "<br>";
echo "false";
}

Related

Check if URL has content in PHP

I've a PHP file. In this file I need to check, if my URL has the following ending:
www.example.de/dashboard/2/
So the ending can be a number 1 - 99+ which is always at the end of the url between two slashes. I can't use $_GET here. If it is $_GET, it would be easy:
if ( isset($_GET['ending']) ) :
So how can I do this without a parameter in the URL? Thanks for your help!

if(preg_match('^\/dashboard\/(\d+)', $_SERVER['REQUEST_URI'])){
foo();
}
Use regular expression on the request uri

You can make use of parse_url and explode:
$url = 'http://www.example.de/dashboard/2/';
$path = parse_url($url, PHP_URL_PATH); // '/dashboard/2/'
$parts = explode('/', $path); // ['', 'dashboard', '2', '']
$section = $parts[1]; // 'dashboard'
$ending = $parts[2]; // '2'
Demo: https://3v4l.org/dv6Cn
You can also make use of URL rewriting (this is for a Apache-based web server, but you can find simular resources for nginx or any other web servers if need be).

A more dynamic way is to explode and use array_filter to remove empty values then pick the last item.
If the item * 1 is the same as the item then we know it's a number.
(The return from explode is strings so we cant use is_int)
$url = "http://www.example.de/dashboard/2/";
$parts = array_filter(explode("/", $url));
$ending = end($parts);
if($ending*1 == $ending) echo $ending; //2

First you need to target this url to script - in web server config. For nginx and index.php:
try_files $uri #rewrite_location;
location #rewrite_location {
rewrite ^/(.*) /index.php?link=$1&$args last;
}
Second - you need to parse URI. In $end you find what you want
$link_as_array = array_values(array_diff(explode("/", $url), array('')));
$max = count($link_as_array) - 1;
$end = $link_as_array[$max];

I would think this way. If the URL is always the same, or the same format, I'll do the following:
Check for the approx URL.
Split the URL into pieces.
Find if there's the part I am looking for.
Extract the number.
<?php
$url = "http://www.example.de/dashboard/2/";
if (strpos($url, "www.example.de/dashboard") === 7 or strpos($url, "www.example.de/dashboard") === 8) {
$urlParts = explode("/", $url);
if (isset($urlParts[4]) && isNumeric($urlParts[4]))
echo "Yes! It is {$urlParts[4]}.";
}
?>
The strpos with 7 and 8 is for URL with http:// or https://.
The above will give you the output as the numeric part if it is set. I hope this works out.

Get domain name without www and .com in PHP

I need to get domain name from URL excluding "www" and ".com" or ".co.uk" or anything other.
Example-
I have following urls like-
http://www.example.com
http://www.example.co.uk
http://subdomain.example.com
http://subdomain.example.co.uk
There will be anything at ".com" , ".org" , ".co.in", ".co.uk".

I try this it work for me.
$original_url="http://subdomain.example.co.uk"; //try with all urls above
$pieces = parse_url($original_url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
echo strstr( $regs['domain'], '.', true );
}
Output- example
I get this from Here
Get domain name from full URL

(?:https?:\/\/)?(?:www\.)?(.*)\.(?=[\w.]{3,4})
Try this.See demo.Grab the capture.
http://regex101.com/r/bW3aR1/2

You should use the PHP function parse_url() in combination with a str_replace() or regex, or maybe even an explode. It depends on a few things:
Things to note:
Will there always be a subdomain?
Will there be a specific list of allowed subdomains?
I would do something like this:
<?php
$url = 'http://www.something.com';
$parts = explode('.', parse_url($url, PHP_URL_HOST));
echo $parts[1]; // "something"

Smarty how to parse URL in template file

How would I translate this PHP statement: $domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST)); to a smarty function such as:{$url|str_ireplace:'something':'etc'}
I want to print $domain in this case. $url is a smarty variable that is set for a certain URL. How do I do this?

You can pipe multiple modifiers, to first extract the host and then strip the www.:
{$url|parse_url:$smarty.const.PHP_URL_HOST|replace:'www.':''}
So for:
$url = 'http://www.example.com/foo/bar.html';
It prints:
example.com

Preg_replace domain problem

I'm Stuck try to get domain using preg_replace,
i have some list url
download.adwarebot.com/setup.exe
athena.vistapages.com/suspended.page/
prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail
freeserials.spb.ru/key/68703.htm
what i want is
adwarebot.com
vistapages.com
prosearchs.com
spb.ru
any body can help me with preg_replace ?
i'm using this http://gskinner.com/RegExr/ for testing :)

using preg_replace, if the number of TLDs is limited:
$urls = array( 'download.adwarebot.com/setup.exe',
'athena.vistapages.com/suspended.page/',
'prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail',
'freeserials.spb.ru/key/68703.htm' );
$domains = preg_replace('|([^.]*\.(?:com|ru))/', '$1', $urls);
matches everything that comes before .com or .ru which is not a period. (to not match subdomains)
You could however use PHPs builtin parse_url function to get the host (including subdomain) – use another regex, substr or array manipulation to get rid of it:
$host = parse_url('http://download.adwarebot.com/setup.exe', PHP_URL_HOST);
if(count($parts = explode('.', $host)) > 2)
$host = implode('.', array_slice($parts, -2));

Following code assumes that every entry is exactly at the beginning of the string:
preg_match_all('#^([\w]*\.)?([\w]*\.[\w]*)/#', $list, $m);
// var_dump($m[2]);
P.S. But the correct answer is still parse_url.

Why use a regular expression? Of course it is possible, but using this:
foreach($url in $url_list){
$url_parts = explode('/', $url);
$domains[] = preg_replace('~(^[^\.]+\.)~i','',$url_parts[0]);
}
$domains = array_unique($domains);
will do just fine;

maybe a more generic solution:
tested by grep, I don't have php environment, sorry:
kent$ echo "download.adwarebot.com/setup.exe
dquote> athena.vistapages.com/suspended.page/
dquote> prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail
dquote> freeserials.spb.ru/key/68703.htm"|grep -Po '(?<!/)([^\./]+\.[^\./]+)(?=/.+)'
output:
adwarebot.com
vistapages.com
prosearchs.com
spb.ru

Remove http from variable

I have a variable, such as this:
$domain = "http://test.com"
I need to use preg_replace or str_place to get the variable like this:
$domain = "test.com"
I have tried using the following, but they do not work.
1) $domain = preg_replace('; ((ftp|https?)://|www3?\.).+? ;', ' ', $domain);
2) $domain = preg_replace(';\b((ftp|https?)://|www3?\.).+?\b;', ' ', $domain);
Any suggestions?

Or you can use parse_url:
parse_url($domain, PHP_URL_HOST);

$domain = ltrim($domain, "http://");

Did you try the str_replace?
$domain = "http://test.com"
$domain = str_replace('http://','',$domain);
You regular expressions probably don't find a match for the pattern.

preg_replace('~(https://|http://|ftp://)~',, '', $domain);

preg_match('/^[a-z]+:[/][/](.+)$/', $domain, $matches);
echo($matches[1]);
Should be what you are looking for, should give you everything after the protocol... http://domain.com/test becomes "domain.com/test". However, it doesn't care about the protocol, if you only want to support specific protocols such as HTTP and FTP, then use this instead:
preg_match('/^(http|ftp):[/][/](.+)$/', $domain, $matches);
If you only want the domain though, or similar parts of the URI, I'd recommend PHP's parse_url() instead. It does all the hard work for you and does it the proper way. Depending on your needs, I would probably recommend you use it anyway and just put it all back together instead.

simple regex:
preg_replace('~^(?:f|ht)tps?://~i','', 'https://www.site.com.br');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular expression to get the main domain of a URL - php

I have never used regex before and I was wondering how to write a regular expression in PHP that gets the domain of the URL. For example: http://www.hegnar.no/bors/article488276.ece --> hegnar.no

You dont need to use regexp for this task. Check PHP's built in function, parse_url http://php.net/manual/en/function.parse-url.php

Just use parse_url() if you are specifically dealing with URLs. For example: $url = "http://www.hegnar.no/bors/article488276.ece"; $url_u_want = parse_url($url, PHP_URL_HOST); Docs EDIT: To take out the www. infront, use: $url_u_want = preg_replace("/^www\./", "", $url_u_want);

$page = "http://google.no/page/page_1.html"; preg_match_all("/((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])/", $page, $result, PREG_PATTERN_ORDER); print_r($result);

$host = parse_url($url, PHP_URL_HOST); $host = array_reverse(explode('.', $host)); $host = $host[1].'.'.$host[0];

See PHP Regex for extracting subdomains of arbitrary domains and Javascript/Regex for finding just the root domain name without sub domains

Related

Check if URL has content in PHP

Get domain name without www and .com in PHP

Smarty how to parse URL in template file

Preg_replace domain problem

Remove http from variable

Categories

Resources