I have to add a GET variable to a url. But the URL might already have GET variables. What's the most efficient way to add this new variable?
Example URLs:
http://domain.com/
http://domain.com/index.html?name=jones
I need to add: tag=xyz:
http://domain.com/?tag=xyz
http://domain.com/index.html?name=jones&tag=xyz
What's the most efficient way to know whether to prepend my string with a ? or &?
Here's a version of the function I have so far:
// where arrAdditions looks like array('key1'=>'value1','key2'=>'value2');
function appendUrlQueryString($url, $arrAdditions) {
$arrQueryStrings = array();
foreach ($arrAdditions as $k=>$v) {
$arrQueryStrings[] = $k . '=' . $v;
}
$strAppend = implode('&',$arrQueryStrings);
if (strpos($url, '?') !== false) {
$url .= '&' . $strAppend;
} else {
$url = '?' . $strAppend;
}
return $url;
}
But, is simply looking for the ? in the existing url problematic? For example, is it possible that a url includes a ? but not actual queries, perhaps as an escaped character?
Take a look at PHP PECL's http_build_url. Said by the doc page:
Build a URL.
The parts of the second URL will be merged into the first according to the flags argument.
Addition:
If you don't have PECL installed, we can jump through some hoops. This approach is somewhat solid right up until we try to rebuild the new URL. Stock PHP (minus PECL) doesn't have a reverse of parse_url(). Making it harder, parse_url() removes some of the grammar from a URL in the resulting parts array so we have to put them back in when we reassemble. http_build_url() can take care of this for us, but if it were available, you wouldn't be reading this portion as it's what I originally recommended. Anyway, here's that code:
<?php
/**
* addQueryParam - given a URL and some new params for its query string, return the modified URL
*
* #see http://us1.php.net/parse_url
* #see http://us1.php.net/parse_str
* #throws Exception on bad input
* #param STRING $url A parseable URL to add query params to
* #param MIXED $input_query_vars - STRING of & separated pairs of = separated key values OR ASSOCIATIVE ARRAY of STRING keys => STRING values
* #return STRING new URL
*/
function addQueryParam ($url, $input_query_vars) {
// Parse new parameters
if (is_string($input_query_vars)) {
parse_str($input_query_vars, $input_query_vars);
}
// Ensure array of parameters now available
if (!is_array($input_query_vars)) {
throw new Exception(__FUNCTION__ . ' expects associative array or query string as second parameter.');
}
// Break up given URL
$url_parts = parse_url($url);
// Test for proper URL parse
if (!is_array($url_parts)) {
throw new Exception(__FUNCTION__ . ' expects parseable URL as first parameter');
}
// Produce array of original query vars
$original_query_vars = array();
if (isset($url_parts['query']) && $url_parts['query'] !== '') {
parse_str($url_parts['query'], $original_query_vars);
}
// Merge new params inot original
$new_query_vars = array_merge($original_query_vars, $input_query_vars);
// replace the original query string
$url_parts['query'] = http_build_query($new_query_vars);
// Put URL grammar back in place
if (!empty($url_parts['scheme'])) {
$url_parts['scheme'] .= '://';
}
if (!empty($url_parts['query'])) {
$url_parts['query'] = '?' . $url_parts['query'];
}
if (!empty($url_parts['fragment'])) {
$url_parts['fragment'] = '#' . $url_parts['fragment'];
}
// Put it all back together and return it
return implode('', $url_parts);
}
// Your demo URLs
$url1 = 'http://domain.com/';
$url2 = 'http://domain.com/index.html?name=jones';
//Some usage (I did this from CLI)
echo $url1, "\n";
echo addQueryParam($url1, 'tag=xyz'), "\n";
echo addQueryParam($url1, array('tag' => 'xyz')), "\n";
echo $url2, "\n";
echo addQueryParam($url2, 'tag=xyz'), "\n";
echo addQueryParam($url2, array('tag' => 'xyz')), "\n";
echo addQueryParam($url2, array('name' => 'foo', 'tag' => 'xyz')), "\n";
To check if parameter already exists you could try parse_str().
This will parse your URL and put variables into an array.
It will give you some issues if you will pass full URL:
$url = "http://domain.com/index.html?name=jones&tag=xyz";
parse_str($url', $arr);
print_r($arr);
will give you
Array ( [http://domain_com/index_html?name] => jones [tag] => xyz )
but with
$url = "name=jones&tag=xyz";
you will get
Array ( [name] => jones [tag] => xyz )
You could explode full URL by '?' and check the second part. After the check you could know how to modify your URL. But I'm not sure this would work all the time.
$url_one = "http://www.stackoverflow.com?action=submit&id=example";
$new_params = "user=john&pass=123";
$final_url = $url_one."&".$new_param
s;
Now $final_url has old url and new params added to it.
I need to write a function to parse variables which contain domain names. It's best I explain this with an example, the variable could contain any of these things:
here.example.com
example.com
example.org
here.example.org
But when passed through my function all of these must return either example.com or example.co.uk, the root domain name basically. I'm sure I've done this before but I've been searching Google for about 20 minutes and can't find anything. Any help would be appreciated.
EDIT: Ignore the .co.uk, presume that all domains going through this function have a 3 letter TLD.
Stackoverflow Question Archive:
How to get domain name from url?
Check if domain equals value?
How do I get the base url?
print get_domain("http://somedomain.co.uk"); // outputs 'somedomain.co.uk'
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
If you want a fast simple solution, without external calls and checking against predefined arrays. Works for new domains like "www.domain.gallery" also, unlike the most popular answer.
function get_domain($host){
$myhost = strtolower(trim($host));
$count = substr_count($myhost, '.');
if($count === 2){
if(strlen(explode('.', $myhost)[1]) > 3) $myhost = explode('.', $myhost, 2)[1];
} else if($count > 2){
$myhost = get_domain(explode('.', $myhost, 2)[1]);
}
return $myhost;
}
domain.com -> domain.com
sub.domain.com -> domain.com
www.domain.com -> domain.com
www.sub.sub.domain.com -> domain.com
domain.co.uk -> domain.co.uk
sub.domain.co.uk -> domain.co.uk
www.domain.co.uk -> domain.co.uk
www.sub.sub.domain.co.uk -> domain.co.uk
domain.photography -> domain.photography
www.domain.photography -> domain.photography
www.sub.domain.photography -> domain.photography
I would do something like the following:
// hierarchical array of top level domains
$tlds = array(
'com' => true,
'uk' => array(
'co' => true,
// …
),
// …
);
$domain = 'here.example.co.uk';
// split domain
$parts = explode('.', $domain);
$tmp = $tlds;
// travers the tree in reverse order, from right to left
foreach (array_reverse($parts) as $key => $part) {
if (isset($tmp[$part])) {
$tmp = $tmp[$part];
} else {
break;
}
}
// build the result
var_dump(implode('.', array_slice($parts, - $key - 1)));
I ended up using the database Mozilla has.
Here's my code:
fetch_mozilla_tlds.php contains caching algorhythm. This line is important:
$mozillaTlds = file('http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1');
The main file used inside the application is this:
function isTopLevelDomain($domain)
{
$domainParts = explode('.', $domain);
if (count($domainParts) == 1) {
return false;
}
$previousDomainParts = $domainParts;
array_shift($previousDomainParts);
$tld = implode('.', $previousDomainParts);
return isDomainExtension($tld);
}
function isDomainExtension($domain)
{
$tlds = getTLDs();
/**
* direct hit
*/
if (in_array($domain, $tlds)) {
return true;
}
if (in_array('!'. $domain, $tlds)) {
return false;
}
$domainParts = explode('.', $domain);
if (count($domainParts) == 1) {
return false;
}
$previousDomainParts = $domainParts;
array_shift($previousDomainParts);
array_unshift($previousDomainParts, '*');
$wildcardDomain = implode('.', $previousDomainParts);
return in_array($wildcardDomain, $tlds);
}
function getTLDs()
{
static $mozillaTlds = array();
if (empty($mozillaTlds)) {
require 'fetch_mozilla_tlds.php';
/* #var $mozillaTlds array */
}
return $mozillaTlds;
}
UPDATE:
The database has evolved and is now available at its own website - http://publicsuffix.org/
Almost certainly, what you're looking for is this:
https://github.com/Synchro/regdom-php
It's a PHP library that utilizes the (as nearly as is practical) full list of various TLD's that's collected at publicsuffix.org/list/ , and wraps it up in a spiffy little function.
Once the library is included, it's as easy as:
$registeredDomain = getRegisteredDomain( $domain );
$full_domain = $_SERVER['SERVER_NAME'];
$just_domain = preg_replace("/^(.*\.)?([^.]*\..*)$/", "$2", $_SERVER['HTTP_HOST']);
There are two ways to extract subdomain from a host:
The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parser and TLDExtract.
The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:
function get_domaininfo($url) {
// regex can be replaced with parse_url
preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
$parts = explode(".", $matches[2]);
$tld = array_pop($parts);
$host = array_pop($parts);
if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
$tld = "$host.$tld";
$host = array_pop($parts);
}
return array(
'protocol' => $matches[1],
'subdomain' => implode(".", $parts),
'domain' => "$host.$tld",
'host'=>$host,'tld'=>$tld
);
}
Example:
print_r(get_domaininfo('https://mysubdomain.domain.co.uk/index.php'));
Returns:
Array
(
[protocol] => https
[subdomain] => mysubdomain
[domain] => domain.co.uk
[host] => domain
[tld] => co.uk
)
This is a short way of accomplishing that:
$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
Check this my simple solution!
function getHost($a){
$tld = preg_replace('/.*\.([a-zA-Z]+)$/','$1',$a);
return trim(preg_replace('/.*(([\.\/][a-zA-Z]{2,}){'.((substr_count($a, '.') <= 2 && mb_strlen( $tld) != 2) ? '2,3' : '3,4').'})/im','$1',$a),'./');
}
echo getHost('https://webmail.google.com.br')."<br>";
echo getHost('https://google.com.br')."<br>";
echo getHost('https://webmail.google.net.br')."<br>";
echo getHost('https://webmail.google.net')."<br>";
echo getHost('https://google.net')."<br>";
echo getHost('webmail.google.com.br')."<br>";
#output
google.com.br
google.com.br
google.net.br
google.net
google.net
google.com.br
As a variant to Jonathan Sampson
function get_domain($url) {
if ( !preg_match("/^http/", $url) )
$url = 'http://' . $url;
if ( $url[strlen($url)-1] != '/' )
$url .= '/';
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if ( preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs) ) {
$res = preg_replace('/^www\./', '', $regs['domain'] );
return $res;
}
return false;
}
This script generates a Perl file containing a single function, get_domain from the ETLD file. So say you have hostnames like img1, img2, img3, ... in .photobucket.com. For each of those get_domain $host would return photobucket.com. Note that this isn't the fastest function on earth, so in my main log parser that's using this, I keep a hash of host to domain mappings and only run this for hosts that aren't in the hash yet.
#!/bin/bash
cat << 'EOT' > suffixes.pl
#!/bin/perl
sub get_domain {
$_ = shift;
EOT
wget -O - http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 \
| iconv -c -f UTF-8 -t ASCII//TRANSLIT \
| egrep -v '/|^$' \
| sed -e 's/^\!//' -e "s/\"/'/g" \
| awk '{ print length($0),$0 | "sort -rn"}' | cut -d" " -f2- \
| while read SUFF; do
STAR=`echo $SUFF | cut -b1`
if [ "$STAR" = '*' ]; then
SUFF=`echo $SUFF | cut -b3-`
echo " return \"\$1\.\$2\.$SUFF\" if /([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-]+)\.$SUFF\$/;"
else
echo " return \"\$1\.$SUFF\" if /([a-zA-Z0-9\-]+)\.$SUFF\$/;"
fi
done >> suffixes.pl
cat << 'EOT' >> suffixes.pl
}
1;
EOT
This isn't foolproof and should only really be used if you know the domain isn't going to be anything obscure, but it's easier to read than most of the other options:
$justDomain = $_SERVER['SERVER_NAME'];
switch(substr_count($justDomain, '.')) {
case 1:
// 2 parts. Must not be a subdomain. Do nothing.
break;
case 2:
// 3 parts. Either a subdomain or a 2-part suffix
// If the 2nd part is over 3 chars's, assume it to be the main domain part which means we have a subdomain.
// This isn't foolproof, but should be ok for most domains.
// Something like domainname.parliament.nz would cause problems, though. As would www.abc.com
$parts = explode('.', $justDomain);
if(strlen($parts[1]) > 3) {
unset($parts[0]);
$justDomain = implode('.', $parts);
}
break;
default:
// 4+ parts. Must be a subdomain.
$parts = explode('.', $justDomain, 2);
$justDomain = $parts[1];
break;
}
// $justDomain should now exclude any subdomain part.
//For short domain like t.co (twitter) the function should be :
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{0,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
Regex could help you out there. Try something like this:
([^.]+(.com|.co.uk))$
I think your problem is that you haven't clearly defined what exactly you want the function to do. From your examples, you certainly don't want it to just blindly return the last two, or last three, components of the name, but just knowing what it shouldn't do isn't enough.
Here's my guess at what you really want: there are certain second-level domain names, like co.uk., that you'd like to be treated as a single TLD (top-level domain) for purposes of this function. In that case I'd suggest enumerating all such cases and putting them as keys into an associative array with dummy values, along with all the normal top-level domains like com., net., info., etc. Then whenever you get a new domain name, extract the last two components and see if the resulting string is in your array as a key. If not, extract just the last component and make sure that's in your array. (If even that isn't, it's not a valid domain name) Either way, whatever key you do find in the array, take that plus one more component off the end of the domain name, and you'll have your base domain.
You could, perhaps, make things a bit simpler by writing a function, instead of using an associative array, to tell whether the last two components should be treated as a single "effective TLD." The function would probably look at the next-to-last component and, if it's shorter than 3 characters, decide that it should be treated as part of the TLD.
To do it well, you'll need a list of the second level domains and top level domains and build an appropriate regular expression list. A good list of second level domains is available at https://wiki.mozilla.org/TLD_List. Another test case apart from the aforementioned CentralNic .uk.com variants is The Vatican: their website is technically at http://va : and that's a difficult one to match on!
Building on Jonathan's answer:
function main_domain($domain) {
if (preg_match('/([a-z0-9][a-z0-9\-]{1,63})\.([a-z]{3}|[a-z]{2}\.[a-z]{2})$/i', $domain, $regs)) {
return $regs;
}
return false;
}
His expression might be a bit better, but this interface seems more like what you're describing.
Ah - if you just want to handle three character top level domains - then this code works:
<?php
// let's test the code works: these should all return
// example.com , example.net or example.org
$domains=Array('here.example.com',
'example.com',
'example.org',
'here.example.org',
'example.com/ignorethis',
'example.net/',
'http://here.example.org/longtest?string=here');
foreach ($domains as $domain) {
testdomain($domain);
}
function testdomain($url) {
if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{3})(\/.*)?$/',$url,$matches)) {
print 'Domain is: '.$matches[3].'.'.$matches[4].'<br>'."\n";
} else {
print 'Domain not found in '.$url.'<br>'."\n";
}
}
?>
$matches[1]/$matches[2] will contain any subdomain and/or protocol, $matches[3] contains the domain name, $matches[4] the top level domain and $matches[5] contains any other URL path information.
To match most common top level domains you could try changing it to:
if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{2,6})(\/.*)?$/',$url,$matches)) {
Or to get it coping with everything:
if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.(co\.uk|me\.uk|org\.uk|com|org|net|int|eu)(\/.*)?$/',$url,$matches)) {
etc etc
Based on http://www.cafewebmaster.com/find-top-level-domain-international-urls-php
function find_tld($url){
$purl = parse_url($url);
$host = strtolower($purl['host']);
$valid_tlds = ".ab.ca .bc.ca .mb.ca .nb.ca .nf.ca .nl.ca .ns.ca .nt.ca .nu.ca .on.ca .pe.ca .qc.ca .sk.ca .yk.ca .com.cd .net.cd .org.cd .com.ch .net.ch .org.ch .gov.ch .co.ck .ac.cn .com.cn .edu.cn .gov.cn .net.cn .org.cn .ah.cn .bj.cn .cq.cn .fj.cn .gd.cn .gs.cn .gz.cn .gx.cn .ha.cn .hb.cn .he.cn .hi.cn .hl.cn .hn.cn .jl.cn .js.cn .jx.cn .ln.cn .nm.cn .nx.cn .qh.cn .sc.cn .sd.cn .sh.cn .sn.cn .sx.cn .tj.cn .xj.cn .xz.cn .yn.cn .zj.cn .com.co .edu.co .org.co .gov.co .mil.co .net.co .nom.co .com.cu .edu.cu .org.cu .net.cu .gov.cu .inf.cu .gov.cx .edu.do .gov.do .gob.do .com.do .org.do .sld.do .web.do .net.do .mil.do .art.do .com.dz .org.dz .net.dz .gov.dz .edu.dz .asso.dz .pol.dz .art.dz .com.ec .info.ec .net.ec .fin.ec .med.ec .pro.ec .org.ec .edu.ec .gov.ec .mil.ec .com.ee .org.ee .fie.ee .pri.ee .eun.eg .edu.eg .sci.eg .gov.eg .com.eg .org.eg .net.eg .mil.eg .com.es .nom.es .org.es .gob.es .edu.es .com.et .gov.et .org.et .edu.et .net.et .biz.et .name.et .info.et .co.fk .org.fk .gov.fk .ac.fk .nom.fk .net.fk .tm.fr .asso.fr .nom.fr .prd.fr .presse.fr .com.fr .gouv.fr .com.ge .edu.ge .gov.ge .org.ge .mil.ge .net.ge .pvt.ge .co.gg .net.gg .org.gg .com.gi .ltd.gi .gov.gi .mod.gi .edu.gi .org.gi .com.gn .ac.gn .gov.gn .org.gn .net.gn .com.gr .edu.gr .net.gr .org.gr .gov.gr .com.hk .edu.hk .gov.hk .idv.hk .net.hk .org.hk .com.hn .edu.hn .org.hn .net.hn .mil.hn .gob.hn .iz.hr .from.hr .name.hr .com.hr .com.ht .net.ht .firm.ht .shop.ht .info.ht .pro.ht .adult.ht .org.ht .art.ht .pol.ht .rel.ht .asso.ht .perso.ht .coop.ht .med.ht .edu.ht .gouv.ht .gov.ie .co.in .firm.in .net.in .org.in .gen.in .ind.in .nic.in .ac.in .edu.in .res.in .gov.in .mil.in .ac.ir .co.ir .gov.ir .net.ir .org.ir .sch.ir .gov.it .co.je .net.je .org.je .edu.jm .gov.jm .com.jm .net.jm .com.jo .org.jo .net.jo .edu.jo .gov.jo .mil.jo .co.kr .or.kr .com.kw .edu.kw .gov.kw .net.kw .org.kw .mil.kw .edu.ky .gov.ky .com.ky .org.ky .net.ky .org.kz .edu.kz .net.kz .gov.kz .mil.kz .com.kz .com.li .net.li .org.li .gov.li .gov.lk .sch.lk .net.lk .int.lk .com.lk .org.lk .edu.lk .ngo.lk .soc.lk .web.lk .ltd.lk .assn.lk .grp.lk .hotel.lk .com.lr .edu.lr .gov.lr .org.lr .net.lr .org.ls .co.ls .gov.lt .mil.lt .gov.lu .mil.lu .org.lu .net.lu .com.lv .edu.lv .gov.lv .org.lv .mil.lv .id.lv .net.lv .asn.lv .conf.lv .com.ly .net.ly .gov.ly .plc.ly .edu.ly .sch.ly .med.ly .org.ly .id.ly .co.ma .net.ma .gov.ma .org.ma .tm.mc .asso.mc .org.mg .nom.mg .gov.mg .prd.mg .tm.mg .com.mg .edu.mg .mil.mg .com.mk .org.mk .com.mo .net.mo .org.mo .edu.mo .gov.mo .org.mt .com.mt .gov.mt .edu.mt .net.mt .com.mu .co.mu .aero.mv .biz.mv .com.mv .coop.mv .edu.mv .gov.mv .info.mv .int.mv .mil.mv .museum.mv .name.mv .net.mv .org.mv .pro.mv .com.mx .net.mx .org.mx .edu.mx .gob.mx .com.my .net.my .org.my .gov.my .edu.my .mil.my .name.my .edu.ng .com.ng .gov.ng .org.ng .net.ng .gob.ni .com.ni .edu.ni .org.ni .nom.ni .net.ni .gov.nr .edu.nr .biz.nr .info.nr .com.nr .net.nr .ac.nz .co.nz .cri.nz .gen.nz .geek.nz .govt.nz .iwi.nz .maori.nz .mil.nz .net.nz .org.nz .school.nz .com.pf .org.pf .edu.pf .com.pg .net.pg .com.ph .gov.ph .com.pk .net.pk .edu.pk .org.pk .fam.pk .biz.pk .web.pk .gov.pk .gob.pk .gok.pk .gon.pk .gop.pk .gos.pk .com.pl .biz.pl .net.pl .art.pl .edu.pl .org.pl .ngo.pl .gov.pl .info.pl .mil.pl .waw.pl .warszawa.pl .wroc.pl .wroclaw.pl .krakow.pl .poznan.pl .lodz.pl .gda.pl .gdansk.pl .slupsk.pl .szczecin.pl .lublin.pl .bialystok.pl .olsztyn.pl .torun.pl .biz.pr .com.pr .edu.pr .gov.pr .info.pr .isla.pr .name.pr .net.pr .org.pr .pro.pr .edu.ps .gov.ps .sec.ps .plo.ps .com.ps .org.ps .net.ps .com.pt .edu.pt .gov.pt .int.pt .net.pt .nome.pt .org.pt .publ.pt .net.py .org.py .gov.py .edu.py .com.py .com.ru .net.ru .org.ru .pp.ru .msk.ru .int.ru .ac.ru .gov.rw .net.rw .edu.rw .ac.rw .com.rw .co.rw .int.rw .mil.rw .gouv.rw .com.sa .edu.sa .sch.sa .med.sa .gov.sa .net.sa .org.sa .pub.sa .com.sb .gov.sb .net.sb .edu.sb .com.sc .gov.sc .net.sc .org.sc .edu.sc .com.sd .net.sd .org.sd .edu.sd .med.sd .tv.sd .gov.sd .info.sd .org.se .pp.se .tm.se .parti.se .press.se .ab.se .c.se .d.se .e.se .f.se .g.se .h.se .i.se .k.se .m.se .n.se .o.se .s.se .t.se .u.se .w.se .x.se .y.se .z.se .ac.se .bd.se .com.sg .net.sg .org.sg .gov.sg .edu.sg .per.sg .idn.sg .edu.sv .com.sv .gob.sv .org.sv .red.sv .gov.sy .com.sy .net.sy .ac.th .co.th .in.th .go.th .mi.th .or.th .net.th .ac.tj .biz.tj .com.tj .co.tj .edu.tj .int.tj .name.tj .net.tj .org.tj .web.tj .gov.tj .go.tj .mil.tj .com.tn .intl.tn .gov.tn .org.tn .ind.tn .nat.tn .tourism.tn .info.tn .ens.tn .fin.tn .net.tn .gov.to .gov.tp .com.tr .info.tr .biz.tr .net.tr .org.tr .web.tr .gen.tr .av.tr .dr.tr .bbs.tr .name.tr .tel.tr .gov.tr .bel.tr .pol.tr .mil.tr .k12.tr .edu.tr .co.tt .com.tt .org.tt .net.tt .biz.tt .info.tt .pro.tt .name.tt .edu.tt .gov.tt .gov.tv .edu.tw .gov.tw .mil.tw .com.tw .net.tw .org.tw .idv.tw .game.tw .ebiz.tw .club.tw .co.tz .ac.tz .go.tz .or.tz .ne.tz .com.ua .gov.ua .net.ua .edu.ua .org.ua .cherkassy.ua .ck.ua .chernigov.ua .cn.ua .chernovtsy.ua .cv.ua .crimea.ua .dnepropetrovsk.ua .dp.ua .donetsk.ua .dn.ua .if.ua .kharkov.ua .kh.ua .kherson.ua .ks.ua .khmelnitskiy.ua .km.ua .kiev.ua .kv.ua .kirovograd.ua .kr.ua .lugansk.ua .lg.ua .lutsk.ua .lviv.ua .nikolaev.ua .mk.ua .odessa.ua .od.ua .poltava.ua .pl.ua .rovno.ua .rv.ua .sebastopol.ua .sumy.ua .ternopil.ua .te.ua .uzhgorod.ua .vinnica.ua .vn.ua .zaporizhzhe.ua .zp.ua .zhitomir.ua .zt.ua .co.ug .ac.ug .sc.ug .go.ug .ne.ug .or.ug .ac.uk .co.uk .gov.uk .ltd.uk .me.uk .mil.uk .mod.uk .net.uk .nic.uk .nhs.uk .org.uk .plc.uk .police.uk .bl.uk .icnet.uk .jet.uk .nel.uk .nls.uk .parliament.uk .sch.uk .ak.us .al.us .ar.us .az.us .ca.us .co.us .ct.us .dc.us .de.us .dni.us .fed.us .fl.us .ga.us .hi.us .ia.us .id.us .il.us .in.us .isa.us .kids.us .ks.us .ky.us .la.us .ma.us .md.us .me.us .mi.us .mn.us .mo.us .ms.us .mt.us .nc.us .nd.us .ne.us .nh.us .nj.us .nm.us .nsn.us .nv.us .ny.us .oh.us .ok.us .or.us .pa.us .ri.us .sc.us .sd.us .tn.us .tx.us .ut.us .vt.us .va.us .wa.us .wi.us .wv.us .wy.us .edu.uy .gub.uy .org.uy .com.uy .net.uy .mil.uy .com.ve .net.ve .org.ve .info.ve .co.ve .web.ve .com.vi .org.vi .edu.vi .gov.vi .com.vn .net.vn .org.vn .edu.vn .gov.vn .int.vn .ac.vn .biz.vn .info.vn .name.vn .pro.vn .health.vn .com.ye .net.ye .ac.yu .co.yu .org.yu .edu.yu .ac.za .city.za .co.za .edu.za .gov.za .law.za .mil.za .nom.za .org.za .school.za .alt.za .net.za .ngo.za .tm.za .web.za .co.zm .org.zm .gov.zm .sch.zm .ac.zm .co.zw .org.zw .gov.zw .ac.zw .com.ac .edu.ac .gov.ac .net.ac .mil.ac .org.ac .nom.ad .net.ae .co.ae .gov.ae .ac.ae .sch.ae .org.ae .mil.ae .pro.ae .name.ae .com.ag .org.ag .net.ag .co.ag .nom.ag .off.ai .com.ai .net.ai .org.ai .gov.al .edu.al .org.al .com.al .net.al .com.am .net.am .org.am .com.ar .net.ar .org.ar .e164.arpa .ip6.arpa .uri.arpa .urn.arpa .gv.at .ac.at .co.at .or.at .com.au .net.au .asn.au .org.au .id.au .csiro.au .gov.au .edu.au .com.aw .com.az .net.az .org.az .com.bb .edu.bb .gov.bb .net.bb .org.bb .com.bd .edu.bd .net.bd .gov.bd .org.bd .mil.be .ac.be .gov.bf .com.bm .edu.bm .org.bm .gov.bm .net.bm .com.bn .edu.bn .org.bn .net.bn .com.bo .org.bo .net.bo .gov.bo .gob.bo .edu.bo .tv.bo .mil.bo .int.bo .agr.br .am.br .art.br .edu.br .com.br .coop.br .esp.br .far.br .fm.br .g12.br .gov.br .imb.br .ind.br .inf.br .mil.br .net.br .org.br .psi.br .rec.br .srv.br .tmp.br .tur.br .tv.br .etc.br .adm.br .adv.br .arq.br .ato.br .bio.br .bmd.br .cim.br .cng.br .cnt.br .ecn.br .eng.br .eti.br .fnd.br .fot.br .fst.br .ggf.br .jor.br .lel.br .mat.br .med.br .mus.br .not.br .ntr.br .odo.br .ppg.br .pro.br .psc.br .qsl.br .slg.br .trd.br .vet.br .zlg.br .dpn.br .nom.br .com.bs .net.bs .org.bs .com.bt .edu.bt .gov.bt .net.bt .org.bt .co.bw .org.bw .gov.by .mil.by .ac.cr .co.cr .ed.cr .fi.cr .go.cr .or.cr .sa.cr .com.cy .biz.cy .info.cy .ltd.cy .pro.cy .net.cy .org.cy .name.cy .tm.cy .ac.cy .ekloges.cy .press.cy .parliament.cy .com.dm .net.dm .org.dm .edu.dm .gov.dm .biz.fj .com.fj .info.fj .name.fj .net.fj .org.fj .pro.fj .ac.fj .gov.fj .mil.fj .school.fj .com.gh .edu.gh .gov.gh .org.gh .mil.gh .co.hu .info.hu .org.hu .priv.hu .sport.hu .tm.hu .2000.hu .agrar.hu .bolt.hu .casino.hu .city.hu .erotica.hu .erotika.hu .film.hu .forum.hu .games.hu .hotel.hu .ingatlan.hu .jogasz.hu .konyvelo.hu .lakas.hu .media.hu .news.hu .reklam.hu .sex.hu .shop.hu .suli.hu .szex.hu .tozsde.hu .utazas.hu .video.hu .ac.id .co.id .or.id .go.id .ac.il .co.il .org.il .net.il .k12.il .gov.il .muni.il .idf.il .co.im .net.im .gov.im .org.im .nic.im .ac.im .org.jm .ac.jp .ad.jp .co.jp .ed.jp .go.jp .gr.jp .lg.jp .ne.jp .or.jp .hokkaido.jp .aomori.jp .iwate.jp .miyagi.jp .akita.jp .yamagata.jp .fukushima.jp .ibaraki.jp .tochigi.jp .gunma.jp .saitama.jp .chiba.jp .tokyo.jp .kanagawa.jp .niigata.jp .toyama.jp .ishikawa.jp .fukui.jp .yamanashi.jp .nagano.jp .gifu.jp .shizuoka.jp .aichi.jp .mie.jp .shiga.jp .kyoto.jp .osaka.jp .hyogo.jp .nara.jp .wakayama.jp .tottori.jp .shimane.jp .okayama.jp .hiroshima.jp .yamaguchi.jp .tokushima.jp .kagawa.jp .ehime.jp .kochi.jp .fukuoka.jp .saga.jp .nagasaki.jp .kumamoto.jp .oita.jp .miyazaki.jp .kagoshima.jp .okinawa.jp .sapporo.jp .sendai.jp .yokohama.jp .kawasaki.jp .nagoya.jp .kobe.jp .kitakyushu.jp .per.kh .com.kh .edu.kh .gov.kh .mil.kh .net.kh .org.kh .net.lb .org.lb .gov.lb .edu.lb .com.lb .com.lc .org.lc .edu.lc .gov.lc .army.mil .navy.mil .weather.mobi .music.mobi .ac.mw .co.mw .com.mw .coop.mw .edu.mw .gov.mw .int.mw .museum.mw .net.mw .org.mw .mil.no .stat.no .kommune.no .herad.no .priv.no .vgs.no .fhs.no .museum.no .fylkesbibl.no .folkebibl.no .idrett.no .com.np .org.np .edu.np .net.np .gov.np .mil.np .org.nr .com.om .co.om .edu.om .ac.com .sch.om .gov.om .net.om .org.om .mil.om .museum.om .biz.om .pro.om .med.om .com.pa .ac.pa .sld.pa .gob.pa .edu.pa .org.pa .net.pa .abo.pa .ing.pa .med.pa .nom.pa .com.pe .org.pe .net.pe .edu.pe .mil.pe .gob.pe .nom.pe .law.pro .med.pro .cpa.pro .vatican.va .ac .ad .ae .aero .af .ag .ai .al .am .an .ao .aq .ar .arpa .as .at .au .aw .az .ba .bb .bd .be .bf .bg .bh .bi .biz .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cat .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .com .coop .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .edu .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gov .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .info .int .io .iq .ir .is .it .je .jm .jo .jobs .jp .ke .kg .kh .ki .km .kn .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .mg .mh .mil .mk .ml .mm .mn .mo .mobi .mp .mq .mr .ms .mt .mu .museum .mv .mw .na .name .nc .ne .net .nf .ng .ni .nl .no .np .nr .nu .nz .om .org .pa .pe .pf .pg .ph .pk .pl .pm .pn .post .pr .pro .ps .pt .pw .py .qa .re .ro .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .travel .tt .tv .tw .tz .ua .ug .uk .um .us .uy .uz .va .vc .ve .vg .vi .vn .vuwf .ye .yt .yu .za .zm .zw .ca .cd .ch .cn .cu .cx .dm .dz .ec .ee .es .fr .ge .gg .gi .gr .hk .hn .hr .ht .hu .ie .in .ir .it .je .jo .jp .kr .ky .li .lk .lt .lu .lv .ly .ma .mc .mg .mk .mo .mt .mu .nl .no .nr .nr .pf .ph .pk .pl .pr .ps .pt .ro .ru .rw .sc .sd .se .sg .tj .to .to .tt .tv .tw .tw .tw .tw .ua .ug .us .vi .vn";
$tld_regex = '#(.*?)([^.]+)('.str_replace(array('.',' '),array('\\.','|'),$valid_tlds).')$#';
//remove the extension
preg_match($tld_regex,$host,$matches);
if(!empty($matches) && sizeof($matches) > 2){
$extension = array_pop($matches);
$tld = array_pop($matches);
return $tld.$extension;
}else{ //change to "false" if you prefer
return $host;
}
}
Here is what I am using:
It works great without needing any arrays for tld's
$split = array_reverse(explode(".", $_SERVER['HTTP_HOST']));
$domain = $split[1].".".$split[0];
if(function_exists('gethostbyname'))
{
if(gethostbyname($domain) != $_SERVER['SERVER_ADDR'] && isset($split[2]))
{
$domain = $split[2].".".$split[1].".".$split[0];
}
}
It is not possible without using a TLD list to compare with as their exist many cases like
http://www.db.de/ or http://bbc.co.uk/
But even with that you won't have success in every case because of SLD's like http://big.uk.com/ or http://www.uk.com/
If you need a complete list you can use the public suffix list:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Feel free to use my function. It won't use regex and it is fast:
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878
Here is how you strip the TLD from any URL - I wrote the code to work on my site: http://internet-portal.me/ - This is a working solution that is used on my site.
$host is the URL that has to be parsed. This code is a simple solution and reliable
compared to everything else I have seen, It works on any URL that I have tried!!!
see this code parsing the page you are looking at right now!
http://internet-portal.me/domain/?dns=https://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain/6320437#6320437
================================================================================
$host = filter_var($_GET['dns']);
$host = $host . '/'; // needed if URL does not have trailing slash
// Strip www, http, https header ;
$host = str_replace( 'http://www.' , '' , $host );
$host = str_replace( 'https://www.' , '' , $host );
$host = str_replace( 'http://' , '' , $host );
$host = str_replace( 'https://' , '' , $host );
$pos = strpos($host, '/'); // find any sub directories
$host = substr( $host, 0, $pos ); //strip directories
$hostArray = explode (".", $host); // count parts of TLD
$size = count ($hostArray) -1; // really only need to know if not a single level TLD
$tld = $hostArray[$size]; // do we need to parse the TLD any further -
// remove subdomains?
if ($size > 1) {
if ($tld == "aero" or $tld == "asia" or $tld == "biz" or $tld == "cat" or
$tld == "com" or $tld == "coop" or $tld == "edu" or $tld == "gov" or
$tld == "info" or $tld == "int" or $tld == "jobs" or $tld == "me" or
$tld == "mil" or $tld == "mobi" or $tld == "museum" or $tld == "name" or
$tld == "net" or $tld == "org" or $tld == "pro" or $tld == "tel" or
$tld == "travel" or $tld == "tv" or $tld == "ws" or $tld == "XXX") {
$host = $hostArray[$size -1].".".$hostArray[$size]; // parse to 2 level TLD
} else {
// parse to 3 level TLD
$host = $hostArray[$size -2].".".$hostArray[$size -1].".".$hostArray[$size] ;
}
}
No need for listing all the countries TLD, they are all 2 letters, besides the special ones listed by IANA
https://gist.github.com/pocesar/5366899
and the tests are here http://codepad.viper-7.com/QfueI0
Comprehensive test suit along with working code. The only caveat is that it won't work with unicode domain names, but that's another level of data extraction.
From the list, I'm testing against:
$urls = array(
'www.example.com' => 'example.com',
'example.com' => 'example.com',
'example.com.br' => 'example.com.br',
'www.example.com.br' => 'example.com.br',
'www.example.gov.br' => 'example.gov.br',
'localhost' => 'localhost',
'www.localhost' => 'localhost',
'subdomain.localhost' => 'localhost',
'www.subdomain.example.com' => 'example.com',
'subdomain.example.com' => 'example.com',
'subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.biz.br' => 'example.biz.br',
'subdomain.example.biz.br' => 'example.biz.br',
'subdomain.example.net' => 'example.net',
'www.subdomain.example.net' => 'example.net',
'www.subdomain.example.co.kr' => 'example.co.kr',
'subdomain.example.co.kr' => 'example.co.kr',
'example.co.kr' => 'example.co.kr',
'example.jobs' => 'example.jobs',
'www.example.jobs' => 'example.jobs',
'subdomain.example.jobs' => 'example.jobs',
'insane.subdomain.example.jobs' => 'example.jobs',
'insane.subdomain.example.com.br' => 'example.com.br',
'www.doubleinsane.subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.jobs' => 'example.jobs',
'test' => 'test',
'www.test' => 'test',
'subdomain.test' => 'test',
'www.detran.sp.gov.br' => 'sp.gov.br',
'www.mp.sp.gov.br' => 'sp.gov.br',
'ny.library.museum' => 'library.museum',
'www.ny.library.museum' => 'library.museum',
'ny.ny.library.museum' => 'library.museum',
'www.library.museum' => 'library.museum',
'info.abril.com.br' => 'abril.com.br',
'127.0.0.1' => '127.0.0.1',
'::1' => '::1',
);
As already said Public Suffix List is only one way to parse domain correctly. I recomend TLDExtract package, here is sample code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('here.example.com');
$result->getSubdomain(); // will return (string) 'here'
$result->getHostname(); // will return (string) 'example'
$result->getSuffix(); // will return (string) 'com'
This is to get domain.tld in any case
public static function get_domain_with_extension($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : $pieces['path'];
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
NO NEED FOR REGEX. There exists native parse_url:
echo parse_url($your_url)['host'];
I am trying to get the search keyword from a referrer url. Currently, I am using the following code for Google urls. But sometimes it is not working...
$query_get = "(q|p)";
$referrer = "http://www.google.com/search?hl=en&q=learn+php+2&client=firefox";
preg_match('/[?&]'.$query_get.'=(.*?)[&]/',$referrer,$search_keyword);
Is there another/clean/working way to do this?
Thank you,
Prasad
If you're using PHP5 take a look at http://php.net/parse_url and http://php.net/parse_str
Example:
// The referrer
$referrer = 'http://www.google.com/search?hl=en&q=learn+php+2&client=firefox';
// Parse the URL into an array
$parsed = parse_url( $referrer, PHP_URL_QUERY );
// Parse the query string into an array
parse_str( $parsed, $query );
// Output the result
echo $query['q'];
There are different query strings on different search engines. After trying Wiliam's method, I have figured out my own method. (Because, Yahoo's is using 'p', but sometimes 'q')
$referrer = "http://search.yahoo.com/search?p=www.stack+overflow%2Ccom&ei=utf-8&fr=slv8-msgr&xargs=0&pstart=1&b=61&xa=nSFc5KjbV2gQCZejYJqWdQ--,1259335755";
$referrer_query = parse_url($referrer);
$referrer_query = $referrer_query['query'];
$q = "[q|p]"; //Yahoo uses both query strings, I am using switch() for each search engine
preg_match('/'.$q.'=(.*?)&/',$referrer,$keyword);
$keyword = urldecode($keyword[1]);
echo $keyword; //Outputs "www.stack overflow,com"
Thank you,
Prasad
To supplement the other answers, note that the query string parameter that contains the search terms varies by search provider. This snippet of PHP shows the correct parameter to use:
$search_engines = array(
'q' => 'alltheweb|aol|ask|ask|bing|google',
'p' => 'yahoo',
'wd' => 'baidu',
'text' => 'yandex'
);
Source: http://betterwp.net/wordpress-tips/get-search-keywords-from-referrer/
<?php
class GET_HOST_KEYWORD
{
public function get_host_and_keyword($_url) {
$p = $q = "";
$chunk_url = parse_url($_url);
$_data["host"] = ($chunk_url['host'])?$chunk_url['host']:'';
parse_str($chunk_url['query']);
$_data["keyword"] = ($p)?$p:(($q)?$q:'');
return $_data;
}
}
// Sample Example
$obj = new GET_HOST_KEYWORD();
print_r($obj->get_host_and_keyword('http://www.google.co.in/search?sourceid=chrome&ie=UTF-&q=hire php php programmer'));
// sample output
//Array
//(
// [host] => www.google.co.in
// [keyword] => hire php php programmer
//)
// $search_engines = array(
// 'q' => 'alltheweb|aol|ask|ask|bing|google',
// 'p' => 'yahoo',
// 'wd' => 'baidu',
// 'text' => 'yandex'
//);
?>
$query = parse_url($request, PHP_URL_QUERY);
This one should work For Google, Bing and sometimes, Yahoo Search:
if( isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER']) {
$query = getSeQuery($_SERVER['HTTP_REFERER']);
echo $query;
} else {
echo "I think they spelled REFERER wrong? Anyways, your browser says you don't have one.";
}
function getSeQuery($url = false) {
$segments = parse_url($url);
$keywords = null;
if($query = isset($segments['query']) ? $segments['query'] : (isset($segments['fragment']) ? $segments['fragment'] : null)) {
parse_str($query, $segments);
$keywords = isset($segments['q']) ? $segments['q'] : (isset($segments['p']) ? $segments['p'] : null);
}
return $keywords;
}
I believe google and yahoo had updated their algorithm to exclude search keywords and other params in the url which cannot be received using http_referrer method.
Please let me know if above recommendations will still provide the search keywords.
What I am receiving now are below when using http referrer at my website end.
from google: https://www.google.co.in/
from yahoo: https://in.yahoo.com/
Ref: https://webmasters.googleblog.com/2012/03/upcoming-changes-in-googles-http.html