check for subdomain in parse_url

check for subdomain in parse_url - php

I am trying to write a function to just get the users profile id or username from Facebook. They enter there url into a form then I'm trying to figure out if it's a Facebook profile page or other page. The problem is that if they enter an app page or other page that has a subdomain I would like to ignore that request.
Right now I have:
$author_url = http://facebook.com/profile?id=12345;
if(preg_match("/facebook/i",$author_url)){
$parse_author_url = (parse_url($author_url));
$parse_author_url_q = $parse_author_url['query'];
if(preg_match('/id[=]([0-9]*)/', $parse_author_url_q, $match)){
$fb_id = "/".$match[1];}
else{ $fb_id = $parse_author_url['path'];
}
$grav_url= "http://graph.facebook.com".$fb_id."/picture?type=square";
}
echo $gav_url;
This works if $author_url has "id=" then use that as the profile id if not then it must be a user name or page name so use that instead. I need to run one more check that if the url contains facebook but is a subdomain ignore it. I belive I can do that in the first preg_match preg_match("/facebook/i",$author_url)
Thanks!

To ignore facebook subdomains you can ensure that
$parse_author_url['host']
is facebook.com.
If its anything else like login.facebook.com or apps.facebook.com you need not proceed.
Alternatively you can also ensure that the URL begins with http://facebook.com as:
if(preg_match("#(?:http://)?facebook#i",$author_url)){

This isn't a direct solution for what you were asking but the parts are here to do what you need to do.
I found that a subdomain resulted in an issue with parse_url. Namely it returned an array with only $result['path'] and no 'host' or 'scheme'.
My theory here is if there is no 'host' or 'scheme' results from parse_url and it has domain suffix ( .ext ) in the string, it is a subdomain.
Here is the code:
(the $src is a url I had to sort out the relative src from subdomains ):
$srcA = parse_url( $src );
//..if no scheme or host test if subdomain.
if( !$srcA['scheme'] && !$srcA['host'] ){
//..this string / array is set elsewhere but for this example I will put it here
$tld = "AC,AD,AE,AERO,AF,AG,AI,AL,AM,AN,AO,AQ,AR,ARPA,AS,ASIA,AT,AU,AW,AX,AZ,BA,BB,BD,BE,BF,BG,BH,BI,BIZ,BJ,BM,BN,BO,BR,BS,BT,BV,BW,BY,BZ,CA,CAT,CC,CD,CF,CG,CH,CI,CK,CL,CM,CN,CO,COM,COOP,CR,CU,CV,CW,CX,CY,CZ,DE,DJ,DK,DM,DO,DZ,EC,EDU,EE,EG,ER,ES,ET,EU,FI,FJ,FK,FM,FO,FR,GA,GB,GD,GE,GF,GG,GH,GI,GL,GM,GN,GOV,GP,GQ,GR,GS,GT,GU,GW,GY,HK,HM,HN,HR,HT,HU,ID,IE,IL,IM,IN,INFO,INT,IO,IQ,IR,IS,IT,JE,JM,JO,JOBS,JP,KE,KG,KH,KI,KM,KN,KP,KR,KW,KY,KZ,LA,LB,LC,LI,LK,LR,LS,LT,LU,LV,LY,MA,MC,MD,ME,MG,MH,MIL,MK,ML,MM,MN,MO,MOBI,MP,MQ,MR,MS,MT,MU,MUSEUM,MV,MW,MX,MY,MZ,NA,NAME,NC,NE,NET,NF,NG,NI,NL,NO,NP,NR,NU,NZ,OM,ORG,PA,PE,PF,PG,PH,PK,PL,PM,PN,POST,PR,PRO,PS,PT,PW,PY,QA,RE,RO,RS,RU,RW,SA,SB,SC,SD,SE,SG,SH,SI,SJ,SK,SL,SM,SN,SO,SR,ST,SU,SV,SX,SY,SZ,TC,TD,TEL,TF,TG,TH,TJ,TK,TL,TM,TN,TO,TP,TR,TRAVEL,TT,TV,TW,TZ,UA,UG,UK,US,UY,UZ,VA,VC,VE,VG,VI,VN,VU,WF,WS,XXX,YE,YT,ZA,ZM,ZW";
$tldA = explode( ',' , strtolower( $tld ) );
$isSubdomain = false;
foreach( $tldA as $tld ){
if( strstr( $src , '.'.$tld)!=false){
$isSubdomain = true;
break;
}
}
//..prefixing with the $host if it is not a subdomain.
$src = $isSubdomain ? $src : $src = $host . '/' . $srcA['path'];
}
Could write a further confirmation by parsing the subdomain==true strings before the first '/' and testing against characters with a RegEx.
Hope this helps some people out.

Related

Remove subdomain from URL/host to match domains in affiliate link array

I want to make a redirect file using php which can add Affiliates tag automatically to all links. Like how it works https://freekaamaal.com/links?url=https://www.amazon.in/ .
If I open the above link it automatically add affiliate tag to the link and the final link which is open is this ‘https://www.amazon.in/?tag=freekaamaal-21‘ And same for Flipkart and many other sites also.
It automatically add affiliate tags to various links. For example amazon, Flipkart, ajio,etc.
I’ll be very thankful if anyone can help me regarding this.
Thanks in advance 🙏
Right now i made this below code but problem is that sometimes link have extra subdomain for example https://dl.flipkart.com/ or https://m.shopclues.com/ , etc for these type links it does not redirect from the array instead of this it redirect to default link.
<?php
$subid = isset($_GET['subid']) ? $_GET['subid'] : 'telegram'; //subid for external tracking
$affid = $_GET['url']; //main link
$parse = parse_url($affid);
$host = $parse['host'];
$host = str_ireplace('www.', '', $host);
//flipkart affiliate link generates here
$url_parts = parse_url($affid);
$url_parts['host'] = 'dl.flipkart.com';
$url_parts['path'] .= "/";
if(strpos($url_parts['path'],"/dl/") !== 0) $url_parts['path'] = '/dl'.rtrim($url_parts['path'],"/");
$url = $url_parts['scheme'] . "://" . $url_parts['host'] . $url_parts['path'] . (empty($url_parts['query']) ? '' : '?' . $url_parts['query']);
$afftag = "harshk&affExtParam1=$subid"; //our affiliate ID
if (strpos($url, '?') !== false) {
if (substr($url, -1) == "&") {
$url = $url.'affid='.$afftag;
} else {
$url = $url.'&affid='.$afftag;
}
} else { // start a new query string
$url = $url.'?affid='.$afftag;
}
$flipkartlink = $url;
//amazon link generates here
$amazon = $affid;
$amzntag = "subhdeals-21"; //our affiliate ID
if (strpos($amazon, '?') !== false) {
if (substr($amazon, -1) == "&") {
$amazon = $amazon.'tag='.$amzntag;
} else {
$amazon = $amazon.'&tag='.$amzntag;
}
} else { // start a new query string
$amazon = $amazon.'?tag='.$amzntag;
}
}
$amazonlink = $amazon;
$cueurl = "https://linksredirect.com/?subid=$subid&source=linkkit&url="; //cuelinks deeplink for redirection
$ulpsub = '&subid=' .$subid; //subid
$encoded = urlencode($affid); //url encode
$home = $cueurl . $encoded; // default link for redirection.
$partner = array( //Insert links here
"amazon.in" => "$amazonlink",
"flipkart.com" => "$flipkartlink",
"shopclues.com" => $cueurl . $encoded,
"aliexpress.com" => $cueurl . $encoded,
"ajio.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
"croma.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
"myntra.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
);
$store = array_key_exists($host, $partner) === false ? $home : $partner[$host]; //Checks if the host exists if not then redirect to your default link
header("Location: $store"); //Do not changing
exit(); //Do not changing
?>

Thank you for updating your answer with the code you have and explaining what the actual problem is. Since your reference array for the affiliate links is indexed by base domain, we will need to normalize the hostname to remove any possible subdomains. Right now you have:
$host = str_ireplace('www.', '', $host);
Which will do the job only if the subdomain is www., obviously. Now, one might be tempted to simply explode by . and take the last two components. However that'd fail with your .co.id and other second-level domains. We're better off using a regular expression.
One could craft a universal regular expression that handles all possible second-level domains (co., net., org.; edu.,...) but that'd become a long list. For your use case, since your list currently only has the .com, .in and .co.in domain extensions, and is unlikely to have many more, we'll just hard-code these into the regex to keep things fast and simple:
$host = preg_replace('#^.*?([^.]+\.)(com|id|co\.id)$#i', '\1\2', $host);
To explain the regex we're using:
^ start-of-subject anchor;
.*? ungreedy optional match for any characters (if a subdomain -- or a sub-sub-domain exists);
([^.]+\.) capturing group for non-. characters followed by . (main domain name)
(com|id|co\.id) capturing group for domain extension (add to list as necessary)
$ end-of-subject anchor
Then we replace the hostname with the contents of the capture groups that matched domain. and its extension. This will return example.com for www.example.com, foo.bar.example.com -- or example.com; and example.co.id for www.example.co.id, foo.bar.example.co.id -- or example.co.id. This should help your script work as intended. If there are further problems, please update the OP and we'll see what solutions are available.

How to redirect to correct URL without knowing input URL

Basically how stack overflow does it.
So if the old URL is : /product-old-url_152 and then it changes to /product-new-url_152, then the following URLs would all redirect here:
/product-old-url_152
/product-some-other-url_152
would both redirect to:
/product-new-url_152
What's the best way of doing this?
EDIT: 152 is the ID of the post in the database.

One way to do this:
Extract the id from the requested URL
if (preg_match('/product-(.*)_(\d+)$/', $_SERVER['REQUEST_URI'], $matches)) {
$old = $matches[1];
$id = $matches[2];
lookup the new URL in the database
$slug = fetch_slug_from_database($id);
and send a redirect to the client, if the URL changed
if ($slug !== $old) {
header("Location: /product-$slug-$id");
exit;
}
}

PHP - Determine if a URL is an internal or external URL

This is a self Q&A
I found myself often needing to parse a URL supplied by a CMS user to determine if it's an external URL, or an internal one. Often clients want external URL's to be highlighted differently, or to force target="_blank" for them.
So, I want a piece of code that can parse a URL and determine if it's an internal or external URL, and then return a different class and target for either scenario.

This below code takes a URL as a string, then two different class names as strings and compares the URL to the host (I also commented out a WordPress specific piece of code if needed).
function parse_external_url( $url = '', $internal_class = 'internal-link', $external_class = 'external-link') {
// Abort if parameter URL is empty
if( empty($url) ) {
return false;
}
// Parse home URL and parameter URL
$link_url = parse_url( $url );
$home_url = parse_url( $_SERVER['HTTP_HOST'] );
//$home_url = parse_url( home_url() ); // Works for WordPress
// Decide on target
if( empty($link_url['host']) ) {
// Is an internal link
$target = '_self';
$class = $internal_class;
} elseif( $link_url['host'] == $home_url['host'] ) {
// Is an internal link
$target = '_self';
$class = $internal_class;
} else {
// Is an external link
$target = '_blank';
$class = $external_class;
}
// Return array
$output = array(
'class' => $class,
'target' => $target,
'url' => $url
);
return $output;
}
You would use the code like this:
$url_data = parse_external_url( 'http://www.funkhaus.us', 'internal-link-class', 'external-link-class' );
This is a link

Change URL Extension from given URL [duplicate]

This question already has answers here:
How to get host name from this kind of URL?
(2 answers)
Closed 8 years ago.
Is there any way to accept a URL and change it's domain to .com ?
For example if a user were to submit www.example.in, I want to check if the URL is valid, and change that to www.example.com. I have built a regex checker that can check if the URL is valid, but I'm not entirely sure how to check if the given extension is valid, and then to change it to .com
EDIT : To be clear I am not actually going to these URL's. I am getting them submitted as user input in a form, and am simply storing them. These are functions I want to do to the URL before storing, that is all.
Edit 2 : An example to make this clearer -
$url = 'www.example.co.uk'
$newurl = function($url);
echo $newurl
which would yield the output
www.example.com

Are you looking for something like this on the server side to replace a list of selected TLDs to be translated to .coms?
<?php
$url = "www.example.in";
$replacement_tld = "com";
# array of all TLDs you wish to support
$valid_tlds = array("in","co.uk");
# possible TLD source lists
# http://data.iana.org/TLD/tlds-alpha-by-domain.txt
# https://wiki.mozilla.org/TLD_List
# from http://stackoverflow.com/a/10473026/723139
function endsWith($haystack, $needle)
{
$haystack = strtolower($haystack);
$needle = strtolower($needle);
return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}
foreach($valid_tlds as $tld){
if(endsWith($url, $tld))
{
echo substr($url, 0, -strlen($tld)) . $replacement_tld . "\n";
break;
}
}
?>

Create an empty text file using a text editor such as notepad, and save it as htaccess.txt.
301 (Permanent) Redirect: Point an entire site to a different URL on a permanent basis. This is the most common type of redirect and is useful in most situations. In this example, we are redirecting to the "mt-example.com" domain:
# This allows you to redirect your entire website to any other domain
Redirect 301 / http://mt-example.com/
302 (Temporary) Redirect: Point an entire site to a different temporary URL. This is useful for SEO purposes when you have a temporary landing page and plan to switch back to your main landing page at a later date:
# This allows you to redirect your entire website to any other domain
Redirect 302 / http://mt-example.com/
For more details : http://kb.mediatemple.net/questions/242/How+do+I+redirect+my+site+using+a+.htaccess+file%3F

The question is not entirely clear, I'm assuming you wish to make this logic on PHP part.
Here's useful function to parse such strings:
function parseUrl ( $url )
{
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)#)?";
$r .= "(?P<host>(?:(?P<subdomain>[\w\.\-]+)\.)?" . "(?P<domain>\w+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!";
preg_match( $r, $url, $out );
return $out;
}
You can parse URL, validate it, and then recreate from resulting array replacing anything you want.
If you want to practice regexp and create own patterns - this site will be best place to do it.
If your goal to route users from one url to another or change URI style, then you need to use mod rewrite.
Actually in this case you will end up configuring your web server, probably virtual host, because it will route only listed domains (those being parked at the server).

To validate a URL in PHP You can use filter_var() .
filter_var($url, FILTER_VALIDATE_URL))
and then to get Top Level Domain (TLD) and replace the it with .com , you can use following function :
$url="http://www.dslreports.in";
$ext="com";
function change_url($url,$ext)
{
if(filter_var($url, FILTER_VALIDATE_URL)) {
$tld = '';
$url_parts = parse_url( (string) $url );
if( is_array( $url_parts ) && isset( $url_parts[ 'host' ] ) )
{
$host_parts = explode( '.', $url_parts[ 'host' ] );
if( is_array( $host_parts ) && count( $host_parts ) > 0 )
{
$tld = array_pop( $host_parts );
}
}
$new_url= str_replace($tld,$ext,$url);
return $new_url;
}else{
return "Not a valid URl";
}
}
echo change_url($url,$ext);
Hope this helps!

How do you strip out the domain name from a URL in php?

Im looking for a method (or function) to strip out the domain.ext part of any URL thats fed into the function. The domain extension can be anything (.com, .co.uk, .nl, .whatever), and the URL thats fed into it can be anything from http://www.domain.com to www.domain.com/path/script.php?=whatever
Whats the best way to go about doing this?

parse_url turns a URL into an associative array:
php > $foo = "http://www.example.com/foo/bar?hat=bowler&accessory=cane";
php > $blah = parse_url($foo);
php > print_r($blah);
Array
(
[scheme] => http
[host] => www.example.com
[path] => /foo/bar
[query] => hat=bowler&accessory=cane
)

You can also write a regular expression to get exactly what you want.
Here is my attempt at it:
$pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
$url = 'http://www.example.com/foo/bar?hat=bowler&accessory=cane';
if (preg_match($pattern, $url, $matches) === 1) {
echo $matches[0];
}
The output is:
example.com
This pattern also takes into consideration domains such as 'example.com.au'.
Note: I have not consulted the relevant RFC.

You can use parse_url() to do this:
$url = 'http://www.example.com';
$domain = parse_url($url, PHP_URL_HOST);
$domain = str_replace('www.','',$domain);
In this example, $domain should contain example.com, irrespective of it having www or not. It also works for a domain such as .co.uk

Following code will trim protocol, domain and port from absolute URL:
$urlWithoutDomain = preg_replace('#^.+://[^/]+#', '', $url);

Here are a couple simple functions to get the root domain (example.com) from a normal or long domain (test.sub.domain.com) or url (http://www.example.com).
/**
* Get root domain from full domain
* #param string $domain
*/
public function getRootDomain($domain)
{
$domain = explode('.', $domain);
$tld = array_pop($domain);
$name = array_pop($domain);
$domain = "$name.$tld";
return $domain;
}
/**
* Get domain name from url
* #param string $url
*/
public function getDomainFromUrl($url)
{
$domain = parse_url($url, PHP_URL_HOST);
$domain = $this->getRootDomain($domain);
return $domain;
}

Solved this...
Say we're calling dev.mysite.com and we want to extract 'mysite.com'
$requestedServerName = $_SERVER['SERVER_NAME']; // = dev.mysite.com
$thisSite = explode('.', $requestedServerName); // site name now an array
array_shift($thisSite); //chop off the first array entry eg 'dev'
$thisSite = join('.', $thisSite); //join it back together with dots ;)
echo $thisSite; //outputs 'mysite.com'
Works with mysite.co.uk too so should work everywhere :)

I spent some time thinking about whether it makes sense to use a regular expression for this, but in the end I think not.
firstresponder's regexp came close to convincing me it was the best way, but it didn't work on anything missing a trailing slash (so http://example.com, for instance). I fixed that with the following: '/\w+\..{2,3}(?:\..{2,3})?(?=[\/\W])/i', but then I realized that matches twice for urls like 'http://example.com/index.htm'. Oops. That wouldn't be so bad (just use the first one), but it also matches twice on something like this: 'http://abc.ed.fg.hij.kl.mn/', and the first match isn't the right one. :(
A co-worker suggested just getting the host (via parse_url()), and then just taking the last two or three array bits (split() on '.') The two or three would be based on a list of domains, like 'co.uk', etc. Making up that list becomes the hard part.

There is only one correct way to extract domain parts, it's use Public Suffix List (database of TLDs). I recomend TLDExtract package, here is sample code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('www.domain.com/path/script.php?=whatever');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'domain'
$result->getSuffix(); // will return (string) 'com'

This function should work:
function Delete_Domain_From_Url($Url = false)
{
if($Url)
{
$Url_Parts = parse_url($Url);
$Url = isset($Url_Parts['path']) ? $Url_Parts['path'] : '';
$Url .= isset($Url_Parts['query']) ? "?".$Url_Parts['query'] : '';
}
return $Url;
}
To use it:
$Url = "https://stackoverflow.com/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php";
echo Delete_Domain_From_Url($Url);
# Output:
#/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

check for subdomain in parse_url - php

Related

Remove subdomain from URL/host to match domains in affiliate link array

How to redirect to correct URL without knowing input URL

PHP - Determine if a URL is an internal or external URL

Change URL Extension from given URL [duplicate]

How do you strip out the domain name from a URL in php?

Categories

Resources