PHP - Determine if a URL is an internal or external URL - php

This is a self Q&A
I found myself often needing to parse a URL supplied by a CMS user to determine if it's an external URL, or an internal one. Often clients want external URL's to be highlighted differently, or to force target="_blank" for them.
So, I want a piece of code that can parse a URL and determine if it's an internal or external URL, and then return a different class and target for either scenario.

This below code takes a URL as a string, then two different class names as strings and compares the URL to the host (I also commented out a WordPress specific piece of code if needed).
function parse_external_url( $url = '', $internal_class = 'internal-link', $external_class = 'external-link') {
// Abort if parameter URL is empty
if( empty($url) ) {
return false;
}
// Parse home URL and parameter URL
$link_url = parse_url( $url );
$home_url = parse_url( $_SERVER['HTTP_HOST'] );
//$home_url = parse_url( home_url() ); // Works for WordPress
// Decide on target
if( empty($link_url['host']) ) {
// Is an internal link
$target = '_self';
$class = $internal_class;
} elseif( $link_url['host'] == $home_url['host'] ) {
// Is an internal link
$target = '_self';
$class = $internal_class;
} else {
// Is an external link
$target = '_blank';
$class = $external_class;
}
// Return array
$output = array(
'class' => $class,
'target' => $target,
'url' => $url
);
return $output;
}
You would use the code like this:
$url_data = parse_external_url( 'http://www.funkhaus.us', 'internal-link-class', 'external-link-class' );
This is a link

Related

Remove subdomain from URL/host to match domains in affiliate link array

I want to make a redirect file using php which can add Affiliates tag automatically to all links. Like how it works https://freekaamaal.com/links?url=https://www.amazon.in/ .
If I open the above link it automatically add affiliate tag to the link and the final link which is open is this ‘https://www.amazon.in/?tag=freekaamaal-21‘ And same for Flipkart and many other sites also.
It automatically add affiliate tags to various links. For example amazon, Flipkart, ajio,etc.
I’ll be very thankful if anyone can help me regarding this.
Thanks in advance 🙏
Right now i made this below code but problem is that sometimes link have extra subdomain for example https://dl.flipkart.com/ or https://m.shopclues.com/ , etc for these type links it does not redirect from the array instead of this it redirect to default link.
<?php
$subid = isset($_GET['subid']) ? $_GET['subid'] : 'telegram'; //subid for external tracking
$affid = $_GET['url']; //main link
$parse = parse_url($affid);
$host = $parse['host'];
$host = str_ireplace('www.', '', $host);
//flipkart affiliate link generates here
$url_parts = parse_url($affid);
$url_parts['host'] = 'dl.flipkart.com';
$url_parts['path'] .= "/";
if(strpos($url_parts['path'],"/dl/") !== 0) $url_parts['path'] = '/dl'.rtrim($url_parts['path'],"/");
$url = $url_parts['scheme'] . "://" . $url_parts['host'] . $url_parts['path'] . (empty($url_parts['query']) ? '' : '?' . $url_parts['query']);
$afftag = "harshk&affExtParam1=$subid"; //our affiliate ID
if (strpos($url, '?') !== false) {
if (substr($url, -1) == "&") {
$url = $url.'affid='.$afftag;
} else {
$url = $url.'&affid='.$afftag;
}
} else { // start a new query string
$url = $url.'?affid='.$afftag;
}
$flipkartlink = $url;
//amazon link generates here
$amazon = $affid;
$amzntag = "subhdeals-21"; //our affiliate ID
if (strpos($amazon, '?') !== false) {
if (substr($amazon, -1) == "&") {
$amazon = $amazon.'tag='.$amzntag;
} else {
$amazon = $amazon.'&tag='.$amzntag;
}
} else { // start a new query string
$amazon = $amazon.'?tag='.$amzntag;
}
}
$amazonlink = $amazon;
$cueurl = "https://linksredirect.com/?subid=$subid&source=linkkit&url="; //cuelinks deeplink for redirection
$ulpsub = '&subid=' .$subid; //subid
$encoded = urlencode($affid); //url encode
$home = $cueurl . $encoded; // default link for redirection.
$partner = array( //Insert links here
"amazon.in" => "$amazonlink",
"flipkart.com" => "$flipkartlink",
"shopclues.com" => $cueurl . $encoded,
"aliexpress.com" => $cueurl . $encoded,
"ajio.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
"croma.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
"myntra.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
);
$store = array_key_exists($host, $partner) === false ? $home : $partner[$host]; //Checks if the host exists if not then redirect to your default link
header("Location: $store"); //Do not changing
exit(); //Do not changing
?>
Thank you for updating your answer with the code you have and explaining what the actual problem is. Since your reference array for the affiliate links is indexed by base domain, we will need to normalize the hostname to remove any possible subdomains. Right now you have:
$host = str_ireplace('www.', '', $host);
Which will do the job only if the subdomain is www., obviously. Now, one might be tempted to simply explode by . and take the last two components. However that'd fail with your .co.id and other second-level domains. We're better off using a regular expression.
One could craft a universal regular expression that handles all possible second-level domains (co., net., org.; edu.,...) but that'd become a long list. For your use case, since your list currently only has the .com, .in and .co.in domain extensions, and is unlikely to have many more, we'll just hard-code these into the regex to keep things fast and simple:
$host = preg_replace('#^.*?([^.]+\.)(com|id|co\.id)$#i', '\1\2', $host);
To explain the regex we're using:
^ start-of-subject anchor;
.*? ungreedy optional match for any characters (if a subdomain -- or a sub-sub-domain exists);
([^.]+\.) capturing group for non-. characters followed by . (main domain name)
(com|id|co\.id) capturing group for domain extension (add to list as necessary)
$ end-of-subject anchor
Then we replace the hostname with the contents of the capture groups that matched domain. and its extension. This will return example.com for www.example.com, foo.bar.example.com -- or example.com; and example.co.id for www.example.co.id, foo.bar.example.co.id -- or example.co.id. This should help your script work as intended. If there are further problems, please update the OP and we'll see what solutions are available.

strpos(): Empty needle WordPress Plugin

I've just finished building my first plugin and have tested it with various plugins on my personal site with no errors. However some users are saying the plugin is causing the following errors for them:
strpos(): Empty needle in
/west/XXXXX/public_html/wp-content/plugins/bot-block/bot-plugin.php on
line 200
On line 200 I have this:
//See if the domain that referred is in the current block url
$pos = strpos( $referrer, $site );
Now I can't see a problem with that line so I'll give you the whole function:
//Check referrer function
function bot_block_parse()
{
//Get the options for the plugin
$options = get_option( 'bot_block' );
//See if the request was from another site
if( isset( $_SERVER['HTTP_REFERER'] ) )
{
//Split the URL into it's components
$referrer = parse_url( $_SERVER['HTTP_REFERER'] );
//Trim the components
$referrer = array_map( 'trim', $referrer );
//Get the domain name
$referrer = $referrer['host'];
//Get the block list
$list = $this->create_block_list();
//Loop through all the blocked domains
foreach( $list as $site )
{
//Trim the domain
$site = trim( $site );
//Set the prefix for domains that aren't sub domains
$prefix = 'www';
//Split domain into smaller components
$domainParts = explode( ".", $referrer );
//See if the domain that referred is in the current block url
$pos = strpos( $referrer, $site );
//See if block subdomains is checked
if( isset( $options['subdomains'] ) )
{
//Check to see if the domain was the current blocked site and if the prefix is not www
if( $pos !== false && $domainParts[0] != $prefix )
{
//Log spam
$this->log_spam( $site );
//Call the redirect function to see where to send the user
$this->bot_block_redirect();
exit;
}
}
//See if the domain was the current site blocked and the prefix is www
if( $pos !== false && $domainParts[0] == $prefix )
{
//Log spam
$this->log_spam( $site );
//Call the redirect function to see where to send the user
$this->bot_block_redirect();
exit;
}
}
}
}
If you need to see the full plugin code I have put it on pastebin here: http://pastebin.com/gw7YbPVa
Can anybody help me figure this out please?
The quick fix is to see if your needle ($site) is empty before attempting to call strpos(). If it is empty, certainly it can't be found in the haystack, so we should skip altogether and set $pos to false.
$pos = strpos( $referrer, $site );
Becomes:
if ( $site == '' || !$site ) {
$pos = false;
} else {
$pos = strpos( $referrer, $site );
}
The better solution is to determine why your $site variable is empty in the first place. Does each child element in $list array contain another array, instead of a string as you expect? You can use var_dump( $site ); in your loop to see the contents of that variable.

Change URL Extension from given URL [duplicate]

This question already has answers here:
How to get host name from this kind of URL?
(2 answers)
Closed 8 years ago.
Is there any way to accept a URL and change it's domain to .com ?
For example if a user were to submit www.example.in, I want to check if the URL is valid, and change that to www.example.com. I have built a regex checker that can check if the URL is valid, but I'm not entirely sure how to check if the given extension is valid, and then to change it to .com
EDIT : To be clear I am not actually going to these URL's. I am getting them submitted as user input in a form, and am simply storing them. These are functions I want to do to the URL before storing, that is all.
Edit 2 : An example to make this clearer -
$url = 'www.example.co.uk'
$newurl = function($url);
echo $newurl
which would yield the output
www.example.com
Are you looking for something like this on the server side to replace a list of selected TLDs to be translated to .coms?
<?php
$url = "www.example.in";
$replacement_tld = "com";
# array of all TLDs you wish to support
$valid_tlds = array("in","co.uk");
# possible TLD source lists
# http://data.iana.org/TLD/tlds-alpha-by-domain.txt
# https://wiki.mozilla.org/TLD_List
# from http://stackoverflow.com/a/10473026/723139
function endsWith($haystack, $needle)
{
$haystack = strtolower($haystack);
$needle = strtolower($needle);
return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}
foreach($valid_tlds as $tld){
if(endsWith($url, $tld))
{
echo substr($url, 0, -strlen($tld)) . $replacement_tld . "\n";
break;
}
}
?>
Create an empty text file using a text editor such as notepad, and save it as htaccess.txt.
301 (Permanent) Redirect: Point an entire site to a different URL on a permanent basis. This is the most common type of redirect and is useful in most situations. In this example, we are redirecting to the "mt-example.com" domain:
# This allows you to redirect your entire website to any other domain
Redirect 301 / http://mt-example.com/
302 (Temporary) Redirect: Point an entire site to a different temporary URL. This is useful for SEO purposes when you have a temporary landing page and plan to switch back to your main landing page at a later date:
# This allows you to redirect your entire website to any other domain
Redirect 302 / http://mt-example.com/
For more details : http://kb.mediatemple.net/questions/242/How+do+I+redirect+my+site+using+a+.htaccess+file%3F
The question is not entirely clear, I'm assuming you wish to make this logic on PHP part.
Here's useful function to parse such strings:
function parseUrl ( $url )
{
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)#)?";
$r .= "(?P<host>(?:(?P<subdomain>[\w\.\-]+)\.)?" . "(?P<domain>\w+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!";
preg_match( $r, $url, $out );
return $out;
}
You can parse URL, validate it, and then recreate from resulting array replacing anything you want.
If you want to practice regexp and create own patterns - this site will be best place to do it.
If your goal to route users from one url to another or change URI style, then you need to use mod rewrite.
Actually in this case you will end up configuring your web server, probably virtual host, because it will route only listed domains (those being parked at the server).
To validate a URL in PHP You can use filter_var() .
filter_var($url, FILTER_VALIDATE_URL))
and then to get Top Level Domain (TLD) and replace the it with .com , you can use following function :
$url="http://www.dslreports.in";
$ext="com";
function change_url($url,$ext)
{
if(filter_var($url, FILTER_VALIDATE_URL)) {
$tld = '';
$url_parts = parse_url( (string) $url );
if( is_array( $url_parts ) && isset( $url_parts[ 'host' ] ) )
{
$host_parts = explode( '.', $url_parts[ 'host' ] );
if( is_array( $host_parts ) && count( $host_parts ) > 0 )
{
$tld = array_pop( $host_parts );
}
}
$new_url= str_replace($tld,$ext,$url);
return $new_url;
}else{
return "Not a valid URl";
}
}
echo change_url($url,$ext);
Hope this helps!

How To Check Whether A URL Is External URL or Internal URL With PHP?

I'm getting all ahrefs of a page with this loop:
foreach($html->find('a[href!="#"]') as $ahref) {
$ahrefs++;
}
I want to do something like this:
foreach($html->find('a[href!="#"]') as $ahref) {
if(isexternal($ahref)) {
$external++;
}
$ahrefs++;
}
Where isexternal is a function
function isexternal($url) {
// FOO...
// Test if link is internal/external
if(/*condition is true*/) {
return true;
}
else {
return false;
}
}
Help!
Use parse_url and compare host to your local host (often but not always it's the same as $_SERVER['HTTP_HOST'])
function isexternal($url) {
$components = parse_url($url);
return !empty($components['host']) && strcasecmp($components['host'], 'example.com'); // empty host will indicate url like '/relative.php'
}
Hovewer this will treat www.example.com and example.com as different hosts. If you want all your subdomains to be treated as local links then the function will be somewhat larger:
function isexternal($url) {
$components = parse_url($url);
if ( empty($components['host']) ) return false; // we will treat url like '/relative.php' as relative
if ( strcasecmp($components['host'], 'example.com') === 0 ) return false; // url host looks exactly like the local host
return strrpos(strtolower($components['host']), '.example.com') !== strlen($components['host']) - strlen('.example.com'); // check if the url host is a subdomain
}
This is how you can simply detect external URLs:
$url = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';
$internal = (
false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
stripos( $url, '.' . $domain ) || // include subdomains, like "www.my-domain.com". DANGEROUS (see below)!
(
0 !== strpos( $url, '//' ) && // exclude protocol relative URLs, like "//example.com"
0 === strpos( $url, '/' ) // include root-relative URLs, like "/demo"
)
);
The above check will treat www.my-domain.com and my-domain.com as being "internal".
Why this rule is dangerous:
The subdomain logic introduces a weakness that could be exploited: When an external URL contains your domain inside the path, for example, https://external.com/www.my-domain.com is treated as internal!
More secure code:
This problem can be eliminated by removing subdomain support (which I suggest to do):
$url = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';
$internal = (
false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
(
0 !== strpos( $url, '//' ) && // exclude protocol relative URLs, like "//example.com"
0 === strpos( $url, '/' ) // include root-relative URLs, like "/demo"
)
);
function isexternal($url) {
// FOO...
// Test if link is internal/external
if(strpos($url,'domainname.com') !== false || strpos($url,"/") === '0')
{
return true;
}
else
{
return false;
}
}
I know this post is old but here my function i coded right now. Maybe some other need it too.
function IsResourceLocal($url){
if( empty( $url ) ){ return false; }
$urlParsed = parse_url( $url );
$host = $urlParsed['host'];
if( empty( $host ) ){
/* maybe we have a relative link like: /wp-content/uploads/image.jpg */
/* add absolute path to begin and check if file exists */
$doc_root = $_SERVER['DOCUMENT_ROOT'];
$maybefile = $doc_root.$url;
/* Check if file exists */
$fileexists = file_exists ( $maybefile );
if( $fileexists ){
/* maybe you want to convert to full url? */
return true;
}
}
/* strip www. if exists */
$host = str_replace('www.','',$host);
$thishost = $_SERVER['HTTP_HOST'];
/* strip www. if exists */
$thishost = str_replace('www.','',$thishost);
if( $host == $thishost ){
return true;
}
return false;
}
You probably want to check if the link is in the same domain. That will only work though if all your href attributes are absolute and contain the domain. Relative ones like /test/file.html are tricky because one can have folders that have the same name as domains.. So, if you have full url's in each link:
function isexternal($url) {
// Test if link is internal/external
if(stristr($url, "myDomain.com") || strpos($url,"/") == '0')
return true;
else
return false;
}

check for subdomain in parse_url

I am trying to write a function to just get the users profile id or username from Facebook. They enter there url into a form then I'm trying to figure out if it's a Facebook profile page or other page. The problem is that if they enter an app page or other page that has a subdomain I would like to ignore that request.
Right now I have:
$author_url = http://facebook.com/profile?id=12345;
if(preg_match("/facebook/i",$author_url)){
$parse_author_url = (parse_url($author_url));
$parse_author_url_q = $parse_author_url['query'];
if(preg_match('/id[=]([0-9]*)/', $parse_author_url_q, $match)){
$fb_id = "/".$match[1];}
else{ $fb_id = $parse_author_url['path'];
}
$grav_url= "http://graph.facebook.com".$fb_id."/picture?type=square";
}
echo $gav_url;
This works if $author_url has "id=" then use that as the profile id if not then it must be a user name or page name so use that instead. I need to run one more check that if the url contains facebook but is a subdomain ignore it. I belive I can do that in the first preg_match preg_match("/facebook/i",$author_url)
Thanks!
To ignore facebook subdomains you can ensure that
$parse_author_url['host']
is facebook.com.
If its anything else like login.facebook.com or apps.facebook.com you need not proceed.
Alternatively you can also ensure that the URL begins with http://facebook.com as:
if(preg_match("#(?:http://)?facebook#i",$author_url)){
This isn't a direct solution for what you were asking but the parts are here to do what you need to do.
I found that a subdomain resulted in an issue with parse_url. Namely it returned an array with only $result['path'] and no 'host' or 'scheme'.
My theory here is if there is no 'host' or 'scheme' results from parse_url and it has domain suffix ( .ext ) in the string, it is a subdomain.
Here is the code:
(the $src is a url I had to sort out the relative src from subdomains ):
$srcA = parse_url( $src );
//..if no scheme or host test if subdomain.
if( !$srcA['scheme'] && !$srcA['host'] ){
//..this string / array is set elsewhere but for this example I will put it here
$tld = "AC,AD,AE,AERO,AF,AG,AI,AL,AM,AN,AO,AQ,AR,ARPA,AS,ASIA,AT,AU,AW,AX,AZ,BA,BB,BD,BE,BF,BG,BH,BI,BIZ,BJ,BM,BN,BO,BR,BS,BT,BV,BW,BY,BZ,CA,CAT,CC,CD,CF,CG,CH,CI,CK,CL,CM,CN,CO,COM,COOP,CR,CU,CV,CW,CX,CY,CZ,DE,DJ,DK,DM,DO,DZ,EC,EDU,EE,EG,ER,ES,ET,EU,FI,FJ,FK,FM,FO,FR,GA,GB,GD,GE,GF,GG,GH,GI,GL,GM,GN,GOV,GP,GQ,GR,GS,GT,GU,GW,GY,HK,HM,HN,HR,HT,HU,ID,IE,IL,IM,IN,INFO,INT,IO,IQ,IR,IS,IT,JE,JM,JO,JOBS,JP,KE,KG,KH,KI,KM,KN,KP,KR,KW,KY,KZ,LA,LB,LC,LI,LK,LR,LS,LT,LU,LV,LY,MA,MC,MD,ME,MG,MH,MIL,MK,ML,MM,MN,MO,MOBI,MP,MQ,MR,MS,MT,MU,MUSEUM,MV,MW,MX,MY,MZ,NA,NAME,NC,NE,NET,NF,NG,NI,NL,NO,NP,NR,NU,NZ,OM,ORG,PA,PE,PF,PG,PH,PK,PL,PM,PN,POST,PR,PRO,PS,PT,PW,PY,QA,RE,RO,RS,RU,RW,SA,SB,SC,SD,SE,SG,SH,SI,SJ,SK,SL,SM,SN,SO,SR,ST,SU,SV,SX,SY,SZ,TC,TD,TEL,TF,TG,TH,TJ,TK,TL,TM,TN,TO,TP,TR,TRAVEL,TT,TV,TW,TZ,UA,UG,UK,US,UY,UZ,VA,VC,VE,VG,VI,VN,VU,WF,WS,XXX,YE,YT,ZA,ZM,ZW";
$tldA = explode( ',' , strtolower( $tld ) );
$isSubdomain = false;
foreach( $tldA as $tld ){
if( strstr( $src , '.'.$tld)!=false){
$isSubdomain = true;
break;
}
}
//..prefixing with the $host if it is not a subdomain.
$src = $isSubdomain ? $src : $src = $host . '/' . $srcA['path'];
}
Could write a further confirmation by parsing the subdomain==true strings before the first '/' and testing against characters with a RegEx.
Hope this helps some people out.

Categories