How to force an HTML link to be absolute? - php

In my website, users can put an URL in their profile.
This URL can be http://www.google.com or www.google.com or google.com.
If I just insert in my PHP code $url, the link is not always absolute.
How can I force the a tag to be absolute ?

If you prefix the URL with // it will be treated as an absolute one. For example:
Google.
Keep in mind this will use the same protocol the page is being served with (e.g. if your page's URL is https://path/to/page the resulting URL will be https://google.com).

Use a protocol, preferably http://
Google
Ask users to enter url in this format, or concatenate http:// if not added.
If you prefix the URL only with //, it will use the same protocol the page is being served with.
Google

I recently had to do something similar.
if (strpos($url, 'http') === false) {
$url = 'http://' .$url;
}
Basically, if the url doesn't contain 'http' add it to the front of the string (prefix).
Or we can do this with RegEx
$http_pattern = "/^http[s]*:\/\/[\w]+/i";
if (!preg_match($http_pattern, $url, $match)){
$url = 'http://' .$url;
}
Thank you to #JamesHilton for pointing out a mistake. Thank you!

Related

Q: How to check the extension of an URL website?

I would like to know if there is a way to check the extension of an URL website ? For example, do something when the website is like http://example.es/ and do another thing when the website is like http://example.fr/
I have check that there is something like
$actual_link = "http://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
which returns the current URL of the web page.
Thanks for help.
Use parse_url() function to get host part of the url then explode by . and get last element of an array
Example below:
$url = 'http://' . $_SERVER['SERVER_NAME'];
echo end(explode(".", parse_url($url, PHP_URL_HOST)));
// echos "com"
From your example I assume that you are using PHP, then you can use parse_url to get the components.
https://www.php.net/parse-url
For example you can get the host example.fr and example.com, then do explode on host string to get the tld, .fr or .com, which should help you to do further if-else.

file_get_contents() returns all local/short links as 404

I am currently setting up a site, which requires some sort of "proxy" work. Basically through $_GET['url'] I can grab a site's content using file_get_contents($url). However, when links are shown like: <a href="images/image.png".../>, they will link to my site instead of theirs, which makes all images, links, etc. load from my site, which returns a 404 not found error.
I have not been able to find anything about this anywhere. How I do the "proxying" in theory, but not as a final product:
$url = $_GET['url'];
$content = file_get_contents($url);
echo $content;
What could I possibly do to change this, so all links doesn't depend on what the browser sees, but where they actually come from (the site link in $_GET['url']), which basically turns relative links into absolute? Thanks!
You would have to know what their site is in order to make a request from it.
To do this, you can parse the url:
$urlParsed = parse_url($url);
$urlHostOnly= $urlParsed['scheme'] . "://" . $urlParsed['host'] . "/";
Then, the tricky part, you have to prepend the host only url to each link.
Most links in html are in hrefs and src values so here is a simple replacer to deal with those.
$content = file_get_contents($url);
$replaced_content = preg_replace(
"/(href|src)=\"((?!http[s]:\/\/[a-z\.]{2,6}).*)\"/",
"$1=\"$urlHostOnly$2\"",
$content
);
Now that you have the replaced contents, echo it to the client
echo $replaced_content;
Note: There can be some conflict with respect to stylesheets and ssl if you do not specify the correct protocol (http / https) when entering the url.
See: http://i.imgur.com/tz6Hn28.png for an example of this.
Seems like I've solved this from an advice from a friend.
//grabs the URL of the site I am working with (the $_GET['url'] site basically)
$fullUrl = basename($url);
//Replaces <head> with <head> followed by a base-tag, which has the href attribute of the website.
//This will make all relative links absolute to that base-tag href.
$content = str_replace("<head>", "<head>\n<base href='http://" . $fullUrl . "' />", $content);
echo $content;
Voilá, this site now functions perfectly.
EDIT: Okay, it did not work perfectly.. for some reason. If the URL linked to a file like help.asp, basename() would return with help.asp. I went with a different route:
function addhttp($url) {
if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
$url = "http://" . $url;
}
return $url;
}
$url = addhttp($url);
preg_match('/^(?:https?:\/\/)?(?:[^#\n]+#)?(?:www\.)?([^:\/\n]+)/', $url, $fullUrl);
$fullUrl = $fullUrl[1];
No more wrong URLs being loaded. This all work... for now.

converting mixed(absolute,relative) to absolute link

i have this php script which works well an convert almost all pages nicely but in few pages it is unable to convert relative url to absolute url.it is giving wrong result for below links.
$url = 'http://www.lowridermagazine.com/girls/1201_lrms_cat_cuesta_lowrider_girls_model/photo_01.html';
// Example of a relative link of the page above.
$relative = 'photo_01.html';
// Parse the URL the crawler was sent to.
$url = parse_url($url);
if(FALSE === filter_var($relative, FILTER_VALIDATE_URL))
{
// If the link isn't a valid URL then assume it's relative and
// construct an absolute URL.
print $url['scheme'].'://'.$url['host'].'/'.ltrim($relative, '/');
}
else
{
$relative;
}
it works nicely for http://www.santabanta.com/photos/shriya/9830084.htm url but fails in above url.
any idea where i am doing mistake

URL output from database

With this code:
if(empty($aItemInfo['url'])) {
$url = '<p> </p>';
} else {
$url = ' | LINK';
}
I've got this as output:
http://localhost/tester/www.google.com
In db there is only www.google.com and ofcourse it's fictional.
What am I doing wrong?
You need to add http:// while parsing your code, before using it in the <a> tag.
If all your URLs will be without http:// use this code:
$url = 'http://'.$aItemInfo['url'];
Then use $url
Not too sure what you're trying to link to. If you're linking to an external site you'll need to add http:// in front of the link. If not, the link will be added to the end of the current domain name as shown above
You can put links are relative or absolute paths. If you don't include the "http://" part, then it assumes that it is a relative path. Add href="http://'.$aItemInfo['url'].'"

How to detect if link is from a particular domain

I need to be able to detect with PHP if a link is from a particular domain. I can not just check if domain is present in the link because it can be faked by appending domain.
Thanks.
Just use parse_url() as konforce mentioned. For example:
$url = "http://www.google.com/";
$parts = parse_url ($url);
print $parts["host"]; // will print www.google.com
// Or, for PHP 5.1 and above
$host = parse_url ($url, PHP_URL_HOST); // returns www.google.com
Now, the good thing about this is that appending the domain to the end of an url like this:
http://www.google.com/?www.foo.com
Wont work as the host element will still say that the link points to www.google.com and not www.foo.com.
Hope this helps.
I believe you'd want check the referrer and make sure you check with double forward slashes, since that's part of the protocol (HTTP/HTTPS) and can't be faked.
Check this link for extra reference: Determining Referer in PHP
I would check against something like...
//www.mydomain.com
//mydomain.com

Categories