PHP creating back-link with $_SERVER['HTTP_REFERER']

PHP creating back-link with $_SERVER['HTTP_REFERER'] - php

Is it safe to create a back link with:
$backLink = htmlentities($_SERVER['HTTP_REFERER']);
or is there a better solution?

An easier way might be to do something like this:
Go back
That does not rely on the browser populating the Referer header, but instead does exactly the same thing as pressing the browser "Back" button.
This may be considered better since it actually goes back in the browser history, instead of adding the previous page to the browser history in the forward direction. It acts just as you would expect the Back button to act.

It's quite safe, as long as you check for its existance. In some browsers it can be turned off, and I'm not sure that it's mandatory for browsers anyhow. But the baseline is, you can't count on it existing. (RFC2616 doesn't say the referer-header must exist.)
If you really need reverse navigation, perhaps you could instead use a session variable to save the previous (current really, but only update it after displaying the back-link) page visited.

Given that:
The referer header is optional
Some security software will rewrite the referer header (e.g. to XXXX:XXXXXXXX or Advert For Product)
Linking to the referer will, at best, duplicate the built in functionality of the back button
User's will often expect a link marked 'back' to take them to the previous page in a sequence, not to the previous page they were on
No, it isn't safe. The dangers are not great, but the benefits are tiny.

It will work in some cases. However, you should be aware that the HTTP referer header is not guaranteed. User agents (browsers, search spoders etc) cannot be relied on to send anything, correct or not. In addition, if a user browses directly to the page, no referer header will be present. Some internet security software products even strip out the HTTP referer for "security" reasons.
If you wish to use this solution, be sure to have a fallback in place such as not showing the back link, or linking to a default start page or something (it would depend on the situation this is to be used in).
An alternative solution might be to use javascript to navigate to "history.back". This will use the browser's back/history function to return to the previous page the user was on.

I think Facebook use a similar technique to redirect the user.
They use GET variable called 'from'.

You must be careful with htmlentities because it corrupts non-ASCII encoding. For example,
echo(htmlentities("Привет, друг!")); //Contains russian letters
is displayed as
Ïðèâåò,
äðóã!
Which is of course incorrect.
Every browser sends non-ASCI chars in URLs as it wants to. Mozilla in Unicode, IE in system's current charset (Windows-1251 in Russia).
So, that might be useful to replace htmlentities with htmlspecialchars.

Related

PHP redirect without # [duplicate]

For some reason, non IE browsers seem to persist a URL hash (if present) when a server-side redirect is sent (using the Location header). Example:
// a simple redirect using Response.Redirect("http://www.yahoo.com");
Text.aspx
If I visit:
Test.aspx#foo
In Firefox/Chrome, I'm taken to:
http://www.yahoo.com#foo
Can anyone explain why this happens? I've tried this with various server side redirects in different platforms as well (all resulting in the Location header, though) and this always seems to happen. I don't see it anywhere in the HTTP spec, but it really seems to be a problem with the browsers themselves. The URL hash (as expected) is never sent to the server, so the server redirect isn't polluted by it, the browsers are just persisting it for some reason.
Any ideas?

I suggest that this is the correct behaviour. The 302 and 307 status codes indicate that the resource is to be found elsewhere. #bookmark is a location within the resource.
Once the resource (html document) has been located it is for the browser to locate the #bookmark within the document.
The analogy is this: You want to look something up in a book in chapter 57, so you go to the library to get the book. But there is a note on the shelf saying the book has moved, it is now in the other building. So you go to the new location. You still want chapter 57 - it is irrelevant where you got the book.

This is an aspect that was not covered by previous HTTP specifications but has been addressed in the later HTTP development:
If the server returns a response code of 300 ("multiple choice"), 301
("moved permanently"), 302 ("moved temporarily") or 303 ("see
other"), and if the server also returns one or more URIs where the
resource can be found, then the client SHOULD treat the new URIs as
if the fragment identifier of the original URI was added at the end.
The exception is when a returned URI already has a fragment
identifier. In that case the original fragment identifier MUST NOT be
not added to it.
So the fragment of the original URI should also be used for the redirection URI unless it also contains a fragment.
Although this was just a draft that expired in 2000, it seems that the behavior as described above is the de-facto standard behavior among todays web browsers.
#Julian Reschke or #Mark Nottingham probably know more/better about this.

From what I have found, it doesn't seem clear what the exact behaviour should be. There are plently of people having problems with this, some of them wants to keep the bookmark through the redirect, some of them wants to get rid of it.
Different browsers handle this differently, so in practice it's not useful to rely on either behaviour.
It definitely is a browser issue. The browser never sends the bookmark part of the URL to the server, so there is nothing that the server could do to find out if there is a bookmark or not, and nothing that could be done about it reliably.

When I put the full URL in the action attribute of the form, it will keep the hash. But when I just do the query string then it drops the hash. E.g.,
Keeps the hash:
https://example.com/edit#alrighty
<form action="https://example.com/edit?ok=yes">
Drops the hash:
https://example.com/edit
<form action="?ok=yes">

What "version" of a php page would crawlers see?

I am considering building a website using php to deliver different html depending on browser and version. A question that came to mind was, which version would crawlers see? What would happen if the content was made different for each version, how would this be indexed?

The crawlers see the page you show them.
See this answer for info on how Googlebot identifies itself as. Also remember that if you show different content to the bot than what the users see, your page might be excluded from Google's search results.
As a sidenote, in most cases it's really not necessary to build separate HTML for different browsers, so it might be best to rethink that strategy altogether which will solve the search engine indexing issue as well.

The crawlers would see the page that you have specified for them to see via your user-agent handling.
Your idea seems to suggest trying to trick the indexer somehow, don't do that.

You'd use the User-Agent HTTP Header, which is often sent by the browsers, to identify the browsers/versions that interest you, and send a content that would be different in some cases.
So, the crawlers would receive the content you'd send for their specific User-Agent string -- or, if you don't code a specific case for those, your default content.
Still, note that Google doesn't really appreciate if you send it content that is not the same as what real users get (and if a someone using a given browser sends a link to some friend, who doesn't see the same thing as he's using another browser, this will not feel "right").
Basically : sending content that differs on the browser is not really a good practice ; and should in most/all cases be avoided

That depends on what content you'll serve to bots. Crawlers usually identify themselves as some bot or other in the user agent header, not as a regular browser. Whatever you serve these clients is what they'll index.

The crawler obviously only sees the version your server hands to it.

If you create a designated version for the search engine, this version would be indexed (and eventually makes you banned from the index).
If you have a version for the default/undetected browser - this one.
If you have no default version - nothing would be indexed.
Sincerely yours, colonel Obvious.
PS. Assuming you are talking of contents, not markup. Search engines do not index markup.

Permanently changing the UserAgent string in Firefox 4

Here is the deal. I have created some HTML/Javacript dashboards that will be displayed on big screen displays. The displays are powered by thin clients running WinXP and Firefox 4. There will also be a desktop version as well. I would like to use one url (dashboard.php) and then redirect to the appropriate page. I need to be able to differentiate between the big screen displays and someone using Firefox from the desktop. My thought was to permanently change the UserAgent string on the big screen deployments and use browser sniffing to determine which version to forward the user too. The problem is, it appears that FF4 has removed the ability to change the UA string permanently. Anyone have any ideas on how I could do this or an idea on how I can otherwise differentiate between big screens and a desktop user.

What about using the IP address of the computers displaying on the big screens? Especially if the big displays are on an internal network, assign them a static IP address and use that to identify the computers. Other than that, just pass a get string saying ?view=bigDisplay or similar. You can simply put in your code
$bigDisplay = (isset($_GET['view'])&&$_GET['view']=='bigDisplay');
then you would have a boolean of whether to display the bigDisplay code.
Edit:
also, just googled and found this: http://support.mozilla.com/en-US/questions/806795

Javascript
if((screen.width >= 1024) && (screen.height >=768))
{
window.location= '?big=1';
}
PHP
if($_GET['big'] == 1){
setcookie('big', 1, 0);
}
Then just read cookie, and that's it...

If IP address detection is not an option, you could simply set a cookie for the big screen machines.
You can do this by creating a special URL, e.g., /bigscreen which will set the cookie with an expiration date far into the future. Then in your script, simply check for the existence of that cookie.
Using a cookie means that you don't have to worry about continuing to append query strings to subsequent URLs.
Edit: You could even manually place the cookie in Firefox if you wish to avoid visiting a special URL. There are add-ons to facilitate that.

You can set the UA string just fine in Firefox 4. The general.useragent.override preference will let you set it to whatever you want.
What was removed was a way to modify parts of the UA string without overriding the whole thing.

document.referrer - limitations?

I am unable to get a lot of referral URLS using document.referrer. I'm not sure what is going on. I would appreciate it if anyone had any info on its limitations (like which browser does not support what) etc.
Is there something else i could use (in a different language perhaps) that covers more browsers etc?

I wouldn't put any faith in document.referrer in your Javascript code. The value is sent in client side request headers (Referer) and as such it can be spoofed and manipulated.
For more info see my answer to this question about the server side HTTP_REFERER server variable:
How reliable is HTTP_REFERER

Which browser are you looking in? If the referring website is sending the traffic via window.open('some link') instead of a regular <a> tag, then IE will not see a referrer. It thinks it's a new request at that point, similar to you simply going to a URL directly (in which case there is no referrer). Firefox and Chrome do not have the same issue.
This is NOT just a javascript limitation, HTTP_REFERRER will NOT work either in this specific scenario.

Just to make sure you're on the same page, you do know that if someone types a URL directly in their web browser, the document.referrer property is empty, right? That being said, you might be interested in a JavScript method to get all HTTP headers. If you prefer PHP (since you're using that tag), the standard $_SERVER variable will provide what information is available. Note that the information is only as reliable as the reporting web browser and server, as noted by Kev.

The document.referrer will be an empty string if:
You access the site directly, by entering the URL;
You access the site by clicking on a bookmark;
The source link contains rel="noreferrer";
The source is a local file;
Check out https://developer.mozilla.org/en-US/docs/Web/API/Document/referrer

Efficient Method for Preventing Hotlinking via .htaccess

I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.

Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/

Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.

I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.

Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.