I have a secure link direction service I'm running (expiringlinks.co). If I change the headers in php to redirect my visitors, then facebook is able to show a preview of the website I'm redirecting to when users send links to one another via facebook. I wish to avoid this. Right now, I'm using an AJAX call to get the URL and javascript to redirect, but it's causing problems for users who don't use javascript.
Here are a number of ways I'd like to block facebook, but I can't seem to get working:
I've tried blocking the facebook bot (facebookexternalhit/1.0 and facebookexternalhit/1.1) but it's not working, I don't think they're using them for this functionality.
I'm thinking of blocking the facebook IP addresses, but I can't find all of them, and I don't think it'll work unless I get all of them.
I've thought of using a CAPTCHA or even a button, but I can't bring myself to do that to my visitors. Not to mention I don't think anyone would use the site.
I've searched the facebook docs for meta tags that would "opt-me out", but haven't found one, and doubt that I would trust it if I had.
Any creative ideas or any idea how to implement the ones above? Thank you so much in advance!
Try this - it works for me ...
<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
if (preg_match('/facebookexternalhit/si',$ua)) {
header('Location: no_fb_page.php');
die() ;
}
?>
You could try to get the logfile of your Webserver, and search there for unusal useragents. (maybe containing facebook)
Or, otherwise get the Logs and delete every containing internet explorer/firefox/opera...
Then you should have only bots useragents in the end.
Then you could search for the facebook one.
All you need to do is appropriately set up robots.txt.
http://www.robotstxt.org/robotstxt.html
You could try using a meta refresh instead of a javascript redirect. They work for all browsers and because the page still returns a 200 response any crawler should stop resolving there.
Related
I want to create a private url as
http://domain.com/content.php?secret_token=XXXXX
Then, only visitors who have the exact URL (e.g. received by email) can see the page. We check the $_GET['secret_token'] before displaying the content.
My problem is that if by any chance search bots find the URL, they will simply index it and the URL will be public. Is there a practical method to avoid bot visits and subsequent index?
Possible But Unfavorable Methods:
Login system (e.g. by php session): But I do not want to offer user login.
Password-protected folder: The problem is as above.
Using Robots.txt: Many search engine bots do not respect it.
What you are talking about is security through obscurity. Its never a good idea. If you must, I would offer these thoughts:
Make the link expire
Lock the link to the C or D class of IPs that it was accessed from the first time
Have the page challenge the user with something like a logic question before forwarding to the real page with a time sensitive token (2 step process), and if the challenge fails send a 404 back so the crawler stops.
Try generating a 5-6 alphanumeric password and attach along with the email, so eventhough robots spider it , they need password to access the page. (Just an extra added safety measure)
If there is no link to it (including that the folder has no index
view), the robot won't find it
You could return a 404, if the token is wrong: This way, a robot (and who else doesn't have the token) will think, there is no such page
As long as you don't link to it, no spider will pick it up. And, since you don't want any password protection, the link is going to work for everyone. Consider disabling the secret key after it is used.
you only need to tell the search engines not to index /content.php, and search engines that honor robots.txt wont index any pages that start with /content.php.
Leaving the link unpublished will be ok in most circumstances...
...However, I will warn you that the prevalence of browser toolbars (Google and Yahoo come to mind) change the game. One company I worked for had pages from their intranet indexed in Google. You could search for the page, and a few results came up, but you couldn't access them unless you were inside our firewall or VPN'd in.
We figured the only way those links got propagated to Google had to be through the toolbar. (If anyone else has a better explanation, I'd love to hear it...) I've been out of that company a while now, so I don't know if they ever figured out definitively what happened there.
I know, strange but true...
I have a simple signup form that needs to track number of hits from one specific external referer. This is a simple task with PHP's:
$_SERVER['HTTP_REFERER']
however, it is blank. After doing some research i tried to use some javascript:
document.referrer
Still blank. :(
I really dont need anything elaborate, but am trying to NOT use awstats.
Is there any other way to get the referer (hacks accepted)?? Or am I stuck with the stats???
-thanks
In short: If the user don't want it, you will never know, where he comes from. However, a more "reliable" solution may be to add the referrer to the link from the origin site to yours. Something like
Visit example.com
This requires, that external sites cannot just link to your site, but always needs to add their personal id. If this is not possible there is not much you can do.
At all its possible, that someone may change this id too.
The referer is possibly sent in the HTTP request's header.
It is possible that the browser will not even send it, or some kind of proxy, firewall or security suite strips it out or even changes it. You cannot rely on it.
There is only one thing you can do: if it is empty, consider that you don't know the referer.
I've been working on this problem all day long, so I really need your help.
I'm trying to create a multi-site login system with Facebook Connect and unfortunatly I can't retrieve Cookies.
Here's a little more details:
I'm having a website (www.first.com) which has an iFrame to www.second.com, which display the Facebook Connect button. I have to use this method because a Facebook App is only valid for 1 website, and I will need to use it on multiple.
When the user clicks on the button and log into Facebook, he is redirect to www.second.com, which saves values in a database, which is later retrieved on www.first.com
Everything is working fine in Firefox, IE 8/7 works fine too since I've added the P3P header.
The problem is that I can't make it work on Safari, which requires some kind of interaction from this user to the iframe.
I found a code ( http://anantgarg.com/2010/02/18/cross-domain-cookies-in-safari/ ) but I'm not sure how to use it, I've tried every possible way (I think), and nothing. I guess it doesn't work because I would need to use this on Facebook's server (which i can obviously ;) )
Does anyone have an idea?
Sorry for the huge block of text ;) let me know if you need more information.
Cross Domain Cookies sounds like a security-bug.
Are you sure you don't offend against the same origin policy?
In fact afaik, you can't access a cookies for first.com from second.com
I want to track the site URL from where user reached my site.
From where he came i.el, Google, GMail, Facebook, etc.
I tried $_SERVER['HTTP_REFERER'] but it does not contain anything when user click on my site link from any external site but resides the value when I visit among my site pages and this is also not trusted.
So, What I can do from here?
Is there any other way to track the external URL through PHP?
Any idea?
EDIT: Now HTTP_REFERER is able to get the url from most of sites but not able to get the url if user came through Gmail and AOL. What could be the causes?
HTTP_REFERER is the only way to get any information about previous site.
And that is also up to the broser if it supplies that information, most do as default.
Its a header that is set by the browser in the request to your server, if it is not present, then you will never know where the user came from.
If the browser is sending and you still to not get anything on the server check if you have any code that interferes with the $_SERVER variable.
Try this URL, its a google search result that goes to a page that just dumps the HTTP_REFERER.
As the pages indicates, if the box lists (none), then your browser is not sending HTTP_REFERER but if you get a result then the problem is in sour server.
http://www.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CBIQFjAA&url=http%3A%2F%2Fkarmak.org%2F2004%2Freftest%2Ftest&rct=j&q=http_referer%20test&ei=cNQ2TdGYGsmUOp_ExPoD&usg=AFQjCNFVSmYmQBUcL2l3_ZpmZzVWZztjWg&cad=rja
You can compare it to when you load the page withour google to redirect you:
http://karmak.org/2004/reftest/test
Here is their own start page with link:
http://karmak.org/2004/reftest/
Have you tried it in a variety of browsers? It's down to the browser (As far as I'm aware) to set HTTP_REFERER and sometimes privacy settings can prevent this.
Visitors coming from google can be tracked using google analytics, it gives you the search query terms used before.
This solution also track a lot of other things from your visitors. I undertand it's not PHP based, but it's the only other kind of solution I know if HTTP_REFERRER is not enough to you, and as you quoted google...
My adsense ad have a dedicated land page.
I want to show the content only to those who came through that ad.
The page is coded with PHP so I'm using $_SERVER['HTTP_REFERER'].
Two questions here:
Is there a better alternative to $_SERVER['HTTP_REFERER'] ?
To what strings/domains should I compare the referrer's domain (I'll handle extracting it)? I mean, I'm guessing that google has more than one domain they're using for the ads, or not? There's doubleclick.com.... any other domain? How can I check it, besides try/fail?
$_SERVER['HTTP_REFERER'] is the canonical way to determine where a click came from generally. There are more reliable (and complicated) methods for clicks within a site you fully control, but that's not much help for clicks from Google. Yes, it can be spoofed, and yes, it can be null, but as long as you're not targeting nuclear weapons based on that data, and you can handle null values gracefully, it should be good enough.
As for domains, you have to consider the international google domains, as well as all the google*.com domains.
I suggest adding a parameter on the link you give to Google. i.e. instead of yoursite.com/landing, do yoursite.com/landing?campaign=12.
If you are concerned that curious users will play with this parameter, the fix is simple-- redirect via a server 301 redirect when they hit that URL.
That is, if I request yoursite.com/landing?campaign=12, your server--before serving a page-- should log my visit to campaign 12 and redirect me to the plain url yoursite.com/landing. This has the added advantage that reloads won't increment your campaign hit count.
Yes, users could still mess with the original link if they are clever or curious enough to look at it before they click on it, but I think this is going to be far more effective than sniffing the referer.
Rather than trying to work out on your own how to measure your page views, you can consider using an existing system for that, like Google Analytics