Im using single sign on solutions from jahrain. basically, i want to detect users coming from (facebook, yahoo, google, myspace, live/hotmail, openid) domains. then if not logged in. redirect to a webpage intended for these visitors. im using php.
While this is not foolproof, a common way to do this is by examining the $_SERVER['HTTP_REFERER'] environment variable, which is generally sent by the browser as a header.
That said, note the things from this thread: Determining Referer in PHP
Look at $_SERVER['HTTP_REFERER'].
This is an optional HTTP header the client may or may not set, so it's not guaranteed to be correct, trustworthy or to be there at all, but it's your only choice.
Related
I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).
I know two method will work.
Javascript.
If the page was load by the browser, it will run the js code automatically, except forbid by the browser. Then use AJAX to call back the server.
1×1 transparent image of in the html.
Use img to call back the server.
Do anyone know the pitfall of these method or any better method?
Also, I don't know how to determine a 0×0 or 1×1 iframe to prevent the above method.
A bot can access a browser, e.g. http://browsershots.org
The bot can request that 1x1 image.
In short, there is no real way to tell. Best you could do is use a CAPTCHA, but then it degrades the experience for humans.
Just use a CAPTCHA where required (user sign up, etc).
I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).
The image way seems better, as Javascript might be turned off by normal users as well. Robots generally don't load images, so this should indeed work. Nonetheless, if you're just looking to filter a known set of robots (say Google and Yahoo), you can simply check for the HTTP User Agent header, as those robots will actually identify themselves as being a robot.
you can create an google webmasters account
and it tells you how to configure your site for bots
also show how robot will read your website
I agree with others here, this is really tough - generally nice crawlers will identify themselves as crawlers so using the User-Agent is a pretty good way to filter out those guys. A good source for user agent strings can be found at http://www.useragentstring.com. I've used Chris Schulds php script (http://chrisschuld.com/projects/browser-php-detecting-a-users-browser-from-php/) to good effect in the past.
You can also filter these guys at the server level using the Apache config or .htaccess file, but I've found that to be a losing battle keeping up with it.
However, if you watch your server logs you'll see lots of suspect activity with valid (browser) user-agents or funky user-agents so this will only work so far. You can play the blacklist/whitelist IP game, but that will get old fast.
Lots of crawlers do load images (i.e. Google image search), so I don't think that will work all the time.
Very few crawlers will have Javascript engines, so that is probably a good way to differentiate them. And lets face it, how many users actually turn of Javascript these days? I've seen the stats on that, but I think those stats are very skewed by the sheer number of crawlers/bots out there that don't identify themselves. However, a caveat is that I have seen that the Google bot does run Javascript now.
So, bottom line, its tough. I'd go with a hybrid strategy for sure - if you filter using user-agent, images, IP and javascript I'm sure you'll get most bots, but expect some to get through despite that.
Another idea, you could always use a known Javascript browser quirk to test if the reported user-agent (if its a browser) is really actually that browser?
"Nice" robots like those from google or yahoo will usually respect a robots.txt file. Filtering by useragent might also help.
But in the end - if someone wants to gain automated access it will be very hard to prevent that; you should be sure it is worth the effort.
Inspect the User-Agent header of the http request.
Crawlers should set this to anything but a known browser.
here are the google-bot header http://code.google.com/intl/nl-NL/web/controlcrawlindex/docs/crawlers.html
In php you can get the user-agent with :
$Uagent=$_SERVER['HTTP_USER_AGENT'];
Then you just compare it with the known headers
as a tip preg_match() could be handy to do this all in a few lines of code.
I am currently researching the best way to share the same session across two domains (for a shared shopping cart / shared account feature). I have decided on two of three different approaches:
Every 15 minutes, send a one time only token (made from a secret and user IP/user agent) to "sync the sessions" using:
img src tag
img src="http://domain-two.com/sessionSync.png?token="urlsafebase64_hash"
displays an empty 1x1 pixel image and starts a remote session session with the same session ID on the remote server. The png is actually a PHP script with some mod_rewrite action.
Drawbacks: what if images are disabled?
a succession of 302 redirect headers (almost same as above, just sending token using 302's instead:
redirect to domain-2.com/sessionSync.php?token="urlsafebase64_hash"
then from domain-2.com/sessionSync, set(or refresh) the session and redirect back to domain-1.com to continue original request.
QuestionL What does Google think about this in terms of SEO/Pagerank?? Will their bots have issues crawling my site properly? Will they think I am trying to trick the user?
Drawbacks: 3 requests before a user gets a page load, which is slower than the IMG technique.
Advantages: Almost always works?
use jsonp to do the same as above.
Drawbacks: won't work if javascript is disabled. I am avoiding this option because of particularly this.
Advantages: callback function on success may be useful (but not really in this situation)
My questions are:
What will google think of using 302's as stated in example 2 above? Will they punish me?
What do you think the best way is?
Are there any security considerations introduced by any of these methods?
Am I not realizing something else that might cause problems?
Thanks for all the help in advance!
Just some ideas:
You could use the jsonP approach and use the <noscript> tag to set the 302-chains mode.
You won't find a lot of js disabled clients in the human part of your web clients.
But the web crawlers will mostly fall in the 302-chain mode, and if you care about them you could maybe implement some user-agent checks in sessionSync to give them specific instructions. For example give them a 301 permanent redirect. Your session synchronistation needs are maybe not valid for web crawlers, maybe you can redirect them permanently (so only the first time) without handling any specific session synchronisation for them. Well it depends ofg your implementation of this 302-chains but you could as well set something in the crawlers session to let them crawl domain-1 without any check on domain-2, as this depends on the url you generate on the page, and that you could have something in the session to prevent the domain-2 redirect on url generation.
I have several domains that point to the same site, some of them ending in ".br" (domain for Brazil, thus for portuguese speakers)
I want to detect from what domain the person came (.br or not) and load the correct landuage...
I can use PHP, JavaScript or standard HTML/CSS etc... How I do it? (and with what?)
On the server side, use the HTTP_HOST variable which is basically the Host header and a fool-proof way of checking the host the request was sent to.
$_SERVER['HTTP_HOST']
See this question for a nice comparison between SERVER_NAME and the HTTP_HOST variables.
On the client side, use document.domain. For this page - https://developer.mozilla.org/en/document.domain, the value of document.domain is
"developer.mozilla.org"
$_SERVER['HTTP_REFERER'] should get that information. But this is not a sure fire way. Some people have the referrer turned off or spoofed in their browsers etc. This is the only way that I would know how, unless you can append get data to the urls on the domain to set the language etc. Then you just check for that get data.
If you are on PHP5.3+ you can use
Locale::acceptFromHttp — Tries to find out best available locale based on HTTP "Accept-Language" header
If not, you can still determine it from Accept-Language header yourself. Using the Accept Header should be somewhat more reliable than using the TLD, especially if you also need to use any of the other intl extensions.
Is it possible to find out where the users come from? For example, I give a client a banner, and the link. The client may put the banner/link to any website, lets say to a site called www.domain.com.
When the user click the banner, is it possible to know where he coming from(www.domain.com)?
Have a look at the HTTP_REFERER variable. It will tell you what site the user was on before he came to your site.
Yes. You give the client a unique URL, like www.yourdomain.com/in/e10c89ee4fec1a0983179c8231e30a45. Then, track these urls and accesses in a database.
The real problem is tracking unique visitors.
See
$_SERVER["HTTP_REFERER"]
Although that can't always be trusted as it's set by the client but you may not care in your case.
In some scenarios, $_SERVER["HTTP_REFERER"] will only work when php (php.ini) is configured with register_globals bool configured to on.
Register globals can allow exploitation in loosely coded php applications. Commonly in apps that allow users to post data.
I have used the following method in the past to check referrers in applications where I controll the operator input.
session_start();
if(!isset($_SESSION['url_referer']))
{
$_SESSION['url_referer'] = $_SERVER['HTTP_REFERER'];
}
Without hashing strings in session variables, I do not know of a more efficient practice. Does anyone know the best practices?
Finest Regards,
Brad
The only chance is that you use a unique ID (as pointed out by gnud). This ay you can track the incomming links. Referrer may be altered/removed from browsers or proxies (many companies do that).
Using the IP to track unique visitors is a bad idea. AOL still pools the IPs and you might use different IPs every few minutes and with proxys yiur counting will be not very accurate.
I'd say, go with the unique ID.
Is it possible to check who is entering your website in PHP. I have a web application ( written in PHP) that should only allow users entering from some particular websites. Is it possible to get the referral websites by examining the _Request object? If yes, how?
Yes, but keep in mind some proxies and other things strip this information out, and it can be easily forged. So never rely on it. For example, don't think your web app is secure from CSRF because you check the referrer to match your own server.
$referringSite = $_SERVER['HTTP_REFERER']; // is that spelt wrong in PHP ?
If you want to only allow requests from a specific domain you'll need to parse some of the URL to get the top level domain. As I've learned more, this can be done with PHP's parse_url().
As andyk points out in the comments, you will also have to allow for www.example.com and example.com.
While you can look at $_SERVER['HTTP_REFERER'] to get the referring site, don't bet the farm on it. The browser sets this header and it's easily spoofed.
If it's critical that only people coming from specific referrers view your site, don't use this method. You'll have to find another way, like basic auth, to protect your content. I'm not saying that you shouldn't use this technique, just keep in mind that it's not fool-proof.
BTW, you can also block referrers at the apache level using mod_rewrite.
You cannot trust the referrer. Despite coming from the $_SERVER array, it is actually a user/browser supplied value and is easily faked, using such things as the Firefox RefControl addon.
You need to examine the $_SERVER array for the 'HTTP_REFERER' key.