I am curious to know if detecting the visitor browser with client-side script is more reliable than server-side script?
It is easy and popular to get the visitor browser both by PHP and Javascript. In the former one, we analyze $_SERVER['HTTP_USER_AGENT'] sent by the header array. However, header is not always reliable. Can Javascript be more reliable as it get the visitor browser from the visitor's machine?
I mean is it possible to miss the USER AGENT in header and get the browser by javascript?
UPDATE: Please do not introduce methods such as jQuery as I am familiar with them. I just want to know if it's possible for header's user agent to fail when javascript still can detect browser? Comparison of client-side and server-side methods.
The User-Agent can be tested server side or client side, either way it can be spoofed.
You can finger print the browser with JavaScript (seeing what methods and objects the browser provides) and use that to infer the browser, but that is less precise and JavaScript can be disabled / blocked / edited by the client.
So neither is entirely reliable.
It is generally a bad idea to do anything based on the identify of the browser though.
OK. So User-Agent header is not required by RFC
User agents SHOULD include this field with requests.
https://www.rfc-editor.org/rfc/rfc2616#section-14.43
Which means the server side detection is not guaranteed.
Similarly client side detection typically relies on navigator.userAgent but that is also provided by the user agent (browser or what not) and similarly cannot be guaranteed.
Thus the answer to your question is 50/50 :)
Now, if you are trying to figure out how to handle different browsers - feature detection is your safest bet here - but that's a different question ;)
I would just use the server side detection.
If a user wants to mask their browser, their browser will likely be masked on both ends.
If you want to find out their browser for HTML compatibility, they should be expecting mildly broken pages if they've masked their browser (but you should always try your best not to have browser specific HTML). If it's for javascript compatibility, they should also be expecting some broken javascript.
Take a look at $.browser() in jquery
A different angle: why do we want to detect the browser?
In the case of analytics, there isn't much you can do really. Anyone that does a little research can send whatever user agent string they like, but who's going to go through all the trouble ;)
If we're talking about features to enable/disable on a website, you should really be going for feature detection. By focusing on what the browser can/can't do, instead of what it calls itself, you can generally expect that browser to perform whatever action reliably if the feature you need is present.
More info: http://jibbering.com/faq/notes/detect-browser/
One big advantage to use client-side javascript is that you can get much more information about the browser.
Here is an interesting example: https://panopticlick.eff.org/
Related
I'm writing a mobile game where the user sends his highscore to a PHP server.
I want to verify in the server that the HTTP request comes only from the mobile devices. I want to refuse calls that a malicious user may send via curl or other HTTP clients with a fake score.
What is the standard, usual way of doing this?
I thought that I could encript the HTTP message in the mobile client, but then I would need to release the binary with the encription key, which could be retrieved if decompiled.
Thank you.
Take a look at this:
https://github.com/serbanghita/Mobile-Detect
It is pretty accurate, however it won't stop clients who are faking their User-Agent.
Generally the best way however is not to let the client make any decisions.
Take a game like Eve Online for example. Every action you make is sent as a user action to the server, the server then validates the action and makes the appropriate decision.
If the server relied on the client to decide how much damage a ship is doing, the game would be subject to no end of trainer hacks.
You can use JavaScript to fetch information about the user's client, such as Browser CodeName, Browser Name, Browser Version, Platform, User-agent header, User-agent language, and so on.
There are probably viable libraries out there (maybe something like Flosculu's mentioned) that can aid you with mobile specifically detection, but you must understand that all of those information can be manipulated anyway, it mainly depends on HOW you transfer the data, so maybe you shouldn't be over-thinking this but instead focus on safe data transfer methods.
A quick search pointed to this mobile detection scripts though.
But again.. you can't rely on anything like this but if it's not too much time-consuming, then by all means add another validation to the script if it makes you feel better :)
I've seen code that detects whether someone is using a mobile browser in Javascript (e.g. a jQuery script) and I've seen some that work in PHP (or other server-side language). But I've never seen a good explanation for whether one is a better choice than the other in all or any situations. Is there a reason why one is a better choice?
The typical answer: it depends on why you are doing the check...
From my standpoint, here is what I usually consider:
If you want to present the user a different experience (mobile, tablet, laptop, etc) based on browser, do it at the server.
If you want to present the same general experience, but need to account for browser compatibility issues, do it at the client.
It is also considered by some in the UX field to be "bad form" to present the user an empty page and fill it in dynamically. Instead, a preliminary page should be populated and content can be dynamically added or altered. If this is a concern for you, a combination of server side and client side may be necessary.
I'd say the better way would be on server side, because for Javascript you need to wait until the page is rendered, while on server side it happens before.
If you're trying to detect this in order to do decide what javascript features are available, you'll have greater accuracy, without any major loss of speed if you do this in JavaScript.
If you're going to completely change what sort of page is rendered, like a full website or a mobile website, you're better off doing this server side.
As Ricebowl stated, never trust the client. However, I feel that it's almost always a problem if you do trust the client. If your application is worth writing, it's worth properly securing. If anyone can break it by writing their own client and passing data you don't expect, that's a bad thing. For that reason, you need to validate on the server.
Is green better than red?
Everything has its benefits and drawbacks. For example, doing it server side is more reliable, doing it client-side means less work for the server.
In fact, the client may have JavaScript disabled (see the NoScript extension for Firefox, and ScriptNo for Chrome, that allows a smart user to only enable JS on sites where you actually need it - a nice side effect is that it also eliminates almost all ads these days, as they largely seem to rely on JS from third party domains now). So just using the User-Agent string is more reliable, but less flexible.
If you work JS-heavy, you might get away with a dumb server, i.e. you do not need slow PHP, but you can serve all your data with high-performance static serving, through the various CDNs etc. - but anything that requires JS will work less good with search spiders, and some users will likely just block it.
As a web developer, UX and UI programmer I figure if anyone wants to change their UA, it's fine. They get to deal with the incompatibilities. As for a mobile vs desktop I would recommend the light version of browscap.ini instead of searching for device check for ismobiledevice. The if statement will be true or false, then you can also check tablet also. In the if mobile clause check the istablet key in the associative array. You can use it for phone or tablet css.
I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).
I know two method will work.
Javascript.
If the page was load by the browser, it will run the js code automatically, except forbid by the browser. Then use AJAX to call back the server.
1×1 transparent image of in the html.
Use img to call back the server.
Do anyone know the pitfall of these method or any better method?
Also, I don't know how to determine a 0×0 or 1×1 iframe to prevent the above method.
A bot can access a browser, e.g. http://browsershots.org
The bot can request that 1x1 image.
In short, there is no real way to tell. Best you could do is use a CAPTCHA, but then it degrades the experience for humans.
Just use a CAPTCHA where required (user sign up, etc).
I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).
The image way seems better, as Javascript might be turned off by normal users as well. Robots generally don't load images, so this should indeed work. Nonetheless, if you're just looking to filter a known set of robots (say Google and Yahoo), you can simply check for the HTTP User Agent header, as those robots will actually identify themselves as being a robot.
you can create an google webmasters account
and it tells you how to configure your site for bots
also show how robot will read your website
I agree with others here, this is really tough - generally nice crawlers will identify themselves as crawlers so using the User-Agent is a pretty good way to filter out those guys. A good source for user agent strings can be found at http://www.useragentstring.com. I've used Chris Schulds php script (http://chrisschuld.com/projects/browser-php-detecting-a-users-browser-from-php/) to good effect in the past.
You can also filter these guys at the server level using the Apache config or .htaccess file, but I've found that to be a losing battle keeping up with it.
However, if you watch your server logs you'll see lots of suspect activity with valid (browser) user-agents or funky user-agents so this will only work so far. You can play the blacklist/whitelist IP game, but that will get old fast.
Lots of crawlers do load images (i.e. Google image search), so I don't think that will work all the time.
Very few crawlers will have Javascript engines, so that is probably a good way to differentiate them. And lets face it, how many users actually turn of Javascript these days? I've seen the stats on that, but I think those stats are very skewed by the sheer number of crawlers/bots out there that don't identify themselves. However, a caveat is that I have seen that the Google bot does run Javascript now.
So, bottom line, its tough. I'd go with a hybrid strategy for sure - if you filter using user-agent, images, IP and javascript I'm sure you'll get most bots, but expect some to get through despite that.
Another idea, you could always use a known Javascript browser quirk to test if the reported user-agent (if its a browser) is really actually that browser?
"Nice" robots like those from google or yahoo will usually respect a robots.txt file. Filtering by useragent might also help.
But in the end - if someone wants to gain automated access it will be very hard to prevent that; you should be sure it is worth the effort.
Inspect the User-Agent header of the http request.
Crawlers should set this to anything but a known browser.
here are the google-bot header http://code.google.com/intl/nl-NL/web/controlcrawlindex/docs/crawlers.html
In php you can get the user-agent with :
$Uagent=$_SERVER['HTTP_USER_AGENT'];
Then you just compare it with the known headers
as a tip preg_match() could be handy to do this all in a few lines of code.
Is there a way to detect in my script whether the request is coming from normal web browser or some script executing curl. I can see the headers and can distinguish with "User-Agent and other few headers" but in curl fake headers can be set, so i am not able to track the request.
Please suggest me ways about identifying the curl or other similar non browser request.
The only way to catch most "automated" requests is to code in logic that spots activity that couldn't possibly be human with a browser.
For example, hitting pages too fast, filling out a form too fast, have an external source in the html file (like a fake css file through a php file), and check to see if the requesting IP has downloaded it in the previous stage of your site (kind of like a reverse honeypot), but you would need to exclude certain IP's/user agents from being blocked, otherwise you'll block google's webspiders. etc.
This is probably the only way of doing it if curl (or any other automated script) is faking its headers to look like a browser.
Strictly speaking, there is no way.
Although there are non-direct techiques, but I would never discuss it in public, especially on a site like Stackoverflow, which encourage screen scraping, content swiping autoposting and all this dirty roboting stuff.
In some cases you can use CAPTCHA test to tell a human from a bot.
As far as i know, you can't see the difference between a "real" call from your browser and one from curl.
You can compare the header (User-agent) but its all i know.
I'm building a web bot to login into some of my accounts on websites but one of the url's are sending cookie from javascript and curl is unable to store them. Any suggestion?
You could parse the javascript file using whatever language your using and look for the document.cookie statement. You could then use this data to set the cookie manually in curl. (CURLOPT_COOKIE)
It wouldn't exactly be the best idea if your hoping for this to work with a number of sites, but since you state that you know the site you'll need to load its a possibility as you'll have an idea of how the Javascript will look.
If you have months of free time on your hands, you could compile webkit and its javascript engine and modify the cookie-setting functionality so that it exports the cookies to stdout (and then grab them with PHP's exec). Good luckkk with that though. Considering you're asking this question with relation to cURL, I don't think this is quite up your alley...
I'd sort of go with Kewley's answer if you're desperate though. You should be able to reverse-engineer the javascript and see the logic behind how the web application sets its cookies. If it authenticates and returns the login-result with XHR, watch what's sent and received by the browser (with Firebug). Add breakpoints on the document.cookie lines and observe what cookies are being set (and what they're being set to). Once you know the precise logic behind authentication, you perform the necessary behind-the-scenes requests necessary to snag a session on the site with cURL.
Curl doesn't parse Javascript, therefore the cookies wont ever be set.