I would like to check if mobile version exists for a specific website or not. To my understanding, we cannot be sure if every website has mobile version located at http://m.example.com/ therefore I am testing through CURL() request. Here is how I am doing it:
* I send mobile browser headers in curl request, this returns contents of
the returning URL.
* If it has a mobile version, then it would return contents of a mobile version site.
* I then check if the content includes #media keyword, if it exists then I assume it has a mobile version.
The problem is, if its css loads externally then I will have to further send CURL() requests to the CSS files as well, which will make it even more slower. Is there any specific solution to my problem or can I boost this process a bit more?
Any help would be appreciated. Thanks.
The problem with your approach, which smells a bit like an XY Problem, is that it is simply unreliable.
The website has many choices for mobile websites, which include:
1. Using CSS media queries
The problem with this method is twofold. For starters, you would have to scan every single CSS file and <link> declaration. Secondly, the site can dynamically introduce stylesheets to the page using JavaScript, which you will never see using cURL because it lacks a JavaScript parser.
2. Browser sniffing using (client side) JavaScript, or screen width sniffing using JavaScript
Again, this JavaScript will never get executed, so you will never see that result.
3. Browser sniffing using server side code
Well, I guess you could try to use a mobile user-agent string with your cURL request, and see where that takes you, but all of these methods are hackish and unreliable.
4. The page could be mobile friendly from the get-go (credit to #Quentin)
As #Quentin mentioned in the comments, the page could be mobile friendly without any additional checks on the client/server side (responsive design without media queries, by simply using percentage-based values, for example).
Related
I am considering building a website using php to deliver different html depending on browser and version. A question that came to mind was, which version would crawlers see? What would happen if the content was made different for each version, how would this be indexed?
The crawlers see the page you show them.
See this answer for info on how Googlebot identifies itself as. Also remember that if you show different content to the bot than what the users see, your page might be excluded from Google's search results.
As a sidenote, in most cases it's really not necessary to build separate HTML for different browsers, so it might be best to rethink that strategy altogether which will solve the search engine indexing issue as well.
The crawlers would see the page that you have specified for them to see via your user-agent handling.
Your idea seems to suggest trying to trick the indexer somehow, don't do that.
You'd use the User-Agent HTTP Header, which is often sent by the browsers, to identify the browsers/versions that interest you, and send a content that would be different in some cases.
So, the crawlers would receive the content you'd send for their specific User-Agent string -- or, if you don't code a specific case for those, your default content.
Still, note that Google doesn't really appreciate if you send it content that is not the same as what real users get (and if a someone using a given browser sends a link to some friend, who doesn't see the same thing as he's using another browser, this will not feel "right").
Basically : sending content that differs on the browser is not really a good practice ; and should in most/all cases be avoided
That depends on what content you'll serve to bots. Crawlers usually identify themselves as some bot or other in the user agent header, not as a regular browser. Whatever you serve these clients is what they'll index.
The crawler obviously only sees the version your server hands to it.
If you create a designated version for the search engine, this version would be indexed (and eventually makes you banned from the index).
If you have a version for the default/undetected browser - this one.
If you have no default version - nothing would be indexed.
Sincerely yours, colonel Obvious.
PS. Assuming you are talking of contents, not markup. Search engines do not index markup.
i was looking for a way to block old browsers from accessing the contents of a page because the page isn't compatible with old browsers like IE 6.0 and to return a message saying that the browser is outdated and that an upgrade is needed to see that webpage.
i know a bit of php and doing a little script that serves this purpose isn't hard, then i was just about to start doing it and a huge question popped up in my mind.
if i do a php script that blocks browsers based on their name and version is it impossible that this may block some search engine spiders or something?
i was thinking about doing the browser identification via this function: http://php.net/manual/en/function.get-browser.php
a crawler will probably be identified as a crawler but is it impossible that the crawler supplies some kind of browser name and version?
if nobody tested this stuff before or played a bit with this kind of functions i will probably not risk it, or i will make a testfolder inside a website to see if the pages there get indexed and if not i abandon this idea or i will try to modify it in a way that it works but to save me the trouble i figured it would be best to ask around and because i didn't found this info after a lot of searching.
No, it shouldn't affect any of major crawlers. get_browser() relies on the User-Agent string sent with the request, and thus it shouldn't be a problem for crawlers, which happen to use custom user-agent strings (eg: Google's spiders will have "Google" in their names).
Now, I personally think it's a bit unfriendly to completely block a website to someone with IE. I'd just put a red banner above saying "Site might not function correctly. Please update your browser or get a new one" or something to that effect.
Specifically. I am making an ajax app and trying to preserve the back button. My javascript is working properly and registering a new url in the address bar with an anchor-like hash in the url:
http://t2b.localhost/#/clients/
I can catch the url when the page loads with javascript and load the "clients" page, but I want to know if there is a way to read the entire url with php or with htaccess? Looking at normal variables, I seem to only be able to get the url up to the occurrence of the "#" (http://t2b.localhost/).
The browser don't send to the server the fragment (the text after the #) part of the url.
It is intended to be used locally by the client.
In firefox (and in explorer too) there is document.location.hash that contains the fragment part of the URL. If you use javascript you can read it and send his value into a common variable.
Please use any of the available javascript libraries to track the history state or browse by ajax requests. There are so many problems involved, such as certain browsers not notifying scripts when the hash part changes, or not adding a pseudo-'navigation' event to the browser's history list etc., that you'll end up recreating an expensive wheel that wouldn't work very well. I recommend YUI's History library, although it has problems on Google Chrome.
I'm pretty sure that you can't parse it strictly with PHP because the hash part is parsed only on the client-side ( Javascript ).
For history I'd recommend Ben Alman's BBQ plugin.
See: Can I read the hash portion of the URL on my server-side application (PHP, Ruby, Python, etc.)?
You could use javascript and set a cookie as the current URL then get it with PHP
For example, I'd like to have my registration, about and contact pages resolve to different content, but via hash tags:
three links one each to the registration, contact and about page -
www.site.com/index.php#about
www.site.com/index.php#registration
www.site.com/index.php#contact
Is there a way using Javascript or PHP to resolve these pages to the separated content?
The hash is not sent to the server, so you can only do it in Javascript.
Check the value of location.hash.
There's no server-side way to do it. You could work with AJAX, but this will break the site for non-javascript users. The best way would probably be to have server-side content URLs (index.php?page=<page_id>) and rewrite these locally with JavaScript (to #<page_id>) and handle the content loading with AJAX then. That way you can have your hash-URLs for JS-enabled devices and everybody else can still use the site.
It does however require a bit of redundance because you need to provide the same content twice, once for inclusion via AJAX and once with the proper layout and everything via PHP.
If you just want hash URLs for aesthetic reasons, but don't want to rely on JS, you're out of luck. The semantics of URLs are against you: fragment IDs shouldn't really affect the content the URL is referring to, merely the fragment within that content. AJAX URLs are changing those semantics, but there's no good reason to do that if you don't have to.
I suppose you probably have a good reason, but can I ask, why would you do this? It breaks the widely understood standard of how hashs in URLs are supposed to work, and its just begging for trouble for interoperability with other clients, down the road.
You can use PHP's Global $_REQUEST variables to grab the requested URL and parse out the hashtag...
While cross-site scripting is generally regarded as negative, I've run into several situations where it's necessary.
I was recently working within the confines of a very limiting content management system. I needed to include database code within the page, but the hosting server didn't have anything usable available. I set up a couple bare-bones scripts on my own server, originally thinking that I could use AJAX to import the contents of my scripts directly into the template of the CMS (thus retaining dynamic images, menu items, CSS, etc.). I was wrong.
Due to the limitations of XMLHttpRequest objects, it's not possible to grab content from a different domain. So I thought iFrame - even though I'm not a fan of frames, I thought that I could create a frame that matched the width and height of the content so that it would appear native. Again, I was blocked by cross-site scripting "protections." While I could indeed load a remote file into the iFrame, I couldn't execute JavaScript to modify its size on either the host page or inside the loaded page.
In this particular scenario, I wasn't able to point a subdomain to my server. I also couldn't create a script on the CMS server that could proxy content from my server, so my last thought was to use a remote JavaScript.
A remote JavaScript works. It breaks when the user has JavaScript disabled, which is a downside; but it works. The "problem" I was having with using a remote JavaScript was that I had to use the JS function document.write() to output any content. Any output that isn't JS causes script errors. In addition to using document.write() for every line, you also have to ensure that the content is escaped - or else you end up with more script errors.
My solution was as follows:
My script received a GET parameter ("page") and then looked for the file ({$page}.php), and read the contents into a variable. However, I had to use awkward buffering techniques in order to actually execute the included scripts (for things like database interaction) then strip the final content of all line break characters (\n) followed by escaping all required characters. The end result is that my original script (which outputs JavaScript) accesses seemingly "standard" scripts on my server and converts their standard output to JavaScript for displaying within the CMS template.
While this solution works, it seems like there may be a better way to accomplish the same thing. What is the best way to make cross-site scripting work specifically for the purpose of including content from a completely different domain?
You've got three choices:
Create a server side proxy script.
Create a remote script to read in remote dynamic HTML. Use a library like jQuery to make this easier. You can use the load function to inject HTML where needed. EDIT What I originally meant for example # 2 was utilizing JSONP, which requires the server side script to recognize the "callback=?" param.
Use a client side Flash proxy and setup a crossdomain.xml file on your server's web root.
Personally, I would call to that other domain on the server and get and parse the data there for use in your page. That way you avoid any problems and you get the power of a server-side language/platform for getting and parsing the data.
Not sure if that would work for your specific scenario...hard to know even with your verbose description...
You could try easyXDM, by including very little code, you can pass data or method calls between documents of different domains.
I've come across that YDN server side proxy script before. It says it's built to work with Yahoo's Search APIs.
Will it work with any domain, if you simply trim the Yahoo API code out? Or do you need to replace it with the domain you want it to work with?
iframe remote content can be accessed by local javascript.
The remote server just have to set the document.domain of the page.
Eg:
Site A contain an iframe with src='Site B/home.php'
home.php looks like this :
[php stuff]...[/php]
[script type='text/javascript']document.domain='Site A'[/script]