Why does Facebook not use _escaped_fragment_ instead of #! in some cases? - php

I have code in my page that is activated by this (I can output its value in a comment in the header):
isset($_GET['_escaped_fragment_'])
and I'm looking at the source of 'what scraper sees' using this tool https://developers.facebook.com/tools/debug
and my URL has #! shebang in it.
Still, one of the sites I'm testing receives the _escaped_fragment_ (Facebook visits using ?_escaped_fragment_= in the URL), while on another it doesn't.
I don't think it has anything to do with what's on the page (og metas) since it determines whether or not to rewrite #! to ?_escaped_fragment_= before even loading the URL.
Can someone enlighten me what's required to make this feature work?

It IS because of the meta og:url / link rel=canonical. I've found that Facebook 'what scraper sees' presents you the final result, not the 'first crawl' that you'd expect. So FB crawler goes on to the page, sees a meta tag with og:url or most importantly the link rel=canonical. It then stops crawling the page and goes to the URL specified. Then it presents you the source of that URL, and that doesn't have the shebang in it. It's all logical but I didn't count that it does this 'hidden redirect' or bounce behind the scenes. The solution is to filter out / remove meta og:url and link rel=canonical from the head, that's about it. Several WP plugins add these by the way.

Related

VIEWPORT doesn't work with an URL redirected

I have a problem using meta tag viewport.
When I enter into my website through the domain name I hired (a redirection to use a more confortable URL) the viewport meta doesn't work, but if I use the original URL (really ugly and not useful at all, that's why I have a redirection) to access into my web the viewport works, look here I paste the URLs to let you try by yourself:
(HERE DOESN'T WORK) This is the redirection URL: http://padelcniinfinit.com/tag_test.php
(HERE WORKS) URL real of my WEB: http://aitor.rvfcursos.com/tag_test.php
How can be possible this? Any clue about how to fix it?
thanks guys!
The kind of redirection you are using works by embedding the destination page in a frameset, you can check this by displaying the page source or inspecting it through the developers tools of your browser. It's not an actual redirection (as in HTTP 3xx) but a page at a nice URL which embeds your ugly-URL page.
As such, the embedded page gets an invalid viewport value.
There may be some workarounds that you can use discussed in this thread.

What is the use of # in url

I realized that many of web app use # in their app's URL.
For example, Google Analytics.
This address is in the URL bar when I am viewing the visitor's language page:
https://www.google.com/analytics/web/?hl=en#report/visitors-language/a33185827w60383872p61754588/
This address is in the address bar when I am viewing the visitors' geolocation page:
https://www.google.com/analytics/web/?hl=en#report/visitors-geo/a33185827w60383872p61754588/
I think that this is the Google Analytics web app passing #report/visitors-language and #report/vistiors-geo.
I know that Google analytics is using an <iframe>. It seems that only the main content box is changing when displaying content.
Is # used because of the <iframe> functionality?
There are several answers but none cover the backend part.
Here is a URL, one from your own example:
www.google.com/analytics/web/?hl=en#report/visitors-language/a33185827w60383872p61754588/
You can think about the post-hash (including the hash #) part as a client-side request.
The web server will never know what was entered after the hash sign. It is the browser pointing to a specific ID on the page.
For basic web pages, if you have this HTML: <a name="main">welcome</a>
on a web page at www.example.com/welcome, going to www.example.com/welcome#main will scroll your browser viewport to the welcome text in the <a> HTML tag.
The web server will not know whether #main was in the URL or not.
Values in the URL after a question mark are called URL parameters, e.g. www.example.com/?foo=bar. The web server can deliver different content based on those values.
However, there is a technology developed by Google called AJAX (Asynchronous JavaScript and XML) that makes use of the # part in the URL to deliver different content without a page load. It's not using an <iframe>.
Using JavaScript, you can trigger a change in the URL's post-hash part and make a request to the server to get a specific part of the page, for example for the URL www.example.com/welcome#main2 Even if an element named #main2 does not exist, you can show one using JavaScript.
A hashbang is #!. It is used to make search engine indexing easier by indicating that this part is a dynamic web page.
This is the "hash" in the url.
Many browsers support hash change event in javascript.
as per my knowledge the hash change is the revolution in the ajax callbacks.
as such when the user interacts with the any link with a hash then on the hash change the event is fired and you can apply any thing with the javascript.
one more thing is that hash change is supported by the browser history.
see below URL
SEO and the use of !# in a url
or Read it
'#! is called a "hashbang" and they are the root of all that is evil in web development.'
Basically, weak web developers decided to use #anchor names as a kludgy hack to get "web 2.0" things to work on their page, then complained to google that their page rank suffered. Google made a work around to their kludge by enabling the hashbang.
Weak web developers took this work around as gospel. Don't use it. It is a crutch.
Web development that depends on hashbangs is web-development done wrong.
This article is far more well worded than I could ever be, and deals with the Gawker media fiasco from their migration to a (failed) hashbang centric website. It tells you WHAT is happening and why it's bad.
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
Correct me if I'm wrong, the hashtag in that URL would be used as an anchor to scroll the page to an element with an id. For example, I send you to the url http://example.com/sample#example, and the page would scroll (just display) at the element (I'm using a div as an arbitrary example, it could be anything).
Ajax and hash mark in the url mostly used for quick action.
If you have a part in your site that can be visible only by fire event (mostly click) - it would be hard to share it. With hash mark in the url you can (by javascript) make the browser think that you did the required action and it will display the relevant part.
Normally the '#' is using in url will find the particular id which is next to '#' in that particular page. By using this we can view the particular content at middle of the page also.

How should implement Hashbang (AJAX) in content page tabs?

As some of you may know, Google is now crawling AJAX. The implementation is by far something elegant, but at least it still applies to Yahoo and Bing AFAIK.
Context: My site is driven by Wordpress & HTML5. An Custom Post Type has tree types of content, and the contents of these are driven by AJAX. The solution I came for not using hashbangs (#!) until fully understand how to implement them is rather "risqué". Every link as HREF linking to *site.com/article-one/?tab=first_tab*, that shows only the contents of the selected tab (<div>Content...</div>). Like this:
This First Tab
As you may note, data-tab is the value that JavaScript sends with AJAX Get, that gets the related content and renders inside a container. At the other side, the server gets the variable and does a <?php get_template_part('tab-first-tab'); ?> to deliver the content.
About the risqué, well, I can see that Google and other search engines will fetch *http://site.com/article-one/?tab=first_tab* instead of http://site.com/article-one/, making users come to that URL instead of showing the home page with the tab content selected automatically.
The problem now is the implementation to avoid that.
Hashbang: From what I learned, I should do this.
HREF should become site.com/article-one/#!first-tab
JS should extract the "first-tab" of the href and pass it out to $_GET (just for the sake of not using "data-tab").
JS should change the URL to site.com/article-one/#!first-tab
JS should detect if the URL has #!first-tab, and show the selected tab instead of the default one.
Now, for the server-side implementation, here is where I'm kind lost in the woods.
How Wordpress will handle site.com/article-one/?_escaped_fragment_=first-tab?
Do I have to change something in .htaccess?
What should have the HTML snapshot? My guess is all the site, but with the requested tab showing, instead of showing only the content.
I think that I can separate what Wordpress will handle when it detects the _escaped_fragment_. If is requested, like by Google, it will show all the content plus the selected content, and if not, it's because AJAX is requesting it and will show only the content. That should be right?
I'm gonna talk third person.
Since this has no responses, I have a good one why you should not do this. Yes, the same reason why Twitter banged them:
http://danwebb.net/2011/5/28/it-is-about-the-hashbangs
Instead of doing hashbangs, you should make normal URIs. For example, an article with summary tab on should be "site.com/article/summary", and if it is the default one that pops out (or is it already requested) it also should change to that URI using pushState().
If the user selects the tab "exercises", the URL should change to "site.com/article/exercises" using pushState() while the site loads the content throught AJAX, and while you still maintain the original href to "site.com/article/exercises". Without JavaScript the user should still see the content - not only the content, the whole page with the tab selected.
For that to work, some editing to the .htaccess to handle the /[tab] in the URL should be done.

how to change url in browser url box?

I really wonder why facebook and google can change the url without reloading the page? they just change the block or content in their site.
I notice that when I am using facebook, when click on the "new feed" the url is "http://www.facebook.com/" and the page didn't reload, then i click on "messages" the url changed to "http://www.facebook.com/messages/" and the page still not reload just change the "content" block of the site.
So how do I change url without reloading the page?
edit: i got the answer.
there are 2 cases here:
browser support html5 (Firefox 3.6 + etc.): using html5 history. (example: www.facebook.com => www.facebook.com/messages )
browser dosn't support html5 (IE6, IE7, IE8 etc.): using hash tag (#) (example: www.facebook.com => www.facebook.com/#!/messages )
hope this help to who have doubt like me.
Have you looked into the history API for Javascript?
http://diveintohtml5.ep.io/history.html
EDIT: You could also use mod_rewrite with apache and then, but that would cause a refresh.
Or there is this JQuery Plugin
http://www.asual.com/jquery/address/
The URL usually changes to http://facebook/#!messages, so the change of the "fragment" URL part doesn't make the browser reload the page. Instead, there is some JavaScript library that watches fragment changes and make appropriate requests in order to reload the page content.
The usage of #! is almost becoming a "standard" for doing these things, I've seen this used elsewhere (eg. on Twitter). I don't remember if they all use the same library or just the naming convention, but you should be able to dig about it on the fb/twitter developers pages.
You could look into the Content-Location HTTP header for this purpose. See here for more info.
I code on JSBin.com, mainly use CSS and HTML (Abandoned Javascript loooong time ago) and have a question. For example a page's URL is http://www.codingrules.com/
Well, using HTML, How can I change that URL to for example
http://www.ilovecoding.com

Crawl Website using PHP

I've tried a bunch of techniques to crawl this url (see below), and for some reason the title comes back incorrect. If I look at the source of the page with firebug I can see the correct title tag, however, if I view the page source it's different.
Using several php techniques I get the same result. Digg is able to crawl the page and parse the correct title.
Here's the link: http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
The correct title is "How to Make Your iPhone (or Other iOS Device) More Like Android"
The parsed title is "Lifehacker, tips and downloads for getting things done"
Is this normal? How are they doing this? Is there a way to get the correct title?
That's because when you request it using PHP (without any JS support) you're getting the main page of lifehacker - which is lifehacker.com.
Lifehacker switched their CMS recently so that all requests go to an initial page and then everything after the hashbang is read by a JS script in the main page to figure out which page needs to be served. You need to modify your program to take this into account
EDIT
Have a gander at these links
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch
Found the answer:
http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
becomes:
http://lifehacker.com/?_escaped_fragment_=5772420/how-to-make-ios-more-like-android

Categories