Crawler adds parameter from url to links

Crawler adds parameter from url to links - php

I tried two different web crawler (Sistrix and http://ssitemap.com). Both crawlers report errors about duplicated content for URLs like / and /?katID=12.
It turns out that if the crawler calls the url /projekte/index.php?katID=12 it finds Home and adds it as link to /?katID=12. Looks like the parameter from the url ?katID=12 is added to every link on the page that does not have a parameter.
If I use a browser or wget I see my simple html link to / as wanted.
Did I do something wrong? Server config?
Is this a bug or a feature in the crawler?

I added <link rel="canonical" href="..."> to every page to help crawlers identify equal pages.
See also http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

Related

Browser adds my website url as prefix to all links in my page content

I have a problem which is that in my website all <a></a> tags get automatically a prefix which is my website address.
For example:
My website home url is: mysite.com
Now when I create a normal <a></a> tag in my page like check ,
it will be rendered in my page as check .
Does anyone know why this is happening??
Thanks in advance.

It's not caused by Laravel. The browser cannot interpret the target since you haven't indicated that the resource is a URL. As a result it cannot differentiate between external URLs, relative URLs and files.
Add the protocol in order for it to work as you expected:
check
Alternatively you can also add 2 slashes, however that won't work if you save the page output as a HTML on your device and open it from there.
check

Why does Facebook not use _escaped_fragment_ instead of #! in some cases?

I have code in my page that is activated by this (I can output its value in a comment in the header):
isset($_GET['_escaped_fragment_'])
and I'm looking at the source of 'what scraper sees' using this tool https://developers.facebook.com/tools/debug
and my URL has #! shebang in it.
Still, one of the sites I'm testing receives the _escaped_fragment_ (Facebook visits using ?_escaped_fragment_= in the URL), while on another it doesn't.
I don't think it has anything to do with what's on the page (og metas) since it determines whether or not to rewrite #! to ?_escaped_fragment_= before even loading the URL.
Can someone enlighten me what's required to make this feature work?

It IS because of the meta og:url / link rel=canonical. I've found that Facebook 'what scraper sees' presents you the final result, not the 'first crawl' that you'd expect. So FB crawler goes on to the page, sees a meta tag with og:url or most importantly the link rel=canonical. It then stops crawling the page and goes to the URL specified. Then it presents you the source of that URL, and that doesn't have the shebang in it. It's all logical but I didn't count that it does this 'hidden redirect' or bounce behind the scenes. The solution is to filter out / remove meta og:url and link rel=canonical from the head, that's about it. Several WP plugins add these by the way.

How to prevent duplicate title tags on dynamic content

Links on the website I am making currently look like this:
http://www.example.net/blogs/151/This-is-a-title-in-a-url
My php system pulls out the id (151 say) and uses that to pull to content from my database. The text afterwards is effectively ignored (much like stackoverflow uses).
Now my problem is that this creates duplicate titles that Google will sometimes index and I lose SEO as a result:
http://www.example.net/blogs/151/This-is
http://www.example.net/blogs/151/
What is the best way to make it so that google and other search engines only see the correct full link so that I don't end up with duplicates and get the best ranking possible?
EDIT: I notice that with stackoverflow site that you get dynamically redirected to another page? How do they do that?

Pick a URI to be canonical.
When you get a request for http://example.com/123/anything then, instead of ignoring the anything, compare it to the canonical URI.
If it doesn't match, issue a 301 Moved Permanently redirect.
A less optimal approach would be to specify the canonical URI in the page instead of redirecting:
<link rel="canonical" href="http://example.com/123/anything"/>

Facebook like box not allowed additional query string from wordpress site?

I am unable to share wordpress page with my custom query string in facebook like box using facebook api. for example http://www.example.com/?page_id=10&myquery=10 after hit like button "myquery=10" is removed from above url, only we are getting "http://www.example.com/?page_id=10" in my wall.
Thanks&Regards,
Arunabathan.G

The problem here is the canonical URL you've set your page to.
How to discover the problem?
If you check out your URL on the facebook url debugger tool, you'll see that the fetched URL if the url with the querystring (http://breakbounce.com/lookbook/?slideID=4), but the canonical URL does not have the querystring (http://breakbounce.com/lookbook/).
Where does this comes from?
This problem can be originated by two meta-tags, either og:url is defined to a different URL or a <link rel="canonical" ...> is defined (being the last one your problem, view your page souce and search your code for <link rel='canonical' href='http://breakbounce.com/lookbook/' />)
How to fix it?
In a normal situation you need to alter or delete the tag you identified as problematic on the previous step.
In you case you need either to change or delete the <link rel='canonical' href='http://breakbounce.com/lookbook/' /> tag.
Importante Note: After changing or deleting the tag you'll need to visit the facebook URL debugger again and enter your URL, in order to clear the cache of your url from Facebook.
Anything let me know.

You can also fix it like this:
http://www.example.com/?page_id=10%26myquery=10
the %26 is an equivalent for the ampersand. I had this problem when I tried additional parameters in a query, and this worked for me :)

SEO friendly ajax driven websites

I have created an ajax driven website which can load any page when given the correct parameters. For instance: www.mysite.com/?page=blog&id=7 opens a blog post.
If I create a sitemap with links to all pages within the website will this be indexed?
Many thanks.

If you provide a url for each page that will actually display the full page, then yes. If those requests are just responding with JSON, or only part of a page, then no. In reality this is probably a poor design SEO wise. Each page should have it's own URL e.g. www.mysite.com/unicorns instead of www.mysite.com/?page=blog&id=1, and the links on the page should point to those. Then you should be using Javascript to capture all the link click events for the AJAX links, and then use Javascript how you like to update the page. Or better yet maybe try out PJAX which will load just the content of a page instead of a full page refresh speeding things up a little without really any changes from your normal site setup.

You do realize that making that sitemap all your search engine links will be ugly.
As Google said a page can still be crawled with nice url if you use fragment identifier:
<meta name="fragment" content="!"> // for meta fragment
and when you generate your page by ajax append the fragment to URL:
www.mysite.com/#!page=blog-7 //(and split them)
The page should load content directly in PHP by using $_GET['_escaped_fragment_']
As I've read that Bing and Yahoo started crawling with same process.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Crawler adds parameter from url to links - php

I added <link rel="canonical" href="..."> to every page to help crawlers identify equal pages. See also http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

Related

Browser adds my website url as prefix to all links in my page content

Why does Facebook not use _escaped_fragment_ instead of #! in some cases?

How to prevent duplicate title tags on dynamic content

Facebook like box not allowed additional query string from wordpress site?

SEO friendly ajax driven websites

Categories

Resources