SEO duplicate content issue with alternative URLs - php

I have a PHP website where every page can be accessed either by page ID or by page name:
http://domain/page_id=ID
http://domain/page=NAME
The problem is that Google treats this as duplicated content. What is the best practice to avoid duplicate content in the case? Will 303 redirect will be better than entirely avoiding two different URLs to lead to the same page?

According to Google:
In the world of content management and online shopping systems, it's
common for the same content to be accessed through multiple URLs.
Therefore,
Indicate the preferred URL with the rel="canonical" link element
Suppose you want
http://blog.example.com/dresses/green-dresses-are-awesome/ to be the
preferred URL, even though a variety of URLs can access this content.
You can indicate this to search engines as follows:
Mark up the canonical page and any other variants with a
rel="canonical" link element. Add a element with the attribute
rel="canonical" to the section of these pages:
This indicates the preferred URL to use to access the green dress
post, so that the search results will be more likely to show users
that URL structure. (Note: We attempt to respect this, but cannot
guarantee this in all cases.)
So, all you need to do is to add the canonical link element to the <head> section of your pages with absolute paths.

Related

Are Robots.txt and metadata tags enough to stop search engines to index dynamic pages that are dependent of $_GET variables?

I created a php page that is only accessible by means of token/pass received through $_GET
Therefore if you go to the following url you'll get a generic or blank page
http://fakepage11.com/secret_page.php
However if you used the link with the token it shows you special content
http://fakepage11.com/secret_page.php?token=344ee833bde0d8fa008de206606769e4
Of course this is not as safe as a login page, but my only concern is to create a dynamic page that is not indexable and only accessed through the provided link.
Are dynamic pages that are dependent of $_GET variables indexed by google and other search engines?
If so, will include the following be enough to hide it?
Robots.txt User-agent: * Disallow: /
metadata: <META NAME="ROBOTS" CONTENT="NOINDEX">
Even if I type into google:
site:fakepage11.com/
Thank you!
If a search engine bot finds the link with the token somehow¹, it may crawl and index it.
If you use robots.txt to disallow crawling the page, conforming search engine bots won’t crawl the page, but they may still index its URL (which then might appear in a site: search).
If you use meta-robots to disallow indexing the page, conforming search engine bots won’t index the page, but they may still crawl it.
You can’t have both: If you disallow crawling, conforming bots can never learn that you also disallow indexing, because they are not allowed to visit the page to see your meta-robots element.
¹ There are countless ways how search engines might find a link. For example, a user that visits the page might use a browser toolbar that automatically sends all visited URLs to a search engine.
If your page isn't discoverable then it will not be indexed.
by "discoverable" we mean:
it is a standard web page, i.e. index.*
it is referenced by another link either yours or from another site
So in your case by using the get parameter for access, you achieve 1 but not necessarily 2 since someone may reference that link and hence the "hidden" page.
You can use the robots.txt that you gave and in that case the page will not get indexed by a bot that respects that (not all will do). Not indexing your page doesn't mean of course that the "hidden" page URL will not be in the wild.
Furthermore another issue - depending on your requirements - is that you use unencrypted HTTP, that means that your "hidden" URLs and content of pages are visible to every server between your server and the user.
Apart from search engines take care that certain services are caching/resolving content when URLs are exchanged for example in Skype or Facebook messenger. In that cases they will visit the URL and try to extract metadata and maybe cache it if applicable. Of course this scenario does not expose your URL to the public but it is exposed to the systems of those services and with them the content that you have "hidden".
UPDATE:
Another issue to consider is the exposing of a "hidden" page by linking to another page. In that case in the logs of the server that hosts the linked URL your page will be seen as a referral and thus be visible, that expands also to Google Analytics etc. Thus if you want to remain stealth do not link to another pages from the hidden page.

Php - URL structure to link specific comment

Context: I am building the blog section of a website in Symfony2.
Question:
Which is the best way to link a specific comment in a news? How should I define the route structure?
Examples:
Single News Url:
example.com/news/{news_id}
Single News + Comment Url:
example.com/news/{news_id}/comment/{comment_id}
or
example.com/news/{news_id}#comment-{comment_id}
or
example.com/news/{news_id}?comment={comment_id}
These are just some suggestions...
VERY IMPORTANT:
I need to use both the news_id and the comment_id inside a controller. They need to be retrievable/available.
Structure of the suggested links will have different outcomes, and your comment_id variable wont be available for your script in all cases.
Something important for news page they affect SEO differently.
first 2 variants of the url.
example.com/news/{news_id}/comment/{comment_id}
example.com/news/{news_id}?comment={comment_id}
commment_id WILL be available in your script. Symfony will pass to your controller or you will be able to get it from the Request object (or from $_GET variable - but don't do this)
browser WILL NOT
don't worry you WON'T have duplicate content if you don't create both routes for the same page.
from SEO point of view you are creating a separate page for each comment (I know you are not), that's the way what google would except from that url structure. To avoid duplicate content just add canonical link element to HEAD to point to the root url <link rel="canonical" href="example.com/news/{news_id}" />
hashtag url variant
example.com/news/{news_id}#comment-{comment_id}
comment_id WILL NOT be available in your script. Everything after # is handler directly by browser, and it WILL NOT send it to the server at all. The comment_id value WILL still be available by javascript (this is how stackoverflow does it)
browser WILL try to move(scroll) to a portion of the html, where the Key after # is used as ID.. eg. <div id="comment_id-123">. If you don't have element with the ID in your markup, it will stay on top.
solution
Based on assumption that you don't want separate page for each comment, and you only need the comment_id for pagination of the comments.
Right solution would be to use the # variant of URL.
load the page with just the news_id
after page load, do an ajax call with the comment_id parameter for comments, or for 1st page if there is no parameter.
change the comment section with returned information about pages etc.
add loader images, there so user will know whats happening as this increases UX.
more SEO suitable alternative
If you want better SEO, more suitable url would be not with ids but with slugs, and also without unnecessary words. I personally suggest this:
example.com/n/{news_slug}#{comment_title_slug}-{comment_id}
// would become something like1
example.com/n/answer-to-life-is-awesome#yeah-it-was-a-great-book-5435335435
If you can handle the parsing, and db querying, it could be also without the /n prefix.
You mixed two different concepts:
Links
The two following links carry the same parameters, news and comment ids:
example.com/news/{news_id}/comment/{comment_id}
or
example.com/news/{news_id}?comment={comment_id}
A browser opening these URLs will not scroll the page.
Links with anchor
This following link have only one parameter, news id, and it uses an anchor:
example.com/news/{news_id}#comment-{comment_id}
A browser opening this URL will scroll to the anchor.
So it depends of your need, if you want that the visitors' browsers scroll to the comment, use an anchor (it's also possible with Javascript).
Here are valid anchors:
<div id="comment-42">...</div>
or
<p>...</p>
Here is a link leading to this answer: https://stackoverflow.com/questions/33176341/php-url-structure-to-link-specific-comment/33179630#33179630
See also #! URLs.
I believe the better way is using #comment-{comment_id} because the duplicate URLs problem that the other 2 URLs may cause.

Repeating Content - PHP includes? [SEO]

I have quite large amount of the same content that needs to be repeated on all 28 product pages of a website that I am working on.
In terms of SEO, I know web sites like Google don't like this and just see this as duplicated content.
I thought using a <?php include 'page.php' ?> would resolve this but this just writes the text as HTML and therefore makes no impact meaning it would still be seen as duplicated content.
I know I can use <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> so that bots don't read these pages but if I was to do this, the only page it would be following is the homepage.
What would be the best way to get around this?
Is it possible to use the NOFOLLOW method for certain sections of the website?
Any suggestions on this would be very helpful!
My suggestion would be to think about your visitors first, not Google and their SEO requirements. Is the repetition of content beneficial to the visitors, then do it.
In SEO terms: if you duplicated content 28 times it might be seen as the same content, so it's not counted seperately. So what? It IS the same content, and you know it.
Websites are made for visitors in the first place, and search engines secondly. You should take SEO optimization in consideration, but don't let it dictate the user experience of your website. Make the best website you can, for real people.
Google understands boilerplate content, so if you need the information on the pages, then so be it.
Google is generally quite good at recognizing "boilerplate text" (text
which you repeat on many pages) and treating it appropriately. I
wouldn't worry about having to place a disclaimer on your pages. If
you want to make it clearer to search engines that it's not relevant
to your content, you could also just place the text in an image
(personally, I'd just place the text on the pages normally).
https://www.seroundtable.com/google-duplicate-text-14515.html
There's a better way to do this.
Combine the 28 product pages into 1 page that serves dynamic content from a database.
So each time you go to this single page, you pass a particular product ID (through a query string parameter, form data, a cookie, or a SESSION variable).
So instead of:
product_bicycle.php
product_skateboard.php
product_widget.php
etc.
You could have:
products.php?id=123
products.php?id=234
products.php?id=345
etc.
If all your product information in stored in raw HTML files, you'd just have to put those into a MySQL database. Create a table called products. Put in it a column called something like "Product_ID" which would hold the values like "123", "234", and "345" shown above. And put in another column like "Product_Details" which would hold the HTML of the product description.
When the page loads, you'd want PHP to do the following things:
Show the HTML for the page header, logo, navigation tabs, and other shared items. (Use the PHP "echo" or "print" statement - or just include it in the raw HTML of the page.)
Use the $_GET, $_POST, $_COOKIE, or $_REQUEST variables to get the product ID passed to the page. For example, "$id = $_GET['id'];"
Do a SQL query to pull up the product record based on the ID.
Show the description of the product from the result set.

How to prevent duplicate title tags on dynamic content

Links on the website I am making currently look like this:
http://www.example.net/blogs/151/This-is-a-title-in-a-url
My php system pulls out the id (151 say) and uses that to pull to content from my database. The text afterwards is effectively ignored (much like stackoverflow uses).
Now my problem is that this creates duplicate titles that Google will sometimes index and I lose SEO as a result:
http://www.example.net/blogs/151/This-is
http://www.example.net/blogs/151/
What is the best way to make it so that google and other search engines only see the correct full link so that I don't end up with duplicates and get the best ranking possible?
EDIT: I notice that with stackoverflow site that you get dynamically redirected to another page? How do they do that?
Pick a URI to be canonical.
When you get a request for http://example.com/123/anything then, instead of ignoring the anything, compare it to the canonical URI.
If it doesn't match, issue a 301 Moved Permanently redirect.
A less optimal approach would be to specify the canonical URI in the page instead of redirecting:
<link rel="canonical" href="http://example.com/123/anything"/>

Make slug mandatory in page URL?

I'm deciding whether or not to make the slug mandatory in order to view a submission.
Right now either of these works to get to a submission:
domain.com/category/id/1/slug-title-here
domain.com/category/id/1/slug-blah-foo-bar
domain.com/category/id/1/
All go to the same submission.
You can also change the slug to whatever you want and it'll still work as it just checks for the category, id, and submission # (in the second example).
I'm wondering if this is the proper way to do this? From an SEO standpoint should I be doing it like this? And if not, what should I be doing to users who request the URL without the slug?
The slug in the url can serve three purposes:
It can act as a content key when there is no id (you have an id,so this one doesn't apply)
When just a url is posted as a link to your site, it can let users know what content to expect because they see it in the url
It can be used by search engines as a ranking signal (Google does not use url words as a ranking signal very much right now as far as I can tell)
Slugs can create problems:
Urls are longer, harder to type, harder to remember, and often get truncated
It can lead to multiple urls for the same page and SEO problems with content duplication
I am personally not a fan of using a slug unless it can be made the content key because of the additional issues it creates. That being said, there are several ways to handle the duplicate content problems.
Do nothing and let search engines sort out duplicate content
They seem to be doing better at this all the time, but I wouldn't recommend it.
Use the canonical tag
When a user visits any of the urls for the content, they should get a canonical tag like such:
<link rel="canonical" href="http://domain.com/category/id/1/slug-title-here" />
As far as Google is concerned, the canonical tag can even exist on the canonical url itself, pointing to itself. Bing has advised against self referential canonical tags though. For more information on canonical tags see: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
Use 301 redirects
Before canonical tags, the only way to avoid duplicate content would be with 301 redirects. Your software can examine the url path, and compare the slug to the correct slug. If they don't match, it can issue a 301 redirect that will send the user to the canonical url with the correct slug. The stack overflow software works this way.
so these urls:
domain.com/category/id/1/slug-blah-foo-bar
domain.com/category/id/1/
would redirect to
domain.com/category/id/1/slug-title-here
which would be the only url that would actually have the content.
Assuming you're not ever going to change a page's slug, I'd just set up domain.com/category/id/1/ to do a 301 redirect (permanent) to domain.com/category/id/1/slug-title-here, and any time someone enters a slug which is incorrect for that article (domain.com/category/id/1/slug-title-here-oops-this-is-wrong), also 301 them to the correct address.
That way you're saying to the search engines "I don't have duplicate content, look, this is a permanent redirect" so it doesn't harm your SEO, and you're being useful to the user in always taking them to the correct "friendly url" page.
I suggest you to make a rel=canonical meta tag.
This prevents do a redirect each time considering someone can link your page with infinte variant like this:
domain.com/category/id/1/?fakeparam=1
From a SEO standpoint, as long as your only ever linking to one version of those URL's, it should be fine as the other URL's won't even be picked up by the search engine (as nowhere links to them).
If however you are linking to all 3, then it could hurt your rankings (as it'll be considered duplicate content).
I personally wouldn't make the slug required, but I would make sure that (internally) all links would point to a URL including the slug.

Categories