Strategy for permanent links (not wordpress) - php

I'm building a little database driven PHP CMS. I'm trying to figure the best strategy for this case scenario:
I have a URL like this:
http://www.my.com/news/cool-slug
Someone saves or share this URL (or it gets indexed by Google).
Now I realize that the slug is not quite right and change it to:
http://www.my.com/news/coolest-slug
Google and users who previously saved the URL will hit a 404 error.
Is this the best and common solution (showing the 404) or should I keep a table in my database with all the history of the generated URLs mapped to the ID of the page and redirect with a 301 header?
Will this be an unnecessary load on my system (this table can get lots of records...)?

One very common solution used by many sites (including StackOverflow as far as I can tell) is to include the ID in the URL. The slug is just here for SEO/beauty/whatever, but is not used to identify the page.
Example: http://stackoverflow.com/questions/27877901/strategy-for-permanent-links-not-wordpress
As long as you have the right ID, it doesn't matter what slug you use. The site will just detect that the slug is wrong, and generate a redirect to the right one. For example, the following URL is valid:
http://stackoverflow.com/questions/27877901/old-slug
If for some reason you do not want the ID in the URL, then either you forbid changes (many news site do that: you can notice that sometimes the slug and the title of a news article do not match), or your have to live with the occasional 404 when the slug changes. I've never seen any website having a system to keep the slug history, as it can be quite annoying (you won't be able to "reuse" a slug for example).

Related

Google Index: Using final URL or Link

This may seem like a basic question (and I've searched for the answer), but I'm wondering if anyone knows whether Google indexes using the final URL displayed (in the address bar), or just the link it used to get there (or both).
A client has a Drupal 7 website with product category links in friendly URL format, e.g.
website.com/product/cat/bedroom
...which is perfectly fine for the primary Menu, but when using the filtering menu I would like those links to be "standard" PHP Queries e.g.
/?q=product/cat/bedroom/cat/bathroom
This can be achieved quite easily as Drupal (and obviously PHP) already accept the query and just requires editing of the filter block's link. However, when landing on that filtered page the URL will drop the query and revert to the friendly URL. My question is, will Google index that URL, or the link's query URL - or both?
The ultimate goal is to block Google from indexing the q parameter, meaning it won't dive deep into filtering, which has resulted in tens of thousands of useless indexes, and only have the pages linked directly to with friendly URLs.

Doing a redirect without using 301 but a PHP script, is that OK?

I've been reading about redirection, and how it can affect (or not if done properly) SEO.
I'm changing my website's content platform from Drupal to a PHP custom made code.
In my current site I have two links that point to the same link like this:
.../node/123
.../my-node-title
Mainly because Drupal allows you to create a custom-made links, so every article has a default one (node/123) and the custom-made one (/my-node-title).
My question is about what to do in order to prevent losing any SEO that each link may have.
In the new website all articles are structured like this: content.php?id=123
I've stored in the database the custom-made link of every article.
Instead of doing a 301 redirect I'm redirecting all links that do not exist to be redirected to redirect.php page to process the request. There I take the string from the link, look for it in the database and redirect the user.
The process is like this:
in .htaccess file:
RewriteRule ^.*$ ./redirect.php
In redirect.php:
I grab the $_SERVER['REQUEST_URI'] and using explode() I get the last part of the link (ie. my-node-title), look for it in the database and grab the ID of the article (ie. 123) and save it in a $link variable.
Then I use header() function and do the redirect: header('Location: '.$link);
So, people still click on .../my-node-title but when the article loads at the navigation bar appears /content.php?id=123
I would like to know your comments about this solution. I know that with SEO there are not fixed rules, or certainty in anything, but I would like to know if what am I doing is acceptable. Thanks!
Your SEO strategy should not only focus on discoverability of your pages, but also take proper UX into account. Having a user follow /some-link/, and then landing on /index.php?page_id=123 may disorient them.
As for saving your ranking, a 302 redirect (which is what the 'Location' header does in PHP), will not affect PageRank, according to Google. I have no information on how it might adversely affect other ranking signals. You would probably do good to specify a canonical URL for all distinct links that point to the same resource.
Also, be aware that your algorithm won't work, if query parameters are present. You might also want to look at properly handling optional trailing slashes.
Ideally, in my opinion, you would want to provide consistent URLs to the outside world, without any need for redirection. Your URL handling would then internally resolve them to their respective resources, serving the canonical URL on every page load.

URL Rewrite: multiple addresses per article

I have a Joomla! website with rewrite rules activated. My article URl is mysite.com/category/ID-alias.html. The only thing which is important (from this url) is the id, because when I can access the article with any text at "category" and any text at "alias".
Let's show a concrete example:
My article URL: mysite.com/flowers/15-begonia.html
I can access the same by changing category name and alias directly from url:
mysite.com/tralala/15-anything.html //Shows the same article as above.
Is this SEO? If one of my visitors want to destroy my website SEO, can he open my articles with different addresses (like above) and Google will say that articles are duplicated? Does Google knows when a visitor goes to a webpage to which link doesn't exists anywhere?
Hope my question is clear.
Thanks.
Google do a good job of deciding which is the "right" version of a page - it is worth watching this video to see how they handle this situation:
http://www.youtube.com/watch?v=mQZY7EmjbMA
Since these wrong URLs should not be linked to from anywhere, it is unlikely they will be indexed by mistake.
However, should they index the wrong version of a page, setting a sitemap with the right one will usually fix it.
A visitor could not harm your SEO with this knowledge. The worst they could do would be to provide good links to a non-indexed page, which would cause the wrong URL to be indexed. However, it would then be very easy for you to 301 redirect that page and turn their attempts at harm into an SEO benefit.
I personally think Joomla should look into adding the canonical tag, but if you want that currently, you must use an extension like this:
http://extensions.joomla.org/extensions/site-management/seo-a-metadata/url-canonicalization-/25795
(NB I have never used this extension so cannot guarantee its quality - the reviews are good, though)

Make slug mandatory in page URL?

I'm deciding whether or not to make the slug mandatory in order to view a submission.
Right now either of these works to get to a submission:
domain.com/category/id/1/slug-title-here
domain.com/category/id/1/slug-blah-foo-bar
domain.com/category/id/1/
All go to the same submission.
You can also change the slug to whatever you want and it'll still work as it just checks for the category, id, and submission # (in the second example).
I'm wondering if this is the proper way to do this? From an SEO standpoint should I be doing it like this? And if not, what should I be doing to users who request the URL without the slug?
The slug in the url can serve three purposes:
It can act as a content key when there is no id (you have an id,so this one doesn't apply)
When just a url is posted as a link to your site, it can let users know what content to expect because they see it in the url
It can be used by search engines as a ranking signal (Google does not use url words as a ranking signal very much right now as far as I can tell)
Slugs can create problems:
Urls are longer, harder to type, harder to remember, and often get truncated
It can lead to multiple urls for the same page and SEO problems with content duplication
I am personally not a fan of using a slug unless it can be made the content key because of the additional issues it creates. That being said, there are several ways to handle the duplicate content problems.
Do nothing and let search engines sort out duplicate content
They seem to be doing better at this all the time, but I wouldn't recommend it.
Use the canonical tag
When a user visits any of the urls for the content, they should get a canonical tag like such:
<link rel="canonical" href="http://domain.com/category/id/1/slug-title-here" />
As far as Google is concerned, the canonical tag can even exist on the canonical url itself, pointing to itself. Bing has advised against self referential canonical tags though. For more information on canonical tags see: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
Use 301 redirects
Before canonical tags, the only way to avoid duplicate content would be with 301 redirects. Your software can examine the url path, and compare the slug to the correct slug. If they don't match, it can issue a 301 redirect that will send the user to the canonical url with the correct slug. The stack overflow software works this way.
so these urls:
domain.com/category/id/1/slug-blah-foo-bar
domain.com/category/id/1/
would redirect to
domain.com/category/id/1/slug-title-here
which would be the only url that would actually have the content.
Assuming you're not ever going to change a page's slug, I'd just set up domain.com/category/id/1/ to do a 301 redirect (permanent) to domain.com/category/id/1/slug-title-here, and any time someone enters a slug which is incorrect for that article (domain.com/category/id/1/slug-title-here-oops-this-is-wrong), also 301 them to the correct address.
That way you're saying to the search engines "I don't have duplicate content, look, this is a permanent redirect" so it doesn't harm your SEO, and you're being useful to the user in always taking them to the correct "friendly url" page.
I suggest you to make a rel=canonical meta tag.
This prevents do a redirect each time considering someone can link your page with infinte variant like this:
domain.com/category/id/1/?fakeparam=1
From a SEO standpoint, as long as your only ever linking to one version of those URL's, it should be fine as the other URL's won't even be picked up by the search engine (as nowhere links to them).
If however you are linking to all 3, then it could hurt your rankings (as it'll be considered duplicate content).
I personally wouldn't make the slug required, but I would make sure that (internally) all links would point to a URL including the slug.

Concept & Algorithm: How to record only single URL for widget?

I have created a widget for my web application. User's getting code and just pasting that code in their website and my widget works on their website something like twitter, digg and other social widgets.
My widget is on the basis of post, for a single post (say postid: 234) I am providing single widget, so anyone can embed the widget on their website.
Now I want to know that where all my widget is posted and for which post? for that I have recorded the URL of the site when my widget start (onload) but the problem arises when someone placed the widget in their blog or website's common sidebar. I am recording URL each time and hence if it's in sidebar of a blog then it's recording URL for every post which is creating duplicates.
can anyone help on this? How should I go so that I have only one single record for a widget on a site?
I think doing something like this is a bit tricky. Here are some ideas that pop to mind
You could for example ask the user to input their site's URL when they get the widget, or the widget could track the domain or subdomain, thus giving less URLs.
Just tracking the domain would obviously be problematic if the actual site is domain.com/sitename/, and there could be more than one site under the domain. In that case, you could attempt to detect the highest common directory. Something like this:
You have multiple URLs like this: domain.com/site/page1, domain.com/site/page2, and so on. Here the highest common directory would be domain.com/site.
I don't think that will always work correctly or provide completely accurate results. For accuracy, I think the best is to just ask the user for the URL when they download the code for the widget.
Edit: new idea - Just generate a unique ID for each user. This could be accomplished by simply taking the current timestamp or something, and hiding it into the code snippet the user is supposed to copy. This way you can track the ID itself and any URLs and domains it appears in can be grouped under it.
If you have an ID which doesn't get a hit in say week or something you could remove it from your database, and that way avoid filling it up with unused IDs.
I agree with Jani regarding a unique id. When you dish out the script you'll then be able to always relate back to that id. You are still going to have duplicates if the user uses the same id over and over, but at least you'll have a way of differentiating one user from another. Another useful advantage is that you are now able to, as Jani said, group by the ID and get a cumulative number for all of the instances where that user used the script & id.

Categories