I'm developing a PHP website, and currently my links are in a facebook-ish style, like so
me.com/profile.php?id=123
I'm thinking of moving to something more friendly to crawling search engines
(like here at stackoverflow), something like:
me.com/john-adams
But how can I differentiate from two users with the same name - or more correctly, how does stackoverflow tell the difference from two questions with the same title?
I was thinking of doing something like
me.com/john-adams-123
and parsing the url.
Any other recommendations?
Stackoverflow does something similar to your me.com/john-adams-123 option, except more like me.com/123/john-adams where the john-adams part actually has no programmatic meaning. The way you're proposing is slightly better because the semantic-content-free numeric ID is farther to the right in the URL.
What I would do is store a unique slug (these SEO-friendly URL components are generally called slugs) in the user table and do the number append thing when necessary to get a unique one.
In stack overflow's case, it's
http://stackoverflow.com/questions/975240/using-seo-friendly-links
http://stackoverflow.com/questions <- Constant prefix
/975240 <- Unique question id
using-seo-friendly-links <- Any text at all, defaults to title of question.
Facebook, on the other hand, has decided to just make everyone pick a unique ID. Then they are going to use that as a profile page. Something like http://facebook.com/p/username/. They are solving the problem of uniqueness between users, by just requiring it to be some string that the user picks that is unique among all existing users.
SO 'cheats' :-).
The link for your question is "Using SEO-friendly links" but "Using SEO-friendly links" also works.
The part after the number is the SEO friendly bit, but SO doesn't really care what's there. I think it defaults to the question title.
So in your case you could construct a link like:
me.com/123/john-adams
a second john adams would have a different Id and a unique URL like :
me.com/111/john-adams
I would say that your proposed solution is a better solution to that of stackoverflows as it preserves content hierarchy:
me.com/john-adams-123
Usage of the unique ID before the username is simply nonsensical.
I would, however, recommend enforcement of content type:
me.com/john-adams-123.html
This will allow for consistent urls while serving a variety of content types.
Additionally, you could make use of sexatrigesimal for the unique id, to further reduce the amount of unnecessary cruft in your URL, especially for high end numbers, but this is often overkill :D
me.com/john-adams-123.html -> me.com/john-adams-3F.html
me.com/john-adams-1234567890.html -> me.com/john-adams-KF12OI.html
Finally, be sure to utilize 301 redirects on non-conforming accessible URIs to redirect to the "correct" seo-friendly schema to prevent duplicate content penalties.
I'd go with your style of me.com/john-adams-123, because I think the leftmost part of the URI has more importance in SEO ranking.
Actually, if you are willing to use this on several controllers (not just user profile), you may want to do it more like me.com/john-adams-profile-123 with a rewriting rule redirecting /.+-profile-(\d+) to profile.php?uid=$1 and still be able to use, say, me.com/john-adams-articles-123 for this user's articles...
To avoid dealing with the links contain special characters, you can use this plugin for Zend Framework.
https://github.com/btlagutoli/CharConvert
$filter2 = new Zag_Filter_CharConvert(array(
'onlyAlnum' => true,
'replaceWhiteSpace' => '-'
));
echo $filter2->filter('éééé ááááá ? 90 :');//eeee-aaaaa-90
this can help you deal with strings in other languages
Related
Cheers everyone!
Please bear with me, I really did do some research on this, but I couldn't come to a final solution, hence I'm here to hear your opinions.
What I want to build is a small i18n-CMS with dynamic hierarchical pages such as:
domain.tld/en/I/am/a/path
I want to find the least performance intense way that allows me to have beautiful, SEO and human-friendly URLs.
I use a Closure-Table, so two tables in the database, one for the pagenodes and one for the pathtree plus another table for the localised page, that references a certain pagenode (three in total).
My different solutions so far:
Sure I could make an algorithm, that goes through all the different request segments and checks if there is an English "path" under an "a" under an "am" under an "I", but this seems very unwise considering a multitude of page-hits.
Or is it?
Positive: I wouldn't need to save the path anywhere, because it would be calculated. So moving pages around wouldn't need to recalculate the path and save it again.
I could simply save the whole path to the database, as VARCHAR(2000) or something and then just check if there is a page with path "I/am/a/path" in English language and get that one.
This seems to be rather messy.
As I do it now. Currently I add an "ID" at the end of my path. Such as:
domain.tld/en/I/am/a/path.1
So if you enter "domain.tld/en.1" you get forwarded to the one with the right slug. But here again I need to save the slug to the database, for each single page.
Also I would love to get rid of the id (could I do this with mod-rewrite and .htaccess?)
Any more insights on this one? As I'm not a webdeveloper, so I'm not really sure regarding performance.
Kindest regards,
Meren
It seems to me that page request will happen a million times more often than an editor changing a page address. So I would definitely go with the save-to-db option. What you can do is create an extra field in which you save the 'slug' for that page, in combination with .htaccess you can redirect pages from the 'slug' addresses. For example in http://www.fuuu.com/futest-fu , 'futest-fu' is a slug which could be rewritten to an ID number (or anything you would want it to be). Amongst others, Wordpress works this way. Check out this discussion for some insights: http://wordpress.org/support/topic/where-are-the-permalinks-slug-stored-in-the-database
I was wondering if their was any sort of way to detect a pages genre/category.
Possibly their is a way to find keywords or something?
Unfortunately I don't have any idea so far, so I don't have any code to show you.
But if anybody has any ideas at all, let me know.
Thanks!
EDIT #Nican
Perhaps their is a way to set, let's say 10 category's (Entertainment, Funny, Tech).
Then creating keywords for these category's (Funny = Laughter, Funny, Joke etc).
Then searching through a webpage (maybe using a cUrl) for these keywords and assigning it to the right category.
Hope that makes sense.
What you are talking about is basically what Google Adsense and similar services do, and it's based on analyzing the content of a page and matching it to topics. Generally, this kind of stuff is beyond what you would call simple programming / development and would require significant resources to be invested to get it to work "right".
A basic system might work along the following lines:
Get page content
Get X most commonly used words (omitting stuff like "and" "or" etc.)
Get words used in headings
Assign weights to different words according to a set of factors (is used in heading, is used in more than one paragraph, is used in link anchors)
Match the filtered words against a database of words related to a specific "category"
If cumulative score > treshold, classify site as belonging to category
Rinse and repeat
Folksonomy may be a way of accomplishing what you're looking for:
http://en.wikipedia.org/wiki/Folksonomy
For instance, in Drupal they have a Folksonomy module:
http://drupal.org/node/19697 (Note this module appears to be dead, see http://drupal.org/taxonomy/term/71)
Couple that with a tag cloud generator, and you may get somewhere:
http://drupal.org/project/searchcloud
Plus, a little more complexity may be able to derive mapped relationships to other terms, especially if you control the structure of the tagging options.
http://intranetblog.blogware.com/blog/_archives/2008/5/22/3707044.html
EDIT
In general, the type of system you're trying to build relies on unique word values on a page. So you would need to...
Get unique word values from your content (index values or create a bot to crawl your site)
Remove all words and symbols you can't use (at, the, or, and, etc...)
Count the number of times the unique words appear on the page
Add them to some type of datastore so you can call them based on the relationships you're mapping
If you have a root label system in place, associate those values with the word counts on the page (such as a query or derived table)
This is very general, and there are a number of ways this can be implemented/interpreted. Folksonomies are meant to "crowdsource" much of the effort for you, in a "natural way", as long as you have a user base that will contribute.
Within my application UI want to avoid id numbers within the urls if possible so the best way to do this would be to create a a unique version of the title that's valid for url schemas.
SO do a something the same but as the you allow duplicate questions they have the id within the URI!
http://stackoverflow.com/questions/3637971/how-to-edit-onchange-attribute-in-a-select-tag-using-jquery
Wordpress have implemented such features as well
my question is:
What's the best way to accomplish this, sticking to the URI RFC as well as keeping search engines happy.
The Drupal Path/Pathauto modules do this, so I'd check that implementation. For a quick hit, if there are titles that reduce to duplicates:
CaseySoftware is awesome
CaseySoftware is awesome!
They would become:
caseysoftware-is-awesome
caseysoftware-is-awesome-0
You will definitely need to scrub out punctuation, but you may want to do the same to common articles like "a, the, is".
To keep search engine happy
You should use this in your head :
<link rel="canonical" href="http://yoursite.com/page/uniqueTitle/"/>
This will tell search engine that all page that have that specific canonical name are the same.
For example, this page has the following line :
<link rel="canonical" href="http://stackoverflow.com/questions/3637990/foolproof-unique-title-for-urls">
If you change the title, that value will stay the same. This is how search engine really know it's all the same page.
How to generate
As for how those URL are generated, you should stick to the lower case alphanumeric characters ([a-z0-9]) and replace space with "-".
"Friendly URLs — Possibly all of what makes a good URL structure" is a nice article about that topic, and it includes a short example implementation in Python.
To make the URLs really unique without having to use a numeric ID everywhere, I'd try to generate my new URL, see if it already exists (shouldn't occur very often), and only if it does, append a short sequence number at the end.
So, here's an example on Forrst, a CodeIgniter website:
http://forrst.com/posts/PHP_Nano_Framework_WIP_Just_throwing_some_ideas_o-mU8
Look at that nice URL. You've got the root site, then posts, then the post title and a short extract. This is pretty cool for user experience.
However, my CodeIgniter site's URLs just plain suck. E.G.
http://mysite.com/code/view/120
So it accesses the controller code, then the function view, then the 20 on the end is the Post ID (and it does the database queries based on that).
I realised I could do some routing. So in my routes.php file, I put the following in:
$route['posts/(:num)'] = "code/view/$1"; - so this will make http://mysite.com/posts/120 be the same as http://mysite.com/code/view/120. A bit nicer, I think you'll agree.
My question is - how can I use a similar technique to Forrst, whereby an extract of the post is actually appended to the URL? I can't really see how this would be possible. How can the PHP script determine what it should look up in the database, especially if there are several things with the same title?
Thanks!
Jack
To get a URL like in your example you need to add a routing rule, like you've already done $route['posts/(:num)'] = "code/view/$1";. Forrst's url seems to be "mapped" (or something like that), I think the last part of the uri is the identifier (o-mU8 seems like a hash, but I prefer an int id) which is stored in the db, so if he queries, he splits the uri by the ndashes (_), and gets the last part of it, like this within your controller action:
$elements = explode('_',$this-uri-segment(2));
$identifier = $elements[count($elements)-1];
$results = $this->myModel->myQuery($identifier);
Basically the string between the controller/ and the identifier is totally useless, but not if your goal is a better SEO.
I hope this helps
See the official dicussions. The term that is often related to this is "slug". Haven't tested the approach from the CI forums myself, but the suggestions and examples look pretty good.
The URL helper in codeigniter has a function call url_title(). I haven't used it myself but I think it's what you are looking for.
I'm writing an app that allows you to filter database results based on Location and Category.
If someone was to search for Liverpool under the Golf category the URI would be /index.php/search/Liverpool/Golf.
Should someone want to search by Location but not category, they would be sent to /index.php/search/Liverpool
However, should someone want to filter only by category they would be unable to use /index.php/search/Golf because that would be caught by the location search.
Is there a best practice way to have /index.php/search/Golf be recognised? Some best practice as to what else to add to the URI to make these two queries distinct? /index.php/search/category/Golf perhaps?
Though that is beginning to show characteristics of /index.php?search&category=Golf which is exactly what I'm trying to avoid.
Try using $this->uri->uri_to_assoc(n)
described here http://codeigniter.com/user_guide/libraries/uri.html (half way down on page)
basically you will structure your url like this:
mysite.com/index.php/search/location/liverpool/category/golf
NOTE: the parameters are optional so you dont have to have both in there all the time. you can just as well do
mysite.com/index.php/search/location/liverpool/
and
mysite.com/index.php/search/category/golf
this way it will return FALSE if the element you are looking for does not exist
It would probably be best to keep your URI segments relavent no matter what they are searching for.
index.php/LOCATION/CATEGORY
If they are not interested in a location then pass a filler to the system:
index.php/anywhere/golf
Then in your code you just check for that specific string of ANYWHERE to determine if they only want to see the activity. I assume that you are going to be redirecting them with either links or forums (and that they aren't typing the URI string themselves) so you should be safe in just passing information that you expect and testing against that.
I use the format suggested by Tom above and then do something along the lines of below to determine the value of the parameters.
$segment_array = $this->uri->segment_array();
$is_location_searched = array_search('location', $segment_array);
if($is_location_searched && $this->uri->segment($is_location_searched +1))
{
$location = $this->uri->segment($is_sorted+1);
}
Have a look at http://lucenebook.com/#/p:solr/s:wiki and click around a bit on the left-hand navigation. Pay close attention to what happens in the url when you do. I really like this scheme for many reasons.
It's SEO-friendly.
"Curious" people can mix/match the urls and it still resolves to a proper search.
It just looks good!
Of course, the trick is really in the code, in how you build the thing. It took me a few weeks to sort it out, but I finally have my own version of that site. Just not ajax based, because I like search engines better than ajax. Ajax don't pay the bills.