How to write parameters in url? - php

Concerning search engine optimization I wonder what the best practice is to write parameters in a url. Should I place parameter names? Does an id value has a negative effect on search engines?
Here are some options that come to mind:
/project/pid/171/name/my_funny_name
/project/171/my_funny_name
/project/my_funny_name

Good rule is less params is better. If you need numerical id, 2nd option is quite good, if you force my_funny_name part to be unique, you may rely only on this as id. However keep in mind, that if you change name, url will be broken.
Also remember to avoid double names for same content, like /project/171/my_funny_name and /project/171/my_old_name. Try to use <link rel="canonical" href="http://example.com/project/171/my_funny_name">

In the url, you want AT MINIMUM, the keywords that people will be searching for to find the page in question. The id's, in your case, should not have a negative affect.
Google SEO

Having an id value in your url won't have a negative effect, but you should definitely add words before or after your id for SEO.
What I personnaly like is trying to have a "logical" structure in the url, with id at the beginning, like this :
/123_my-category/456_my-great-stuff
Concerning the underscore, you should rather use "-", as it is considered as the word separator while _ is more for lisibility, but it's like having the words tied to each other.

Related

PHP: Get specific links with preg_match_all()

i want to extract specific links from a website.
The links look like that:
<a href="1494761,offer-mercedes-used.html">
The links are always the same - except the brandname (mercedes in this case).
This works fine so far but only delivers the first part of the link:
preg_match_all('/((\d{7}),offer-)/s',$inhalt,$results);
And this delivers the first link with the whole website :(
preg_match_all('/((\d{7}).*html)/s',$inhalt,$results);
Any ideas?
Note that i use preg_match_all() and not preg_match().
Thanks,
Chama
While .*? would do (= less greedy), in both cases you should specify a more precise pattern.
Here [\w.-]+ would do. But [^">]+ might also be feasible, if the HTML source is consistent (or you specifically wish to ignore other variations).
preg_match_all('/((\d{7}),offer-[\w.-])/s',$inhalt,$results);
Trying to parse xml/html with regex generally isn't a good idea, but if you're sure it will always be formatted well, this should return any links in the content.
/<a href="([^">]+)">/
This will more closely match only the example pattern you gave, but not sure what variations you might have
/<a href="([0-9]{7},offer-[a-z]+-used\.html)">/
// [7 numbers],offer-[at least one letter]-used.html

How to robustly check Wikipedia pages via API using search terms of different casing

I have a website which allows users to submit photos of wildlife. Once uploaded, they can identify the specie on the photo, for example "Polar bear".
This triggers me to get information from Wikipedia about that specie, using that search term:
$query = "http://en.wikipedia.org/w/api.php?action=query&rvprop=content&format=json&titles=" . $query;
$pages = file_get_contents($query);
Such a query returns one of the following:
An array of pageids, which I can then query for that page's content
Nothing, because there simply isn't any match
a REDIRECT result, which allows me to resolve the page with the proper name
The problem I have has to do with casing. For example, the search term "Milky stork", returns nothing, not even a redirect. "Milky Stork" does work. Uppercasing each word in the query is not a solution either, as it could be that some pages are in lowercase, whereas the uppercase query does not work. There's no consistency.
I'm looking for a way to make this more robust. It shouldn't be that a query fails because of wrong casing, which cannot even be predicted on the user's side.
Does anyone know of a solution for this? Other than trying every possible combination of casings?
Note: Some may suggest to use dbpedia instead, but this is no solution for my total needs.
Unfortunatelly, there is no easy solution - read http://www.mediawiki.org/wiki/API:Opensearch#Note_on_case_sensitivity
You can try instead use opensearch to find appropriate casing (if normal query returns nothing usable):
http://en.wikipedia.org/w/api.php?action=opensearch&search=milky+stork&namespace=0&suggest=
will give you
["milky stork",["Milky Stork"]]
I think trying every possible combination is a viable solution. So, your query might look like:
http://en.wikipedia.org/w/api.php?action=query&rvprop=content&format=json&titles=Milky stork|Milky Stork
Note that the first letter is not case-sensitive on Wikipedia.

Foolproof Unique title for Urls

Within my application UI want to avoid id numbers within the urls if possible so the best way to do this would be to create a a unique version of the title that's valid for url schemas.
SO do a something the same but as the you allow duplicate questions they have the id within the URI!
http://stackoverflow.com/questions/3637971/how-to-edit-onchange-attribute-in-a-select-tag-using-jquery
Wordpress have implemented such features as well
my question is:
What's the best way to accomplish this, sticking to the URI RFC as well as keeping search engines happy.
The Drupal Path/Pathauto modules do this, so I'd check that implementation. For a quick hit, if there are titles that reduce to duplicates:
CaseySoftware is awesome
CaseySoftware is awesome!
They would become:
caseysoftware-is-awesome
caseysoftware-is-awesome-0
You will definitely need to scrub out punctuation, but you may want to do the same to common articles like "a, the, is".
To keep search engine happy
You should use this in your head :
<link rel="canonical" href="http://yoursite.com/page/uniqueTitle/"/>
This will tell search engine that all page that have that specific canonical name are the same.
For example, this page has the following line :
<link rel="canonical" href="http://stackoverflow.com/questions/3637990/foolproof-unique-title-for-urls">
If you change the title, that value will stay the same. This is how search engine really know it's all the same page.
How to generate
As for how those URL are generated, you should stick to the lower case alphanumeric characters ([a-z0-9]) and replace space with "-".
"Friendly URLs — Possibly all of what makes a good URL structure" is a nice article about that topic, and it includes a short example implementation in Python.
To make the URLs really unique without having to use a numeric ID everywhere, I'd try to generate my new URL, see if it already exists (shouldn't occur very often), and only if it does, append a short sequence number at the end.

How can I have Code Igniter URI segments for multiple variables?

I'm writing an app that allows you to filter database results based on Location and Category.
If someone was to search for Liverpool under the Golf category the URI would be /index.php/search/Liverpool/Golf.
Should someone want to search by Location but not category, they would be sent to /index.php/search/Liverpool
However, should someone want to filter only by category they would be unable to use /index.php/search/Golf because that would be caught by the location search.
Is there a best practice way to have /index.php/search/Golf be recognised? Some best practice as to what else to add to the URI to make these two queries distinct? /index.php/search/category/Golf perhaps?
Though that is beginning to show characteristics of /index.php?search&category=Golf which is exactly what I'm trying to avoid.
Try using $this->uri->uri_to_assoc(n)
described here http://codeigniter.com/user_guide/libraries/uri.html (half way down on page)
basically you will structure your url like this:
mysite.com/index.php/search/location/liverpool/category/golf
NOTE: the parameters are optional so you dont have to have both in there all the time. you can just as well do
mysite.com/index.php/search/location/liverpool/
and
mysite.com/index.php/search/category/golf
this way it will return FALSE if the element you are looking for does not exist
It would probably be best to keep your URI segments relavent no matter what they are searching for.
index.php/LOCATION/CATEGORY
If they are not interested in a location then pass a filler to the system:
index.php/anywhere/golf
Then in your code you just check for that specific string of ANYWHERE to determine if they only want to see the activity. I assume that you are going to be redirecting them with either links or forums (and that they aren't typing the URI string themselves) so you should be safe in just passing information that you expect and testing against that.
I use the format suggested by Tom above and then do something along the lines of below to determine the value of the parameters.
$segment_array = $this->uri->segment_array();
$is_location_searched = array_search('location', $segment_array);
if($is_location_searched && $this->uri->segment($is_location_searched +1))
{
$location = $this->uri->segment($is_sorted+1);
}
Have a look at http://lucenebook.com/#/p:solr/s:wiki and click around a bit on the left-hand navigation. Pay close attention to what happens in the url when you do. I really like this scheme for many reasons.
It's SEO-friendly.
"Curious" people can mix/match the urls and it still resolves to a proper search.
It just looks good!
Of course, the trick is really in the code, in how you build the thing. It took me a few weeks to sort it out, but I finally have my own version of that site. Just not ajax based, because I like search engines better than ajax. Ajax don't pay the bills.

Using SEO-friendly links

I'm developing a PHP website, and currently my links are in a facebook-ish style, like so
me.com/profile.php?id=123
I'm thinking of moving to something more friendly to crawling search engines
(like here at stackoverflow), something like:
me.com/john-adams
But how can I differentiate from two users with the same name - or more correctly, how does stackoverflow tell the difference from two questions with the same title?
I was thinking of doing something like
me.com/john-adams-123
and parsing the url.
Any other recommendations?
Stackoverflow does something similar to your me.com/john-adams-123 option, except more like me.com/123/john-adams where the john-adams part actually has no programmatic meaning. The way you're proposing is slightly better because the semantic-content-free numeric ID is farther to the right in the URL.
What I would do is store a unique slug (these SEO-friendly URL components are generally called slugs) in the user table and do the number append thing when necessary to get a unique one.
In stack overflow's case, it's
http://stackoverflow.com/questions/975240/using-seo-friendly-links
http://stackoverflow.com/questions <- Constant prefix
/975240 <- Unique question id
using-seo-friendly-links <- Any text at all, defaults to title of question.
Facebook, on the other hand, has decided to just make everyone pick a unique ID. Then they are going to use that as a profile page. Something like http://facebook.com/p/username/. They are solving the problem of uniqueness between users, by just requiring it to be some string that the user picks that is unique among all existing users.
SO 'cheats' :-).
The link for your question is "Using SEO-friendly links" but "Using SEO-friendly links" also works.
The part after the number is the SEO friendly bit, but SO doesn't really care what's there. I think it defaults to the question title.
So in your case you could construct a link like:
me.com/123/john-adams
a second john adams would have a different Id and a unique URL like :
me.com/111/john-adams
I would say that your proposed solution is a better solution to that of stackoverflows as it preserves content hierarchy:
me.com/john-adams-123
Usage of the unique ID before the username is simply nonsensical.
I would, however, recommend enforcement of content type:
me.com/john-adams-123.html
This will allow for consistent urls while serving a variety of content types.
Additionally, you could make use of sexatrigesimal for the unique id, to further reduce the amount of unnecessary cruft in your URL, especially for high end numbers, but this is often overkill :D
me.com/john-adams-123.html -> me.com/john-adams-3F.html
me.com/john-adams-1234567890.html -> me.com/john-adams-KF12OI.html
Finally, be sure to utilize 301 redirects on non-conforming accessible URIs to redirect to the "correct" seo-friendly schema to prevent duplicate content penalties.
I'd go with your style of me.com/john-adams-123, because I think the leftmost part of the URI has more importance in SEO ranking.
Actually, if you are willing to use this on several controllers (not just user profile), you may want to do it more like me.com/john-adams-profile-123 with a rewriting rule redirecting /.+-profile-(\d+) to profile.php?uid=$1 and still be able to use, say, me.com/john-adams-articles-123 for this user's articles...
To avoid dealing with the links contain special characters, you can use this plugin for Zend Framework.
https://github.com/btlagutoli/CharConvert
$filter2 = new Zag_Filter_CharConvert(array(
'onlyAlnum' => true,
'replaceWhiteSpace' => '-'
));
echo $filter2->filter('éééé ááááá ? 90 :');//eeee-aaaaa-90
this can help you deal with strings in other languages

Categories