Parsing URLs using PHP - php

I have posted a similar question here. However, this was more about getting advice on what to do. Now that I know what to do, I am looking for a little help on how to do it!
Basically I have a website that is pretty much 100% dynamic. All the website links are generated using PHP and all the pages are made up of php includes/code. I am trying to improve the SEO of the site by improving the URLs (as stated in the other question) and I am struggling a little.
I am using mod_rewrite of rewriting the nice urls to the ugly urls on the server. So what I need is to now convert the ugly urls (which are generated from the php code in the pages) to the nicer urls.
Here are the URLs I need to parse (these are in the other question aswell):
/index.php?m=ModuleType
/index.php?m=ModuleType&categoryID=id
/index.php?m=ModuleType&categoryID=id&productID=id
/index.php?page=PageType
/index.php?page=PageType&detail=yes
Here is what I want the above URLs to be parsed to:
/ModuleType
/ModuleType/CategoryName
/ModuleType/CategoryName/ProductName
/PageType
/PageType/Detail
There is an example on the other question posted by Gumbo however I felt it was a bit messy and unclear on exactly what it was doing.
Could someone help me solve this problem?
Thanks in advance.

I think I see what you're after... You've done all the URL rewriting, but all the links between your pages are using the old URL syntax.
The only way I can see around this is to do some kind of regex search and replace on the links so they use the new syntax. This will be a bit more complicated if all the links are dynamically generated, but hopefully there won't be too much of this to do.
Without seeing how your links are generated at the moment, it's difficult to say how to change the code. I imagine it works something like this though:
<?php echo "<a href='/index.php?m=$ModuleType&categoryID=$id'>"; ?>
So you'd change this to:
<?php echo "<a href='$ModuleType/$id'>"; ?>
Sorry if I've made errors in the syntax, just off the top of my head...

Unless I misunderstood your question, you don't parse the "ugly" URLs, your PHP script is called with them, so you $_GET[] your parameters (m, categoryID, productID) and you combine them to make your nice URLs, which shouldn't be too hard (just a bit of logic to see if one parameter is there and concatenate the strings).

You will need a front controller, which will dispatch the URL to the correct page.
Apache will rewrite the URL using rules in .htaccess, so that anything written will be redirected to index.php?q=. For example, typing http://example.com/i/am/here will result in a call to index.php?q=/i/am/here
Index.php will parse the path from $_GET["q"] and decide what to do. For example, it may include a page, or go to the database, look the path up, get the appropriate content and print it out
If you want a working example of a .htaccess which will do exactly that (redirect to index.php with ?q=path) take a look at how drupal does it:
http://cvs.drupal.org/viewvc.py/drupal/drupal/.htaccess?revision=1.104

As Palantir wrote this is done using mod_rewrite and .htaccess. To get the correct rewrite conditions into your .htaccess you might want to take a look at a Mod Rewrite Generator (e.g. http://www.generateit.net/mod-rewrite/). Makes it a lot easier.

Related

Purpose of replacing .php with .html extension through .htaccess file

I was reading a little about .htaccess file and I found that it's possible to change .php in the url to .html
But I do not understand what the point of doing it is or what it tries to achieve.
Please note that i'm a beginner with .htaccess also I've searched for that on Google but I didn't found what i'm looking for
There is an ever ongoing battle between seo friendly url's and not. It all depends on the very purpose of your site. If for an example you have a script with dynamic pages like search script or generator of some type (very broad , right ? ;) ) then if your url looks like this:
domain.tld/products?id=183
Rather than
domain.tld/products|183.html
then I don't think there is any better than .php, it's still dynamic and SE will discover that but if you have blog then you might consider a proper wording in url instead of id's. I can think of only one reason apart from what SE' see. What your viewer/surfer sees. I think if he quickly glance at domain.tld/date/rather-simple-title then he might be more interested than in id=183. Now if the url has extension domain.tld/date/rather-simple-title.php or domain.tld/date/rather-simple-title.html in my humble opinion there's no impact on either SE or your website visitor.
Pure theory and rather opinion based conversation here, like the question.

PHP Get method without?

Hey so im working on a website and one part of it allows you to lookup a user based on their name. At the moment i have it using a $_GET request so the link would look like:
http://website.com/p?name=John+Smith
How would i be able to remove that ?name= because i see alot of sites doing things like:
http://website.com/p/John+Smith
how would i achieve this because to my knowladge their arent any other forum request types only Post and Get?
URL rewriting is definitely what you're looking to do. It's well worth playing carefully with it but lots of testing is recommended. With great power comes great responsibility!
Most dynamic sites include variables in their URLs that tell the site what information to show the user. The example you provided is exactly like this.
Unfortunately, a cleaned up URL cannot be easily understood by a server without some work. When a request is made for the clean URL, the server needs to work out how to process it so that it knows what to send back to the user. URL rewriting is the technique used to "translate" a URL like the last one into something the server can understand.
To accomplish this, you need to first create a text document called ".htaccess" to contain the rules. This would be placed in the root directory of the server. To tell the server to rewrite a URL pattern, you need to add the following to the file:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^p/[A-Za-z\+]$ /p/?name=$1 [NC,L] # Rewriting rule here
The NC bit denotes case insensitive URLs and the L indicates this is the last rule that should be applied before attempting to access the final URL.
You can do quite a bit with this one rule, but the specifics extend far beyond the space of my answer here.
https://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/
I would highly suggest reading that thorough guide to help you on your quest!

Custom URL query arguments or not?

I'm redirecting all requests to index.php, which parses the URL and fires the appropriate controller based on it.
Is it a good idea to change the way query arguments appear in the URL?
like http://site.com/somepage/sub-subpage/page=20,offset=100. then parse those arguments and pass them to the controller, because it looks more readable.
Or should I stick with the $_GET thing? like ...http://site.com/somepage/sub-subpage/?page=20&offset=100
Not too good idea because you'll have to implement query parsing yourself. I see no advantages in this way. And if you use standard ?name=val&name=val notation you have:
Automatic parsing and storage to $_GET[]
Possibility to start using POST in no time.
Less possible vulnerabilites in parsing. At least they are known.
Stick with standards and therefore its better you stick with $_GET thing.
YAGNI - You ain't gonna need it. Don't think too much, just do it. Apart from a matter of taste (someone might say "I dislike questions marks in my URLs"), there is a lot of benefit to just use the common format that is just working and for which many parsers/function do exist. Additionally you find documents you can refer to if you're unclear about the format.
Its better if you got nice urls (so called SEO friend URLs) even if you dont care google or its an admin area.
The reason for using nice url is
Its more readable.
You can change the parameter by hand.
When you paste it to email, IM or other media the url makes sense.
Ugly urls makes it difficult to read the actual values. Some times you need it.
When you see the address bar it looks nice and clean and you know where you are.
In ugly urls you dont know where you are. Each url seems a middle of no-where.
Creating clean url with a little help of mod_rewrite is not tough.
Rewrite all URI to your index.php as /index.php/REQUESTED_URL
In the index.php just parse the url and invoke controller.
I think this is only interesting for SEO. Have a look at Googles opinion to this question: http://googlewebmastercentral.blogspot.com/2008/09/dynamic-urls-vs-static-urls.html
So it makes sense when this "pseudo" URL defines a page with really different content (i.e. /user/clara, /user/tom ...) but avoid to put a dynamic variable like session IDs into this static form.

Changing URL with query string

I have been looking and looking around on the web for an answer for my question. But everything is just not the right thing.
So my issue is:
I'm creating my own CMS, and right now I've got the issue with the urls. They aren't really that SEO friendly.
When I create a new page, it gets the URL: index.php?page=(id). That doesn't tell much. So what I would love to create.
Is that I wan't the URL to be something like: www.myurl.com/home instead of the page=id. Is this possible?
I have to mention, that I need the id number later on, for editing the pages. I'm focusing the GET function to be able to edit my pages, and to show 'em one by one.
Thanks. :o)
Try to set your .htaccess file to the following:
RewriteEngine On
RewriteRule ^([^/]*).html$ index.php?page=$1 [L,NS]
this way you can translate what visitors see as yourdomain.com/home.html to what php reads as yourdomain.com/index.php?page=home afterwards you can of course use a translating array containing your id's
$translationArray("home"=>1, "contact"=>2);
$id = $translationArray[ $_GET['page'] ]; // $id now contains 1
What you're looking for is called Semantic URLs. Other keywords that will aid you: .htaccess, mod_rewrite
A full solution is too complicated to expand upon here but the underlying idea is fairly simple.

URL rewriting with 3 variables

I want to do URL rewriting of my webpage. There are 2 sorts of links possible on the same page as follows:
Pagination:
http://www.xxxxx.com/dictionnaire.php?page=4
That I want to look like this:
http://www.xxxxx.com/dictionnaire/p4
Word:
http://www.xxxxx.com/dictionnaire.php?idW=675&word=Resto-basket
That I want to look like this:
http://www.xxxxx.com/dictionnaire/675/Resto-basket
In the .htaccess, I have the following:
RewriteRule ^dictionnaire/p([0-9]+)?$ dictionaire.php?page=$1 [NC,L]
RewriteRule ^dictionnaire/([0-9]+)/([a-z])?$ dictionaire.php?idW=$1&word=$2 [NC,L]
QUESTIONS:
Is this the best google friendly way of doing this? (mostly for the word link, or is there better?)
Can you have 2 rewrite rules for one link? Like above?
Is there an error in my code, is so, please help.
When I created this code, my CSs and images weren't appearing. Can you help me fix it?
I know it's a long question, but I thought it would be easier that way.
Thank for the help.
Is this the best google friendly way of doing this? (mostly for the word link, or is there better?) Well, you could put page4 in the Pagination part, either won't really affect it much. For the Word page, why do you have it find words by IDs instead of the actual word? the word=(WORD) doesn't really seem to do anything at all. Perhaps remove ID entirely and have it search by word so that it could be dictionnaire/word/(WORD) instead.
Can you have 2 rewrite rules for one link? Like above? Yes, it is completely possible to have more than 2 rewrite rules (Think many forums that do this)
Is there an error in my code, is so, please help. I haven't looked hard, but it doesn't appear to be any errors.
When I created this code, my CSs and images weren't appearing. Can you help me fix it? The problem here is it is searching /dictionnaire/p4/css.file.css for your css file. It isn't looking in your root directory like I suppose you want. Put the direct root to your CSS file starting with a / at the beginning.
This should be "Google friendly".
Yes, rewrite rules are applied in order; as soon as one matches and replaces the URL, it's unlikely any further ones will match (since they'll be working on the replaced URL of the previous one).
Looks OK to me. You could make it into one rule though, if you allowed the destination URLs to be a little different (to both use page= rather than idW= on the second one).
That's because the browser will ask for resources relative to the non-rewritten URL (it of course doesn't know about what's going on behind the scenes). You'll have to use absolute URLs for your images and CSS (or alternatively, use ../ in the URLs, or add more rewrite rules for your resources to make them work).
Hope that helps.
EDIT: Sorry, because you have that [L] at the end of your rules, it will stop trying to match any more rules once it matches one. This won't have any practical effect in this case.

Categories