As almost every programmer, I'm writing my own PHP framework for educational purposes. And now I'm looking at the problem with parsing URLs for MVC routing.
Framework will use friendly URLs everywhere. But the question is how to parse them in front controller. For example the link /foo/bar/aaa/bbb may mean "Call the controller's foo action bar and pass parameter aaa with value bbb. But in case someone installs a framework into the subdirectory of the domain root, the directory part should be stripped before determining controller name and action name. And I'm looking for a way to do it safely.
Also I would like to support a fallback case if URL rewriting is not supported on the server.
On different systems different sets of $_SERVER variables are defined. For example, on my local machine from the set of PATH_INFO, REQUEST_URI, REQUEST_URL, ORIG_REQUEST_URI, SCRIPT_NAME, PHP_SELF only REQUEST_URI, SCRIPT_NAME and PHP_SELF are defined. I wonder, if I can rely on them.
Mature frameworks like Symfony or ZF have some compicated algorithms of parsing URLs (at least it seemed to be so). So, I can't just take a part from there for mine.
Two workarounds:
Add config variable with url / instalation directory to your application, and strip it from $_SERVER['REQUEST_URI']
Make apache rewrite it to get variable
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) index.php?myrequest=$1 [QSA,L]
I'm currently doing the same research. But everything I see is so complicated that I'll most probably continue using mod_rewrite anyway. After all you end up with the same thing rather you use SEF with PHP or mod_rewrite with apache. Anyway I'll be monitoring this topic.. it's interesting :)
Hope the php gurus around here have some more info about this :)
Edit:
It really depends on what you want to do. For my needs I hardcoded most of the pages so they looked SEF. But something like the example below should work as well.
RewriteEngine on
RewriteRule ^/posts/([A-Za-z0-9_\-]+)/([A-Za-z0-9_\-]+)\.html$ posts.php?$1=$2 [NC]
With this example above:
http://localhost/posts/view/23
http://localhost/posts/delete/23
is equal to:
http://localhost/posts.php?view=23
http://localhost/posts.php?delete=23
It really depends on what exactly you're doing :)
The example above should be working but I haven't tested them.
I usually use the following for determining an application base URL path, assuming all your requests always goes through the same gateway script:
$base = dirname($_SERVER['PHP_SELF']);
For your second question, if you want to check if mod_rewrite is enabled, you can use:
if (in_array('mod_rewrite', apache_get_modules())) {
// rewrite is enabled
}
However, it doesn't necessarily means that RewriteEngine is enabled, so you probably should use an extra condition:
if (in_array('mod_rewrite', apache_get_modules()) &&
preg_match('/RewriteEngine +On/i', file_get_contents('/path/to/.htaccess'))) {
// rewrite is enabled and active
}
Maybe you could take PHP_SELF and remove the first n chars where n is the length of SCRIPT_NAME.
Edit: Oops... seems like you can just take PHP_SELF: http://php.about.com/od/learnphp/qt/_SERVER_PHP.htm
Related
When you edit a question on stackoverflow.com, you will be redirected to a URL like this:
https://stackoverflow.com/posts/1807421/edit
But usually, it should be
https://stackoverflow.com/posts/1807491/edit.php
or
https://stackoverflow.com/posts/edit.php?id=1807491
How was
https://stackoverflow.com/posts/1807421/edit
created?
I know that Stackoverflow.com was not created by using PHP, but I am wondering how to achieve this in PHP?
With apache and PHP, you might perform one of your examples using a mod_rewrite rule in your apache config as follows:
RewriteEngine On
RewriteRule ^/posts/(\d+)/edit /posts/edit.php?id=$1
This looks for URLs of the "clean" form, and then rewrites them so that they are internally redirected to a particular PHP script.
Quite often rules like this are used to route all requests into a common controller script, which might do something like instantiate a "PostsController" class and ask it to handle an edit request. This is a common feature of most PHP application frameworks.
It's called routing. Take a look at tutorials on the subject.
If you use a framework such as cake php it should be built in.
As #mr-euro stated you can use mod_rewrite but front controller is a far better solution.
You force every request to index.php and you write your application controlling in index.php.
You use Apache's .htaccess/mod_rewrite, and optionally a PHP file, which is the approach I like to take myself.
For the .htaccess, something like this:
RewriteEngine On
RewriteRule ^(.*)$ index.php
Then in your PHP file, you can do something like this:
The following should get everything after the first slash.
$url = $_SERVER['REQUEST_URI'];
You can then use explode to turn it into an array.
$split = explode('/', $url);
Now you can use the array to determine what to load:
if ($split[1] == 'home')
{
// display homepage
}
The array is starting from 1 since 0 will usually be empty.
It's indeed done by mod_rewrite, or with multiviews. But i prefer mod_rewrite.
First: you create a .htaccessfile with these contents:
RewriteEngine On
RewriteRule ^posts/([0-9])/(edit|delete)$ /index.php?page=posts&postId=$1&action=$2
Obvious, mod_rewrite must be enabled by your hostingprovider ;)
Using mod_rewrite this can be achieved very easily.
I am poor at this but i do know you can redirect urls using apache mod_rewrite and by touching config files. From what i remember htaccess can be used to redirect. Then internally when the user hits
http://stackoverflow.com/posts/1807421/edit it can use your page http://stackoverflow.com/edit.php?p=1807421 instead or whatever you want.
You could use htaccess + write an URI parser class.
I've always been bad at apache and used very simple solutions. Right now I have built a cms software.. but the .htaccess is starting to be a huge downsize.
I will first explain, how my friendly-urls work and look like. My language-switch is url based and always contains two characters. And it looks like this: stackoverflow.com/en/ this makes the switching really easy and since its url based.. it works well in the SEO terms. Also, if no language-id is set, then the default language will be used (stackoverflow.com/).
There are no page-ids in numbers. I have unique page-ids in text: stackoverflow.com/services.html and for SEO and folder-directories-anti-conflict purposes .html at the end..
For subpages I have "$current_page" and "$parent_page" style variables: stackoverflow.com/services/translating.html Services being the parent and translating being the current page.
Some sample code too (I nerfed it alot, so you don't think its incomplete):
RewriteRule ^(et|en|fi)\/(.+)\/(.+)\.html index.php?language=$1&pagelink=$3&parentlink=$2 [L,NC,QSA]
RewriteRule ^(.+)\/(.+)\.html index.php?language=0&pagelink=$2&parentlink=$1 [L,NC,QSA]
RewriteRule ^(.+)\.html index.php?language=0&pagelink=$1&parentlink=0 [L,NC,QSA]
How can I make the language-switch part more dynamic?
This method ..^(et|en|fi)\/.. means, that when I set up the cms, I must manually set the languages list. Best bet would be to set it somehow from the cms settings. Because, this way there are no conflicts related to folders. Is it possible global apace variable via php and then display it the .htaccess file? Something like this: ..^(LANGUAGELISTS)\/..? If this isn't possible, then next best thing would be to match 2 characters in that location and pass it as $_GET['language'].
How can I have unlimited parents dynamically?
Meaning, that the "$parent_page" is not set statically and I have unlimited children, similar to this: stackoverflow.com/services/translating/english/somesubpage.html. If that is possible, then also, how will it be used in the php, with an array?
Bounty edit
First part of the question is basically solved, unless somebody comes up with some php -> apache-array -> .htaccess way.
However, the second part of the question is still not solved. Since this is been the problem with all my projects and could possibly help somebody else in the future, I decided to add bounty to this question.
To answer your first question:
You could use RewriteRule ^([a-zA-Z]{2})([/]?)(.*)$ path/file.php?language=$1
This limits the first string to two characters and passes it on to $_GET['language']
Edit: adding RewriteCond %{REQUEST_FILENAME} !-f
and RewriteCond %{REQUEST_FILENAME} !-d will prevent conflicts with existing directories / files
Second question is much more difficult..
Update:
What Shad and toopay say is a good start in my opinion.
Using explode() to seperate levels and comparing it to the slug is quite simple.
But it's getting complicated once you want to add flexibility to the script.
function get_URL_items() {
$get_URL_items_url = $_SERVER['REQUEST_URI'];
$get_URL_items_vars = explode("/",$get_URL_items_url);
for ($get_URL_items_i = 0; $get_URL_items_i < count($get_URL_items_vars) ; $get_URL_items_i++) {
if(strlen(trim($get_URL_items_vars[$get_URL_items_i])) == 0) {
unset($get_URL_items_vars[$get_URL_items_i]);
}
}
return $get_URL_items_vars;
Let's say you you've got a website with a sub-section called "Festival" and a database filled with info for 100+ artist and you want your URLs to look like website.com/festival/<artistgenre>/<artistname>/.
You don't want to create 100+ pages in your CMS so <artistgenre> and <artistname> are some kind of wildcards.
I found it hard to achieve this without a lot of if/else statements like:
$item = get_URL_items();
if(is_user($item[2]) && is_genre($item[1]) && is_festival($item[0])) {
// do mysql stuff here
}
If I were you, I would use something like this:
.htaccess:
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !/main.php$
RewriteRule ^([a-zA-Z]{2})?(.*)$ main.php?lang=$1&path=$2 [L,QSA]
main.php:
$langs = array('en','de','ru'); // list of supported languages
$currentLang = isset($_GET['lang'])&&in_array($_GET['lang']) ? $_GET['lang'] : $defaultLang; // current selected language
$path = $_GET['path']; // current path
Then, in main.php you may parser path according to your needs
In answer to your bounty question I would use this:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([A-Z]{2}\/)*(([A-Z]+\/)*)([A-Z]+)\.html$ index.php?lang=$1&parents=$2&pagelink=$4 [NC,QSA,L]
Since you want to be able to handle any number of generations/levels in your URL, have you thought about how you want to catch them in you PHP script?
You definitely don't want to be going and checking isset($_GET['parent1']);isset($_GET['parent2']) etc etc etc.
As some of the other responses have indicated, you really need to be parsing your URL inside your PHP script; to that end, my RewriteRule hands off the entire 'parents' section of the URL, which your script can then explode and parse; but doesn't interfere with normal no-parent urls. =)
I somehow think this answer won't be very popular but here goes anyway. :)
mod_rewrite reaches a point where using it the old fashioned way with regular expressions becomes annoying. I suggest you skip all the pain and swap to using an external program/script to do your rewrites. I wouldn't suggest you do rewrites on all files using this method, but instead just for the urls that most users will see and type. As long as you know how to write efficient code you can even redirect to a php script to do the rewrites (as I have done in the past on a very high traffic site) and it will not have a noticeable effect on load times. If you ever reach a point where the rewrites are the main thing slowing down your site you can then switch it out for a program written in a quicker language, however I'd be surprised if you reach that.
Some things be aware of:
You need to set a rewrite lock directive or you will get lots of crazy output.
Remember that the rewrite script is a command line PHP script. It has no knowledge of things such as the $_SERVER global. This is surprisingly easy to forget.
This script is loaded at server start so any changes to it require a server restart before they take effect.
Always test this on the command line by passing a url and checking the output before restarting the server. If your script is broken restarting the server will result in anything from non functioning rewrites to the server not starting at all.
It a bit more hassle in the beginning, but once you have set this up you will find adding new rewrite rules to be an absolute breeze and a hell of a lot more flexible.
Here is the only tutorial I was able to find on how to do this using PHP...
Using MySQL to control mod_rewrite via PHP
This is far from the standard way of doing rewrites so I imagine I'm going to cop a lot of flack for this answer. Oh well. :)
Well, for SEO part, i think its better to have slug for each article (referencing you are use this for CMS). Means in your database, you have some "translation" table which translate the requesting uri/slug and associated it with $parent_page.
When you edit a question on stackoverflow.com, you will be redirected to a URL like this:
https://stackoverflow.com/posts/1807421/edit
But usually, it should be
https://stackoverflow.com/posts/1807491/edit.php
or
https://stackoverflow.com/posts/edit.php?id=1807491
How was
https://stackoverflow.com/posts/1807421/edit
created?
I know that Stackoverflow.com was not created by using PHP, but I am wondering how to achieve this in PHP?
With apache and PHP, you might perform one of your examples using a mod_rewrite rule in your apache config as follows:
RewriteEngine On
RewriteRule ^/posts/(\d+)/edit /posts/edit.php?id=$1
This looks for URLs of the "clean" form, and then rewrites them so that they are internally redirected to a particular PHP script.
Quite often rules like this are used to route all requests into a common controller script, which might do something like instantiate a "PostsController" class and ask it to handle an edit request. This is a common feature of most PHP application frameworks.
It's called routing. Take a look at tutorials on the subject.
If you use a framework such as cake php it should be built in.
As #mr-euro stated you can use mod_rewrite but front controller is a far better solution.
You force every request to index.php and you write your application controlling in index.php.
You use Apache's .htaccess/mod_rewrite, and optionally a PHP file, which is the approach I like to take myself.
For the .htaccess, something like this:
RewriteEngine On
RewriteRule ^(.*)$ index.php
Then in your PHP file, you can do something like this:
The following should get everything after the first slash.
$url = $_SERVER['REQUEST_URI'];
You can then use explode to turn it into an array.
$split = explode('/', $url);
Now you can use the array to determine what to load:
if ($split[1] == 'home')
{
// display homepage
}
The array is starting from 1 since 0 will usually be empty.
It's indeed done by mod_rewrite, or with multiviews. But i prefer mod_rewrite.
First: you create a .htaccessfile with these contents:
RewriteEngine On
RewriteRule ^posts/([0-9])/(edit|delete)$ /index.php?page=posts&postId=$1&action=$2
Obvious, mod_rewrite must be enabled by your hostingprovider ;)
Using mod_rewrite this can be achieved very easily.
I am poor at this but i do know you can redirect urls using apache mod_rewrite and by touching config files. From what i remember htaccess can be used to redirect. Then internally when the user hits
http://stackoverflow.com/posts/1807421/edit it can use your page http://stackoverflow.com/edit.php?p=1807421 instead or whatever you want.
You could use htaccess + write an URI parser class.
My current code is something like this
store.php?storeid=12&page=3
and I'm looking to translate it to something like this
mysite.com/roberts-clothing-store/store/12/3
and something like this:
profile.php?userid=19
to
mysite.com/robert-ashcroft/user/19
I understand that it's best to have the SEO-friendly text as far left as possible, ie not
mysite.com/user/19/robert-ashcroft
(what stackoverflow does)
I can't figure out how to do this in apache's mod_rewrite.
Actually, you may have to think "upside-down" with mod_rewrite.
The easiest way is that to make your PHP emit the rewritten mysite.com/roberts-clothing-store/store/12/3 links.
mod_rewrite will then proxy the request to one PHP page for rewrite.php?path=roberts-clothing-store/store/12/3 that will decode the URL and sets the arguments (here storeid and page)
and dynamically include the correct PHP file, or just emit 301 for renamed pages.
A pure solution with mod_rewrite is possible, but this one is much easier to get right, especially when you don't master mod_rewrite.
The main prob could be with the overhead that might be significant but is the price of simplicity & flexibility. mod_rewrite is much faster
Update:
The other posts do answer the question, but they don't solve the typical duplicate-content problem that avoided by having canonical urls (and using 301/404 for all those URLs that seems ok, but aren't).
Try these rules:
RewriteRule ^[^/]+/store/([0-9]+)/([0-9]+)$ store.php?storeid=$1&page=$2
RewriteRule ^[^/]+/user/([0-9]+)/ profile.php?userid=$1
But I wouldn’t use such URLs. They don’t make sense when you think of the URL path as a hierarchy and the path segments as their levels.
RewriteRule ^roberts-clothing-store/store/([^.]+)/([^.]+)$ store.php?id=$1&page=$2
RewriteRule ^robert-ashcroft/user/([^.]+)$ profile.php?userid=$1
Then you can just use RewriteRule directive in a .htaccess like:
RewriteRule roberts-clothing-store/store/(\d+)/(\d+)$ store.php?storeid=$1&page=$2 [L]
See http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html for help, or google.
My approach is to make the .htaccess as easy as possible and to do all the hard work in PHP:
RewriteRule ^(.*?)$ index.php?$1
This basically means to take everything and reroute it to my index.php file (in css/javascript/image directories I simply use "RewriteEngine off" to grand access to these files). In PHP I than just split("/", $param, 5) the string and run a foreach() to check all the parameters. Encapsulated in a nice function this works fine for me.
Update:
For this easy case I highly recommend the use of explode() instead of using split(), because explode() doesn't come with the overhead by using regular expressions.
What's the best way to implement a URL interpreter / dispatcher, such as found in Django and RoR, in PHP?
It should be able to interpret a query string as follows:
/users/show/4 maps to
area = Users
action = show
Id = 4
/contents/list/20/10 maps to
area = Contents
action = list
Start = 20
Count = 10
/toggle/projects/10/active maps to
action = toggle
area = Projects
id = 10
field = active
Where the query string can be a specified GET / POST variable, or a string passed to the interpreter.
Edit: I'd prefer an implementation that does not use mod_rewrite.
Edit: This question is not about clean urls, but about interpreting a URL. Drupal uses mod_rewrite to redirect requests such as http://host/node/5 to http://host/?q=node/5. It then interprets the value of $_REQUEST['q']. I'm interested in the interpreting part.
If appropriate, you can use one that already exists in an MVC framework.
Check out projects such as -- in no particular order -- Zend Framework, CakePHP, Symfony, Code Ignitor, Kohana, Solar and Akelos.
have a look at the cakephp implementation as an example:
https://trac.cakephp.org/browser/trunk/cake/1.2.x.x/cake/dispatcher.php
https://trac.cakephp.org/browser/trunk/cake/1.2.x.x/cake/libs/router.php
You could also do something with mod_rewrite:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ $1 [L]
RewriteRule ^([a-z]{2})/(.*)$ $2?lang=$1 [QSA,L]
RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]
</IfModule>
This would catch urls like /en/foo /de/foo and pass them to index.php with GET parameters 'lang' amd 'url'. Something similar can be done for 'projects', 'actions' etc
Why specifically would you prefer not to use mod_rewrite? RoR uses mod_rewrite. I'm not sure how Django does this, but mod_php defaults to mapping URLs to files, so unless you create a system that writes a separate PHPfile for every possible URL (a maintenance nightmare), you'll need to use mod_rewrite for clean URLs.
What you are describing in your question should actually be the URL mapper part. For that, you could use a PEAR package called Net_URL_Mapper. For some information on how to use that class, have a look at this unit test.
The way that I do this is very simple.
I use wordpress' .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
What this .htaccess does is when something returns a 404, it sends the user to index.php.
In the above, /index.php is the "interpreter" for the URL.
In index.php, I have something along the lines of:
$req = $_SERVER['REQUEST_URI'];
$req = explode("/",$req);
The second line splits up the URL into sections based on "/". You can have
$area = $req['0'];
$action= $req['1'];
$id = $req['2'];
What I end up doing is:
function get_page($offset) {//offset is the chunk of URL we want to look at
$req = $_SERVER['REQUEST_URI'];
$req = explode("/",$req);
$page = $req[$offset];
return $page;
}
$area = get_page(0);
$action = get_page(1);
$id = get_page(2);
Hope this helps!
Just to second #Cheekysoft's suggestion, check out the Zend_Controller component of the Zend Framework. It is an implementation of the Front Controller pattern that can be used independently of the rest of the framework (assuming you would rather not use a complete MVC framework).
And obviously, CakePHP is the most similar to the RoR style.
I'm doing a PHP framework that does just what you are describing - taking what Django is doing, and bringing it to PHP, and here's how I'm solving this at the moment:
To get the nice and clean RESTful URLs that Django have (http://example.com/do/this/and/that/), you are unfortunately required to have mod_rewrite. But everything isn't as glum as it would seem, because you can achieve almost the same thing with a URI that contains the script's filename (http://example.com/index.php/do/this/and/that/). My mod_rewrite just forwards all calls to that format, so it's almost as usable as without the mod_rewrite trick.
To be truthful, I'm currently doing the latter method by GET (http://example.com/index.php?do/this/and/that/), and fixing stuff in case there are some genuine GET variables passed around. But my initial research says that using the direct slash after the filename should be even easier. You can dig it out with a certain $_SERVER superglobal index and doesn't require any Apache configuration. Can't remember the exact index off-hand, but you can trivially do a phpinfo() testpage to see how stuff look like under the hood.
Hope this helps.