What are the pros for using extension-less URLs?
For example, why should I change...
http://yoursite.com/mypage.html
http://yoursite.com/mypage.php
http://yoursite.com/mypage.aspx
to...
http://yoursite.com/mypage
And is it possible to have extension-less URLs for every page?
Update:
Are extension-less URLs better for site security?
The reason for extension-less URLs is that it is technology independent. If you want to change how your content is rendered you do not have to change the URL.
W3: Cool URIs don't change
File name extension
This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension....
Conclusion Keeping URIs so that they will still be around in 2, 20 or 200 or even 2000 years is clearly not as simple as it sounds. However, all over the Web, webmasters are making decisions which will make it really difficult for themselves in the future. Often, this is because they are using tools whose task is seen as to present the best site in the moment, and no one has evaluated what will happen to the links when things change. The message here is, however, that many, many things can change and your URIs can and should stay the same. They only can if you think about how you design them.
It's mostly done for aesthetic purposes.
There is a very minor potential security benefit (a user doesn't immediately know what language the backend code is written in) but this is negligible.
A related blog post.
People claim it makes for better SEO, even if I am not personally convinced of that. Many clients request these extension-less URLs nowadays, so it's just as well that it can be easily achieved.
If you are running IIS 7, you can switch the AppPool to run on the Integrated Pipeline, thereby removing the need to have specific extensions mapped to the ASP.NET engine. Once that is done, you can instruct Sitecore to use extension-less urls in the web.config setting (assuming Sitecore 6):
<linkManager defaultProvider="sitecore">
<providers>
<clear />
<add name="sitecore" type="Sitecore.Links.LinkProvider, Sitecore.Kernel"
addAspxExtension="false" /* This one is set to true, per default */
alwaysIncludeServerUrl="false"
encodeNames="true"
languageEmbedding="asNeeded"
languageLocation="filePath"
shortenUrls="true"
useDisplayName="false" />
</providers>
</linkManager>
And you're set.
Be aware that early versions of Sitecore 6 had a few issues when running Integrated Pipeline. More information can be found here.
As stated, one of the advantages is that you do not tie URLS to a specific technology or language. Also, one of the advantages is that it allows you to manage the output format from within the application if you wish to do so.
But this is relevant only within a "routed" code framework, where you would basically attach url routes to code.
For instance, in my code library, you can specify the allowed output format of an url by
1) Setting an Accept header in the HTTP header
2) Attaching a valid extension to the URL
So the code for /my/simple/url.html, /my/simple/url.xml and /my/simple/url.json is exactly the same. The ouput manager will be responsible for outputing the content in the proper way.
So if you change the underlying technology, you are still able to keep the same URL pattern within the new version of you application.
From there, since you are parsing the URL withing your own code to extract the data, it generally gives you the opportunity to make SEO-friendly URL, i.e. more meaningful URLs in terms of search engine indexing. You can then define more meaningful URL patterns within you web application structure.
Because user does not need to know technology behind a page.
Example: domain.com/Programs/Notepad
Only thing I can think of is to make it easier for the end user to remember/type, other than that I don't see any reason, I also ran this by our admin and he says some say SEO but if he was to use it, he would use it for a level if security.
Related
hi
I am working on a great website (social network with php) and I've decided to create only one php page, (index.php), but this php page will contain php if conditions and statments of the $_GET value,and will display the page requered (but in the same page index.php).
This means that the code(javascript+xhtml+php) will be very huge (nearly all the project in one page).
I will also use the Htaccess to rewrite the urls of those pages to avoid any malicious requests (so it will appear just like a normal website).
But, before doing so, I just want to know about the advantages and downsides of this technique, seeing it from all other sides (security, server resources, etc...)
thank you
I think what you're trying to do is organize your code properly and effectively, which I commend.
However if I understand correctly, you're going to put all of your javascript, html, and PHP in one file, which is really bad. You want your code to be modular, not lumped together in a single file.
I think you should look into using a framework (eg Zend) - PHP frameworks are specifically designed to help your code remain organized, modular, and secure. Your intent (organizing your code effectively) is great, but your idea for how to organize your code isn't very good. If you're absolutely adament about not using a framework (for example if this is a learning/school project), you should at least make sure you're following best practices.
This approach is not good because of server resource usage. In order to get access to say jQuery.js your web server is going to:
Determine that jQuery.js actually passes through index.php
Pass index.php through the php parser
Wait for php to generate a response.
Serve that response.
Or, you could serve it this:
Determine jQuery.js exists in /var/www/mysite/jQuery.js
Serve it as the response.
Likewise for anything that's "static" i.e. isn't generated from PHP directly. The bigger the number of ifs in the PHP script, the more tests will need be done to find your file.
You do not need to pass your static content through some form of url routing; only your dynamic content. For real speed, its better to generate responses ready as well, called caching, particularly if the dynamic content is expensive in terms of cpu cycles to generate. Other caching techniques include leaving frequently accessed database data in memory, which is what memcached does.
If you're developing a social network, these things really do matter. Heck, facebook wrote a PHP-to-C++ compiler to save clock cycles.
I second the framework recommendation because it really will make code organisation easier and might integrate with a caching-based solution.
In terms of PHP frameworks, there are many. Here's a list of many web application frameworks in many languages and from the same page, the PHP ones. Take a look and decide which you like best. That's what I did and I ended up learning Python to use Django.
Came by this question searching so since the best answer is old, here is more modern one, from this question
Why use a single index.php page for entire site?
A front controller (index.php) ensures that everything that is common to the whole site (e.g. authentication) is always correctly handled, regardless of which page you request. If you have 50 different PHP files scattered all over the place, it's difficult to manage that. And what if you decide to change the order in which the common library files get loaded? If you have just one file, you can change it in one place. If you have 50 different entry points, you need to change all of them.
Someone might say that loading all the common stuff all the time is a waste of resources and you should only load the files that are needed for this particular page. True. But today's PHP frameworks make heavy use of OOP and autoloading, so this "waste" doesn't exist anymore.
A front controller also makes it very easy for you to have pretty URLs in your site, because you are absolutely free to use whatever URL you feel like and send it to whatever controller/method you need. Otherwise you're stuck with every URL ending in .php followed by an ugly list of query strings, and the only way to avoid this is to use even uglier rewrite rules in your .htaccess file. Even WordPress, which has dozens of different entry points (especially in the admin section), forces most common requests to go through index.php so that you can have a flexible permalink format.
Almost all web frameworks in other languages use single points of entry -- or more accurately, a single script is called to bootstrap a process which then communicates with the web server. Django works like that. CherryPy works like that. It's very natural to do it this way in Python. The only widely used language that allows web applications to be written any other way (except when used as an old-style CGI script) is PHP. In PHP, you can give any file a .php extension and it'll be executed by the web server. This is very powerful, and it makes PHP easy to learn. But once you go past a certain level of complexity, the single-point-of-entry approach begins to look a lot more attractive.
It will be a hell of a mess.
You also wont be able to upgrade parts of the website or work on them without messing with the whole thing.
You will not be able to apply some programming architecture like MVC.
It could theoretically be faster, because you have only one file that needs to be fetched from disk, but only under the assumption that all or at least almost all the code is going to be executed.
So you will have to load and compile the whole file for every single request, also the parts that are not needed. so it will slow you down.
What you however CAN do is have a single point of entry where all requests originate from. That helps controlling a lot and is called a bootstrap file.
But most importantly:
Why would you want that?
From what I know most CMSes (and probably all modern ones) are made so that the requested page is the same index.php, but that file is just a dispatcher to other sections. The code is written properly in different files that are built together with includes.
Edit: If you're afraid your included scripts are vulnerable the solutions is trivial. Put them outside of the web root.
Simplistic example:
<?php
/* This folder shouldn't even be in the site root,
it should be in a totally different place on the server
so there is no way someone could request something from it */
$safeRoot = '/path/to/safe/folder/';
include $safeRoot.'all_pages_need_this.php'; // aka The Bootstrap //
switch($_GET['page']){
case 'home':
include $safeRoot.'home.module.php';
break;
case 'blog':
include $safeRoot.'blog.module.php';
break;
case 'store':
include $safeRoot.'store.module.php';
break;
default:
include $safeRoot.'404.module.php';
}
This means that the code(javascript+xhtml+php) will be very huge (nearly all the project in one page).
Yes and it'll be slow.
So you're not going to have any HTML cacheing?
It's all purely in one file, hard to update and slow to interpret? geesh, good luck.
What you are referring to is called single point of entry and is something many web applications (most notably the ones built following the MVC pattern) use.
The code of your point of entry file doesn't have to be huge as you can simply include() other files as needed. For example:
<?php
if ($_GET['module'] == 'messages') {
include('inbox.php');
}
if ($_GET['module'] == 'profile') {
include('profile.php');
} etc..
I am taking over an existing PHP project. I noticed that the previous developer uses a one index.php page for the entire site, currently 10+ pages. This is the second project that I have seen done like this. I don't see the advantage with this approach. In fact it seems like it over complicates everything because now you can't just add a new page to the site and link to it. You also have to make sure you update the main index page with a if clause to check for that page type and then load the page. It seems if they are just trying to reuse a template it would be easier to just use includes for the header and footer and then create each new page with those files referenced.
Can someone explain why this approach would be used? Is this some form of an MVC pattern that I am not familiar with? PHP is a second language so I am not as familiar with best practices.
I have tried doing some searches in Google for "single index page with php" and things like that but I can not find any good articles explaining why this approach is being used. I really want to kick this old stuff to the curb and not continue down that path but I want to have some sound reasoning before making the suggestion.
A front controller (index.php) ensures that everything that is common to the whole site (e.g. authentication) is always correctly handled, regardless of which page you request. If you have 50 different PHP files scattered all over the place, it's difficult to manage that. And what if you decide to change the order in which the common library files get loaded? If you have just one file, you can change it in one place. If you have 50 different entry points, you need to change all of them.
Someone might say that loading all the common stuff all the time is a waste of resources and you should only load the files that are needed for this particular page. True. But today's PHP frameworks make heavy use of OOP and autoloading, so this "waste" doesn't exist anymore.
A front controller also makes it very easy for you to have pretty URLs in your site, because you are absolutely free to use whatever URL you feel like and send it to whatever controller/method you need. Otherwise you're stuck with every URL ending in .php followed by an ugly list of query strings, and the only way to avoid this is to use even uglier rewrite rules in your .htaccess file. Even WordPress, which has dozens of different entry points (especially in the admin section), forces most common requests to go through index.php so that you can have a flexible permalink format.
Almost all web frameworks in other languages use single points of entry -- or more accurately, a single script is called to bootstrap a process which then communicates with the web server. Django works like that. CherryPy works like that. It's very natural to do it this way in Python. The only widely used language that allows web applications to be written any other way (except when used as an old-style CGI script) is PHP. In PHP, you can give any file a .php extension and it'll be executed by the web server. This is very powerful, and it makes PHP easy to learn. But once you go past a certain level of complexity, the single-point-of-entry approach begins to look a lot more attractive.
Having a single index.php file in the public directory can also protect against in the case of the php interpreter going down. A lot of frameworks use the index.php file to include the bootstrap file outside of the doc root. If this happens, the user will be able to see your sourcecode of this single file instead of the entire codebase.
Well, if the only thing that changes is the URL, It doesn't seem like it's done for any reason besides aesthetic purposes...
As for me - single entry point can help you to have better control of your application: it helps to handle errors easily, route requests, debug application.
A single "index.php" is an easy way to make sure all requests to your application flow through the same gate. This way when you add a second page you don't have to make sure bootstrapping, authentication, authorization, logging, etc are all configured--you get it for free by merit of the framework.
In modern web frameworks this could be using a front controller but it is impossible to tell since a lot of PHP code/developers suffer from NIH syndrome.
Typically such approaches are used when the contents of the pages are determined by database contents. Thus all the work would get done in a single file. This is seen often in CMS systems.
I am working on a project that uses PHP , AS3, and AMFPHP .
The project allows users to upload and download images among other things. Since I am fairly new to PHP/FLash security I have been trying to gather as much info about making things as secure as possible. I've got some good advise about using .htaccess files, and a few other tricks.
My main question at the moment is how to hide the "path" info from and to the PHP / assets / and to and from the AMFPHP services ...
Currently I have all the paths hard-coded in one .as that returns an object with the paths to any of the other classes that need/request it. I found this method to work well since I only need to change this one .AS , and it will branch out to the other classes that need it.
I'm not super worried about others decompiling my code, and they could probably "sniff" out the paths if they really wanted. I'm mostly concerned with allowing others easy access to all of my AMFPHP services or being allowed to parts of the site I do not wish them to be. basically I realize that things aren't gonna be 100% secure regardless, but would like to take precautions.
So my main question is ...
Whats the best- simplest way to obscure / hide the paths being used in a PHP - AS3 project ? ... I entertained the possibly of PHP includes or even a SQL database if need be. I rather not spend a bunch of time and money on questionable obfuscatory software, unless there's a tried and true ( and inexpensive) one for flash (not flex). .. and I currently do not have a SSL but don't know how critical - common this is. --
As you've noted, anyone could find out your paths by using Wireshark to watch traffic sent to your site, or a Flash decompiler to look at your source code and find the links directly.
I don't think it sounds worth the trouble to try to hide your paths, since all it would be adding is a slight layer of obscurity. Anyone interested could figure it out with relatively little effort, but the average person would have no clue whatsoever about how to make an AMF call to one of your services. Instead, I'd concentrate on making your AMFPHP functions themselves as secure as possible.
You could use a mod_rewrite file (with Apache) to remove or change the file extensions for your pages.
RewriteEngine on
RewriteRule ^bob.php$ bob.html
See http://www.workingwith.me.uk/articles/scripting/mod_rewrite for more examples.
This would not change the links hardcoded in flash but could make them less obvious to a user.
If you are using Windows then you can use OBFU to obfuscate your flash code. It is Expensive but very secure. There are a few open source alternatives but not as secure.
See http://tech.motion-twin.com/obfu.html
But what Code Duck is saying is correct in that there is no way to completely protect it.
Would using a central "page handler" affect SEO negatively?
eg A page request comes in for www.mysite.com/index.php, which mod_rewrite passes on as www.mysite.com/handler.php?page=index. Handler.php gathers the page-specific includes, language files and templates, and outputs the resultant html.
My understanding is that the page handler method won't be any different SEO-wise than serving index.php directly, as the content and publicly visible url remain the same regardless of the monkey-business going on behind-the-scenes, but I've been wrong before... :)
Search engines can only see the end HTML result. They have no idea if you're using a central page handler - how would they without hacking into your site's FTP?
Also, as many frameworks and CMSes use this technique - Drupal and WordPress come to mind immediately - Google et. al. would be lunatics to penalise it, even if they could detect it.
Because mod_rewrite happens within the server, the requester will only see that they requested index.php and got a response. Without a redirect, the requester will only know that index.php exists.
Many content management systems use this method. While in Drupal every page is actually served by the request /index.php?q=request/path through mod_rewrite, any links on the site will be seen as /requests/path, with the requester oblivious that they are all passed through one php script. There are modules as well that redirect the ?q= path to the 'clean path', telling the requester that the path with a query is invalid or doesn't exist.
A well formed URI is a bonus when it comes to SEO. It helps indexing. Consider that there are sites like PRWeb.com that sell you URI space. Not subdomains, but URI keywords.
Also, while many customers merely want to mouse around, astute web users are impressed with an intuitive URI pattern. If you chop the filename off a path, you should get something logical, like a homepage or an index page, not an error screen.
If your application will eventually be statically cached, you want to able to leverage the file system. So if you have content that will publish well in a static form, I wouldn't hide it behind a convoluted query string.
Also, when conducting web analytics, having an easily parse URI certainly helps you craft your reports.
Your URI doesn't have to correspond to your filesystem. REST style APIs make it quite common to use pathings as a way to divide up areas of their APIs. Your application might leverage some pathing in the URI as a way to separate features. For access control, too: if you want to restrict Googlebot forinstance, it doesn't make a lot of sense to put ?action=blah in a robots.txt file. It does expect paths and fileglobs.
Apache mod_rewrite is awesome. I love it, I live it. I'd rather design in mod_rewrite to proxy a consistent URI space to a changing application codebase early, rather than use mod_rewrite as a bandage on an aging file structure or application layout.
I've seen it mentioned in many blogs around the net, but I believe it shoud be discussed here.
What can we do when we have an MVC framework (I am interested in ZEND) in PHP but our host does not provide mod_rewrite?
Are there any "short-cuts"? Can we transfer control in any way (so that a mapping may occur between pages)? Any ideas?
Thank you :-)
Zend framework should work without mod_rewrite. If you can live with your URL:s looking more like "/path/to/app/index.php/controller/action". If you had mod_rewrite you could do away with the "index.php" bit, but it should work with too.
It's all a matter of setting up the routes to accept the index.php part.
OK my verdict :-): I have used successfully zend without mod_rewrite and it's as you've all said site/index.php/controller/action. I knew that before posting this. I've also found out around the net a technique that "pushes" 404 pages to index.php therefore what is not a resource (eg. CSS, image, etc) get there, with one exception: POST values.
So I decided that the next time an application has to be made in the specific server, to ask politely for mod_rewrite. If the administrator can not provide it, talk with my boss or if it is for me, switch provider.
Generally, it is a shame sometimes that the PHP market is so much fragmented (php4, php5, php6, mod_rewrite, mod_auth, mod_whatever), but this is another story...
mod_rewrite is almost essential in today's hosting environment..but unfortunately not everyone got the message.
Lots of the large php programs (I'm thinking magento, but most can cope) have a pretty-url fall back mode for when mod_rewrite isn't available.
URLs end up looking like www.site.com/index.php?load-this-page
They must be running some magic to grab the variable name from the $_GET variable and using it as the selector for what module/feature to execute.
In a related note, I've seen lots of messed up URLs in the new facebook site where it's using the #. So links look like www.new.facebook.com/home.php#/inbox/ Clearly we're not meant to see that but it suggests that they're probably parsing the $_SERVER['REQUEST_URI'] variable.
If you can find a non-mod_rewrite way to redirect all requests to index.php (or wherever your init script is), you can, as mentioned above, use 'REQUEST_URI' to grab the portion of the address after the domain and then parse it as you like and make the request do what you want it to. This is how Wordpress does it (granted, with mod_rewrite). As long as you can redirect requests to your index page while retaining the same URI, you can do what you need to to process the request.
Drupal's rewrite rules translate
http://example.com/path/goes/here
into
http://example.com/index.php?q=path/goes/here
...and has logic to decide which flavor of URLs to generate. If you can live with ugly URLs, this would let you keep all the logic of a single front controller in place w/o relying on URL rewriting.