URL handling – PHP vs Apache Rewrite - php

Currently I let a single PHP script handle all incoming URLs. This PHP script then parses the URL and loads the specific handler for that URL. Like this:
if(URI === "/")
{
require_once("root.php");
}
else if(URI === "/shop")
{
require_once("shop.php");
}
else if(URI === "/contact")
{
require_once("contact.php");
}
...
else
{
require_once("404.php");
}
Now, I keep thinking that this is actually highly inefficient and is going to need a lot of unnecessary processing power once my site is being visited more often. So I thought, why not do it within Apache with mod_rewrite and let Apache directly load the PHP script:
RewriteRule ^$ root.php [L]
RewriteRule ^shop$ shop.php [L]
...
However, because I have a lot of those URLs, I only want to make the change if it really is worth it.
So, here's my question: What option is better (efficiency-wise and otherwise) and why?
Btw, I absolutely want to keep the URL scheme and not simply let the scripts accessible via their actual file name (something.php).

So, here's my question: What option is better (efficiency-wise and otherwise) and why?
If every resource has to run through a PHP based check, as you say in your comment:
some resources are only available to logged in users, so I get to check cookies and login state first, then I serve them with readfile().
then you can indeed use PHP-side logic to handle things: A PHP instance is going to be started anyway, which renders the performance improvement of parsing URLs in Apache largely moot.
If you have static resources that do not need any session or other PHP-side check, you should absolutely handle the routing in the .htaccess file if possible, because you avoid starting a separate PHP process for every resource. But in your case, that won't apply.
Some ideas to increase performance:
consider whether really every resource needs to be protected through PHP-based authentication. Can style sheets or some images not be public, saving the performance-intensive PHP process?
try minifying resources into as few files as possible, e.g. by minifying all style sheets into one, and using CSS sprites to reduce the number of images.
I've heard that nginx is better prepared to handle this specific kind of scenario - at least I'm told it can very efficiently handle the delivery of a file after the authentication check has been done, instead of having to rely on PHP's readfile().

The PHP approach is correct but it could use a bit of improvement.
$file = $uri.".php";
if (!is_file($file)) { header("Status: 404 Not Found"); require_once(404.php); die(); }
require_once($uri.".php");

OK, as for efficiency - htaccess version with regexp and php version with single regexp and loading of matching file would be faster than many htaccess rules or many php if - else
Apart from that, htaccess and php way should be similar in efficiency in that case, probably with little gain with htaccess (eliminating one require in php)

RewriteRule ^([a-z]+)$ $1.php [L]
and rename root.php to index.php.

Related

User friendly URLs without htaccess

I want to create friendly urls for my website script only using PHP, right now im using the query style (Ex: index.php?location=register) and i would like to convert them to something like this:
https://www.sitename.com/index.php/Register
Right now im using a $_GET based function to parse and include the php script based on the $_GET value.
$includeDir = ".".DIRECTORY_SEPARATOR."assets/controllers".DIRECTORY_SEPARATOR;
$includeDefault = $includeDir."Home.php";
if(isset($_GET['ajaxpage']) && !empty($_GET['ajaxpage'])){
$_GET['ajaxpage'] = str_replace("\0", '', $_GET['ajaxpage']);
$includeFile = basename(realpath($includeDir.$_GET['ajaxpage'].".php"));
$includePath = $includeDir.$includeFile;
if(!empty($includeFile) && file_exists($includePath)) {
include($includePath);
}
else{
include($includeDefault);
}
exit();
}
if(isset($_GET['location']) && !empty($_GET['location']))
{
$_GET['location'] = str_replace("\0", '', $_GET['location']);
$includeFile=basename(realpath($includeDir.$_GET['location'].".php"));
$includePath = $includeDir.$includeFile;
if(!empty($includeFile) && file_exists($includePath))
{
include($includePath);
}
else
{
include($includeDefault);
}
}
else
{
include($includeDefault);
}
Kind regards!
Okay, my comment keeps growing...so I guess I'll just provide an answer...
1) This still requires server configuration. In the case of Apache, I believe it's called MultiView. This is what allows Apache to look up a directory when the first path /file.php/somepage is not found...if you don't have the right configuration, it will just give a 404 error even though file.php exists. So, if your intention is to avoid the need for server configuration, it won't work.
2) What you are doing is dangerous:
$includeFile = basename(realpath($includeDir.$_GET['ajaxpage'].".php"));
All I have to do is know where some of your files are and I can potentially cause one of your PHP files to run...e.g. run your nightly cron every 5 minutes and overwhelm your server or some other page that might do some damage...you need some way of forcing only files with a certain name can be included...e.g.
$includeFile = basename(realpath($includeDir.$_GET['ajaxpage']."Controller.php"));
By forcing a suffix of Controller to the filename, you just have to make sure not to use the name Controller at the end of the file name for any file you don't want to be include-able.
3) There are so many MV* style frameworks out there...and there are so many security considerations, etc., that it is not always wise to create your own until you understand many or most of them. Even if you don't like them, using those frameworks will also help you learn some best practices for creating your own.
4) Finally, what in the world is the reason to avoid using URL Rewriting. URL Rewriting is the STANDARD for both Apache and Windows to create clean URLs. There is a reason that "everybody's doing it." If it's performance, your way will actually, probably, be slower because apache first has to look to see if the path exists, then go up a directory and see if that file exists, then go up another directory and see if that file exists until it hits a match...then open that file.
Why do you need to show index.php in the URL?
I would create my URL to look like this https://www.sitename.com/register if you truly want clean URL's but you would need to use something like the rewrite.
But you would need to use .htaccess or Apache config rules such as this.
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/?$ index.php?location=$1 [L]
Then in your PHP code you can do a get on location var $_GET["location"] and then load the page from the value sent.
The result of $_GET["location"] would be register from this URL and then you will display that page.
I don't suggest using MultiViews as it can cause issues if you have file and folders with the same name. e.g. /admin and admin.php.

Show a 404 error when users access a directory on the server

I'm writing a small php site and I'm trying to structure it properly so that requests are handled centrally and there is a single point of entry to the code.
So I'm using the guide for a single entry point described here: https://thomashunter.name/blog/php-navigation-system-using-single-entry-point/
The mod_rewrite rule works well until you request a directory that actually exists on the server. I've disabled directory indexes but now the accesses behave differently depending on whether there is a trailing slash in the URL or not. With a slash, everything behaves the way I'd like (URL bar is www.example.com/images/ and the server returns a 404 created by my php script) but without a trailing slash the URL is rewritten to www.example.com/index.php?s=images which looks messy and exposes too much information about the structure of my site.
Seeing how it works with a trailing slash, ideally I want it to work the same way without the trailing slash. Otherwise, if it didn't work in any case, I'd settle for a simple redirect, although I don't like the idea of highlighting the real directories.
My .htaccess looks like this:
RewriteEngine On
RewriteRule ^([a-zA-Z0-9_]+)/?$ index.php?s=$1 [QSA]
Options -Indexes
ErrorDocument 403 index.php
I've also put in a header redirect but the ?s=images still gets appended to the URL (I am aware this can be written better but I'm experimenting with various combinations at the moment):
if (startsWith($_SERVER['REQUEST_URI'], "/main"))
{
header('Location: '.'/');
exit;
}
else if (startsWith($_SERVER['REQUEST_URI'], "/images"))
{
header('Location: '.'/');
exit;
}
where the definition of startsWith is (taken from this StackOverflow answer):
function startsWith($haystack, $needle)
{
return $needle === "" || strpos($haystack, $needle) === 0;
}
Finally, I'm looking for a solution that doesn't require copying dummy index.phps into every directory as that can get difficult to maintain. Any help or guidance will be appreciated.
you want some sort of routing, either via only apache Routing URLs in PHP or via one of the libraries (apache+php) that are avaliable
(like this one http://www.slimframework.com/)

Supersimple static file based (html) php site cache

I have a website that basically only displays things without any forms and post-gets.
This website is PHP based and hosted on shared hosting. It rarely changes.
I would like to enable caching for this website.
Its shared hosting so i need a solution that:
does not use Memcached
dont need to move my website to VPS
dont use APC or other things
So basically what i would like to acomplish is cache every subsite to HTML and tell PHP to get for 5 minutes the HTML cached version of current subsite and display it to user. And after 5 minutes to refresh the cache.
I've been looking for some while on the internets and there are some tutorials and frameworks that support this kind of kinky cache.
But what i need is just one good library that is extremely easy to use.
I imagine it to work in this way:
<?
if (current_site_cache_is_valid())
{
display_cached_version();
die;
}
..mywebsite rendering code
?>
So simple as it sounds but i hope some good fellow developer did library of this kind before. So do you know such ready to use, not very time consuming to implement solution?
This is how I normally do this, however I don't know your URL design nor your directory / file layout.
I do this with .htaccess and a mod_rewrite­Docs.
The webserver checks if a cached HTML file exists, and if yes, it's delivered. You can also check it's age.
If it's too old or if it does not exists your PHP script(s?) is started. At the beginning of your script you start the output buffer­Docs. At the end of your script, you obtain the output buffer and you place the content into the cache file and then you output it.
The benefit of this solution is, that apache will deliver static files in case they exist and there is no need to invoke a PHP process. If you do it all within PHP itself, you won't have that benefit.
I would even go a step further and run a cron-job that removes older cache-files instead of doing a time-check inside the .htaccess. That done, you can make the rewrite less complex to prefer a .php.cached file instead of the .php file.
I have a simple algo for HTML caching, predicated on the following conditions
The user is a guest (logged on users have a blog_user cookie set)
The request URI is a GET that contains no request parameters
An HTML cache version of the file exists
then an .htaccessrewrite rule kicks in, mapping the request to a cached file. Anything else is assumed to be context-specific and therefore not cacheable. Note that I use wikipedia-style URI mapping for my blog so /article-23 gets mapped to /index.php=article-23 when not cached.
I use a single HTML access file in my DOCUMENT_ROOT directory and here is the relevant extract. It's the third rewrite rule that does what you want. Any script which generates cacheable O/P wraps this in an ob_start() ob_get_clean() pair and write out the HTML cache file (though this is all handled by my templating engine). Updates also flush the HTML cache directory as necessary.
RewriteEngine on
RewriteBase /
# ...
# Handle blog index
RewriteRule ^blog/$ blog/index [skip=1]
# If the URI maps to a file that exists then stop. This will kill endless loops
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^blog/.* - [last]
# If the request is HTML cacheable (a GET to a specific list, with no query params)
# the user is not logged on and the HTML cache file exists then use it instead of executing PHP
RewriteCond %{HTTP_COOKIE} !blog_user
RewriteCond %{REQUEST_METHOD}%{QUERY_STRING} =GET [nocase]
RewriteCond %{DOCUMENT_ROOT}/blog/html_cache/$1.html -f
RewriteRule ^blog/(article-\d+|index|sitemap.xml|search-\w+|rss-[0-9a-z]*)$ \
blog/html_cache/$1.html [last]
# Anything else relating to the blog pass to index.php
RewriteRule blog/(.*) blog/index.php?page=$1 [qsappend,last]
Hope this helps. My blog describes this in more detail. :-)
It's a while since you asked this, but as this is still gathering search hits I thought I'd give you a better answer.
You can do static caching in PHP without .htaccess or other trickery. I found this at http://simonwillison.net/2003/may/5/cachingwithphp/ :
<?php
$cachefile = 'cache/index-cached.html';
$cachetime = 5 * 60;
// Serve from the cache if it is younger than $cachetime
if (file_exists($cachefile) && time() - $cachetime < filemtime($cachefile)) {
include($cachefile);
echo "<!-- Cached copy, generated ".date('H:i', filemtime($cachefile))." -->\n";
exit;
}
ob_start(); // Start the output buffer
/* The code to dynamically generate the page goes here */
// Cache the output to a file
$fp = fopen($cachefile, 'w');
fwrite($fp, ob_get_contents());
fclose($fp);
ob_end_flush(); // Send the output to the browser
?>
Just to add a little more to nico's response to make it more useful for generic copy and paste use by saving the time of typing in individual cachefile names for each file saved.
Original:
$cachefile = 'cache/index-cached.html';
Modified:
$cachefile = $_SERVER['DOCUMENT_ROOT'].'/cache/'.pathinfo($_SERVER['SCRIPT_NAME'], PATHINFO_FILENAME).'-cached.html';
What this does is to take the filename of whatever file it is located in, minus the extension (.php in my case), and appends the "-cached.html" label and new extension to the cached file. There are probably more efficient ways of doing this, but it works for me and hopefully save others some time and effort.
You should give skycache a try. edit : this project seems cool too: cacheme
Another solution is to use auto_prepend_file/auto_append_file. Something like what's described in this tutorial: Output caching for beginners

Couple of questions about .htaccess and friendly urls

I've always been bad at apache and used very simple solutions. Right now I have built a cms software.. but the .htaccess is starting to be a huge downsize.
I will first explain, how my friendly-urls work and look like. My language-switch is url based and always contains two characters. And it looks like this: stackoverflow.com/en/ this makes the switching really easy and since its url based.. it works well in the SEO terms. Also, if no language-id is set, then the default language will be used (stackoverflow.com/).
There are no page-ids in numbers. I have unique page-ids in text: stackoverflow.com/services.html and for SEO and folder-directories-anti-conflict purposes .html at the end..
For subpages I have "$current_page" and "$parent_page" style variables: stackoverflow.com/services/translating.html Services being the parent and translating being the current page.
Some sample code too (I nerfed it alot, so you don't think its incomplete):
RewriteRule ^(et|en|fi)\/(.+)\/(.+)\.html index.php?language=$1&pagelink=$3&parentlink=$2 [L,NC,QSA]
RewriteRule ^(.+)\/(.+)\.html index.php?language=0&pagelink=$2&parentlink=$1 [L,NC,QSA]
RewriteRule ^(.+)\.html index.php?language=0&pagelink=$1&parentlink=0 [L,NC,QSA]
How can I make the language-switch part more dynamic?
This method ..^(et|en|fi)\/.. means, that when I set up the cms, I must manually set the languages list. Best bet would be to set it somehow from the cms settings. Because, this way there are no conflicts related to folders. Is it possible global apace variable via php and then display it the .htaccess file? Something like this: ..^(LANGUAGELISTS)\/..? If this isn't possible, then next best thing would be to match 2 characters in that location and pass it as $_GET['language'].
How can I have unlimited parents dynamically?
Meaning, that the "$parent_page" is not set statically and I have unlimited children, similar to this: stackoverflow.com/services/translating/english/somesubpage.html. If that is possible, then also, how will it be used in the php, with an array?
Bounty edit
First part of the question is basically solved, unless somebody comes up with some php -> apache-array -> .htaccess way.
However, the second part of the question is still not solved. Since this is been the problem with all my projects and could possibly help somebody else in the future, I decided to add bounty to this question.
To answer your first question:
You could use RewriteRule ^([a-zA-Z]{2})([/]?)(.*)$ path/file.php?language=$1
This limits the first string to two characters and passes it on to $_GET['language']
Edit: adding RewriteCond %{REQUEST_FILENAME} !-f
and RewriteCond %{REQUEST_FILENAME} !-d will prevent conflicts with existing directories / files
Second question is much more difficult..
Update:
What Shad and toopay say is a good start in my opinion.
Using explode() to seperate levels and comparing it to the slug is quite simple.
But it's getting complicated once you want to add flexibility to the script.
function get_URL_items() {
$get_URL_items_url = $_SERVER['REQUEST_URI'];
$get_URL_items_vars = explode("/",$get_URL_items_url);
for ($get_URL_items_i = 0; $get_URL_items_i < count($get_URL_items_vars) ; $get_URL_items_i++) {
if(strlen(trim($get_URL_items_vars[$get_URL_items_i])) == 0) {
unset($get_URL_items_vars[$get_URL_items_i]);
}
}
return $get_URL_items_vars;
Let's say you you've got a website with a sub-section called "Festival" and a database filled with info for 100+ artist and you want your URLs to look like website.com/festival/<artistgenre>/<artistname>/.
You don't want to create 100+ pages in your CMS so <artistgenre> and <artistname> are some kind of wildcards.
I found it hard to achieve this without a lot of if/else statements like:
$item = get_URL_items();
if(is_user($item[2]) && is_genre($item[1]) && is_festival($item[0])) {
// do mysql stuff here
}
If I were you, I would use something like this:
.htaccess:
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !/main.php$
RewriteRule ^([a-zA-Z]{2})?(.*)$ main.php?lang=$1&path=$2 [L,QSA]
main.php:
$langs = array('en','de','ru'); // list of supported languages
$currentLang = isset($_GET['lang'])&&in_array($_GET['lang']) ? $_GET['lang'] : $defaultLang; // current selected language
$path = $_GET['path']; // current path
Then, in main.php you may parser path according to your needs
In answer to your bounty question I would use this:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([A-Z]{2}\/)*(([A-Z]+\/)*)([A-Z]+)\.html$ index.php?lang=$1&parents=$2&pagelink=$4 [NC,QSA,L]
Since you want to be able to handle any number of generations/levels in your URL, have you thought about how you want to catch them in you PHP script?
You definitely don't want to be going and checking isset($_GET['parent1']);isset($_GET['parent2']) etc etc etc.
As some of the other responses have indicated, you really need to be parsing your URL inside your PHP script; to that end, my RewriteRule hands off the entire 'parents' section of the URL, which your script can then explode and parse; but doesn't interfere with normal no-parent urls. =)
I somehow think this answer won't be very popular but here goes anyway. :)
mod_rewrite reaches a point where using it the old fashioned way with regular expressions becomes annoying. I suggest you skip all the pain and swap to using an external program/script to do your rewrites. I wouldn't suggest you do rewrites on all files using this method, but instead just for the urls that most users will see and type. As long as you know how to write efficient code you can even redirect to a php script to do the rewrites (as I have done in the past on a very high traffic site) and it will not have a noticeable effect on load times. If you ever reach a point where the rewrites are the main thing slowing down your site you can then switch it out for a program written in a quicker language, however I'd be surprised if you reach that.
Some things be aware of:
You need to set a rewrite lock directive or you will get lots of crazy output.
Remember that the rewrite script is a command line PHP script. It has no knowledge of things such as the $_SERVER global. This is surprisingly easy to forget.
This script is loaded at server start so any changes to it require a server restart before they take effect.
Always test this on the command line by passing a url and checking the output before restarting the server. If your script is broken restarting the server will result in anything from non functioning rewrites to the server not starting at all.
It a bit more hassle in the beginning, but once you have set this up you will find adding new rewrite rules to be an absolute breeze and a hell of a lot more flexible.
Here is the only tutorial I was able to find on how to do this using PHP...
Using MySQL to control mod_rewrite via PHP
This is far from the standard way of doing rewrites so I imagine I'm going to cop a lot of flack for this answer. Oh well. :)
Well, for SEO part, i think its better to have slug for each article (referencing you are use this for CMS). Means in your database, you have some "translation" table which translate the requesting uri/slug and associated it with $parent_page.

PHP alter URL without redirecting to it?

How can I alter url or part of it using php? I lost the code that I have used and now I cannot find answer online.
I know using header('Location: www.site.com') gets redirect but how can I just show fake URL?
I don't want to use mod_rewrite now.
It is impossible in PHP, which is executed server side. Any change to the url you make will trigger a page loading.
I think it may be possible in javascript, but I really doubt this is a good idea, if you want to rewrite an url only in the user adressbar, you're doing something wrong, or bad ;)
What you've actually asked for isn't possible in using PHP (Although, in JavaScript you can use the dreadful hashbang or the poorly supported bleeding edge pushState).
However, in a comment on another answer you stated that your goal is actually friendly URIs without mod_rewrite. This isn't about showing a different URI to the real one, but about making a URI that isn't based on a simple set of files and directories.
This is usually achieved (in PHP-land) with mod_rewrite, so you should probably stick with that approach.
However, you can do it using ScriptAlias (assuming you use Apache, other webservers may have different approaches).
e.g. ScriptAlias /example /var/www/example.php in the Apache configuration.
Then in the PHP script you can read $_SERVER['REQUEST_URI'] to find out what is requested and pull in the appropriate content.
You can make somewhat SEO-friendly URLs by adding directories after the php script name so that your URLs become:
http://yoursite.com/script.php/arg1/arg2
In script.php:
<?php
$args = preg_split('#/#', $_SERVER['PATH_INFO']);
echo "arg1 = ".$args[1].", arg2 = ".$args[2]."\n";
?>
if you use some more htaccess trickery, then you can make the script.php look like something else (see #David's answer for an idea)
You can try using,
file_get_contents("https://website.com");
This is not going to redirect but fire the api and you can catch the output by assigning a variable to above function.

Categories