PHP request file caching in load-balanced server environment - php

I'm looking to write a basic PHP file caching driver in a PHP application that routes all traffic to a front controller. For example's sake, assume the following simplified setup using apache mod_proxy_balancer:
In a single-server environment I would cache request responses on disk in a directory structure matching the request URI. Then, simple apache rewrite rules like the following could allow apache to return static cache files (if they exist) and avoid the PHP process altogether:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /front_controller.php [L]
Obviously, this is problematic in a load-balanced environment because the cache file would only be written to disk on the specific PHP server where the request was served up and the results cached.
Solving the issue ...
So, to solve this problem, I figured I could knock out some code to have the individual back-end PHP servers write/delete cache data to the load balancer. However, being mostly ignorant as to the capabilities of mod_proxy_balancer (and any other load balancing options, really), I need some outside verification for the following questions:
And the questions ...
Is it possible to do some form of checking like the above RewriteRules to have the front-facing load balancer serve up a static file before sending off requests to one of the backend servers?
Is this even advisable? Should the load balancer be allowed to route traffic exclusively and not be bothered with serving up static content?
Would it be better to just use an acceptable TTL on the cached files at the PHP server level and deal with an accepted level of stale cache overlap?
Finally, apologies if this is is too broad or has already been answered; I'm not really sure what to search for as a result of my aforementioned ignorance on the load-balancing subject.

For the simplest solution, you can use NFS. Mount a file system via NFS on all of the PHP servers and it acts like local storage, but is the same for all servers. To get a little more sophisticated, use something like Nginx or Varnish that can cache what is on the NFS file system.
Using memcache is also a viable alternative, which is a distributed memory based storage system. The nice thing about memcache is that you don't need to manage cache clearing or purging if you don't want to. You can set TTL for each cached item, or if memcache gets full, it automatically purges cached items.

This sounds like something Nginx could do easily, and would remove the need to write to files on disk.
Nginx can do the load balancing and caching, here's a tutorial on it:
http://nathanvangheem.com/news/nginx-with-built-in-load-balancing-and-caching

Related

Load balancing and APC

I am interested in a scenario where webservers serving a PHP application is set up with a load balancer.
There will be multiple webservers with APC behind the load balancer. All requests will have to go through the load balancer, which then sends it to one of the web servers to process.
I understand that memcached should be used for distributed caching, but I think having the APC cache on each machine cache things like application configurations and other objects that will NOT be different across any of the servers would yield even better performance.
There is also an administrator area for this application. It is also accessed via the load balancer (for example, site.com/admin). In a case like this, how can I call apc_clear_cache to clear the APC object cache on ALL servers?
Externally in your network you have a public IP you use to route all your requests to your load balancer that distributes load round robin so outside you cannot make a request to clear your cache on each server one at a time because you don't know which one is being used at any given time. However, within your network, each machine has its own internal IP and can be called directly. Knowing this you can do some funny/weird things that do work externally.
A solution I like is to be able to hit a single URL and get everything done such as http://www.mywebsite/clearcache.php or something like that. If you like that as well, read on. Remember you can have this authenticated if you like so your admin can hit this or however you protect it.
You could create logic where you can externally make one request to clear your cache on all servers. Whichever server receives the request to clear cache will have the same logic to talk to all servers to clear their cache. This sounds weird and a bit frankenstein but here goes the logic assuming we have 3 servers with IPs 10.232.12.1, 10.232.12.2, 10.232.12.3 internally:
1) All servers would have two files called "initiate_clear_cache.php" and "clear_cache.php" that would be the same copies for all servers.
2) "initiate_clear_cache.php" would do a file_get_contents for each machine in the network calling "clear_cache.php" which would include itself
for example:
file_get_contents('http://10.232.12.1/clear_cache.php');
file_get_contents('http://10.232.12.2/clear_cache.php');
file_get_contents('http://10.232.12.3/clear_cache.php');
3) The file called "clear_cache.php" is actually doing the cache clearing for its respective machine.
4) You only need to make a single request now such as http://www.mywebsite/initial_clear_cache.php and you are done.
Let me know if this works for you. I've done this in .NET and Node.js similar but haven't tried this in PHP yet but I'm sure the concept is the same. :)

Make clean URLS and retrieve query string

Is there a way to insert relative URLS in php code such as /forums/(forumID)/ into tags while setting up my site? Then when I am trying to get which forumID the current page is, to get it via a $_GET request without using a template system like Smarty, CakePHP etc or Apache rewrite module? Or is it a huge headache? I just want to be able to not be bound to one web server type (Apache).
Clean urls are fairly easy to do, but if the web pages are vastly different, it may cause some problems.
You'll need to edit your .htaccess file and add something similar to this
RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)$ index.php?page=$1
#This will process http://example.com/forum as http://example.com/index.php?page=forum
RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1
#This includes the trailing slash
RewriteRule ^([a-zA-Z0-9]+)/([a-zA-Z0-9]+)$ index.php?page=$1&id=$2
#This will process http://example.com/forum/512 as http://example.com/index.php?page=forum&id=512
This is a good source for more information http://www.desiquintans.com/cleanurls
... carried on from OP comments.
These frameworks read the request again in their respective languages to allow the framework to route to specific controllers, but they still need the webserver to be setup to send the request to the framework in the first place.
Client requests http://example.com/forums/123
After DNS lookup, request hits server at 127.0.0.1:80
Webserver (eg Apache) listens for port 80 and accepts request
Apache routes request to server-side script
This is the journey as the web server sees it. It needs to read the request before it even hits the server-side scripting language (PHP/Python/Ruby etc).
The server-side languages can then re-read the URL once the webserver has hit the front controller as they please.
The main reasons to have clean urls from an architecture point of view:
They are not tied to any programming language. If you have .php or any other extensions, you'd have to set up your server to accept .php extensions for other languages if you switch to ASP.net.
They are are easy to route in any language or server setup. All modern servers I know of have modules to route urls.
Note that to use a programming language to route the urls, you still have to set up your server to direct everything to a bootstrap file. Honestly, you are not getting around server configurations of some kind no matter what.
Conclusion: your logic for wanting your project set up this way will not work without doing some server setup.

Htaccess: Trigger php or write to log

I want a redirect which is as fast as possible. So I decided to use htaccess redirect , because it responses even before the php interpreter is initialized. But I want to log the redirects and write something to the Database.
I tried to redirect and call rewritemap just to trigger a php file but, it only throws a 500 error.
I would be ok, if i can create a log file, even if the log processing would be delayed. Important is only: Fast redirect, track / log a redirect.
Have you got any ideas or recommendations on this?
thank you in upfront
all the best,
emre
You could use RewriteLog to log rewriting actions to a file -- that would be done by Apache, without invoking PHP.
=> Quite fast ; but logs only to a file, not a database ; still, as you said, the log processing can be delayed, and done by a script run from the crontab.
See also RewriteLogLevel, to configure how verbose that log should be.
Can be used in apache/vhost config, but not in .htaccess (so if you can put it there, remember to reload apache):
RewriteEngine On
RewriteRule /foo http://www.example.com [R=301,E=redirected]
CustomLog /path/to/log combined env=redirected
'combined' is a default log format, but you can define your own
If your goal is simply fast as possible with logging, then the primary point of concern is keeping disk I/O to a minimum. Relying on .htaccess, you're doing a directory scan at each level of the URL (http://muffinresearch.co.uk/archives/2008/04/07/avoiding-the-use-of-htaccess-for-performance/).
If you could setup your RewriteRule in Apache's conf, and point your redirects to a PHP file, then you could have PHP running w/ APC and have the logging done to a memcache object. That way your entire client's access could occur purely in fast-access memory, and you could have a cron-job that'd routinely take the data from memcache and push it to the database (this way you'd still have long-term storage but the client's access should never require the disk to be read.)
Obviously if you're flexible on the database, you could use a Couchbase-style solution that'd essentially let you have speed of writing to memcache without storing the information in volatile memory, but I'm guessing you're locked into the database you're currently using.
You might want t try an apache log to mysql module like:
http://www.outoforder.cc/projects/apache/mod_log_sql/
here is a howto for debian
http://www.howtoforge.com/apache2-logging-to-a-mysql-database-with-mod_log_sql-on-debian-etch
However I'm not 100% sure this works with the latest apache.
Good Luck

Apache, PHP caching

A have setup an internal proxy kind of thing using Curl and PHP. The setup is like this:
The proxy server is a rather cheap VPS (which has slow disk i/o at times). All requests to this server are handled by a single index.php script. The index.php fetches data from another, fast server and displays to the user.
The data transfer between the two servers is very fast and the bottleneck is only the disk i/o on the proxy server. Since there is only one index.php - I want to know
1) How do I ensure that index.php is permanently "cahced" in Apache on the proxy server? (Googling for php cache, I found many custom solutions that will cache the "data" output by php I want to know if there are any pre-build modules in apache that will cache the php-script itself?).
2) Is the data fetched from the backend server alway in the RAM/cache on the proxy server? (assuming there is enough memory)
3) Does apache read any config files or other files from disk when handling requests?
4) Does apache wait for logs to be written to disk before serving the content - if so I will disable logging on the proxy server (or is there way to ensure content is first served without waiting for logs to be written).?
Basically, I want to eliminate disk i/o all together on the 'proxy' server.
Thanks,
JP
1) Install APC (http://pecl.php.net/apc), this will compile your PHP script once and keep it in shared memory for the lifetime of the webserver process (or a given TTL).
2) If your script fetches data and does not cache/store it on the filesystem, it will be in RAM, yes. But only for the duration of the request. PHP uses a 'share-nothing' strategy which means -all- memory is released after a request. If you do cache data on the filesystem, consider using memcached (http://memcached.org/) instead to bypass file i/o.
3) If you have .htaccess support activated, Apache will search for those in each path leading to your php file. See Why can't I disable .htaccess in Apache? for more info.
4) Not 100% sure, but it probably does wait.
Why not use something like Varnish which is explicitly built for this type of task and does not carry the overhead of Apache?
I would recommend "tinyproxy" for this puprose.
Does everything you want very efficeintly.

Question on dynamic URL parsing

I see many, many sites that have URLs for individual pages such as
http://www.mysite.com/articles/this-is-article-1
http://www.mysite.com/galleries/575
And they don't redirect, they don't run slowly...
I know how to parse URL's, that's easy enough. But in my mind, that seems slow and cumbersome on a dynamic site. As well, if the pages are all staticly built (hende the custom URL) then that means all components of the page are static as well... (which would be bad)
I'd love to hear some ideas about how this is typically accomplished.
There are many ways you can handle the above. Generally speaking, there is always at least some form of redirection involved - although that could be at the .htaccess level rather than php. Here's a scenario:
Use .htaccess to redirect to your php processing script.
Parse the uri ($_SERVER['REQUEST_URI']) and ascertain the type of content (for instance, articles or galleries as per your examples).
Use the provided id (generally appended to the end of the uri, again as in your examples) to obtain the correct data - be that by serving a static file or querying a database for the requested content.
This method is a very popular way of increasing SEO, but as you rightly highlight there can be difficulties in taking this approach - not typically performance, but it can make development or administration more troublesome (the later if your implementation is not well thought out and scalable).
Firstly, when comparing /plain/ URL rewriting at the application level to using /plain/ CGI (CGI can be PHP, ISAPI, ASP.NET, etc.) with serving static pages, serving static files will always, always win. There is simply less work. For example, in Windows and Linux (that I know of) there are even enhancements in the kernel for serving static files on a local drive via HTTP. To further make the point I even found a benchmark using several servers and OSs: http://www.litespeedtech.com/web-server-performance-comparison-litespeed-2.0-vs.html#RESULT Note that serving static files is dramatically faster than using any type of CGI
However, there can potentially be performance and scalability gains by using rewritten URLs effectively and it is done with caching. If you return proper cache headers (see cache-control directive in HTTP documentation) then it enables downstream servers to cache the data so you won't even get hits on your site. However, I guess you could get the same benefit with static pages :) I just happen to read an article on this very topic a day or two ago at the High Scalability blog: http://highscalability.com/strategy-understanding-your-data-leads-best-scalability-solutions
A rewrite engine is the best approach as they are fast and optimised. Allowing your Server-Side scripting to use just plain local vars.
Apaches mod_rewrite is the most common.
It's usually done via a rewrite engine, either in the server (via something like mod_rewrite in Apache) or in the web application (all requests are routed to the web application, which looks for a route for the path specified).
In my case, I stick to the web framework with this feature already built-in. (CodeIgniter)
... As well, if the pages are all staticly
built (hende the custom URL) then that
means all components of the page are
static as well... (which would be bad)
... yes, this is very bad indeed. :o
It is possible to rewrite at
The server level in either the .htaccess file or the httpd.conf or vhosts.conf file. This is typically faster than the next level of rewriting which is done on the application level.
The application level (in this instance with PHP). You can write custom redirects that analyse the URL and redirect in some way based on that. Modern web frameworks such as the Zend Framework (ZF) use routes to control URL rewriting. The following is an example of a static route with ZF
$route = new Zend_Controller_Router_Route_Static('latest/news/this/week',
array('controller' => 'news'));
Which would redirect any request from http://somedomain.com/lastest/news/this/week to the news controller.
An example of a dynamic route would be
$route = new Zend_Controller_Router_Route('galleries/:id', array('controller' => 'gallery'));
Where the variable $id would be availbe to that controller (and using our example above would be 575)
These are very useful tools to that allow you to develop an application and retrospectively change the URL to anything you want.
A very simple way is to have a CGI parse the PATH_INFO portion of the URL.
In your example:
http://www.example.com/articles/12345 (where "articles" is a CGI script)
^CGI^ ^^^^^^PATH_INFO
Every thing after the script name is passed to the script in the PATH_INFO CGI header.
Then you can do a database lookup or whatever you wish to generate the page.
Use caution when accessing this value as the IIS server and Apache server put different portions of the URL in PATH_INFO. (IIRC: IIS incorrectly uses the entire URL and Apache prunes it as stated above.)
On apache servers mod_rewrite is the most common for this, it's an apache mod which allows you to rewrite request urls to other urls with regular expressions, so for your example something like this would be used:
RewriteEngine ON
RewriteRule ^articles/(.*) articles.php?article=$1 [L]
RewriteRule ^galleries/(\d*) galleries.php?gallerie=$1 [L]
This costs hardly any time, and in practice is just as fast as having the url:
www.mysite.com/galleries.php?gallerie=575 but looks way better
I have used this method preiously - you just need to add the file extensions that should not be redirected in the regex and then everything else is handled by php so you don't need to be going into your .htacces file
suceed with urls

Categories