File based caching under PHP - php

I've been using http://code.google.com/p/phpbrowscap/ for a project, and it usually works nice. But a few times it's cache, which is plain php-files (see http://code.google.com/p/phpbrowscap/source/browse/trunk/browscap/Browscap.php#372 et. al.), has been "zeroed", i.e. the whole cache file has become large blob of NULLs.
Instead of trying to find out why the files become NULL, I though perhaps it might be better to change the caching strategy to something more resilient.
So I do wonder if you has any good ideas what would be a good solution; I've been looking at http://www.jongales.com/blog/2009/02/18/simple-file-based-php-cache-class/ and http://www.phpclasses.org/package/313-PHP-Cache-arbitrary-data-in-files-.html and I also though of just saving an serialized array to the file instead of pure php as it's been doing now; But I'm uncertain what approach I should target here.
I'm grateful for any insight into this area of technology, as I know it's complex from a performance point of view.

What you're describing appears to be a bug in phpbrowscap. You may check what's causing it.
Anyway, phpbrowscap's strategy is a relatively sensible one because by writing the cache into a PHP file, it can also take advantage of opcode caches.
However, I think the best strategy would be to serialize the object and put it the result in a memory cache like APC. Another possible strategy would be to implement the functionality in an extension, which would always be in memory.

Related

Why do MVC frameworks in PHP not persist between requests?

What I have been able to grasp from reading the source and documentation from several PHP frameworks is that they generally don't persist, except for what you personally cache or throw into a $_SESSION var. Why is this? It seems a waste to essentially initialize the framework for every single request, would it not be better to at least serialize and store some core objects and variables to save processing and time?
At first I thought this was rather subjective and avoided asking, but everything I've read doesn't really speak about it at all, so there must be something obvious I'm missing.
The only real mention/discussion I've found of this is here which doesn't directly answer my question and some of which goes over my head a little.
Edit for Clarification: I am not asking about the inner workings of PHP, I know how persistence works (ie won't persist unless you make it through caching or session vars), I am asking why PHP frameworks don't do this for their core objects. Again it seems subjective to me, but as almost nothing I've read mentions it, and it seems to be fairly standard practice, I'd like to know what I'm missing.
Memory:
Most frameworks don't store these core mechanisms in $_SESSION due to memory concerns. Frameworks often generate variables / objects that can contain several megabytes of information. That may not sound like a lot, but scale that to a few thousand users and you've got a problem.
Data "Freshness"
The second issue with shoving framework components into memory is that they can become out of date very quickly. Instead of pulling an object out of memory, checking to see if it's outdated and then recreating it (if it's indeed outdated) is less efficient (most of the time) than just recreating it with every request.
I hope this clarifies things.
If you want data to persist between server requests then you need to use cookies/sessions or store your data in a database. This is just the way that it works. PHP cannot store data in itself for use between server requests.
Some frameworks may store core objects in a database or to a local file on disk, but it would depend on the framework.

Php simple caching technique for small-mid size websites

I was looking for HTML/Text content caching for small-mid size site using php. I'll mostly save the dynamic navigation-menu for site, generated HTML report from DB etc. Primarily I am looking for session based caching (is it a bad idea?). It can also be file based.
Any existing solution is much appreciated. For example Zend Framework is well known for its loosely coupled components. So, Zend_Cache can be a candidate, but could not find session based caching adapter. Moreover, it is not completely independent component. Can anybody tell what are the classes that I need to take to use Zend_Cache?
Another option is PEAR's - Cache_Lite, whats your take on this?
Is there any other framework, from where I can easily separate the caching component and use it with less learning curve?
Thanks.
Memcached comes to mind, as a really lightweight and efficient solution.
But you can also cache content in simple files. The filesystem is usually fast, and handles read/write locks without problems. And there's no need for any fancy library to handle that...the functions filemtime, file_put_contents and file_get_contents are all you need.
Check if the cache has been written more than N secondes ago with filemtime()
If it's too old, generate the content and write it with file_put_contents()
If not, simply load it wit file_get_contents()
Edit: I'll add a link to that post I made a few months ago : Best Solution for caching. It's not completely on topic, but it might help you in your researchs :)
Session based caching is probably not a good idea. It's only appropriate in limited cases where you need to cache a specific result per-user (not for everyone).
APC is pretty widely deployed, so if you have access to it, I'd look into Zend_Cache with APC on the back end. If APC is not available, Zend_Cache with flat files on the back-end should be sufficient for small/medium type sites
JPCache is a decent lightweight caching library.
You can look at the caching in CakePHP. I doubt that you will be able to separate it from the frame work but it should help you to understand how to cache dynamic content.
Most of the php caching libraries are implemented using the output buffer control functions.
You can implement your own very simple caching the same way.
<?php
function callback($buffer)
{
// Code to store output in cache
}
if (/* Test cached copy is still valid */) {
/* Output cached copy to browser */
exit(0);
}
ob_start("callback");
?>
<html>...</html>
<?php
ob_end_flush();
?>
You can omit the ob_end_flush() if you like, since it will be triggered automatically at the end of the output.
The interesting thing to note is that this structure could be wrapped around smaller units than the page. For example you mention caching just the navigation menu. You'd need a bit more logic around the block to be cached, but the principle is the same.

Cached XML for PHP?

I have a custom PHP framework and am discovering the joys of storing configuration settings as human-readable, simple XML.
I am very tempted to introduce XML (really simple XML, like this)
<mainmenu>
<button action="back" label="Back"/>
<button action="new" label="New item"/>
</mainmenu>
in a number of places in the framework. It is just so much more fun to read, maintain and extend. Until now, configuration was done using
name=value;name=value;name=value;
pairs, arrays, collections of variables, and so on.
However, I fear for the framework's performance. While the individual XML operations amount to next to nothing in profiling, they are of course more expensive than a simple explode() on a big string. I feel uncomfortable with simpleXML (my library of choice) doing a full well-formedness check on a dozen of XML chunks every time the user loads a page.
I could cache the XML objects myself but would like to 1.) avoid the hassle and 2.) not add to the framework's complexity.
I am therefore looking for an unobtrusive XML "caching" solution. The perfect thing would be a function that I can give a file path, and returns a parsed simpleXML object. Internally, it maintains a cache somewhere with serialized simpleXML objects or whatever.
Does anybody know tools to do this?
No extension-dependent solutions please, as the framework is designed to run in shared webhosting environments (which is the reason why performance matters).
You could transform the XML into your former format once and then check for modification time of the XML and the text file via filemtime. If XML is newer than the textfile, then do the transformation again.
This would increase complexity in a way, but on the other hand would help you reuse your existing code. Of course, caching is another viable option.
Hi there i'm useing the Zend Cache for those kind of things and must say it's very fast got one page from about 2secs to 0.5secs down.

How important is to not load unused scripts in PHP?

On a site where 90% of the pages use the same libraries, should you just load the libraries all the time or only load them when needed? The other pages would be ajax or simple pages that don't have any real functionality.
Also, should you only load the code when needed? If part way down a page you need a library, should you load it then or just load it at the top. Maybe it's possible it may never get there before of an error or wrong data. (Loading at the top makes it somewhat easier to understand, but may result in extra code not needed.)
I'm also wondering if I should make the libraries more specific so I'm not say loading the code to edit at the same time as viewing?
Basically, how much should I worry about loading code or not loading code?
I would always try to give a file, class, and method a single responsibility. Because of that, separating the displaying from the editing code could be a good idea in either case.
As for loading libraries, I believe that the performance loss of including non required libraries could be quite irrelevant in a lot of cases. However, include, require, include_once, and require_once are relatively slow as they (obviously) access the file system. If the libraries you do not use on each occasion are quite big and usually include a lot of different files themselves, removing unnecessary includes could help reducing the time spent there. Nonetheless, this cost could also be reduced drastically by using an efficient caching system.
Given you are on PHP5 and your libraries are nicely split up into classes, you could leverage PHP's auto loading functionality which includes required classes as the PHP script needs them. That would pretty effectively avoid a lot of non used code to be included.
Finally, if you make any of those changes which could affect your website's performance, run some benchmarks and profile the gain or loss in performance. That way, you do not run into the risk of doing some possibly cool optimization which just costs too much time to fully implement or even degrades performance.
Bear in mind that each script that is loaded gets parsed as PHP is compiled at run-time, so there is a penalty for loading unneeded scripts. This might be minor depending on your application structure and requirements, but there are cases which this is not the case.
There are two things you can do to negate such concerns:
Use __autoload to load your scripts as they are needed. This removes the need to maintain a long 'require' list of scripts and only loads what's needed for the current run.
Use APC as a byte-code cache to lower the cost of loading scripts. APC caches scripts in their compiled state and will do wonders for your application performance.
+1 Vote for the autoload technique.
The additional benefit of using autoload is it eliminates some of the potential for abusive code. If something fails, pop a back-trace and an "included_files" list and you get a list of places where the problem could come from.
This means you have less files to hunt through if somebody hides malicious code at the end of one of them, or designs something fruity.
I worked on a codebase once ( not mine ) where the presence of certain tokens in the URL caused unexpected behaviour, and because the code was horrible, it was a nightmare tracking the origin of the problem burried in the fact in one of the 200 included files one of them was rewriting the entire request and then calling "die"
The question was "how important".
Answer: it is NOT important at all. If you don't have a dozen servers running this app already, then this is probably early optimization, and as we all know, early optimization is the root of all evil.
In other words: don't even worry about it. There are a lot of other things to optimize speed before you should even consider this.

PHP performance considerations?

I'm building a PHP site, but for now the only PHP I'm using is a half-dozen or so includes on certain pages. (I will probably use some database queries eventually.)
Are simple include() statements a concern for speed or scaling, as opposed to static HTML? What kinds of things tend to cause a site to bog down?
Certainly include() is slower than static pages. However, with modern systems you're not likely to see this as a bottleneck for a long time - if ever. The benefits of using includes to keep common parts of your site up to date outweigh the tiny performance hit, in my opinion (having different navigation on one page because you forgot to update it leads to a bad user experience, and thus bad feelings about your site/company/whatever).
Using caching will really not help either - caching code is going to be slower than just an include(). The only time caching will benefit you is if you're doing computationally-intensive calculations (very rare, on web pages), or grabbing data from a database.
Sounds like you are participating in a bit of premature optimization. If the application is not built, while performance concerns are good to be aware of, your primary concern should be getting the app written.
Includes are a fact of life. Don't worry about number, worry about keeping your code well organized (PEAR folder structure is a lovely thing, if you don't know what I'm talking about look at the structure of the Zend Framework class files).
Focus on getting the application written with a reasonable amount of abstraction. Group all of your DB calls into a class (or classes) so that you minimize code duplication (KISS principles and all) and when it comes time to refactor and optimize your queries they are centrally located. Also get started on some unit testing to prevent regression.
Once the application is up and running, don't ask us what is faster or better since it depends on each application what your bottleneck will be. It may turn out that even though you have lots of includes, your loops are eating up your time, or whatever. Use XDebug and profile your code once its up and running. Look for the segments of code that are eating up a disproportionate amount of time then refactor. If you focus too much now on the performance hit between include and include_once you'll end up chasing a ghost when those curl requests running in sync are eating your breakfast.
Though in the mean time, the best suggestions are look through the php.net manual and make sure if there's a built in function doing something you are trying to do, use it! PHP's C-based extensions will always be faster than any PHP code that you could write, and you'll be surprised how much of what you are trying to do is done already.
But again, I cannot stress this enough, premature optimization is BAD!!! Just get your application up off the ground with good levels of abstraction, profile it, then fix what actually is eating up your time rather than fixing what you think might eat up your time.
Strictly speaking, straight HTML will always serve faster than a server-side approach since the server doesn't have to do any interpretation of the code.
To answer the bigger question, there are a number of things that will cause your site to bog down; there's just no specific threshold for when your code is causing the problem vs. PHP. (keep in mind that many of Yahoo's sites are PHP-driven, so don't think that PHP can't scale).
One thing I've noticed is that the PHP-driven sites that are the slowest are the ones that include more than is necessary to display a specific page. OSCommerce (oscommerce.com) is one of the most popular PHP-driven shopping carts. It has a bad habit, however, of including all of their core functionality (just in case it's needed) on every single page. So even if you don't need to display an 'info box', the function is loaded.
On the other hand, there are many PHP frameworks out there (such as CakePHP, Symfony, and CodeIgniter) that take a 'load it as you need it' approach.
I would advise the following:
Don't include more functionality than you need for a specific page
Keep base functions separate (use an MVC approach when possible)
Use require_once instead of include if you think you'll have nested includes (e.g. page A includes file B which includes file C). This will avoid including the same file more than once. It will also stop the process if a file can't be found; thus helping your troubleshooting process ;)
Cache static pages as HTML if possible - to avoid having to reparse when things don't change
Nah includes are fine, nothing to worry about there.
You might want to think about tweaking your caching headers a bit at some point, but unless you're getting significant hits it should be no problem. Assuming this is all static data, you could even consider converting the whole site to static HTML (easiest way: write a script that grabs every page via the webserver and dumps it out in a matching dir structure)
Most web applications are limited by the speed of their database (or whatever their external storage is, but 9/10 times that'll be a database), the application code is rarely cause for concern, and it doesn't sound like you're doing anything you need to worry about yet.
Before you make any long-lasting decisions about how to structure the code for your site, I would recommend that you do some reading on the Model-View-Controller design pattern. While there are others this one appears to be gaining a great deal of ground in web development circles and certainly will be around for a while. You might want to take a look at some of the other design patterns suggested by Martin Fowler in his Patterns of Enterprise Application Architecture before making any final decisions about what sort of design will best fit your needs.
Depending on the size and scope of your project, you may want to go with a ready-made framework for PHP like Zend Framework or PHP On Trax or you may decide to build your own solution.
Specifically regarding the rendering of HTML content I would strongly recommend that you use some form of templating in order to keep your business logic separate from your display logic. I've found that this one simple rule in my development has saved me hours of work when one or the other needed to be changed. I've used http://www.smarty.net/">Smarty and I know that most of the frameworks out there either have a template system of their own or provide a plug-in architecture that allows you to use your own preferred method. As you look at possible solutions, I would recommend that you look for one that is capable of creating cached versions.
Lastly, if you're concerned about speed on the back-end then I would highly recommend that you look at ways to minimize your calls your back-end data store (whether it be a database or just system files). Try to avoid loading and rendering too much content (say a large report stored in a table that contains hundreds of records) all at once. If possible look for ways to make the user interface load smaller bits of data at a time.
And if you're specifically concerned about the actual load time of your html content and its CSS, Javascript or other dependencies I would recommend that you review these suggestions from the guys at Yahoo!.
To add on what JayTee mentioned - loading functionality when you need it. If you're not using any of the frameworks that do this automatically, you might want to look into the __autoload() functionality that was introduced in PHP5 - basically, your own logic can be invoked when you instantiate a particular class if it's not already loaded. This gives you a chance to include() a file that defines that class on-demand.
The biggest thing you can do to speed up your application is to use an Opcode cache, like APC. There's an excellent list and description available on Wikipedia.
As far as simple includes are concerned, be careful not to include too many files on each request as the disk I/O can cause your application not to scale well. A few dozen includes should be fine, but it's generally a good idea to package your most commonly included files into a single script so you only have one include. The cost in memory of having a few classes here and there you don't need loaded will be better than the cost of disk I/O for including hundreds of smaller files.

Categories