Cached XML for PHP?

Cached XML for PHP? - php

I have a custom PHP framework and am discovering the joys of storing configuration settings as human-readable, simple XML.
I am very tempted to introduce XML (really simple XML, like this)
<mainmenu>
<button action="back" label="Back"/>
<button action="new" label="New item"/>
</mainmenu>
in a number of places in the framework. It is just so much more fun to read, maintain and extend. Until now, configuration was done using
name=value;name=value;name=value;
pairs, arrays, collections of variables, and so on.
However, I fear for the framework's performance. While the individual XML operations amount to next to nothing in profiling, they are of course more expensive than a simple explode() on a big string. I feel uncomfortable with simpleXML (my library of choice) doing a full well-formedness check on a dozen of XML chunks every time the user loads a page.
I could cache the XML objects myself but would like to 1.) avoid the hassle and 2.) not add to the framework's complexity.
I am therefore looking for an unobtrusive XML "caching" solution. The perfect thing would be a function that I can give a file path, and returns a parsed simpleXML object. Internally, it maintains a cache somewhere with serialized simpleXML objects or whatever.
Does anybody know tools to do this?
No extension-dependent solutions please, as the framework is designed to run in shared webhosting environments (which is the reason why performance matters).

You could transform the XML into your former format once and then check for modification time of the XML and the text file via filemtime. If XML is newer than the textfile, then do the transformation again.
This would increase complexity in a way, but on the other hand would help you reuse your existing code. Of course, caching is another viable option.

Hi there i'm useing the Zend Cache for those kind of things and must say it's very fast got one page from about 2secs to 0.5secs down.

Related

File based caching under PHP

I've been using http://code.google.com/p/phpbrowscap/ for a project, and it usually works nice. But a few times it's cache, which is plain php-files (see http://code.google.com/p/phpbrowscap/source/browse/trunk/browscap/Browscap.php#372 et. al.), has been "zeroed", i.e. the whole cache file has become large blob of NULLs.
Instead of trying to find out why the files become NULL, I though perhaps it might be better to change the caching strategy to something more resilient.
So I do wonder if you has any good ideas what would be a good solution; I've been looking at http://www.jongales.com/blog/2009/02/18/simple-file-based-php-cache-class/ and http://www.phpclasses.org/package/313-PHP-Cache-arbitrary-data-in-files-.html and I also though of just saving an serialized array to the file instead of pure php as it's been doing now; But I'm uncertain what approach I should target here.
I'm grateful for any insight into this area of technology, as I know it's complex from a performance point of view.

What you're describing appears to be a bug in phpbrowscap. You may check what's causing it.
Anyway, phpbrowscap's strategy is a relatively sensible one because by writing the cache into a PHP file, it can also take advantage of opcode caches.
However, I think the best strategy would be to serialize the object and put it the result in a memory cache like APC. Another possible strategy would be to implement the functionality in an extension, which would always be in memory.

Is object-oriented PHP slow?

I used to use procedural-style PHP. Later, I used to create some classes. Later, I learned Zend Framework and started to program in OOP style. Now my programs are based on my own framework (with elements of cms, but without any design in framework), which is built on the top of the Zend Framework.
Now it consists of lots classes. But the more I program, more I'm afraid. I'm afraid that my program will be slow because of them I'm afraid to add every another one class which can help me to develop but can slow the application.
All I know is that including lots of files slows application (using eAccelerator + gathering all the code in one file can speed up application 20 times!), but I have no idea if creating new classes and objects slows PHP by itself.
Does anyone have any information about it?

This bugs me. See...procedural code is not always spaghetti code, yet the OOP fanboys always presume that it is. I've written several procedural based web apps as well as an IRC services daemon in PHP. Amazingly, it seems to outperform most of the other ones that are out there and editing it is super easy. One of my friends who generally does OOP took a look at it and said "no code has the right to be this clean"
Conversely, I wrote my own PHP framework (out of boredom) and it was done in a purely OOP manner.
A good programmer can write great procedural code without the overhead classes bring. A bad programmer who uses OOP will always write crappy OOP code that slows things down.
There is no one right answer to which is better for PHP, but rather which is better for the exact scenario.

Here's good article discussing the issue. I also have seen some anecdotal bench-marks that will put OOP PHP overhead at 10-15%
Personally I think OOP is better choice since at the end it may perform better just because it probably was better designed and thought through. Procedural code tends to be messy and hard to maintain. So at the end - it has to be how critical is performance difference for your app vs. ability to maintain, extend and simply comprehend

The most important thing to remember is, design first, optimize later. A better design, which is more maintainable, is better than spaghetti code. Otherwise, you might as well write your web app in assembler. After you're done, you can profile (instead of guess), and optimize what seems slowest.

Yes, every include makes your program slower, but there is more to it than that.
If you decompose your program, over many files, there is a point where you're including/parsing/executing the least amount of code, vs the overhead of including all those files.
Furthermore, having lots of files with little code ain't so bad, because, as you said, using things like eAccelerator, or APC, is a trivial way to get a crap ton of performance back. At the same time you get, if you believe in them, all the wonderful benefits of having and Object Oriented code base.
Also, slow on a per request basis != not scalable.
Updated
As requested, PHP is still faster at straight up array manipulation than it is classes. I vaguely remember the doctrine ORM project, and someone comparing hydration of arrays versus objects, and the arrays came out faster. It's not an order of magnitude, it is noticable, however -- this is in french, but the code and results are completely understandable.. Just a note, that doctrine uses magic methods __get, and __set a lot, and these are also slower than an explicit variable access, part of doctrine's object hydration slowness could be attributed to that, so I would treat it as a worst case scenario. Lastly, even if you're using arrays, if you have to do a lot of moving around in memory, or tonnes of tests, such as isset, or functions like 'in_array' (it's order N), you'll screw the performance benefits. Also remember that objects are just arrays underneath, the interpreter just treats them as a special. I would, personally, favour better code than a small performance increase, you'll get more benefit from having smarter algorithms.

If your project contains many files and due to the nature of PHP's file access checking and restrictions, I'd recommend to turn on realpath_cache, bump up the configuration settings to reasonable numbers, and turn off open_basedir and safe_mode. Ensure to use PHP-FPM or SuExec to run the php process under a user id which is restricted to the document root to get back the security one usually gains from open_basedir and/or safe_mode.
Here are a few pointers why this is a performance gain:
https://bugs.php.net/bug.php?id=46965
http://nirlevy.blogspot.de/2009/01/slow-lstat-slow-php-slow-drupal.html
Also consider my comment on the answer from #Ólafur:
I found especially auto-loading to be the biggest slow down. PHP is extremely slow for directory lookup and file open access, the more PHP function you use during a custom auto-loader, the bigger the slow-down. You can help it a bit with turning off safe-mode (deprecated anyways) or even open-basedir (but I would not do that), but the biggest improvement comes from not using auto-loading and simply use "require_once" with complete fs pathes to require all dependencies per php file you use.

Using large frameworks for web apps that actually do not require so large number of classes for everything is probably the worst problem that many are not aware of. Strip it down at least not to include every bit of code, keep just what you need and throw the rest.

If you're using include_once() then you are causing an unnecessary slowdown, regardless of OOP design or not.
OOP will add an overhead to your code but I will bet that you will never notice it.

You may reconsider to rethink your classes structure and how do you implement them. If you said that OOP is slower you may have to redesign your classes and how do you implement them. A class is just a template of an object, any bad designed method affects all the objects of that class.
Use inheritance and polimorfism the most you can, this will effectively reduce the amount of behaviors and independent methods your classes need, but first off all you need to create a good inheritance map, abstracting your first or mother classes as much as you can.
It is not a problem about how many classes do you have, the problem is how many methods, properties or fields they have and how well are those methods structured. Inheritance reduces the amount of methods to design drammatically and the amount of code to be compiled too.

As several other people have pointed out, there is a mild overhead to OO PHP, but you can offset it by focusing your optimization effort on the core classes that your various other classes derive from. This is why C++ is becoming increasingly popular in the world of high-performance computing, traditionally the realm of C and Fortran.
Personally, I've never seen a PHP server that was CPU-constrained. Check your RAM use (you can optimize the core classes for this as well) and make sure you're not making unnecessary database calls, which are orders of magnitude more expensive than any extra CPU work you're doing.

If you design a huge OOP object hog, that does everything rather than doing functional decomposition to various classes, you will obviously fill up the memory with useless ballast code. Also, with a slow framework you will not make a simply hello World any fast. I noticed it is a kind trend (bad habit) that for one single facebook icon, people include a hole awesome font library and then next there is a search icon with fontello included. Each time they accomplish something unusual, they connect an entire framework. If you want to create a fast loading oop app use one framework only like zephir-phalcon or whatever you fancy and stick to it.

There are ways to limit the penalty from the include_once entries, and that's by having functions declared in the 'include_once' file that themselves have their code content in an 'include' statement. This will load your library of code, but only those functions actually being used will load code as it is needed. You take a second file system hit for the included code, but memory usages drop to practically nothing for the library itself, and only the code used by your program gets loaded. The hit from the second file system access can be mitigated by caching. When dealing with a large project of procedural based PHP, this provides low memory usage and fast processing. DO NOT do this with classes. This would be for a production instance, a development server will show all the penalty of hits since you don't want caching turned on.

Embedding PHP in XML

I am trying to execute PHP code in XML Below is the code is there better way of executing as we are using eval and far as I know it degrade the performance 80-85% as it is supposed to be used by browser.
function processing_instruction($inParser, $inTarget, $inCode) {
if ($inTarget === 'php') {
eval($inCode);
}
}

"If eval() is the answer, you're almost certainly asking the wrong question."
-Rasmus Lerdorf, BDFL of PHP
Is the code you are running so varied that it can't be decided upon as a series of files to be included on demand or a XML-RPC style function call? There is generally very little to gain by allowing arbitrary code execution, and that's before you consider the staggering amount you stand to lose.
If there is a finite, predictable number of things these files could possibly do, I would Strongly recommend taking the time to create a semi-generic XML-RPC interface (or at least a series of files that you could specify in the XML file and then include on-the-fly, perhaps after setting some environment variables, depending on your coding style) and using that.
The number of risks you take when creating a portal to eval() are nigh innumerable.
I had considered providing some examples here, but XML-RPC ought to be a well enough known concept that my doing so is altogether unnecessary.

eval() sadly, is actually the only way to execute it.
UNLESS...
If the code in the XML gets executed more than once. for instance you have a set of 6 Xml files that contain code, kind of like a plugin system.
If that's the case, you can read the code out of the xml, write it out to a .php file, then include that. That would be slower for sure, but if you do that you only have to do it once per XML file. After that you can just run the pure php files.
And, yes like everyone else said, you can't trust untrustworthy code (duh)

For understand the use of "Embedding PHP in XML" see http://code.google.com/p/smallest-php-xml-xsl-framework/
It is a full application with XML+PHP (PHP generating XML) and XSLT as template system. In a MVC architecture the XML+PHP do the "MVC-Model processing" and XSLT the MVC-View.

XML as a Data Layer for a PHP application

I was wondering how i should go about writing an XML data layer for a fairly simple php web site. The reasons for this are:
db server is not available.
Simple data schema that can be expressed in xml.
I like the idea of having a self contained app, without server dependencies.
I would possibly want to abstract it to a small framework for reuse in other projects.
The schema resembles a simple book catalog with a few lookup tables plus i18n. So, it is quite simple to express.
The size of the main xml file is in the range of 100kb to 15mb. But it could grow at some point to ~100mb.
I am actually considering extending my model classes to handle xml data.
Currently I fetch data with a combination of XMLReader and SimpleXml, like this:
public function find($xpath){
while($this->xml_reader->read()){
if($this->xml_reader->nodeType===XMLREADER::ELEMENT &&
$this->xml_reader->localName == 'book' ){
$node = $this->xml_reader->expand();
$dom = new DOMDocument();
$n = $dom->importNode($node, true);
$dom->appendChild($n);
$sx = simplexml_import_dom($n);
// xpath returns an array
$res = $sx->xpath($xpath);
if(isset($res[0]) && $res[0]){
$this->results[] = $res;
}
}
return $this->results;
}
So, instead of loading the whole xml file in memory, I create a SimpleXml object for each section and run an xpath query on that object. The function returns an array of SimpleXml objects. For conservative search I would probably break on first found item.
The questions i have to ask are:
Would you consider this as a viable solution, even for a medium to large data store?
Are there any considerations/patterns to keep in mind, when handling XML in PHP?
Does the above code scale for large files (100mb)?
Can inserts and updates in large xml files be handled in a low overhead manner?
Would you suggest an alternative data format as a better option?

If you have a saw and you need to
pound in a nail, don't use the
saw. Get a hammer. (folk saying)
In other words, if you want a data store, use a data-base, not a markup language.
PHP has good support for various database systems via PDO; for small data sets, you can use SQLite, which doesn't need a server (it is stored in a normal file). Later, should you need to switch to a full-featured database, it is quite simple.
To answer your questions:
Viable solution - no, definitely not. XML has its purposes, but simulating a database is not one, not even for a small data set.
With XML, you're shuffling strings around, all the time. That might be just bearable on read, but is a real nightmare on write (slow to parse,large memory footprint, etc.). While you could subvert XML to work as a data store, it is simply the wrong tool for the job.
No (everything will take forever, if you don't run out of memory before that).
No, for many reasons (locking, re-writing the whole XML-string/file, not to mention memory again).
5a. SQLite was designed with very small and simple databases in mind - simple, no server dependencies (the db is contained in one file). As #Robert Gould points out in a comment, it doesn't scale for larger applications, but then
5b. for a medium to large data store, consider a relational database (and it is usually easier to switch databases than to switch from XML to a database).

No, it won't scale. It's not feasible.
You'd be better off using e.g. SQLite. You don't need a server, it's bundled in with PHP by default and stores data in regular files.

I would go with SQLite instead, which is perfect for small websites and x-copy style deployments.
XML-based data storage won't scale well.
"SQLite is an ACID-compliant embedded relational database management system contained in a relatively small (~225 kB) C programming library. The source code for SQLite is in the public domain.
Unlike client-server database management systems, the SQLite engine is not a standalone process with which the program communicates. Instead, the SQLite library is linked in and thus becomes an integral part of the program. It can also be called dynamically. The program uses SQLite's functionality through simple function calls, which reduces latency in database access as function calls within a single process are more efficient than inter-process communication. The entire database (definitions, tables, indices, and the data itself) is stored as a single cross-platform file on a host machine. This simple design is achieved by locking the entire database file at the beginning of a transaction."

Everyone loves to throw dirt on XML files, but in reality it works, I've seen large applications use them, and I know of an MMO that uses simple flatfiles for storage and it works fine( by the way the MMO is among the top 5 worldwide, so it's not just a toy). However my job right now is creating a better and more savy persistence layer based on SQL, and if your site will be big SQL is the best solution but XML is capable of Massive (MMO) scalability if done well.
But a caveat is migration from XML to SQL is rough if the mapping isn't easy.

PHP performance considerations?

I'm building a PHP site, but for now the only PHP I'm using is a half-dozen or so includes on certain pages. (I will probably use some database queries eventually.)
Are simple include() statements a concern for speed or scaling, as opposed to static HTML? What kinds of things tend to cause a site to bog down?

Certainly include() is slower than static pages. However, with modern systems you're not likely to see this as a bottleneck for a long time - if ever. The benefits of using includes to keep common parts of your site up to date outweigh the tiny performance hit, in my opinion (having different navigation on one page because you forgot to update it leads to a bad user experience, and thus bad feelings about your site/company/whatever).
Using caching will really not help either - caching code is going to be slower than just an include(). The only time caching will benefit you is if you're doing computationally-intensive calculations (very rare, on web pages), or grabbing data from a database.

Sounds like you are participating in a bit of premature optimization. If the application is not built, while performance concerns are good to be aware of, your primary concern should be getting the app written.
Includes are a fact of life. Don't worry about number, worry about keeping your code well organized (PEAR folder structure is a lovely thing, if you don't know what I'm talking about look at the structure of the Zend Framework class files).
Focus on getting the application written with a reasonable amount of abstraction. Group all of your DB calls into a class (or classes) so that you minimize code duplication (KISS principles and all) and when it comes time to refactor and optimize your queries they are centrally located. Also get started on some unit testing to prevent regression.
Once the application is up and running, don't ask us what is faster or better since it depends on each application what your bottleneck will be. It may turn out that even though you have lots of includes, your loops are eating up your time, or whatever. Use XDebug and profile your code once its up and running. Look for the segments of code that are eating up a disproportionate amount of time then refactor. If you focus too much now on the performance hit between include and include_once you'll end up chasing a ghost when those curl requests running in sync are eating your breakfast.
Though in the mean time, the best suggestions are look through the php.net manual and make sure if there's a built in function doing something you are trying to do, use it! PHP's C-based extensions will always be faster than any PHP code that you could write, and you'll be surprised how much of what you are trying to do is done already.
But again, I cannot stress this enough, premature optimization is BAD!!! Just get your application up off the ground with good levels of abstraction, profile it, then fix what actually is eating up your time rather than fixing what you think might eat up your time.

Strictly speaking, straight HTML will always serve faster than a server-side approach since the server doesn't have to do any interpretation of the code.
To answer the bigger question, there are a number of things that will cause your site to bog down; there's just no specific threshold for when your code is causing the problem vs. PHP. (keep in mind that many of Yahoo's sites are PHP-driven, so don't think that PHP can't scale).
One thing I've noticed is that the PHP-driven sites that are the slowest are the ones that include more than is necessary to display a specific page. OSCommerce (oscommerce.com) is one of the most popular PHP-driven shopping carts. It has a bad habit, however, of including all of their core functionality (just in case it's needed) on every single page. So even if you don't need to display an 'info box', the function is loaded.
On the other hand, there are many PHP frameworks out there (such as CakePHP, Symfony, and CodeIgniter) that take a 'load it as you need it' approach.
I would advise the following:
Don't include more functionality than you need for a specific page
Keep base functions separate (use an MVC approach when possible)
Use require_once instead of include if you think you'll have nested includes (e.g. page A includes file B which includes file C). This will avoid including the same file more than once. It will also stop the process if a file can't be found; thus helping your troubleshooting process ;)
Cache static pages as HTML if possible - to avoid having to reparse when things don't change

Nah includes are fine, nothing to worry about there.
You might want to think about tweaking your caching headers a bit at some point, but unless you're getting significant hits it should be no problem. Assuming this is all static data, you could even consider converting the whole site to static HTML (easiest way: write a script that grabs every page via the webserver and dumps it out in a matching dir structure)
Most web applications are limited by the speed of their database (or whatever their external storage is, but 9/10 times that'll be a database), the application code is rarely cause for concern, and it doesn't sound like you're doing anything you need to worry about yet.

Before you make any long-lasting decisions about how to structure the code for your site, I would recommend that you do some reading on the Model-View-Controller design pattern. While there are others this one appears to be gaining a great deal of ground in web development circles and certainly will be around for a while. You might want to take a look at some of the other design patterns suggested by Martin Fowler in his Patterns of Enterprise Application Architecture before making any final decisions about what sort of design will best fit your needs.
Depending on the size and scope of your project, you may want to go with a ready-made framework for PHP like Zend Framework or PHP On Trax or you may decide to build your own solution.
Specifically regarding the rendering of HTML content I would strongly recommend that you use some form of templating in order to keep your business logic separate from your display logic. I've found that this one simple rule in my development has saved me hours of work when one or the other needed to be changed. I've used http://www.smarty.net/">Smarty and I know that most of the frameworks out there either have a template system of their own or provide a plug-in architecture that allows you to use your own preferred method. As you look at possible solutions, I would recommend that you look for one that is capable of creating cached versions.
Lastly, if you're concerned about speed on the back-end then I would highly recommend that you look at ways to minimize your calls your back-end data store (whether it be a database or just system files). Try to avoid loading and rendering too much content (say a large report stored in a table that contains hundreds of records) all at once. If possible look for ways to make the user interface load smaller bits of data at a time.
And if you're specifically concerned about the actual load time of your html content and its CSS, Javascript or other dependencies I would recommend that you review these suggestions from the guys at Yahoo!.

To add on what JayTee mentioned - loading functionality when you need it. If you're not using any of the frameworks that do this automatically, you might want to look into the __autoload() functionality that was introduced in PHP5 - basically, your own logic can be invoked when you instantiate a particular class if it's not already loaded. This gives you a chance to include() a file that defines that class on-demand.

The biggest thing you can do to speed up your application is to use an Opcode cache, like APC. There's an excellent list and description available on Wikipedia.
As far as simple includes are concerned, be careful not to include too many files on each request as the disk I/O can cause your application not to scale well. A few dozen includes should be fine, but it's generally a good idea to package your most commonly included files into a single script so you only have one include. The cost in memory of having a few classes here and there you don't need loaded will be better than the cost of disk I/O for including hundreds of smaller files.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.