Reduce number of included files - php

My application based on Zend Framework and Doctrine includes > 300 files on each request. They are mostly the same files.
This is a quite huge overhead. Partially solved by Zend_Cache (and Memcache), but not all the pages may be cached.
How to reduce this number? How to speed up?
Doctrine has an option to compile the needed files which seems quite rational for production server and final version of the app.
My plan is to compile other libraries too (I have already stripped all require_once's).
Are there any tools for this task? Maybe some cache drivers do it automatically? How to set them up?

The overhead of php file inclusions can usually be countered with an opcode cache such as APC, an extension available through pecl. Opcode caches generally work by caching the compiled bytecode so that the overhead of reading and parsing the source is only incurred on the first request. This will greatly negate the need or benefit of any source compilation on your php files.

Best option is to use APC or Zend_Accelerator. But still you can make these "compilation" scripts that merge classes together into one file. That lowers the required IO to minimum. Unfortunately, you also need to rewrite the autoloading process so that it looks into appropriate file. You can usually condense common classes together (Zend_Form + Elements + Decorators, frequently used validators, Request + Response + Router + Controller, Zend_Db + adapters + Zend_Db_Select, etc.). Mainly the classes always used on each request can be easily condensed and included manually in one file. Best way is to add debug call, that save all included files (http://www.php.net/get_included_files) into DB and then:
SELECT * FROM files GROUP BY filename WHERE COUNT(filename) = $numOfRequests
All the files in the result can be safely merged into a single file and included before bootstraping :)

Related

(How) does PHP cache script and custom INI-files?

Currently I'm storing the configuration for my PHP scripts in variables and constants within another PHP script (e.g. config.php).
So each time a script is called, it includes the configuration script to gain access to the values of the variables/constants.
Since INI-files are easier to parse by other scripts, I thought about storing values for my configuration in such a file an read it using parse_ini_file().
In my notion PHP keeps script-files in memory, so including a script-file does (usually) not cause IO (Or does Zend do the caching? Or are the sources not cached at all?).
How is it with reading custom INI-files. I know that for .user.ini there is caching (see user_ini.cache_ttl), but does PHP also cache custom INI-files?, or does a call to parse_ini_file() always cause IO?
Summary
The time required to load configuration directives (which is not the same as the time needed by the app to perform those directives) is usually negligible - below one millisecond for most "reasonably sized" configurations. So don't worry - INI, PHP, or JSON are, performance wise, all equally good choices. Even if PHP were ten times faster than JSON, that would be like loading in 0.001s instead of 0.01s; very few will ever notice.
That said, there are considerations when deciding where to store config data.
.ini vs .php config storage
Time to load: mostly identical unless caching is involved (see below), and as I said, not really important.
ease of use: .ini is easier to read and modify for a human. This may be an advantage, or a disadvantage (if the latter, think integrity check).
data format: PHP can store more structured data than .ini files, unless really complicated workarounds are used. But consider the possibility of using JSON instead of INI.
More structured data means that you can more easily create a "super configuration" PHP or JSON holding the equivalent of several INI files, while keeping information well isolated.
automatic redundancy control: PHP file inclusion can be streamlined with require_once.
user modifications: there are visual INI and JSON editors that can allow a user to modify a INI or JSON file while keeping it, at least, syntactically valid. Not so for PHP (you would need to roll your own).
Caching
The PHP core does not do caching. Period. That said, you'll never use the PHP core alone: it will be loaded as a (fast)CGI, an Apache module, et cetera. Also you might not use a "barebones" installation but you could have (chances are that you will have) several modules installed.
Both the "loader" part and the "module" part might do caching; and their both doing this could lead to unnecessary duplications or conflicts, so it is worth checking this out:
the file (but this does not change between INI, JSON and PHP files) will be cached into the filesystem I/O subsystem layer and, unless memory is really at a premium, will be loaded from there (on a related note, this is one of the reasons why not all filesystems are equally good for all websites).
if you need the configuration in several files, and use require_once in all of them, the configuration will be loaded once only, as soon as it is needed. This is not caching, but it is a performance improvement nonetheless.
several modules exist (Zend, opcache, APC, ...) that will cache all PHP files, configuration included. They will not cache INI files, though.
the caching done by modules (e.g. opcache) can (a) ignore further modifications to the file system, which means that upon modifying a PHP file, you'll need to somehow reload or invalidate the cache; how to do this changes from module to module; (b) implement shortcuts that might conflict with either the file system data management or its file structure (famously, opcache can ignore the path part of a file, allowing for much faster performances unless you have two files with the same name in different directories, when it risks loading one instead of the other).
Performance enhancement: cache digested data instead of config directives
Quite often it will be the case that depending on some config directive, you will have to perform one of several not trivial operations. Then you will use the results for the actual output.
What slows down the workflow in this case is not reading whether, say, "config.layout" is "VERTICAL" or "HORIZONTAL", but actually generating the layout (or whatever else). In this case you might reap huge benefits by storing the generated object somewhere:
serialized inside a file (e.g. cache/config.layout.vertical.html.gz). You will probably need to deploy some kind of 'stale data check' if the layout changes, or some kind of cache invalidation procedure. (For layouts specifically, you could check out Twig, which also does parameterized template caching).
inside a keystore, such as Redis.
in a RDBMS database such as MySQL (even if that's overkill - you'd use it as a keystore, basically).
faster NoSQL alternatives such as MongoDB.
Additional options
You will probably want to read about client caching and headers, and possibly explore whatever options your hosting offers (load balancers, HTTP caches such as Varnish, etc.).
parse_ini_file() uses standard operations to convert the file into an array.

Opcache with Joomla! and Resolved Name

I have been researching the best use of OPcache with Joomla.
This github page, The Zend Engine and OPcode caching, is the best explanation of how OPcache works that I've seen and was trying to get answers to a couple of points here.
Resolved filename:
What does "Resolved filename" mean?
What does Opache use as the "Resolved filename" since I use Joomla! CMS and I know that it always call the index.php but passes different parameters is the resolved file name index.php?[querystring]
Timestamp Used:
How does "timestamp" apply with a CMS/Framework system such as Joomla! because since the index.php file never changes it seems to me that the cache would never refresh.
Joomla! CMS Caching system:
Does it makes sense to use the cache in Joomla? It writes the pages it builds out to the file system in the folder named "cache" as php files and those php pages will be called instead of Joomla rebuilding the pages every time
Resolved Filenames
The PHP equivalent of a resolved filename is obtained by the realpath() function. This converts all symbolic links, any references to '/./', '/../' and extra '/' characters in the input path, against the current working directory in the case of a relative filename returning the canonicalized absolute pathname. In other words the resolved filename is a complete filename mapping onto the underlying filesystem. It's not necessarily unique, because of hard links, etc..
OPcache uses the resolved filename as the index into its internal compiled script database for two reasons:
Having relative filenames and embedded symlinks opens all sorts of security and simple application programming beartraps that can cause bugs or enable exploitable vulnerabilities. By using the resolved filename for each script as its key, OPcache avoids these issues.
This also can have material performance benefits with multi installations of packages like phpBB, WordPress, MediaWiki (and I assume Joomla) which typically use a hierarchical PHP directory structure. You can symlink many versions of a common subdirectory onto a shared library folder, and this way separate logical instances of a package can share the same compiled script in the OPcache internal database.
The query parameters are quite separate from a script being executed. The parameters typically vary from request to request depending on the request context but the executed script is the same, and ditto any included scripts for the same processing path.
Script Timestamps
The timestamp of each underlying script file is used by OPcache as a secondary key. This is to enable detection of changes to the underlying script which will normally result in a changed timestamp. There are various opcache INI parameters which can be used to reduce the performance hit as well as OPcache API calls (such as opcache_invalidate()) which can enable sysadmins to do this explicitly.
Since the (standard) OPcache internal cache is entirely in-memory, it does not have a persistent version on, say, the filesystem. Hence it must be rebuild every time the underlying PHP process hierarchy (which is typically web-server specific) is reloaded. And yes, this does result in a startup performance hit whilst the cache is re-primed.
This use of timestamps is to do with the caching of script compilations and quite separate to any application content related caching
Application Caching
What OPcache does is to avoid per-request the compilation costs. For any PHP application based on a framework or a complex package such as Joomla or MediaWiki this can represent typically 50-90% of the per-request CPU cost, hence leading to a 2-10X throughput improvement.
Application caching is application-specific and relate to avoiding per-request costs of execution applications code for duplicated processing of application data.
These are quite separate and to get good application performance, you always should consider doing both.

Speeding up symfony 1.4 templates with memcached

I was wondering if itwould be possible to somehow speed up symfony templates by loading the files in memcached, and then instead of doing include, grabbing them from memory? Has anyone tried this? WOuld it work?
Have you looked at the view cache already? This built-in system makes it possible to cache the output from actions, and has a lot of configuration options, and is overridable on a per-action (and per-component) level. It works by default on a file level, but I think it is possible to configure it in a way that the action output is cached to memcached. (Or you should write this part)
If you want really lightning fast pages, you should also look at the sfSuperCachePlugin, which stores the output as an HTML file in your public HTML folder. That way Apache can directly serve the pages, and doesn't need to start up PHP and symfony to generate the output.
Sorry for not having more time to give an explanation here but you can review the notes at:
http://www.symfony-project.org/book/1_2/12-Caching
under the heading:
Alternative Caching storage
Quote from the page:
"By default, the symfony cache system stores data in files on the web server hard disk. You may want to store cache in memory (for instance, via memcached) or in a database (notably if you want to share your cache among several servers or speed up cache removal). You can easily alter symfony's default cache storage system because the cache class used by the symfony view cache manager is defined in factories.yml."
good luck!

Need some thoughts and advice if I need to do anything more to improve performance of my webapp

I'm working on a webapp that uses a lot of ajax to display data and I'm wondering if I could get any advice on what else I could do to speed up the app, and reduce bandwidth, etc.
I'm using php, mysql, freeBSD, Apache, Tomcat for my environment. I own the server and have full access to all config files, etc.
I have gzip deflate compression turned on in the apache http.conf file. I have obfuscated and minified all the .js and .css files.
My webapp works in this general manner. After login the user lands on the index.php page. All links on the index page are ajax calls to read a .php class function that will retrieve the html in a string and display it inside a div somewhere on the main index.php page.
Most of the functions returning the html are returning strings like:
<table>
<tr>
<td>Data here</td>
</tr>
</table>
I don't return the full "<html><head>" stuff, because it already exists in the main index.php page.
However, the html strings returned are formatted with tabs, spaces, comments, etc. for easy reading of the code. Should I take the time to minify these pages and remove the tabs, comments, spaces? Or is it negligible to minify the .php pages because its on the server?
I guess I'm trying to figure out if the way I've structured the webapp is going to cause bandwidth issues and if I can reduce the .php class file size could I improve some performance by reducing them. Most of the .php classes are 40-50KB with the largest being 99KB.
For speed, I have thought about using memcache, but don't really know if adding it after the fact is worth it and I don't quite know how to implement it. I don't know if there is any caching turned on on the server...I guess I have left that up to the browser...I'm not very well versed in the caching arena.
Right now the site doesn't appear slow, but I'm the only user...I'm just wondering if its worth the extra effort.
Any advice, or articles would be appreciated.
Thanks in advance.
My recommendation would be to NOT send the HTML over the AJAX calls. Instead, send just the underlying data ("Data here" part) through JSON, then process that data through a client-side function that would decorate it with the right HTML, then injecting it into the DOM. This will drastically speed up the Ajax calls.
Memcache provides an API that allows you to cache data. What you additionally need (and in my opinion more important is) is a strategy about what to cache and when to invalidate the cache. This cannot be determined by looking at the source code, it comes from how your site is used.
However, an opcode cache (e.g. APC) could be used right away.
Code beautifier is for human not for machine.
As part of the optimization you should take off.
Or simply add a flag checking in your application, certain condition match (like debug mode), it return nicely formatted javascript. Otherwise, whitespace does not mean anything to machine.
APC
You should always use APC to compile & cache php script into op-code.
Why?
changes are hardly make after deployment
if every script is op-code ready, your server does not required to compile plain-text script into binary op-code on the fly
compile once and use many
What are the benefits?
lesser execution cycle to compile plain-text script
lesser memory consume (both related)
a simple math, if a request served in 2 seconds in your current environment, now with APC is served in 0.5 seconds, you gain 4 times better performance, 2 seconds with APC can served 4 requests. That's mean previously you can fit 50 concurrent users, now you can allow 200 concurrent users
Memcache - NO GO?
depends, if you are in single host environment, probably not gain any better. The biggest advantages of memcache is for information sharing & distribution (which mean multiple server environment, cache once and use many).
etc?
static files with expiration header (prime cache concept, no request is fastest, and save bandwidth)
cache your expensive request into memcache/disk-cache or even database (expensive request such as report/statistics generation)
always review your code for best optimization (but do not over-do)
always do benchmark and compare the results (was and current)
fine-tune your apache/tomcat configuration
consider to re-compile PHP with minimum library/extension and load the necessary libraries during run-time only (such as application using mysqli, not using PDO, no reason to keep it)

Optimizing PHP require_once's for low disk i/o?

Q1)
I'm designing a CMS (-who isn't!) but priority is being given to caching. Literally everything is cached. DB rows, DB id queries, Configuration data, processed data, compiled templates. Currently it has two layers of caching.
The first is a opcode cache or memory cache such as apc, eaccelerator, xcache or memcached. If an entry is not found in there it is then searched for in the secondary slow cache, ie php includes.
Are the opcode caches actually faster than doing a require_once to a php file with a var_export'd array of data in it? My tests are inconclusive as my development box (5.3 of XAMPP) keeps throwing errors installing any of the aforementioned programs.
Q2)
The CMS has numerous helper classes that are autoloaded on demand instead of loading all files. Mostly each has a require before it so no autoloading needs to take place, however this is not the question. Because a page script can have up to 50/60 helper files included I have a feeling that if the site was under pressure it would buckle because of all the i/o that this incurs. Ignore for the moment that there is output cache in place that would remove the need for what I am about to suggest, and also that opcode caches would render this moot. What I have tried to do is join all the helper files required for the scripts execution in one single file. This is achievable and works well, however it has a side effect of greatly increasing the memory usage dramatically even though technically the same code is being used.
What are your thoughts and opinions on this?
Using a compiler cache like APC should help out as it will take your helper files and cache them after they are converted to opcode. That will mean the files will not only be cached but already in opcode so they do not need to be parsed and compiled each time they are required.
Looks like you just have no idea what you want to cache (and why).
You just cannot compare "opcode cache" and "require_once". Opcode cache will cache required code as well as other code.
First, keep in mind that your operating system will cache files in memory if they are being accessed frequently enough.
Also, don't use require_once. It is significantly slower than require. If you aren't using an autoloader, you should be. There is no reason to be manually including files in a modern php application (very few exceptions).
50-60 helper files is crazy. Isn't there some way to combine these? Can't you put them all in a related helper class, like OutputHelper or CacheHelper? That way you only have to include the class, which, again, should be taken care of your autoloader. It sounds to me you're doing something like putting one function per file.
Opcode caching greatly reduces memory usage and execution speed, but I'm not sure what effect it has on require statements.
I agree with ryeguy. require_once is slower than require or include because it has to log every include and check against it. If your only doing one require/include (which you should be for classes) then you don't need require_once or include_once.
Autoloading is great for optimization. As you only will load in classes when needed. So if your app has 500 classes, but only needs 15 to run a certain page/script. Then only those 15 get loaded. Which is nice.
If you take a peak at any big framework. You will notice that they have migrated to using autoloaders. They use to use require_once at the last moment like this example from the Zend Framework Version 1.
require_once 'Zend/Db/Exception.php';
throw new Zend_Db_Exception('Adapter name must be specified in a string');
Zend Framework Version 2 is going to be using auto loaders instead. I believe this is the fastest and it's also the easiest to code for.

Categories