I've read a lot of articles about ZF performance and still can't understand, if I've enabled byte-code caching (APC), does it make sense to use some other tricks? E.g. disabling autoload and using one big php-file with all necessary classes instead.
I was surprised to find that this is the only question on the site tagged performance, autoload, php. What a better place than this to dispel the #1 autoload myth:
Modern, well-designed autoloaders won't break APC (or PHP 5.5's OPcache), and are not any worse for performance than require_once (except for the function call overhead, of course).
Why? Well, now we have spl_autoload_register, which lets you add multiple autoload handlers. This allows each third party library to ship it's own autoloader that knows how to load that library's files, and skip the rest.
For example, Zend Framework 1's Zend_Loader_Autoloader restricts itself to trying to load classes that start with a specific pseudo-namespace -- Zend_ (and anything else the user asks it to load). If it doesn't start with the desired pseudo-namespace, it simply returns and lets the next loader on the stack run. It also knows that it can find Zend_Foo_Bar_Baz in Zend/Foo/Bar/Baz.php, so it doesn't need to search the include path by hand. Like other modern framework autoloaders, it follows the the PSR-0 autoloading standard.
Any dependencies installed via composer also get automatically built namespaced autoloaders in the same way.
It's that include path scouring that makes poorly-designed autoloaders suck. You generally don't see these in modern PHP code. The intense filesystem stat calls that result from trying to find files are a frequent performance drag. Check out this presentation by PHP creator Rasmus Lerdorf, in which he increases the performance of Wordpress through benchmarking, profiling, and careful removal of slow operations like stat calls.
The require_once-everything-up-front from the olden days is is unnecessary when you're using modern libraries and don't have a sucky autoloader. It's only a major win when you disable apc.stat if you're using APC, or fiddling with OPcache's validate_, revalidate_, and enable_file_override INI options if you're using OPcache.
tl;dr: Unless you know that statting include files is your largest bottleneck, the Zend autoloader is just fine, and you don't need to resort to a require_once fest.
Related
There are a hell lot of inbuilt PHP functions. I was wondering that after almost 2 and a half years of working as a software engineer I hardly use a little fraction of those. But all of them are defined and can be used with the default PHP installations.
I read somewhere in SO that PHP provides all these inbuilt things but doing similar things with languages like JAVA needs a lot of coding. Is that correct? I am not experienced in other languages much.
Also, am I correct to assume that a large portion of these functions are not used by any of the other inbuilt functions or anything (internal dependencies)? E.g. these functions pdf_fit_table(), gzopen() are needed only in case of PDF and gzip file related things respectively.
If so, then as advanced programmers, does PHP provide any option to us to selectively load them, based on the specific project requirements or more dynamically, based on a specific module? e.g. load PDF related functions only if I have PDF related tasks. If possible, at what level can it be done? If at the PHP installation level, then I think it is not possible in case of shared hosting. Is a better solution to this possible?
I am just speaking from a common sense point of view, we include files containing functions on a need basis.
Is it going to give a performance boost?
I am not much aware of the core libraries etc. of PHP. So, please shed some light.
Updates:
Thanks for the answers
#pygorex1 - The HipHop way is to optimize PHP overall. So, putting in very simple terms, if I am correct, if it was taking 1 second to run before then using HipHop it may make it 0.7 second. But in both the cases, the presence of those extra unnecessary defined functions are adding their overhead (say 0.1 second in first case and 0.07 sec in HipHop case). If so, then HipHop targets something else and does not answer my question. However, the other two points you gave say that all has to be done while compiling. So, it probably means if I compile with an extension then the function groups under that will be loaded every time . Then probably there is no further way of removing the inclusion? Some kind of everride?
#Tyler - I agree that it might be difficult to do what I am asking for but the reason is not what you are saying. It cannot be so difficult to find out the dependencies. Just applying common sense, I can say that functions like is_numeric(), is_array(), array_walk(), func_get_args() etc. are very basic ones and are probably called by many but there are easily distinguishable groups like the socket functions group containing e.g. socket_connect() which need not be included if not explicitly needed. The problem probably is that it needs to be specified while compiling, like pygorex1 has answered.
Concerning any potential performance boost - you're probably not going to notice it unless you're serving a ton of dynamic PHP pages. This road has been traveled before - take a look at HipHop, Facebook's tool to optimize PHP into C++. Utilizing byte code caches like APC and eAccelerator AND/OR rewriting your PHP code to cache intelligently with memcached will improve PHP performance far more than enabling/disabling certain PHP functions.
That having been said, there's two main ways to pare down the number of functions that PHP has available:
PHP compile-time options
Available when compiling PHP from source. One of the functions noted in the question gzopen() is part of the zlib extension and has to be enabled at compile time. There's quite a few built-in compile-time options.
PHP modules
These are loaded dynamically by PHP and are controlled by the php.ini config file under extensions - they are .dll files on Windows or .so files on Linux. A snippet from my development php.ini:
...
extension=php_bz2.dll
;extension=php_curl.dll
;extension=php_dba.dll
;extension=php_dblib.dll
extension=php_mbstring.dll
extension=php_exif.dll
extension=php_fileinfo.dll
extension=php_gd2.dll
...
There is dl() to load a PHP extension at runtime.
Example to load an extension dynamically:
if (!extension_loaded('sqlite')) {
$prefix = (PHP_SHLIB_SUFFIX === 'dll') ? 'php_' : '';
dl($prefix . 'sqlite.' . PHP_SHLIB_SUFFIX);
}
This is taken from http://php.net/manual/en/function.dl.php
The php function namespace debacle, is, well, exactly that.
No, there's no way selectively load them at run-time. Just because you don't call something, doesn't mean something you call doesn't call it.
Dont bother compiling out built-in functions. Learn about shared libraries and linux caching system. Those files(and functions) are basically always loaded and cached so it has very little impact on an application. As pygorex1 said, its better to use a good caching mechanism than crippling the PHP distribution on purpose.
#powtac: doing dl() as a way to dinamically load some libraries might acually slow down your app(depends on how many dl() you do, it might be better to have them always loaded in memory than loading them on request)
#Tyler Eaves: you may disable some function from being called actually. There's nothing preventing their loading though..
Also, hip-hop as far as i knwo actually compiles php code down to C/C++ code, and then compiles it. This has the BIG advantage of skipping the virtual machine, and php-specific upcodes and lots of overhead over a scripted language, but has the big disatvantage that its not a scripted language anymore.
Q1)
I'm designing a CMS (-who isn't!) but priority is being given to caching. Literally everything is cached. DB rows, DB id queries, Configuration data, processed data, compiled templates. Currently it has two layers of caching.
The first is a opcode cache or memory cache such as apc, eaccelerator, xcache or memcached. If an entry is not found in there it is then searched for in the secondary slow cache, ie php includes.
Are the opcode caches actually faster than doing a require_once to a php file with a var_export'd array of data in it? My tests are inconclusive as my development box (5.3 of XAMPP) keeps throwing errors installing any of the aforementioned programs.
Q2)
The CMS has numerous helper classes that are autoloaded on demand instead of loading all files. Mostly each has a require before it so no autoloading needs to take place, however this is not the question. Because a page script can have up to 50/60 helper files included I have a feeling that if the site was under pressure it would buckle because of all the i/o that this incurs. Ignore for the moment that there is output cache in place that would remove the need for what I am about to suggest, and also that opcode caches would render this moot. What I have tried to do is join all the helper files required for the scripts execution in one single file. This is achievable and works well, however it has a side effect of greatly increasing the memory usage dramatically even though technically the same code is being used.
What are your thoughts and opinions on this?
Using a compiler cache like APC should help out as it will take your helper files and cache them after they are converted to opcode. That will mean the files will not only be cached but already in opcode so they do not need to be parsed and compiled each time they are required.
Looks like you just have no idea what you want to cache (and why).
You just cannot compare "opcode cache" and "require_once". Opcode cache will cache required code as well as other code.
First, keep in mind that your operating system will cache files in memory if they are being accessed frequently enough.
Also, don't use require_once. It is significantly slower than require. If you aren't using an autoloader, you should be. There is no reason to be manually including files in a modern php application (very few exceptions).
50-60 helper files is crazy. Isn't there some way to combine these? Can't you put them all in a related helper class, like OutputHelper or CacheHelper? That way you only have to include the class, which, again, should be taken care of your autoloader. It sounds to me you're doing something like putting one function per file.
Opcode caching greatly reduces memory usage and execution speed, but I'm not sure what effect it has on require statements.
I agree with ryeguy. require_once is slower than require or include because it has to log every include and check against it. If your only doing one require/include (which you should be for classes) then you don't need require_once or include_once.
Autoloading is great for optimization. As you only will load in classes when needed. So if your app has 500 classes, but only needs 15 to run a certain page/script. Then only those 15 get loaded. Which is nice.
If you take a peak at any big framework. You will notice that they have migrated to using autoloaders. They use to use require_once at the last moment like this example from the Zend Framework Version 1.
require_once 'Zend/Db/Exception.php';
throw new Zend_Db_Exception('Adapter name must be specified in a string');
Zend Framework Version 2 is going to be using auto loaders instead. I believe this is the fastest and it's also the easiest to code for.
A number of frameworks utilize spl_autoload_register() for dynamically loading classes (i.e. controllers and models). There are a couple of posts on the issue of autoloading and opcode caching. One post in particular has a response by #cletus which references #Rasmus making a number of statements which prove to be unsavoury for those utilizing APC as an opcode cache:
Do PHP opcode cache work with __autoload?
There does not appear to be any discussion as to any possible alternatives to autoloading which do not affect opcode cache performance.
Is there a way to get around the fact autoloaded classes do not get added to the byte code cache?
If not, are there any alternative methods for dynamically loading classes which will get cached?
There seems to still be confusion about this topic, however in most cases it comes down to ease vs performance.
A good mailing list thread to read would be this one on Zend Frameworks mailing list:
http://n4.nabble.com/ZF-and-Autoloading-td640085i20.html
Now, the correlation is here because if you inherit from not-yet-defined
class, you might rely on autoload to define it (though you might also
rely on include), and actually the presence of the autoload facility may
encourage you to use such inheritance. But this is not the autoload
which brings trouble (see after Ramus' "it's not just autoload" in the
blog for some examples of troublesome things).
So the right phrase would be "people which tend to rely on autoload tend
also to use code which defies compile-time binding". Which can't be seen
as autoload fault, of course, and just avoiding autoload won't help a
bit with that - you would also have to rewrite your code so that
compile-time binding could happen. And it has nothing to do with uses of
autoload with "new", for example.
As for the slowdown from the effects described above - i.e., absence of
the compile-time binding - the code indeed becomes a bit slower and such
code can lead in some obscure cases to some trouble to opcode caches
(not in the autoload cases - but in cases where classes are defined
inside conditions, or, God forbid, different definition is created
depending on condition) - but it has next to nothing to do with using
autoload by itself.
The amount of slowdown, however, seem to be greatly exagerrated by
people - it is nothing (and I repeat to be clear - NOTHING) compared
to the performance benefit given by the opcode cache due to the absence
of the disk operations and compilation stage. You could probably
compose an artificial benchmark that would show some significant
slowdown, but I do not believe any real application would even notice.
source: http://n4.nabble.com/ZF-and-Autoloading-td640085i20.html#a640092
I'm trying to accomplish a task and turns out that the code I need is packaged as a PHP extension, which according to what I've been told means I have to have root access to install it (I'm on shared hosting so that's a bit of a problem.
I'll solve this problem later, but for now I'm trying to understand the difference between an extension, a library, and a class. Is it more of a packaging thing that could be overridden and repackaged a different way, or is there a valid architectural reasoning behind it?
Also when releasing your own code, what makes you decide to release as library vs. class vs. extension? or do you go with whichever sounds better?
thanks in advance.
P.S. If you must know which extension I'm talking about, it's Libpuzzle, but that's really beside the point, my question is more general.
An extension is a pice of code programmed in C which will be included into the PHP core when PHP starts. Normally you have some more native functions available after including a extension. For example a zip functionality.
A class is a abstract pice of PHP code which solves common tasks. For example sending emails. You can find some common classes at pear.php.net.
A library is a collection of PHP classes wich solve more generic tasks for example buliding HTML forms AND sending emails. The Zend Framework is a framework which consists of many, many PHP classes.
Normally extension features can be programmed in PHP. For example the PEAR::Compat class. Often you will find the functionality you need as a PHP class available. I'm sure the stackoverflow readers will supply you with ideas where to find a specific PHP class.
Extensions are low-level. Usually written in C/C++, and compiled into native-code shared libraries, they interact with the Zend Engine directly. It has pros and cons, main advantages being the speed and more control; and main disadvantages - they are harder to install, and require compilation (and that requires a compiler and PHP headers); it's not true they require root access though - you only need ability to use custom php.ini (or dl() function, but I see they deprecated it for some reason).
Libraries/classes are high-level and interpreted. If you don't know if you need to write extension, then you probably don't. About what classes are - read about OOP. A library is a reusable collection of code (most commonly in form of functions/classes).
Some libraries (including libpuzzle) also include a command-line tool. So if you're unable to use the PHP library due to your shared hosting environment, maybe you can compile the command-line tool. Then you can run it from PHP using something like exec. It will be slower and require more memory than a library, but it might get the job done. Of course, many hosts also have restrictions on commands like exec, so this might not work either.
zend framework has many components/services I don't need, it has many includes.
All this I think slow down application.
Do you know how to speed up it? may be remove not used(what is common) components, or combine files to one file?
APC or eAccelerator (APC will be included by default in future releases, so I'd recommend using it, even though raw speed is slightly below of eAccelerator)
Two level cache for configuration, full-page, partial views, queries, model objects:
1st level cache (in memory KV-store, APC store or memcached)
2nd level cache (persistent KV-store in files, SQLite or dedicated KV-store)
RDBMS connection pooling, if avaliable.
Before you start worrying about actively modifying things for more performance, you'll want to check the Performance Guide from the manual. One of the simplest steps you can do is to enable an opcode cache (such as APC) on your server - an Opcode cache alone can give you a 3-4x boost.
Code on disk that isn't being called, doesn't take any time. The only way to see what is slow is to measure it. That said, if you aren't running an opcode-cache such as APC, then you are wasting time.
I agree with Topbit, that you should start with code profiling.
Find what is the problem.
I don't think that the problem is just because of ZF has so many files. It uses autoloading, so only files required at the moment are loaded. You definitely shouldn't split different files contents.
For many perfomance problems, caching is your friend.
you can get a bit of extra speed by optimizing the requirements statements
as stated in the optimizing help topic ... first remove all the requirements
and i also recommend using pear naming and overwriting the autoloader,
function __autoload($class) {
require str_replace('_', '/', $class) . '.php';
}
you can find more details here
Are you being forced to use the Zend Framework? If there is no obligation to use it, then not using it would obviously be the fastest way to speed things up. There are several lightweight PHP frameworks that don't come with all the overhead and bulk of Zend. For instance, Codeigniter, Yii, Symfony, and Kohana are all excellent choices and I know at least that codenigniter and Kohana both support the use of Zend components (for instance:Using Zend with Codeigniter).
Good luck!