I hope this is not a completely stupid question. I have searched quite a bit for an answer, but I can't find (or recognise) one exactly on point.
I understand that functions in PHP are not parsed until actually run. Therefore, if I have a large class with many functions, only one of which requires a large include file, will I potentially save memory if I only include the "include file" within the function (as opposed to at the top of the class file)?
I presume that, even if this would save memory, it would only do so until such time as the function was called, after which the memory would not be released until the current script stopped running?
Many Thanks,
Rob
I love this saying: "Make it work and then, if needed, make it fast." -some good programmer?
In most cases you would probably be better off focusing on good OOP structure and application design then speed. If you server is using something like Zend Optimizer having all your methods in a single file won't make any difference since it is all pre-compiled and stored in memory.(It's more complicated then this but you get the idea)
You can also load all your include files when apache starts. Then all the functions are loaded in memory. You wouldn't want to do that while developing unless you like to restart Apache every time you make a code change. But when done on production servers it can make a huge difference. And if you really want to make things fast you can write the code in C++ and load it as a module for Apache.
But in the end... do you really need that speed?
Yes it will, but be sure that the function doesn't depend on any other functions included in the parent. The memory consumption is also dependent on a couple things, from the size of the file itself to the amount of virtual memory it requires with variable setting and proper garbage collection protocols.
If the function is inside a class, it's called a method, and it might depend on its class to extend another class.
Just some things to consider. Always include the bare minimum.
Don't save memory on such cases unless you really need it, save development time. Memory is usually cheap but development/supoort time isn't. Use php opcode cacher like eAccelerator or APC, it will increase speed of execution because all files will be pre-compiled and stored in memory.
Related
I'm developing a web application and I'm using a file called functions.php in which I stored all the functions related to my application, currently the file has about 1500 lines of code with over 30 functions.
I was wondering if it is a problem, could it slow down the process when calling functions? Should I make other files to move some of the functions there?
In most cases, the number of lines in the .php script isn't going to affect the speed of the program nearly as much as the code itself. If execution time is your number one concern, then optimizing your code should be your number one priority. Start with the functions that are called the most, and make sure the code there is as tight as possible.
Splitting the functions up into different files would technically make the script slower since the interpreter would have to do disk I/O to parse the files. But the speed hit would be infinitesimal, so I'd argue that splitting them up might save time in the long run since it'll be easier to debug and optimize if you're not always staring down a huge file with 30 functions in it.
Finally, if you do have to use a bunch of huge .php files in your app, you might want to look into using something like Zend that will compile your scripts on the server.
If you only use one or two functions per page, would it be faster to split them up into separate files and only include the ones you need? Yes, because PHP needs to read through the whole file to include it. Will it make a noticeable enough difference that it's worth splitting the file? In most cases the answer's probably no.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
To give some context:
I had a discussion with a colleague recently about the use of Autoloaders in PHP. I was arguing in favour of them, him against.
My point of view is that Autoloaders can help you minimise manual source dependency which in turn can help you reduce the amount of memory consumed when including lots of large files that you may not need.
His response was that including files that you do not need is not a big problem because after a file has been included once it is kept in memory by the Apache child process and this portion of memory will be available for subsequent requests. He argues that you should not be concerned about the amount of included files because soon enough they will all be loaded into memory and used on-demand from memory. Therefore memory is less of an issue and the overhead of trying to find the file you need on the filesystem is much more of a concern.
He's a smart guy and tends to know what he's talking about. However, I always thought that the memory used by Apache and PHP was specific to that particular request being handled.
Each request is assigned an amount of memory equal to memory_limit PHP option and any source compilation and processing is only valid for the life of the request.
Even with op-code caches such as APC, I thought that the individual request still needs to load up each file in it's own portion of memory and that APC is just a shortcut to having it pre-compiled for the responding process.
I've been searching for some documentation on this but haven't managed to find anything so far. I would really appreciate it if someone can point me to any useful documentation on this topic.
UPDATE:
Just to clarify, the autoloader discussion part was more of a context :).
It may not have been clear but my main question is about whether Apache will pool together its resources to respond to multiple requests (especially memory used by included files), or whether each request will need to retrieve the code required to satisfy the execution path in isolation from other requests handled from the same process.
e.g.:
Files 1, 2, 3 and 4 are an equal size of 100KB each.
Request A includes file 1, 2 and 3.
Request B includes file 1, 2, 3 and 4.
In his mind he's thinking that Request A will consume 300KB for the entirety of it's execution and Request B will only consume a further 100KB because files 1,2 and 3 are already in memory.
In my mind it's 300KB and 400KB because they are both being processed independently (if by the same process).
This brings him back to his argument that "just include the lot 'cos you'll use it anyway" as opposed to my "only include what you need to keep the request size down".
This is fairly fundamental to how I approach building a PHP website, so I would be keen to know if I'm off the mark here.
I've also always been of the belief that for large-scale website memory is the most precious resource and more of a concern than file-system checks for an autoloader that are probably cached by the kernel anyway.
You're right though, it's time to benchmark!
Here's how you win arguments: run realistic benchmark, and be on the right side of the numbers.
I've had this same discussion, so I tried an experiment. Using APC, I tried a Kohana app with a single monolithic include (containing all of Kohana) as well as with the standard autoloader. The final result was that the single include was faster at a statistically irrelevant rate (less than 1%) but used slightly more memory (according to PHP's memory functions). Running the test without APC (or XCache, etc) is pointless, so I didn't bother.
So my conclusion was to continue use autoloading because it's much simpler to use. Try the same thing with your app and show your friend the results.
Now you don't need to guess.
Disclaimer: I wasn't using Apache. I cannot emphasize enough to run your own benchmarks on your own hardware on your own app. Don't trust that my experience will be yours.
You are the wiser ninja, grasshopper.
Autoloaders don't load the class file until the class is requested. This means that they will use at most the same amount memory as manual includes, but usually much less.
Classes get read fresh from file each request even if an apache thread can handle multiple requests, so your friends 'eventuall all are read' doesn't hold water.
You can prove this by putting an echo 'foo'; above the class definition in the class file. You'll see on each new request the line will be executed regardless of if you autoload or manually include the whole world of class files at start.
I couldn't find any good concise documentation on this--i may write some with some memory usage examples--as i also have had to explain this to others and show evidence to get it to sink in. I think the folks at zend didn't think anyone would not see the benifits of autoloading.
Yes, apc and such (like all caching solutions) can overcome the resouce negatives and even eek out small gains in performance, but you eat up lots of unneeded memory if you do this on a non-trivial number of libraries and serving a large number of clients. Try something Like loading a healthy chunk of the pear libraries in a massive include file while handling 500 connections hitting your page at the same time.
Even using things like Apc you benefit from using autoloaders with any non-namespaced classes (most of the existing php code currently) as it can help avoid global namespace pollution when dealing with large umbers of class libraries.
This is my opionion.
I think autoloaders are a very bad idea for the following reasons
I like to know what and where my scripts are grabbing the data/code from. Makes debugging easier.
This also has configuration problems in so far as if one of your developers changes the file (upgrade etc) or configuration and things stop working it is harder to find out where it is broken.
I also think that it is lazy programming.
As to memory/preformance issues it is just as cheap to buy some more memory for the computer if it is struggling with that.
From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.
Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.
The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.
Another approach, done by Facebook, is to completely compile the PHP code to a different language.
So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.
The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.
PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.
PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.
Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.
Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.
It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)
PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.
It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.
Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.
But the classes make this even more complicated, which means that there are potential code fragments like
if(<conditon>)
{
class XYZ() { }
}
else
{
class XYZ() { }
}
class ABC extends XYZ {}
Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.
The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.
But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)
PS: I'm the guy who can't find time to update APC for 5.4 properly
I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).
Q1)
I'm designing a CMS (-who isn't!) but priority is being given to caching. Literally everything is cached. DB rows, DB id queries, Configuration data, processed data, compiled templates. Currently it has two layers of caching.
The first is a opcode cache or memory cache such as apc, eaccelerator, xcache or memcached. If an entry is not found in there it is then searched for in the secondary slow cache, ie php includes.
Are the opcode caches actually faster than doing a require_once to a php file with a var_export'd array of data in it? My tests are inconclusive as my development box (5.3 of XAMPP) keeps throwing errors installing any of the aforementioned programs.
Q2)
The CMS has numerous helper classes that are autoloaded on demand instead of loading all files. Mostly each has a require before it so no autoloading needs to take place, however this is not the question. Because a page script can have up to 50/60 helper files included I have a feeling that if the site was under pressure it would buckle because of all the i/o that this incurs. Ignore for the moment that there is output cache in place that would remove the need for what I am about to suggest, and also that opcode caches would render this moot. What I have tried to do is join all the helper files required for the scripts execution in one single file. This is achievable and works well, however it has a side effect of greatly increasing the memory usage dramatically even though technically the same code is being used.
What are your thoughts and opinions on this?
Using a compiler cache like APC should help out as it will take your helper files and cache them after they are converted to opcode. That will mean the files will not only be cached but already in opcode so they do not need to be parsed and compiled each time they are required.
Looks like you just have no idea what you want to cache (and why).
You just cannot compare "opcode cache" and "require_once". Opcode cache will cache required code as well as other code.
First, keep in mind that your operating system will cache files in memory if they are being accessed frequently enough.
Also, don't use require_once. It is significantly slower than require. If you aren't using an autoloader, you should be. There is no reason to be manually including files in a modern php application (very few exceptions).
50-60 helper files is crazy. Isn't there some way to combine these? Can't you put them all in a related helper class, like OutputHelper or CacheHelper? That way you only have to include the class, which, again, should be taken care of your autoloader. It sounds to me you're doing something like putting one function per file.
Opcode caching greatly reduces memory usage and execution speed, but I'm not sure what effect it has on require statements.
I agree with ryeguy. require_once is slower than require or include because it has to log every include and check against it. If your only doing one require/include (which you should be for classes) then you don't need require_once or include_once.
Autoloading is great for optimization. As you only will load in classes when needed. So if your app has 500 classes, but only needs 15 to run a certain page/script. Then only those 15 get loaded. Which is nice.
If you take a peak at any big framework. You will notice that they have migrated to using autoloaders. They use to use require_once at the last moment like this example from the Zend Framework Version 1.
require_once 'Zend/Db/Exception.php';
throw new Zend_Db_Exception('Adapter name must be specified in a string');
Zend Framework Version 2 is going to be using auto loaders instead. I believe this is the fastest and it's also the easiest to code for.
In my index.php file I always load some classes used later. From profiler it states it sometimes can take about 20% of entire code. Is there any improvement that can make this process faster?
I would try to make this list of classes shorter, but app is very big and checking all dependencies will be costly.
Op-code caches such as APC and eAccelerator store a compiled version of your scripts in a cache. This dramatically reduces memory usage and loading time for frequently used static scripts.
While using an opcode cache (such as APC) will reduce the impact of loading/parsing/compiling the class, you'll still be loading them all on every page load & doing whatever initialization accompanies a require_once() call. If you were to set up an autoload function then the classes won't be loaded until your code actually needs to use them. There's a little overhead involved in using a class autoloader but it makes the code easier to maintain.
As always, YMMV, so benchmark your application to see if it's worthwhile in your case.
You might want to look at apc php.net/apc