When is a PHP include file parsed? At startup, or during execution?
My web forms call a single php script. Depending on the arguements passed in the URL, a switch/case condition determines what the script will do. Each "case" within the switch has its own include files.
If include files are parsed during initial load, then my php script will take up more memory/time to process which leads me to believe having individual php files called from my web form is better, than having one which includes what it needs.
If include files are parsed when needed (thus, when a branch of the code reaches a specific case statement, that it then performs the include) it tells me my code will be reasonably conservative on memory.
So.... my question... When is a PHP include file parsed? At initial load, or during execution?
(note... I failed to find the answer here, and I have read http://php.net/manual/en/function.include.php)
Files are included if and when the include statement is reached at runtime. To very succinctly summarise what that means, the following file is never going to be included:
if (false) {
include 'foo.php';
}
Since you're concerned about memory usage from too many includes, I feel that a bit more detail will be useful over and above a direct answer to your question.
Firstly, to directly answer you, PHP files are parsed as soon as they are loaded -- if a file contains a syntax error, you will be told of that immediately; it won't wait till it gets to that line of code. However subsequent files are only included if the specific line of code containing the include statement is executed.
You're concerned about memory usage, but having a lot of included files is generally not a major memory issue, nor a major performance issue. Indeed, most modern PHP applications of any size will use a framework library that load hundreds of PHP files for every page load. Memory and performance issues are far more likely to be caused by bugs within your code rather than simply loading too much code.
If you are concerned about memory and performance from this, you should consider using PHP's OpCache feature. With this feature enabled, PHP stores a cache in memory of the compiled state of all the files it has included within a system. When it runs the page again, therefore, it does not need to actually load or parse anything when it encounters an include statement; it simply fetches it from the cache.
Using OpCache you can write your code with a very large number of files to include, and without any performance penalty at all.
The good news is that OpCache is enabled by default in recent PHP versions, and is completely transparent to the developer -- you don't even need to know that it's there; the only difference you'll see between it being turned on and off is your site running faster.
So, firstly, make sure your PHP version is up-to-date (v5.5 or higher). Then make sure OpCache is enabled in your PHP.ini file. Then just sit back and stop worrying about these kinds of things.
File included with include statement are parsed during exection. When your php code hits a include statement it will start parsing the file to see what is in there.
From w3schools
The include (or require) statement takes all the text/code/markup that
exists in the specified file and copies it into the file that uses the
include statement.
There is other questions with a similar topic:
In PHP, how does include() exactly work?
Related
I was always sure that the PHP functions file_get_contents and readfile execute any PHP code in any files - regardless of file type - that are given to it. I tried this on multiple setups, and it always worked.
I received a question regarding this here, and the user seems to think that this is not the case.
I looked at the PHP documentation for the functions, and they do not mention code execution (which is something that I would expect if this is normally the case, as it has serious security implications).
I also searched for it, and found a lot of claims that the functions do not execute PHP code. For example:
readfile does not execute the code on your server so there is no issue there. source
Searching for "php file_get_contents code execution" also returns various questions trying to execute the retrieved PHP code, which seems odd if it would indeed normally execute any given PHP code.
I also found one question that asks about not execution PHP code, so execution does seem to happen to others as well.
So my questions are:
do the functions file_get_contents and readfile execute PHP code in retrieved files?
does this depend on some php.ini setting? If so, what setting(s)?
does it depend on the PHP version, and if so, what versions are affected?
if it is not normally the case, what may be the reasons that they execute the PHP code in my setups?
file_get_contents and readfile do not execute code. All they do is return the raw contents of the file. That could be text, PHP code, binary (e.g. image files), or anything else. No interpretation of the files' contents is happening at all.
The only situation in which it may appear as if execution is happening is:
<?php ?> tags will likely be hidden by the browser because it's trying to interpret them as HTML tags, so this may lead to the impression that the PHP disappeared and hence may have been executed.
You're reading from a source which executes the code, e.g. when reading from http://example.com/foo.php. In this case the functions have the same effect as visiting those URLs in a web browser: the serving web server is executing the PHP code and returning the result, but file_get_contents merely gets that result and returns it.
Those functions are described in the «Function Reference / File System Related Extensions / Filesystem» section of the manual, while function to execute code are described at «Function Reference / Process Control Extensions».
I'm pretty sure the misunderstanding comes from a somehow widespread confusion between file system and network and that's made worse by the PHP streams feature that provides protocol wrappers which allow to use the same functions to transparently open any kind of resources: local files, networks resources, compressed archives, etc. I see endless posts here where someone does something like this:
file_get_contents('http://example.com/inc/database.inc.php');
... and wonders why he cannot see this database connection. And the answer is clear: you are not loading a file, you're fetching a URL. As a result, code inside database.inc.php gets effectively executed... though rather indirectly.
I would like to understand how the PHP compilation process works.
Assuming I have a file called funcs.php and this file has three functions, if I include or require it, will all the three functions be compiled during the file load? Or will the source code be read and kept in memory, until I call them and this call will trigger the compilation process?
Thanks,
Yes, all three functions will be read in and prepared for execution and their names will be saved into a table and from then on be reserved. So, syntax errors will also appear if you don't execute the function.
This process doesn't really consume much time, but you should try to reduce the amount of code and remove unused stuff. Mainly because it could cause problems after a major PHP upgrade.
Each page on my website is rendered using PHP.
Each PHP file uses around 10 includes. So for every page that is displayed, the server needs to fetch 10 files, in addition to the rest of its functions (MySQL, etc).
Should I combine them into a single include file? Will that make ANY difference to the real-world speed? It's not a trivial task as there would be a spaghetti of variable scope to sort out.
Include files are processed on the server, so they're not "fetched" by the browser. The performance difference of using includes vs. copy and pasting the code or consolidating files is so negligible (and I'm guessing we're talking about in the 10 ms to 100 ms range, at the absolute most), that it isn't at all worth it.
Feel free to include and require to your heart's content. Clean code is substantially more important than shaving less than 100 ms off a page load. If you're building something where timing is that critical, you shouldn't be using PHP anyway.
What takes time is figuring out where the files are actually located in the include path. If you got multiple locations in your include path, PHP will search each location until it either finds the file or fails (in which case it throws an error). That's why you should put the include path where most of the included files are to be found on top of the include path.
If you use absolute paths in your include path, PHP will cache the path in the realpath cache, but note that this gets stale very quickly. So yes, including ten files is potentially slower than including one large file, simply because PHP has to check the include path more often. However, unless your webserver is a really weak machine, ten files is not enough to make an impact. This gets only interesting when including hundreds of files or have many locations to search, in which case you should use an OpCode cache anyway.
Also note that when including files, it is not good practice to include each and every file right at the beginning, because you might be including files that are never called by your application for a specific request.
Reference
http://de2.php.net/manual/en/ini.core.php#ini.include-path
http://de2.php.net/manual/en/ini.core.php#ini.sect.performance
http://en.wikipedia.org/wiki/List_of_PHP_accelerators
Although disk I/O operations among the biggest performance-eaters, a regular site won't notice any sensible number of includes.
Before you hit any problems with includes, you probably already would have some opcode cache that eliminates this problem too.
include\ andrequires` only open file on the server side, but that might be time consumming depending on the hardware/filesystem, etc.
Anyway, if you can, use autoloader. Only needed files will be loaded that way.
Then if you think included files are a source of slowdown (and I think there is a lot of other points to look for improvement before), you can try to automatically merge the files. You still have one file per class when developping, but you can build a file that contains each class' definition to have only one include (something like cat <all your included file>.php > to_include.php).
Why is it a good practice to remove PHP files from the htdocs/public directory?
They are being parsed anyway, right?
if PHP files are at some point not parsed due to a configuration error or, say, a failing interpreter, there is no danger of the source code (and possibly passwords) being revealed to the world as clear text.
Also, human mistakes like renaming a .php file to .php.bak are less dangerous that way.
I had this once, years ago, when a colleague, from the Perl world and totally ignorant about PHP, decided to set "short_open_tags" to "off" on a server we shared, because short_open_tags messed with some XML experiment he had going (<?xml version="1.0"?>). That was fun! :)
and a second thing:
Calling includes out of context
Having includes (i.e. pieces of PHP code that is included elsewhere) under the web root makes you potentially vulnerable to people calling those includes directly, out of context, possibly bypassing security checks and initializations.
If you can't/won't avoid PHP code to reside in the web root, at least be sure to start each file checking whether it is running in the correct context.
Set this in your main script(s):
define ("RUNNING_IN_SCRIPT", true);
and add this to the 1st line of each include:
if (!defined("RUNNING_IN_SCRIPT")) die ("This file cannot be called directly.");
Yes, they are parsed. However, that is completely dependent on you or the server admin not screwing up the config files.
All it takes is a quick typo in the Apache config before Apache forgets to parse the PHP (I've had this happen). Since Apache won't know what to do with a PHP file after that, your source code just gets output as plain text, and can be immediately copied. Heck, it's even cached in the user's browser, so a malicious user can quickly copy all your code and browse it later at their convenience, looking for security holes.
You don't want your source to be visible even for a second. If you have no code files in the htdocs directory, this can't happen. They can easily be included into your code from outside the directory however.
Many MVC frameworks use this method of sandboxing for just this purpose.
The more executable PHP files you have, the more security risks you also have :
What if there is a problem in your configuration (it happens !), and the source code of your PHP file containing your database credentials is sent to the browser ?
what if there is some "bad" thing left in one of those files, you didn't think about, and no-one ever tested ?
The less PHP executable files you have... well, that's a couple of potential problems you don't have to care about.
That's why it's often considered as best to :
put under the document root only the PHP files that have to be called via Apache (like index.php, for instance),
and put outside of the document root the PHP files that are not accessed directly, but only included by the first ones (ie, libraries / frameworks, for instance).
The question might prompt some people to say a definitive YES or NO almost immediately, but please read on...
I have a simple website where there are 30 php pages (each has some php server side code + HTML/CSS etc...). No complicated hierarchy, nothing. Just 30 pages.
I also have a set of purely back-end php files - the ones that have code for saving stuff to database, doing authentication, sending emails, processing orders and the like. These will be reused by those 30 content-pages.
I have a master php file to which I send a parameter. This specifies which one of those 30 files is needed and it includes the appropriate content-page. But each one of those may require a variable number of back-end files to be included. For example one content page may require nothing from back-end, while another might need the database code, while something else might need the emailer, database and the authentication code etc...
I guess whatever back-end page is required, can be included in the appropriate content page, but one small change in the path and I have to edit tens of files. It will be too cumbersome to check which content page is requested (switch-case type of thing) and include the appropriate back-end files, in the master php file. Again, I have to make many changes if a single path changes.
Being lazy, I included ALL back-end files inthe master file so that no content page can request something that is not included.
First question - is this a good practice? if it is done by anyone at all.
Second, will there be a performance problem or any kind of problem due to me including all the back-end files regardless of whether they are needed?
EDIT
The website gets anywhere between 3000 - 4000 visits a day.
You should benchmark. Time the execution of the same page with different includes. But I guess it won't make much difference with 30 files.
But you can save yourself the time and just enable APC in the php.ini (it is a PECL extension, so you need to install it). It will cache the parsed content of your files, which will speed things up significantly.
BTW: There is nothing wrong with laziness, it's even a virtue ;)
If your site is object-oriented I'd recommend using auto-loading (http://php.net/manual/en/language.oop5.autoload.php).
This uses a magic method (__autoload) to look for a class when needed (it's lazy, just like you!), so if a particular page doesn't need all the classes, it doesn't have to get them!
Again, though, this depends on if it is object-oriented or not...
It will slow down your site, though probably not by a noticable amount. It doesn't seem like a healthy way to organize your application, though; I'd rethink it. Try to separate the application logic (eg. most of the server-side code) from the presentation layer (eg. the HTML/CSS).
it's not a bad practice if the files are small and contains just definition and settings.
if they actually run code, or extremely large, it will cause a performance issue.
now - if your site has 3 visitors an hour - who cares, if you have 30000... that's another issue, and you need to work harder to minimize that.
You can migitate some of the disadvantages of PHP code-compiling by using XCache. This PHP module will cache the PHP-opcode which reduces compile time and performance.
Considering the size of your website; if you haven't noticed a slowdown, why try to fix it?
When it comes to larger sites, the first thing you should do is install APC. Even though your current method of including files might not benefit as much from APC as it could, APC will still do an amazing job speeding stuff up.
If response-speed is still problematic, you should consider including all your files. APC will keep a cached version of your sourcefiles in memory, but can only do this well if there are no conditional includes.
Only when your PHP application is at a size where memory exhaustion is a big risk (note that for most large-scale websites Memory is not the bottleneck) you might want to conditionally include parts of your application.
Rasmus Lerdorf (the man behind PHP) agrees: http://pooteeweet.org/blog/538
As others have said, it shouldn't slow things down much, but it's not 'ideal'.
If the main issue is that you're too lazy to go changing the paths for all the included files (if the path ever needs to be updated in the future). Then you can use a constant to define the path in your main file, and use the constant any time you need to include/require a file.
define('PATH_TO_FILES', '/var/www/html/mysite/includes/go/in/here/');
require_once PATH_TO_FILES.'database.php';
require_once PATH_TO_FILES.'sessions.php';
require_once PATH_TO_FILES.'otherstuff.php';
That way if the path changes, you only need to modify one line of code.
It will indeed slow down your website. Most because of the relative slow loading and processing of PHP. The more code you'd like to include, the slower the application will get.
I live by "include as little as possible, as much as necessary" so i usually just include my config and session handling for everything and then each page includes just what they need using an include path defined in the config include, so for path changes you still just need to change one file.
If you include everything the slowdown won't be noticeable until you get a lot of page hits (several hits per second) so in your case just including everything might be ok.