Alternatively does it copy the compiled code from cache (APC) or load it from disk for each call.
The reason I am asking this is that I have a large data structure that I have initialized in a file. Now I worry about the performance of the script.
PHP is actually very efficient at dealing with large data-structures. It is, however, always going to store them in memory (which is not shared between calls). If it is large enough, you might want to consider buffering it piece by piece or storing it in a datastore of some sort. If your data file is 100MiB, you're going to be loading at least 100MiB plus the memory required by PHP with every call.
APC wont entirely help this situation either. When PHP loads initially (without APC), it will perform the following steps:
Read the entire file into memory
Lexing, it tokenizes the file into standard codes or "Lexicons" for the parser to read
Parsing, it then utilizes the tokens in the file to generate the required expressions for complication
Compiling, It takes the expressions and creates "opt-codes" similar to how Java "compiles"
Executing, the opt-codes are executed in the PHP runtime (data actually gets manipulated here)
You might have noticed that steps 1-4 are all redundant with multiple calls which is why compiled languages have a dedicated compiler to perform these steps and either a Runtime, VM, or OS to run the generated bytecode or binary on. APC actually tries to give PHP that same edge: by precompiling each file, it can then store the (typically smaller) precompiled, opt-code file into memory and access it when someone accesses the page.
The problem with your use-case is that this does absolutely nothing for literal data within a file. Data still must be declared and wont even be touched until step 5, which is why I am emphasizing the importance of perhaps using an external data store if you see a significant performance hit.
Please use a profile like XDebug or something similar to gain some more insight on what is actually slowing your script down (if at all) so you can make a more informed decision on where to go from here.
Related
Currently I'm storing the configuration for my PHP scripts in variables and constants within another PHP script (e.g. config.php).
So each time a script is called, it includes the configuration script to gain access to the values of the variables/constants.
Since INI-files are easier to parse by other scripts, I thought about storing values for my configuration in such a file an read it using parse_ini_file().
In my notion PHP keeps script-files in memory, so including a script-file does (usually) not cause IO (Or does Zend do the caching? Or are the sources not cached at all?).
How is it with reading custom INI-files. I know that for .user.ini there is caching (see user_ini.cache_ttl), but does PHP also cache custom INI-files?, or does a call to parse_ini_file() always cause IO?
Summary
The time required to load configuration directives (which is not the same as the time needed by the app to perform those directives) is usually negligible - below one millisecond for most "reasonably sized" configurations. So don't worry - INI, PHP, or JSON are, performance wise, all equally good choices. Even if PHP were ten times faster than JSON, that would be like loading in 0.001s instead of 0.01s; very few will ever notice.
That said, there are considerations when deciding where to store config data.
.ini vs .php config storage
Time to load: mostly identical unless caching is involved (see below), and as I said, not really important.
ease of use: .ini is easier to read and modify for a human. This may be an advantage, or a disadvantage (if the latter, think integrity check).
data format: PHP can store more structured data than .ini files, unless really complicated workarounds are used. But consider the possibility of using JSON instead of INI.
More structured data means that you can more easily create a "super configuration" PHP or JSON holding the equivalent of several INI files, while keeping information well isolated.
automatic redundancy control: PHP file inclusion can be streamlined with require_once.
user modifications: there are visual INI and JSON editors that can allow a user to modify a INI or JSON file while keeping it, at least, syntactically valid. Not so for PHP (you would need to roll your own).
Caching
The PHP core does not do caching. Period. That said, you'll never use the PHP core alone: it will be loaded as a (fast)CGI, an Apache module, et cetera. Also you might not use a "barebones" installation but you could have (chances are that you will have) several modules installed.
Both the "loader" part and the "module" part might do caching; and their both doing this could lead to unnecessary duplications or conflicts, so it is worth checking this out:
the file (but this does not change between INI, JSON and PHP files) will be cached into the filesystem I/O subsystem layer and, unless memory is really at a premium, will be loaded from there (on a related note, this is one of the reasons why not all filesystems are equally good for all websites).
if you need the configuration in several files, and use require_once in all of them, the configuration will be loaded once only, as soon as it is needed. This is not caching, but it is a performance improvement nonetheless.
several modules exist (Zend, opcache, APC, ...) that will cache all PHP files, configuration included. They will not cache INI files, though.
the caching done by modules (e.g. opcache) can (a) ignore further modifications to the file system, which means that upon modifying a PHP file, you'll need to somehow reload or invalidate the cache; how to do this changes from module to module; (b) implement shortcuts that might conflict with either the file system data management or its file structure (famously, opcache can ignore the path part of a file, allowing for much faster performances unless you have two files with the same name in different directories, when it risks loading one instead of the other).
Performance enhancement: cache digested data instead of config directives
Quite often it will be the case that depending on some config directive, you will have to perform one of several not trivial operations. Then you will use the results for the actual output.
What slows down the workflow in this case is not reading whether, say, "config.layout" is "VERTICAL" or "HORIZONTAL", but actually generating the layout (or whatever else). In this case you might reap huge benefits by storing the generated object somewhere:
serialized inside a file (e.g. cache/config.layout.vertical.html.gz). You will probably need to deploy some kind of 'stale data check' if the layout changes, or some kind of cache invalidation procedure. (For layouts specifically, you could check out Twig, which also does parameterized template caching).
inside a keystore, such as Redis.
in a RDBMS database such as MySQL (even if that's overkill - you'd use it as a keystore, basically).
faster NoSQL alternatives such as MongoDB.
Additional options
You will probably want to read about client caching and headers, and possibly explore whatever options your hosting offers (load balancers, HTTP caches such as Varnish, etc.).
parse_ini_file() uses standard operations to convert the file into an array.
I have a PHP-file that is included by a lot of other PHP-scripts, which all use only a subset of the functions and variables defined in that included file. (I guess this is the usual case for most larger libaries.)
For this reason, in most cases only a small part of the included file is actually used and most of it simply ignored (unused functions, never referenced variables, etc.).
But AFAIK all recent versions of PHP come with the Zend-optimizer, which as far as I understand it, produces some kind of bytecode that is then used at runtime. It therefore should filter out all unused code, so even a huge number of unused functions would cause zero overhead at runtime.
Is this the case or is there a performance overhead for using large libraries in PHP?
From the PHP 5.5 change log of new features:
The Zend Optimiser+ opcode cache has been added to PHP as the new
OPcache extension. OPcache improves PHP performance by storing
precompiled script bytecode in shared memory, thereby removing the
need for PHP to load and parse scripts on each request.
What I understand from that statement is that every .php file, when converted into bytecode, will be saved into shared memory so that the conversion does not need be repeated per file. As we are no longer performing that step our processing time goes down.
This means that the uncalled functions and un-needed variables get declared and stored in the cache but never used.
is there a performance overhead for using large libraries in PHP?
The answer to that is almost always "yes". There have been numerous benchmarks that say that a library is slow, even when using OPCaching (such as APC or Zend Optimiser).
I recall reading on php.net (although unfortunately can't seem to find the page) that the PHP interpreter can run in different ways - most commonly, every time a page is requested, an instance of the PHP interpreter is created, runs its course, and then is destroyed, along with all the memory associated with that particular page call. Apparently, it is also possible to allow all the memory to linger, so that it can be used again in future page calls; as I understood it, essentially allowing multiple different PHP scripts to access and modify the same objects, without losing them after the script is complete.
Or at least, so I remember. Is there any truth to this? If so, how would I set it up?
php doesn't work that way. its about run and forget.
you can save data between requests using userland shared memory extensions, for example: apc, xcache, memcached, etc.
or by using the session data array after calling session_start
$_SESSION
don't think of php scripts like a java application in e.g. tomcat. standard php was not designed for that use case. php compiler works on-the-fly.
You can use shared memory for some of what you want, but Redis/Memcache are probably better bets.
Let the server stay between requests, so will objects do:
appserver-in-php - Generic HTTP applications approach for PHP5.3+ (inspired by Rack and WSGI)
With well written applications that gives you more speed than APC, however it doesn't scale between users if you need to put on mutliple boxes (you still can use sticky sessions).
I'm working on a webapp that uses a lot of ajax to display data and I'm wondering if I could get any advice on what else I could do to speed up the app, and reduce bandwidth, etc.
I'm using php, mysql, freeBSD, Apache, Tomcat for my environment. I own the server and have full access to all config files, etc.
I have gzip deflate compression turned on in the apache http.conf file. I have obfuscated and minified all the .js and .css files.
My webapp works in this general manner. After login the user lands on the index.php page. All links on the index page are ajax calls to read a .php class function that will retrieve the html in a string and display it inside a div somewhere on the main index.php page.
Most of the functions returning the html are returning strings like:
<table>
<tr>
<td>Data here</td>
</tr>
</table>
I don't return the full "<html><head>" stuff, because it already exists in the main index.php page.
However, the html strings returned are formatted with tabs, spaces, comments, etc. for easy reading of the code. Should I take the time to minify these pages and remove the tabs, comments, spaces? Or is it negligible to minify the .php pages because its on the server?
I guess I'm trying to figure out if the way I've structured the webapp is going to cause bandwidth issues and if I can reduce the .php class file size could I improve some performance by reducing them. Most of the .php classes are 40-50KB with the largest being 99KB.
For speed, I have thought about using memcache, but don't really know if adding it after the fact is worth it and I don't quite know how to implement it. I don't know if there is any caching turned on on the server...I guess I have left that up to the browser...I'm not very well versed in the caching arena.
Right now the site doesn't appear slow, but I'm the only user...I'm just wondering if its worth the extra effort.
Any advice, or articles would be appreciated.
Thanks in advance.
My recommendation would be to NOT send the HTML over the AJAX calls. Instead, send just the underlying data ("Data here" part) through JSON, then process that data through a client-side function that would decorate it with the right HTML, then injecting it into the DOM. This will drastically speed up the Ajax calls.
Memcache provides an API that allows you to cache data. What you additionally need (and in my opinion more important is) is a strategy about what to cache and when to invalidate the cache. This cannot be determined by looking at the source code, it comes from how your site is used.
However, an opcode cache (e.g. APC) could be used right away.
Code beautifier is for human not for machine.
As part of the optimization you should take off.
Or simply add a flag checking in your application, certain condition match (like debug mode), it return nicely formatted javascript. Otherwise, whitespace does not mean anything to machine.
APC
You should always use APC to compile & cache php script into op-code.
Why?
changes are hardly make after deployment
if every script is op-code ready, your server does not required to compile plain-text script into binary op-code on the fly
compile once and use many
What are the benefits?
lesser execution cycle to compile plain-text script
lesser memory consume (both related)
a simple math, if a request served in 2 seconds in your current environment, now with APC is served in 0.5 seconds, you gain 4 times better performance, 2 seconds with APC can served 4 requests. That's mean previously you can fit 50 concurrent users, now you can allow 200 concurrent users
Memcache - NO GO?
depends, if you are in single host environment, probably not gain any better. The biggest advantages of memcache is for information sharing & distribution (which mean multiple server environment, cache once and use many).
etc?
static files with expiration header (prime cache concept, no request is fastest, and save bandwidth)
cache your expensive request into memcache/disk-cache or even database (expensive request such as report/statistics generation)
always review your code for best optimization (but do not over-do)
always do benchmark and compare the results (was and current)
fine-tune your apache/tomcat configuration
consider to re-compile PHP with minimum library/extension and load the necessary libraries during run-time only (such as application using mysqli, not using PDO, no reason to keep it)
I was just reading over this thread where the pros and cons of using include_once and require_once were being debated. From that discussion (particularly Ambush Commander's answer), I've taken away the fact(?) that any sort of include in PHP is inherently expensive, since it requires the processor to parse a new file into OP codes and so on.
This got me to thinking.
I have written a small script which will "roll" a number of Javascript files into one (appending the all contents into another file), such that it can be packed to reduce HTTP requests and overall bandwidth usage.
Typically for my PHP applications, I have one "includes.php" file which is included on each page, and that then includes all the classes and other libraries which I need. (I know this isn't probably the best practise, but it works - the __autoload feature of PHP5 is making this better in any case).
Should I apply the same "rolling" technique on my PHP files?
I know of that saying about premature optimisation being evil, but let's take this question as theoretical, ok?
There is a problem with Apache/PHP on Windows which causes the application to be extremely slow when loading or even touching too many files (page which loads approx. 50-100 files may spend few seconds only with file business). This problem appears both with including/requiring and working with files (fopen, file_get_contents etc).
So if you (or more likely anybody else, due to the age of this post) will ever run your app on apache/windows, reducing the number of loaded files is absolutely necessary for you. Combine more PHP classes into one file (an automated script for it would be useful, I haven't found one yet) or be careful to not touch any unneeded file in your app.
That would depend somewhat on whether it was more work to parse several small files or to parse one big one. If you require files on an as-needed basis (not saying you necessarily should do things that way ) then presumably for some execution paths there would be considerably less compilation required than if all your code was rolled into one big PHP file that the parser had to encode the entirety of whether it was needed or not.
In keeping with the question, this is thinking aloud more than expertise on the internals of the PHP runtime, - it doesn't sound as though there is any real world benefit to getting too involved with this at all. If you run into a serious slowdown in your PHP I would be very surprised if the use of require_once turned out to be the bottleneck.
As you've said: "premature optimisation ...". Then again, if you're worried about performance, use an opcode cache like APC, which makes this problem almost disappear.
This isn't an answer to your direct question, just about your "js packing".
If you leave your javascript files alone and allow them to be included individually in the HTML source, the browser will cache those files. Then on subsequent requests when the browser requests the same javascript file, your server will return a 304 not modified header and the browser will use the cached version. However if your "packing" the javascript files together on every request, the browser will re-download the file on every page load.