I have been researching the best use of OPcache with Joomla.
This github page, The Zend Engine and OPcode caching, is the best explanation of how OPcache works that I've seen and was trying to get answers to a couple of points here.
Resolved filename:
What does "Resolved filename" mean?
What does Opache use as the "Resolved filename" since I use Joomla! CMS and I know that it always call the index.php but passes different parameters is the resolved file name index.php?[querystring]
Timestamp Used:
How does "timestamp" apply with a CMS/Framework system such as Joomla! because since the index.php file never changes it seems to me that the cache would never refresh.
Joomla! CMS Caching system:
Does it makes sense to use the cache in Joomla? It writes the pages it builds out to the file system in the folder named "cache" as php files and those php pages will be called instead of Joomla rebuilding the pages every time
Resolved Filenames
The PHP equivalent of a resolved filename is obtained by the realpath() function. This converts all symbolic links, any references to '/./', '/../' and extra '/' characters in the input path, against the current working directory in the case of a relative filename returning the canonicalized absolute pathname. In other words the resolved filename is a complete filename mapping onto the underlying filesystem. It's not necessarily unique, because of hard links, etc..
OPcache uses the resolved filename as the index into its internal compiled script database for two reasons:
Having relative filenames and embedded symlinks opens all sorts of security and simple application programming beartraps that can cause bugs or enable exploitable vulnerabilities. By using the resolved filename for each script as its key, OPcache avoids these issues.
This also can have material performance benefits with multi installations of packages like phpBB, WordPress, MediaWiki (and I assume Joomla) which typically use a hierarchical PHP directory structure. You can symlink many versions of a common subdirectory onto a shared library folder, and this way separate logical instances of a package can share the same compiled script in the OPcache internal database.
The query parameters are quite separate from a script being executed. The parameters typically vary from request to request depending on the request context but the executed script is the same, and ditto any included scripts for the same processing path.
Script Timestamps
The timestamp of each underlying script file is used by OPcache as a secondary key. This is to enable detection of changes to the underlying script which will normally result in a changed timestamp. There are various opcache INI parameters which can be used to reduce the performance hit as well as OPcache API calls (such as opcache_invalidate()) which can enable sysadmins to do this explicitly.
Since the (standard) OPcache internal cache is entirely in-memory, it does not have a persistent version on, say, the filesystem. Hence it must be rebuild every time the underlying PHP process hierarchy (which is typically web-server specific) is reloaded. And yes, this does result in a startup performance hit whilst the cache is re-primed.
This use of timestamps is to do with the caching of script compilations and quite separate to any application content related caching
Application Caching
What OPcache does is to avoid per-request the compilation costs. For any PHP application based on a framework or a complex package such as Joomla or MediaWiki this can represent typically 50-90% of the per-request CPU cost, hence leading to a 2-10X throughput improvement.
Application caching is application-specific and relate to avoiding per-request costs of execution applications code for duplicated processing of application data.
These are quite separate and to get good application performance, you always should consider doing both.
Related
ionCube stores php files in encrypted format and it is installed as php extension but what i want to know is when I request the encrypted php file from non-encrypted php file how does php compiler executes it.
Does it send the encrypted file to ionCube server and get the original file and compile that or there is something else.
Means how the communication is going on between our server and ionCube. I guess it is through curl but i want to know how it works.
As you may have picked up on now, original code is never obtained, and processing is based on bytecode.
Here's some high level information that may help.
PHP Extensions
PHP has two types of extensions, module extensions such as CURL that typically wrap external APIs and expose their functionality via new PHP functions, and PHP engine extensions. Though the distinction isn't set in stone, engine extensions tend to interact with PHP's compiler and execution engine, though they may add new PHP functions too. ionCube is an engine extension that also adds PHP functions for its API and also to support ionCube24, though used also to be installable as a module extension using dl(). Both kinds of modules are shared libraries, and a single line to the php.ini file is used to add an extension to PHP, with PHP making use of OS functions to dynamically link the library into the running process.
Hooks
PHP has internal hooks that allow an extension to intercept the compile and execute stages of source file processing. An extension might use these simply to perform additional steps before or after regular processing, or replace the usual processing entirely. The ionCube Loader uses the compile hook to examine a file before the PHP engine compiles it, and takes over the task of processing the file if it is an ionCube file. The result of either reading an ionCube file or normal compilation is ultimately bytecode, however ionCube bytecode is non-standard, and with version 9 it may still be encrypted or unavailable for other reasons after initial processing of a file. As the standard execution engine cannot process ionCube bytecode, the Loader also uses the execution hook to take over execution of the compiled code if it was read from an ionCube encoded file.
A further task of the Loader is to allow files produced for certain older version of PHP to run on newer versions, and where necessary the Loader performs on the fly transformations of the compiled code to make it usable on whatever version of PHP is running. PHP internals change significantly from time to time, most recently and most significantly between PHP 5 and 7, making this a challenging but important task for end user experience.
Processing of ionCube files does not require communication with outside servers, however since version 9, code can be protected with encryption keys that only exist when created at runtime by the PHP application itself, and an application developer may write PHP code that makes external calls to obtain data for constructing the decryption keys when required.
Encoded files
In terms of the files themselves, early PHP encoding tools of this type in essence compiled to bytecode and serialised this form directly to files. There was little knowledge and interest in PHP internals among developers in general, and this approach gave good protection and excellent performance. When interest first emerged in producing bytecode decompilers from a hacker group in China called the "Blue Wind" around 2006 ish, simply compiling to bytecode was clearly no longer acceptable. To varying degrees, tools such as ionCube then added more protection around the bytecode to hamper the task of successful reverse engineering. Though steps can be taken to limit the effectiveness of decompilation even if bytecode is recovered, the success at code protection still depends fundamentally on the ability to hide the necessary decoding key(s) though, and all encoding tools of this type store such a key in the encoded file itself.
In evolving code protection for ionCube version 9, a challenge was to address the limitation of stored keys, and the ability to encrypt code without storing the necessary decryption key statically anywhere was the obvious and necessary next step. This was added as a feature called "Dynamic Keys".
Hopefully that gives some insight into how ionCube and in some respects similar tools work. For more detailed knowledge of engine extension implementation, I'd recommend looking at the source code for the PHP OpCache and also Derick Rethans Xdebug.
Disclosure: I am associated with ionCube.
Currently I'm storing the configuration for my PHP scripts in variables and constants within another PHP script (e.g. config.php).
So each time a script is called, it includes the configuration script to gain access to the values of the variables/constants.
Since INI-files are easier to parse by other scripts, I thought about storing values for my configuration in such a file an read it using parse_ini_file().
In my notion PHP keeps script-files in memory, so including a script-file does (usually) not cause IO (Or does Zend do the caching? Or are the sources not cached at all?).
How is it with reading custom INI-files. I know that for .user.ini there is caching (see user_ini.cache_ttl), but does PHP also cache custom INI-files?, or does a call to parse_ini_file() always cause IO?
Summary
The time required to load configuration directives (which is not the same as the time needed by the app to perform those directives) is usually negligible - below one millisecond for most "reasonably sized" configurations. So don't worry - INI, PHP, or JSON are, performance wise, all equally good choices. Even if PHP were ten times faster than JSON, that would be like loading in 0.001s instead of 0.01s; very few will ever notice.
That said, there are considerations when deciding where to store config data.
.ini vs .php config storage
Time to load: mostly identical unless caching is involved (see below), and as I said, not really important.
ease of use: .ini is easier to read and modify for a human. This may be an advantage, or a disadvantage (if the latter, think integrity check).
data format: PHP can store more structured data than .ini files, unless really complicated workarounds are used. But consider the possibility of using JSON instead of INI.
More structured data means that you can more easily create a "super configuration" PHP or JSON holding the equivalent of several INI files, while keeping information well isolated.
automatic redundancy control: PHP file inclusion can be streamlined with require_once.
user modifications: there are visual INI and JSON editors that can allow a user to modify a INI or JSON file while keeping it, at least, syntactically valid. Not so for PHP (you would need to roll your own).
Caching
The PHP core does not do caching. Period. That said, you'll never use the PHP core alone: it will be loaded as a (fast)CGI, an Apache module, et cetera. Also you might not use a "barebones" installation but you could have (chances are that you will have) several modules installed.
Both the "loader" part and the "module" part might do caching; and their both doing this could lead to unnecessary duplications or conflicts, so it is worth checking this out:
the file (but this does not change between INI, JSON and PHP files) will be cached into the filesystem I/O subsystem layer and, unless memory is really at a premium, will be loaded from there (on a related note, this is one of the reasons why not all filesystems are equally good for all websites).
if you need the configuration in several files, and use require_once in all of them, the configuration will be loaded once only, as soon as it is needed. This is not caching, but it is a performance improvement nonetheless.
several modules exist (Zend, opcache, APC, ...) that will cache all PHP files, configuration included. They will not cache INI files, though.
the caching done by modules (e.g. opcache) can (a) ignore further modifications to the file system, which means that upon modifying a PHP file, you'll need to somehow reload or invalidate the cache; how to do this changes from module to module; (b) implement shortcuts that might conflict with either the file system data management or its file structure (famously, opcache can ignore the path part of a file, allowing for much faster performances unless you have two files with the same name in different directories, when it risks loading one instead of the other).
Performance enhancement: cache digested data instead of config directives
Quite often it will be the case that depending on some config directive, you will have to perform one of several not trivial operations. Then you will use the results for the actual output.
What slows down the workflow in this case is not reading whether, say, "config.layout" is "VERTICAL" or "HORIZONTAL", but actually generating the layout (or whatever else). In this case you might reap huge benefits by storing the generated object somewhere:
serialized inside a file (e.g. cache/config.layout.vertical.html.gz). You will probably need to deploy some kind of 'stale data check' if the layout changes, or some kind of cache invalidation procedure. (For layouts specifically, you could check out Twig, which also does parameterized template caching).
inside a keystore, such as Redis.
in a RDBMS database such as MySQL (even if that's overkill - you'd use it as a keystore, basically).
faster NoSQL alternatives such as MongoDB.
Additional options
You will probably want to read about client caching and headers, and possibly explore whatever options your hosting offers (load balancers, HTTP caches such as Varnish, etc.).
parse_ini_file() uses standard operations to convert the file into an array.
i'm new in PHP and want to try caching(for the first time), so i make website and it has :
dynamic home page
dynamic portfolio page
dynamic contact page
static about page
static admin page
so i read the tutorial about caching and i try to make my own caching system:
using file cache based on the what page is requested, when the page is requested the cache system will check if there's cache in cache directory if there's no cache file yet then write all the output(html) from the php script(in this case output from output buffer) and if there's cache file that corresponds with the specific id(based on URI) then just include_once() the html file.
Then i read in CodeIgniter(i make this website using CI) says there's APC for caching, then i read again about APC, what i read about APC is that it caches the DB results, but now i'm confused which should i use
what i get so far:
file caching probably would slower if there's alot of request (i dont know if this is true or not but i read it somewhere from search engine)
APC is fast
but i'm still confused which i should use , i'm on shared hosting
The levels of caching most relevant in a PHP application:
File / Script caching - The operating system will actually do this to a large extent. When a file is opened it's added to an OS-level cache. It stays there until the file is touched or the OS needs to free memory for other processes. A homegrown PHP solution isn't a good replacement for this.
Opcode caching - In order to function, PHP needs to parse and compile a script into opcodes. A mechanism like APC will cache the opcodes of every PHP script executed by Apache, provided that the cache doesn't overflow. A homegrown PHP solution build on top of APC can partially do this, but APC already does it ... so don't bother.
Query caching - If your script accesses a lot of data that doesn't change very frequently, or wherein some latency between updates and the visibility of those updates is acceptable, caching the results from complex queries is beneficial. A homegrown PHP solution built on APC is acceptable and beneficial at this level. But a database level solution is also appropriate here, and often more appropriate.
Output caching - If your page is largely deterministic and/or the same sort of latency applicable to query caching is acceptable, you can cache the entire output of the script using output buffering and APC. A homegrown PHP solution built on APC is acceptable here, but generally not necessary. If the page is static, you're probably not saving yourself any re-computation. And if it's dynamic, it's usually preferable to just re-render the page anyway.
In a dedicated or virtual-dedicated environment you'd need install APC (or something similar) yourself. But, in a shared hosting environment, it's very likely that APC is installed. And if it weren't you couldn't install it yourself anyway.
And, due to my own uncertainty, I'd recommend not performing any query or output caching with APC in a shared environment -- I'm not sure whether APC segregates caches by virtual host. Even if it does, I wouldn't assume that my site is truly a separate virtual host.
My application based on Zend Framework and Doctrine includes > 300 files on each request. They are mostly the same files.
This is a quite huge overhead. Partially solved by Zend_Cache (and Memcache), but not all the pages may be cached.
How to reduce this number? How to speed up?
Doctrine has an option to compile the needed files which seems quite rational for production server and final version of the app.
My plan is to compile other libraries too (I have already stripped all require_once's).
Are there any tools for this task? Maybe some cache drivers do it automatically? How to set them up?
The overhead of php file inclusions can usually be countered with an opcode cache such as APC, an extension available through pecl. Opcode caches generally work by caching the compiled bytecode so that the overhead of reading and parsing the source is only incurred on the first request. This will greatly negate the need or benefit of any source compilation on your php files.
Best option is to use APC or Zend_Accelerator. But still you can make these "compilation" scripts that merge classes together into one file. That lowers the required IO to minimum. Unfortunately, you also need to rewrite the autoloading process so that it looks into appropriate file. You can usually condense common classes together (Zend_Form + Elements + Decorators, frequently used validators, Request + Response + Router + Controller, Zend_Db + adapters + Zend_Db_Select, etc.). Mainly the classes always used on each request can be easily condensed and included manually in one file. Best way is to add debug call, that save all included files (http://www.php.net/get_included_files) into DB and then:
SELECT * FROM files GROUP BY filename WHERE COUNT(filename) = $numOfRequests
All the files in the result can be safely merged into a single file and included before bootstraping :)
I've been using rails, merb, django and asp.net mvc applications in the past. What they have common (that is relevant to the question) is that they have code that sets up the framework. This usually means creating objects and state that is persisted until the web server is recycled (like setting up routing, or checking which controllers are available, etc).
As far as I know PHP is more like a CGI script that gets compiled to some bytecode each time it's run, and after the request it's discarded. Of course you can have sessions, to persist data between requests from the same user, and as I see there are extensions like APC, with which you can persist objects between requests at the server level.
My question is: how can one create a PHP application that works like rails and such? I mean an application that on the first requests sets up the framework, then on the 2nd and later requests use the objects that are already set up. Is there some built in caching facility in mod_php? (for example that stores the compiled bytecode of the executed php applications) Or is using APC or some similar extensions the only way to solve this problem? How would you do it?
Thanks.
EDIT: Alternative question: if I create a large PHP application that has a very large set up time, but minor running time (like in the frameworks mentioned above) then how should I "cache" the things that are already set up (this might mean a lot of things, except for maybe the database connections, because for that you have persistent connections in PHP already).
To justify large set up time: what if I'm using PHP reflection to check what objects are available and set the runtime according to that. Doing a lot of reflection is usually slow, but one has to do it only once (and re-evaluate only if the source code is modified).
EDIT2: It seems it's APC then. The fact that it caches bytecode automatically is good to know.
Not sure if APC is the only solution but APC does take care of all your issues.
First, your script will be compiled once with APC and the bytecode is stored in memory.
If you have something taking long time to setup, you can also cache it in APC as user data. For example, I do this all the time,
$table = #apc_fetch(TABLE_KEY);
if (!$table) {
$table = new Table(); // Take long time
apc_store(TABLE_KEY, $table);
}
With APC, the task of creating table is only performed once per server instance.
PHP (and ruby for that matter) are interpretive languages. That is they parse the files each time they are requested and I suppose you could say are converted to a pseudo byte code. It is more 'apparent' one could say that PHP is more like this than say RoR but they both behave the same way.
The feature of persisting data between requests is a feature of the server not of the language itself. For example, the RoR routing you speak of is in fact cached but that's cached in the server's local memory. It isn't compiled and stored for faster readins. The server (and by server I mean both the box & the web service instances) restarts this information is gone. The 'setting up the framework' you speak of still involves parsing EACH file involved in the framework. Rails parses each file during the request again and again, the production level features may in fact cache this data in memory but certainly in development it does not. The only reason I mention that is because it illustrates that it's a feature of the server not the language.
To achieve the same thing in PHP you could use Zend Server. As far as I know this is the only PHP interpreter that will 'compile' and use byte code when told to. Otherwise you'll need to find a way to store the data you want to persist over requests. APC as you mentioned is a very powerful feature, a more distributed one is Memcached and then of course there's more persistent forms like disc & sql.
I am interested in knowing why you'd like this particular feature. Are you noticing performance issues that would be 'solved' by doing this?
I think you're making some incorrect generalizations. All of those frameworks (ex: Rails) can be run with different configurations. Under some, a process is created for every request. This obviously hurts performance, but it shows that these frameworks don't rely on a long-running process. They can set things up (reparse config files, create objects, etc.) every request if needed.
Of course, mod_php (the way PHP is usually used) runs inside the web server process, unlike CGI. So I don't see anything fundamentally different between CakePHP (for example) and Rails.
I think perhaps you are looking for something like Python's WSGI or Ruby's Rack, but for PHP. This specifies an interface (independent of how the language is run) for an application. For a new request, a new instance of an application object is created. As far as I know, this does not exist for PHP.