(How) does PHP cache script and custom INI-files?

(How) does PHP cache script and custom INI-files? - php

Currently I'm storing the configuration for my PHP scripts in variables and constants within another PHP script (e.g. config.php).
So each time a script is called, it includes the configuration script to gain access to the values of the variables/constants.
Since INI-files are easier to parse by other scripts, I thought about storing values for my configuration in such a file an read it using parse_ini_file().
In my notion PHP keeps script-files in memory, so including a script-file does (usually) not cause IO (Or does Zend do the caching? Or are the sources not cached at all?).
How is it with reading custom INI-files. I know that for .user.ini there is caching (see user_ini.cache_ttl), but does PHP also cache custom INI-files?, or does a call to parse_ini_file() always cause IO?

Summary
The time required to load configuration directives (which is not the same as the time needed by the app to perform those directives) is usually negligible - below one millisecond for most "reasonably sized" configurations. So don't worry - INI, PHP, or JSON are, performance wise, all equally good choices. Even if PHP were ten times faster than JSON, that would be like loading in 0.001s instead of 0.01s; very few will ever notice.
That said, there are considerations when deciding where to store config data.
.ini vs .php config storage
Time to load: mostly identical unless caching is involved (see below), and as I said, not really important.
ease of use: .ini is easier to read and modify for a human. This may be an advantage, or a disadvantage (if the latter, think integrity check).
data format: PHP can store more structured data than .ini files, unless really complicated workarounds are used. But consider the possibility of using JSON instead of INI.
More structured data means that you can more easily create a "super configuration" PHP or JSON holding the equivalent of several INI files, while keeping information well isolated.
automatic redundancy control: PHP file inclusion can be streamlined with require_once.
user modifications: there are visual INI and JSON editors that can allow a user to modify a INI or JSON file while keeping it, at least, syntactically valid. Not so for PHP (you would need to roll your own).
Caching
The PHP core does not do caching. Period. That said, you'll never use the PHP core alone: it will be loaded as a (fast)CGI, an Apache module, et cetera. Also you might not use a "barebones" installation but you could have (chances are that you will have) several modules installed.
Both the "loader" part and the "module" part might do caching; and their both doing this could lead to unnecessary duplications or conflicts, so it is worth checking this out:
the file (but this does not change between INI, JSON and PHP files) will be cached into the filesystem I/O subsystem layer and, unless memory is really at a premium, will be loaded from there (on a related note, this is one of the reasons why not all filesystems are equally good for all websites).
if you need the configuration in several files, and use require_once in all of them, the configuration will be loaded once only, as soon as it is needed. This is not caching, but it is a performance improvement nonetheless.
several modules exist (Zend, opcache, APC, ...) that will cache all PHP files, configuration included. They will not cache INI files, though.
the caching done by modules (e.g. opcache) can (a) ignore further modifications to the file system, which means that upon modifying a PHP file, you'll need to somehow reload or invalidate the cache; how to do this changes from module to module; (b) implement shortcuts that might conflict with either the file system data management or its file structure (famously, opcache can ignore the path part of a file, allowing for much faster performances unless you have two files with the same name in different directories, when it risks loading one instead of the other).
Performance enhancement: cache digested data instead of config directives
Quite often it will be the case that depending on some config directive, you will have to perform one of several not trivial operations. Then you will use the results for the actual output.
What slows down the workflow in this case is not reading whether, say, "config.layout" is "VERTICAL" or "HORIZONTAL", but actually generating the layout (or whatever else). In this case you might reap huge benefits by storing the generated object somewhere:
serialized inside a file (e.g. cache/config.layout.vertical.html.gz). You will probably need to deploy some kind of 'stale data check' if the layout changes, or some kind of cache invalidation procedure. (For layouts specifically, you could check out Twig, which also does parameterized template caching).
inside a keystore, such as Redis.
in a RDBMS database such as MySQL (even if that's overkill - you'd use it as a keystore, basically).
faster NoSQL alternatives such as MongoDB.
Additional options
You will probably want to read about client caching and headers, and possibly explore whatever options your hosting offers (load balancers, HTTP caches such as Varnish, etc.).

parse_ini_file() uses standard operations to convert the file into an array.

Related

Does php running on a webserver share the code segment

Alternatively does it copy the compiled code from cache (APC) or load it from disk for each call.
The reason I am asking this is that I have a large data structure that I have initialized in a file. Now I worry about the performance of the script.

PHP is actually very efficient at dealing with large data-structures. It is, however, always going to store them in memory (which is not shared between calls). If it is large enough, you might want to consider buffering it piece by piece or storing it in a datastore of some sort. If your data file is 100MiB, you're going to be loading at least 100MiB plus the memory required by PHP with every call.
APC wont entirely help this situation either. When PHP loads initially (without APC), it will perform the following steps:
Read the entire file into memory
Lexing, it tokenizes the file into standard codes or "Lexicons" for the parser to read
Parsing, it then utilizes the tokens in the file to generate the required expressions for complication
Compiling, It takes the expressions and creates "opt-codes" similar to how Java "compiles"
Executing, the opt-codes are executed in the PHP runtime (data actually gets manipulated here)
You might have noticed that steps 1-4 are all redundant with multiple calls which is why compiled languages have a dedicated compiler to perform these steps and either a Runtime, VM, or OS to run the generated bytecode or binary on. APC actually tries to give PHP that same edge: by precompiling each file, it can then store the (typically smaller) precompiled, opt-code file into memory and access it when someone accesses the page.
The problem with your use-case is that this does absolutely nothing for literal data within a file. Data still must be declared and wont even be touched until step 5, which is why I am emphasizing the importance of perhaps using an external data store if you see a significant performance hit.
Please use a profile like XDebug or something similar to gain some more insight on what is actually slowing your script down (if at all) so you can make a more informed decision on where to go from here.

Opcache with Joomla! and Resolved Name

I have been researching the best use of OPcache with Joomla.
This github page, The Zend Engine and OPcode caching, is the best explanation of how OPcache works that I've seen and was trying to get answers to a couple of points here.
Resolved filename:
What does "Resolved filename" mean?
What does Opache use as the "Resolved filename" since I use Joomla! CMS and I know that it always call the index.php but passes different parameters is the resolved file name index.php?[querystring]
Timestamp Used:
How does "timestamp" apply with a CMS/Framework system such as Joomla! because since the index.php file never changes it seems to me that the cache would never refresh.
Joomla! CMS Caching system:
Does it makes sense to use the cache in Joomla? It writes the pages it builds out to the file system in the folder named "cache" as php files and those php pages will be called instead of Joomla rebuilding the pages every time

Resolved Filenames
The PHP equivalent of a resolved filename is obtained by the realpath() function. This converts all symbolic links, any references to '/./', '/../' and extra '/' characters in the input path, against the current working directory in the case of a relative filename returning the canonicalized absolute pathname. In other words the resolved filename is a complete filename mapping onto the underlying filesystem. It's not necessarily unique, because of hard links, etc..
OPcache uses the resolved filename as the index into its internal compiled script database for two reasons:
Having relative filenames and embedded symlinks opens all sorts of security and simple application programming beartraps that can cause bugs or enable exploitable vulnerabilities. By using the resolved filename for each script as its key, OPcache avoids these issues.
This also can have material performance benefits with multi installations of packages like phpBB, WordPress, MediaWiki (and I assume Joomla) which typically use a hierarchical PHP directory structure. You can symlink many versions of a common subdirectory onto a shared library folder, and this way separate logical instances of a package can share the same compiled script in the OPcache internal database.
The query parameters are quite separate from a script being executed. The parameters typically vary from request to request depending on the request context but the executed script is the same, and ditto any included scripts for the same processing path.
Script Timestamps
The timestamp of each underlying script file is used by OPcache as a secondary key. This is to enable detection of changes to the underlying script which will normally result in a changed timestamp. There are various opcache INI parameters which can be used to reduce the performance hit as well as OPcache API calls (such as opcache_invalidate()) which can enable sysadmins to do this explicitly.
Since the (standard) OPcache internal cache is entirely in-memory, it does not have a persistent version on, say, the filesystem. Hence it must be rebuild every time the underlying PHP process hierarchy (which is typically web-server specific) is reloaded. And yes, this does result in a startup performance hit whilst the cache is re-primed.
This use of timestamps is to do with the caching of script compilations and quite separate to any application content related caching
Application Caching
What OPcache does is to avoid per-request the compilation costs. For any PHP application based on a framework or a complex package such as Joomla or MediaWiki this can represent typically 50-90% of the per-request CPU cost, hence leading to a 2-10X throughput improvement.
Application caching is application-specific and relate to avoiding per-request costs of execution applications code for duplicated processing of application data.
These are quite separate and to get good application performance, you always should consider doing both.

Php apc opcode cache - caching whole files vs variables

Are entire php files added to apc just by using and enabling it?
I understand how fetch and store works with variables, but when should this be used? Is the caching of whole files done automatically? If a variable is cached - should it only be a global variable or a user-specific variable?

Generally, you should cache Database responses that don't need to be updated frequently, but is accessed frequently. This data need not be from a database — could also be from a file or any type of data store. The key is to feed the most popular things from cache/memory to avoid i/o which is costly.
Take a look at this answer for a good explanation of Opcode caching. Opcode caching basically just stores your PHP file on memory, so that it can be interpreted quicker when run.
APC works automatically, and detects changes to your file to see if it needs re-caching. Quoting from the above answer:
The apc.stat option defines whether APC should examine the last modification date/time of a file to decide between using the opcodes from RAM, or re-compiling the file if it is more recent that the opcodes in RAM.
Also, to answer your global vs user specific question. It all depends on exposure, you should cache anything with large amounts of exposure. But generally user-specific data will have less exposure than global data.

cache methods in php?

what are the available cache methods i could use in php ?
Cache HTML output
Cache some variables
it would be great to implement more than one caching method , so i need them all , all the available out there (i do caching currently with files , any other ideas ?)

Most PHP build don't have a caching mechanism built in. There are extensions though that can take care of caching for you.
Have a look at APC or MemCache

If you are using a framework, then most come with some form of caching mechanism that you can use e.g. Zend Framework's Zend_Cache.
If you are not using a framework then the APC or Memcache as Pelle ten Cate mentioned can be used. The correct approach to use does depend in your situation though, do you have your website or application running on more than server and does the information in the cache need to be shared between those servers? (if yes then something like memcache is your answer, or maybe a database or distributed NoSQL solution if you are feeling brave).
If you code is only running on the one server you could try something simple like serializing your variables, and writing them to disk, then on every request afterwards, see if the files exists, if it does, open it and unserialize the string into the variable you need.
This though is only worth it if it would take a long time to generate the varaible normally,
(e.g longer than it would to open,read,unserialize the file on disk)
For HTML caching you are generally going to get the most mileage from using a proxy like Varnish or Squid to do it for you but i realise that this may not be an option for you.
If its not then you could the write to disk approach i mentioned above, and save chunks of HTML to files. look in the PHP manual for ob_start and its friends.

Since every PHP run starts from scratch on page request, there is nothing that would persist between calls, making cacheing moot.
Well, that's the basic view. Of course there are ways to implement a caching, sort of - and a few packages and extensions do so (like Zend Extensions and APC). However, you should have a very close look whether it actually improves performance. Other methods like memcache (for DB results), or switching from PHP to e.g. Java will often yield better results.
You can store variables in the $_SESSION, but you shouldn't keep larger HTML there.
Please check what you are actually trying to do. "Bytecode cacheing" (that is, saving PHP parsing time) needs to be done by the PHP runtime executable. For cacheing Database (SQL) request/reply-pairs, there is memcache. Cacheing HTML output can be done, but is often not a good idea.
See also an earlier answer on a similar question.

Reduce number of included files

My application based on Zend Framework and Doctrine includes > 300 files on each request. They are mostly the same files.
This is a quite huge overhead. Partially solved by Zend_Cache (and Memcache), but not all the pages may be cached.
How to reduce this number? How to speed up?
Doctrine has an option to compile the needed files which seems quite rational for production server and final version of the app.
My plan is to compile other libraries too (I have already stripped all require_once's).
Are there any tools for this task? Maybe some cache drivers do it automatically? How to set them up?

The overhead of php file inclusions can usually be countered with an opcode cache such as APC, an extension available through pecl. Opcode caches generally work by caching the compiled bytecode so that the overhead of reading and parsing the source is only incurred on the first request. This will greatly negate the need or benefit of any source compilation on your php files.

Best option is to use APC or Zend_Accelerator. But still you can make these "compilation" scripts that merge classes together into one file. That lowers the required IO to minimum. Unfortunately, you also need to rewrite the autoloading process so that it looks into appropriate file. You can usually condense common classes together (Zend_Form + Elements + Decorators, frequently used validators, Request + Response + Router + Controller, Zend_Db + adapters + Zend_Db_Select, etc.). Mainly the classes always used on each request can be easily condensed and included manually in one file. Best way is to add debug call, that save all included files (http://www.php.net/get_included_files) into DB and then:
SELECT * FROM files GROUP BY filename WHERE COUNT(filename) = $numOfRequests
All the files in the result can be safely merged into a single file and included before bootstraping :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.