How to persist objects between requests in PHP

How to persist objects between requests in PHP - php

I've been using rails, merb, django and asp.net mvc applications in the past. What they have common (that is relevant to the question) is that they have code that sets up the framework. This usually means creating objects and state that is persisted until the web server is recycled (like setting up routing, or checking which controllers are available, etc).
As far as I know PHP is more like a CGI script that gets compiled to some bytecode each time it's run, and after the request it's discarded. Of course you can have sessions, to persist data between requests from the same user, and as I see there are extensions like APC, with which you can persist objects between requests at the server level.
My question is: how can one create a PHP application that works like rails and such? I mean an application that on the first requests sets up the framework, then on the 2nd and later requests use the objects that are already set up. Is there some built in caching facility in mod_php? (for example that stores the compiled bytecode of the executed php applications) Or is using APC or some similar extensions the only way to solve this problem? How would you do it?
Thanks.
EDIT: Alternative question: if I create a large PHP application that has a very large set up time, but minor running time (like in the frameworks mentioned above) then how should I "cache" the things that are already set up (this might mean a lot of things, except for maybe the database connections, because for that you have persistent connections in PHP already).
To justify large set up time: what if I'm using PHP reflection to check what objects are available and set the runtime according to that. Doing a lot of reflection is usually slow, but one has to do it only once (and re-evaluate only if the source code is modified).
EDIT2: It seems it's APC then. The fact that it caches bytecode automatically is good to know.

Not sure if APC is the only solution but APC does take care of all your issues.
First, your script will be compiled once with APC and the bytecode is stored in memory.
If you have something taking long time to setup, you can also cache it in APC as user data. For example, I do this all the time,
$table = #apc_fetch(TABLE_KEY);
if (!$table) {
$table = new Table(); // Take long time
apc_store(TABLE_KEY, $table);
}
With APC, the task of creating table is only performed once per server instance.

PHP (and ruby for that matter) are interpretive languages. That is they parse the files each time they are requested and I suppose you could say are converted to a pseudo byte code. It is more 'apparent' one could say that PHP is more like this than say RoR but they both behave the same way.
The feature of persisting data between requests is a feature of the server not of the language itself. For example, the RoR routing you speak of is in fact cached but that's cached in the server's local memory. It isn't compiled and stored for faster readins. The server (and by server I mean both the box & the web service instances) restarts this information is gone. The 'setting up the framework' you speak of still involves parsing EACH file involved in the framework. Rails parses each file during the request again and again, the production level features may in fact cache this data in memory but certainly in development it does not. The only reason I mention that is because it illustrates that it's a feature of the server not the language.
To achieve the same thing in PHP you could use Zend Server. As far as I know this is the only PHP interpreter that will 'compile' and use byte code when told to. Otherwise you'll need to find a way to store the data you want to persist over requests. APC as you mentioned is a very powerful feature, a more distributed one is Memcached and then of course there's more persistent forms like disc & sql.
I am interested in knowing why you'd like this particular feature. Are you noticing performance issues that would be 'solved' by doing this?

I think you're making some incorrect generalizations. All of those frameworks (ex: Rails) can be run with different configurations. Under some, a process is created for every request. This obviously hurts performance, but it shows that these frameworks don't rely on a long-running process. They can set things up (reparse config files, create objects, etc.) every request if needed.
Of course, mod_php (the way PHP is usually used) runs inside the web server process, unlike CGI. So I don't see anything fundamentally different between CakePHP (for example) and Rails.
I think perhaps you are looking for something like Python's WSGI or Ruby's Rack, but for PHP. This specifies an interface (independent of how the language is run) for an application. For a new request, a new instance of an application object is created. As far as I know, this does not exist for PHP.

Related

Why does PHP use opcode caches while Java compiles to bytecode files?

From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.
Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.
The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.
Another approach, done by Facebook, is to completely compile the PHP code to a different language.
So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.

The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.
PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.

PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.
Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.
Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.

It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)

PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.
It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.
Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.
But the classes make this even more complicated, which means that there are potential code fragments like
if(<conditon>)
{
class XYZ() { }
}
else
{
class XYZ() { }
}
class ABC extends XYZ {}
Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.
The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.
But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)
PS: I'm the guy who can't find time to update APC for 5.4 properly

I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).

Is it possible not to load the bootstrapping mechanism at every call?

This is not a PHP question, but my expertise is with PHP frameworks.
A lot of frameworks have a bootstrapping (loading of classes and files) mechanism. (Drupal, Zend Framework to name a few)
Everytime that you make a request, the complete bootloading process needs to be repeated. And it can be optimized using APC by automatically caching some intermediate code
The general question is:
For any language, is there any way to not load the complete bootstrapping process? Is there any way of "caching" the state (or starting at) at the end of the bootstraping process to not load everything again? (maybe the answer is in some other language/framework/pattern)
It looks to me as extremely inefficient.

In general, it's quite possible to perform bootstrap / init code once per process, instead of having to reload it for every request. In your specific case, I don't think this is possible with PHP (but my knowledge of PHP is limited). I know I have seen this as a frequently criticism of PHP's architecture... but to be fair to PHP, it's not the only language or framework that does things this way. To go into some detail...
The style of "run everything for every request" came about with "CGI" scripts (c.f. Common Gateway Interface), which were essentially just programs that got executed as a separate process by the webserver whenever a request came in matching the file, and predefined environmental variables would be set providing meta information. The file could be basically any executable, written in any language. Since this was basically the first way anyone came up with of doing server-side scripting, a number of the first languages to integrate into a webserver used the cgi interface, Perl and PHP among them.
To eliminate the inefficiency you identified, the a second method was devised, which used plugins into the webserver itself... for Apache, this includes mod_perl for Perl, and mod_python for Python (the latter now replaced by mod_wsgi for Python). Using these plugins, you could configure the server to identify a program to load once per process, which then does the requisite initialization, loads it's persistent state into memory, and offers up a single function for the server to call whenever there is a request. This can lead to some extremely fast frameworks, as well as things such as easy database connection pooling.
The other solution that was devised was to write a web server (usually stripped down) in the language required, and then use the real webserver to act as a proxy for the complicated requests, while still serving static files directly. This route is also used frequently by Python (quite often via the server provided by the 'Paste' project). It's also used by Java, through the Tomcat webserver. These servers, in turn, offer approximately the same interface as I mentioned in the last paragraph.

The short answer is: in PHP there's no good way to skip the bootstrapping. (Technically you could run a PHP service 24/7 that ran forked children to handle requests, but that's not going to make your life any better.)
A good framework shouldn't do much in bootstrapping. In my personal one that I use, it simply registers an autoload function for classes, loads the config settings from MemCache, and connects to a database.
At that point, it parses the request and sends it to the proper controller / action. While creating the new router object every time is a "waste," the actual process of handling the request needs to be done regardless if the bootstrapping process is magically "cached" between requests.
So I would measure the time it takes between starting the page and getting to the action method to see if it's even a problem. If the framework is doing expensive things related to configuration and class loading, you should be able to minimize that via storing the end results in memcache.
Note that you should always be using an opcode cache (e.g. APC) and a persistent SAPI (e.g., php-fpm) in production. Otherwise, there is a lot of overhead with starting up and shutting down.

I would suggest you to look into FastCGI and C/C++ interface if you want to handle multiple requests. Usually it brings many problems (such as data caching / flushing, memory leaks etc), but can raise performance 10-100 times.
PHP is more suitable for web interface, and if you need fast-processing then you can write a persistent handler.
Also take a look at Java / Tomcat, Python and mod_perl. Some people have also suggested xcache.
For the PHP frameworks they do need to support a multi-request structure in the core, and I'm not aware of any framework doing that.
However said that, I'd love to have a project which would let PHP script to respond to multiple requests inside a loop. Not simultaneously, but bypassing the initialization.
Also you can take a look at https://github.com/kvz/system_daemon, and http://gearman.org/.

cache methods in php?

what are the available cache methods i could use in php ?
Cache HTML output
Cache some variables
it would be great to implement more than one caching method , so i need them all , all the available out there (i do caching currently with files , any other ideas ?)

Most PHP build don't have a caching mechanism built in. There are extensions though that can take care of caching for you.
Have a look at APC or MemCache

If you are using a framework, then most come with some form of caching mechanism that you can use e.g. Zend Framework's Zend_Cache.
If you are not using a framework then the APC or Memcache as Pelle ten Cate mentioned can be used. The correct approach to use does depend in your situation though, do you have your website or application running on more than server and does the information in the cache need to be shared between those servers? (if yes then something like memcache is your answer, or maybe a database or distributed NoSQL solution if you are feeling brave).
If you code is only running on the one server you could try something simple like serializing your variables, and writing them to disk, then on every request afterwards, see if the files exists, if it does, open it and unserialize the string into the variable you need.
This though is only worth it if it would take a long time to generate the varaible normally,
(e.g longer than it would to open,read,unserialize the file on disk)
For HTML caching you are generally going to get the most mileage from using a proxy like Varnish or Squid to do it for you but i realise that this may not be an option for you.
If its not then you could the write to disk approach i mentioned above, and save chunks of HTML to files. look in the PHP manual for ob_start and its friends.

Since every PHP run starts from scratch on page request, there is nothing that would persist between calls, making cacheing moot.
Well, that's the basic view. Of course there are ways to implement a caching, sort of - and a few packages and extensions do so (like Zend Extensions and APC). However, you should have a very close look whether it actually improves performance. Other methods like memcache (for DB results), or switching from PHP to e.g. Java will often yield better results.
You can store variables in the $_SESSION, but you shouldn't keep larger HTML there.
Please check what you are actually trying to do. "Bytecode cacheing" (that is, saving PHP parsing time) needs to be done by the PHP runtime executable. For cacheing Database (SQL) request/reply-pairs, there is memcache. Cacheing HTML output can be done, but is often not a good idea.
See also an earlier answer on a similar question.

Seriously speeding up PHP?

I've been writing PHP for years, and have used every framework under the sun, but one thing has always bugged me... and that's that the whole bloody thing has to be interpreted and executed every time someone tells my server they want the page served.
I've experimented with caching, FastCGI, the Zend Job Queue (and symfony plug-ins that do similar - as well as my own DB-based solutions that implement the System_Daemon class to run background processes) and I've managed to make my apps fairly quick using all that stuff... but I can't get over the mental block that my settings files, system/environment check functions, and all the stuff that should only really be loaded ONCE... loads every darn time someone hits my page.
So, my ramble leads to the following Q--
Is there some method/technique for loading certain aspects of PHP into RAM so that when that page is requested, all my settings.yml files, system checks, framework files, cached pages etc can be loaded directly from memory without ever even touching the HD... or needing to go through the same loading mechanism 50,000 times per day to init the program?
If there's nothing in PHP... are there any other 'web' languages that can be compiled in this way, to allow for true init-once apps?

I think you should give memcached a try, if you're talking about caching data. I think PHP is fairly proficient in caching compiled php-pages if you use stuff like mod_php in apache (which doesn't die in between requests).

Take a look on APC (Alternative PHP Cache), it keeps a cache of compiled files (PHP Opcode) and also lets you store random variables on memory with apc_fetch, apc_store.
The instalation is very simple and it really gives a boost on performance.

Create a full page cache on the ram disk and make your web server serve the page from there. This is a method that wordpress supercache plugin uses and it works great if your web site is suitable for full page caching. This whay you are not even invoking the PHP interpreter.
For users that are logged in (have an open session) you can create a rewrite condition that will redirect their request to the PHP engine.
Also, always use an opcode cache like APC and use it for caching config files (memcache is also fine).

If you are asking for a JVM/Tomcat like application server, then the answer is likely no. To my knowledge nothing (usable) like this exists for PHP. PHP uses a shared-nothing architecture, so it is by design everything is setup on all requests. But actually, this makes PHP scale pretty well.
As for speeding up your apps, try to use memcached and a code accelerator. Maybe look into Zend Server to get a complete package.

Regarding your last question, I believe at least most of the Python and Ruby web frameworks work like that.
Ruby web applications are nowadays built so that the app is only initialized once per server process. When requests come in, the server (Apache, for example) passes them to the web application (over Rack interface) which is running on the background.
This is how web frameworks based on Rack work. Older versions of Ruby on Rails were similar, although they used a different interface to talk to the web server.

I'd keep an eye on the Facebook Engineering ppage (http://www.facebook.com/notes.php?id=9445547199), every now and then they come up with posts about how they keep things fast/optimize/scale. I think they're use of php is super impressive.

PHP vs. long-running process (Python, Java, etc.)?

I'd like to have your opinion about writing web apps in PHP vs. a long-running process using tools such as Django or Turbogears for Python.
As far as I know:
- In PHP, pages are fetched from the hard-disk every time (although I assume the OS keeps files in RAM for a while after they've been accessed)
- Pages are recompiled into opcode every time (although tools from eg. Zend can keep a compiled version in RAM)
- Fetching pages every time means reading global and session data every time, and re-opening connections to the DB
So, I guess PHP makes sense on a shared server (multiple sites sharing the same host) to run apps with moderate use, while a long-running process offers higher performance with apps that run on a dedicated server and are under heavy use?
Thanks for any feedback.

After you apply memcache, opcode caching, and connection pooling, the only real difference between PHP and other options is that PHP is short-lived, processed based, while other options are, typically, long-lived multithreaded based.
The advantage PHP has is that its dirt simple to write scripts. You don't have to worry about memory management (its always released at the end of the request), and you don't have to worry about concurrency very much.
The major disadvantage, I can see anyways, is that some more advanced (sometimes crazier?) things are harder: pre-computing results, warming caches, reusing existing data, request prioritizing, and asynchronous programming. I'm sure people can think of many more.
Most of the time, though, those disadvantages aren't a big deal. You can scale by adding more machines and using more caching. The average web developer doesn't need to worry about concurrency control or memory management, so taking the minuscule hit from removing them isn't a big deal.

With APC, which is soon to be included by default in PHP compiled bytecode is kept in RAM.
With mod_php, which is the most popular way to use PHP, the PHP interpreter stays in web server's memory.
With APC data store or memcache, you can have persistent objects in RAM instead of for example always creating them all anew by fetching data from DB.
In real life deployment you'd use all of above.

PHP is fine for either use in my opinion, the performance overheads are rarely noticed. It's usually other processes which will delay the program. It's easy to cache PHP programs with something like eAccelerator.

As many others have noted, PHP nor Django are going to be your bottlenecks. Hitting the hard disk for the bytecode on PHP is irrelevant for a heavily trafficked site because caching will take over at that point. The same is true for Django.
Model/View and user experience design will have order of magnitude benefits to performance over the language itself.

PHP is a language like Java etc.
Only your executable is the php binary and not the JVM! You can set another MAX-Runtime for PHP-Scripts without any problems (if your shared hosting provider let you do so).
Where your apps are running shouldn't depend on the kind of the server. It should depend on the ressources used by the application (CPU-Time,RAM) and what is given by your Server/Vserver/Shared Host!
For performance tuning reasons you should have a look at eAccelerator etc.
Apache supports also modules for connection pooling! See mod_dbd.
If you need to scale (like in a cluster) you can use distributed memory caching systems like memcached!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.