Is it possible not to load the bootstrapping mechanism at every call? - php

This is not a PHP question, but my expertise is with PHP frameworks.
A lot of frameworks have a bootstrapping (loading of classes and files) mechanism. (Drupal, Zend Framework to name a few)
Everytime that you make a request, the complete bootloading process needs to be repeated. And it can be optimized using APC by automatically caching some intermediate code
The general question is:
For any language, is there any way to not load the complete bootstrapping process? Is there any way of "caching" the state (or starting at) at the end of the bootstraping process to not load everything again? (maybe the answer is in some other language/framework/pattern)
It looks to me as extremely inefficient.

In general, it's quite possible to perform bootstrap / init code once per process, instead of having to reload it for every request. In your specific case, I don't think this is possible with PHP (but my knowledge of PHP is limited). I know I have seen this as a frequently criticism of PHP's architecture... but to be fair to PHP, it's not the only language or framework that does things this way. To go into some detail...
The style of "run everything for every request" came about with "CGI" scripts (c.f. Common Gateway Interface), which were essentially just programs that got executed as a separate process by the webserver whenever a request came in matching the file, and predefined environmental variables would be set providing meta information. The file could be basically any executable, written in any language. Since this was basically the first way anyone came up with of doing server-side scripting, a number of the first languages to integrate into a webserver used the cgi interface, Perl and PHP among them.
To eliminate the inefficiency you identified, the a second method was devised, which used plugins into the webserver itself... for Apache, this includes mod_perl for Perl, and mod_python for Python (the latter now replaced by mod_wsgi for Python). Using these plugins, you could configure the server to identify a program to load once per process, which then does the requisite initialization, loads it's persistent state into memory, and offers up a single function for the server to call whenever there is a request. This can lead to some extremely fast frameworks, as well as things such as easy database connection pooling.
The other solution that was devised was to write a web server (usually stripped down) in the language required, and then use the real webserver to act as a proxy for the complicated requests, while still serving static files directly. This route is also used frequently by Python (quite often via the server provided by the 'Paste' project). It's also used by Java, through the Tomcat webserver. These servers, in turn, offer approximately the same interface as I mentioned in the last paragraph.

The short answer is: in PHP there's no good way to skip the bootstrapping. (Technically you could run a PHP service 24/7 that ran forked children to handle requests, but that's not going to make your life any better.)
A good framework shouldn't do much in bootstrapping. In my personal one that I use, it simply registers an autoload function for classes, loads the config settings from MemCache, and connects to a database.
At that point, it parses the request and sends it to the proper controller / action. While creating the new router object every time is a "waste," the actual process of handling the request needs to be done regardless if the bootstrapping process is magically "cached" between requests.
So I would measure the time it takes between starting the page and getting to the action method to see if it's even a problem. If the framework is doing expensive things related to configuration and class loading, you should be able to minimize that via storing the end results in memcache.
Note that you should always be using an opcode cache (e.g. APC) and a persistent SAPI (e.g., php-fpm) in production. Otherwise, there is a lot of overhead with starting up and shutting down.

I would suggest you to look into FastCGI and C/C++ interface if you want to handle multiple requests. Usually it brings many problems (such as data caching / flushing, memory leaks etc), but can raise performance 10-100 times.
PHP is more suitable for web interface, and if you need fast-processing then you can write a persistent handler.
Also take a look at Java / Tomcat, Python and mod_perl. Some people have also suggested xcache.
For the PHP frameworks they do need to support a multi-request structure in the core, and I'm not aware of any framework doing that.
However said that, I'd love to have a project which would let PHP script to respond to multiple requests inside a loop. Not simultaneously, but bypassing the initialization.
Also you can take a look at https://github.com/kvz/system_daemon, and http://gearman.org/.

Related

How does PHP works and what is its architecture ?

Guys recently I decided to go back to PHP and do some more complex stuff than a simple log in page. For 3 years I've been programming with Java/JavaEE and have a good understanding of the architecture of of Java Applications. Basically, a virtual machine ( a simple OS process ) that runs compiled code called byte code. a simple Java web server is basically a java application that listens on provided TCP port for Http requests and responds accordingly of course it is more complicated than that but this is its initial work.
Now, what about PHP ? How does it work ? What, in a nutshell, is its architecture.
I googled about this question but in 90% the articles explain how to implement and construct a web application with PHP which is not what I am looking for.
The biggest difference between a Java web server and PHP is that PHP doesn't have its own built-in web server. (Well, newer versions do, but it's supposed to be for testing only, it's not a production ready web server.) PHP itself is basically one executable which reads in a source code file of PHP code and interprets/executes the commands written in that file. That's it. That's PHP's architecture in a nutshell.
That executable supports a default API which the userland PHP code can call, and it's possible to add extensions to provide more APIs. Those extensions are typically written in C and compiled together with the PHP executable at install time. Some extensions can only be added by recompiling PHP with additional flags, others can be compiled against a PHP install and activated via a configuration file after the fact. PHP offers the PEAR and PECL side projects as an effort to standardise and ease such after-the-fact installs. Userland PHP code will often also include additional third party libraries simply written in PHP code. The advantage of C extensions is their execution speed and low-level system access, the advantage of userland code libraries is their trivial inclusion. If you're administering your own PHP install, it's often simple enough to add new PHP extensions; however on the very popular shared-host model there's often a tension between what the host wants to install and what the developer needs.
In practice a web service written in PHP runs on a third party web server, very often Apache, which handles any incoming requests and invokes the PHP interpreter with the given requested PHP source code file as argument, then delivers any output of that process back to the HTTP client. This also means there's no persistent PHP process running at all times with a persistent state, like Java typically does, but each request is handled by starting up and then tearing down a new PHP instance.
While Java simply saves persistent data in memory, data persistence between requests in PHP is handled via a number of methods like memcache, sessions, databases, files etc.; depending on the specific needs of the situation. PHP does have opcode cache addons, which kind of work like Java byte code, simply so PHP doesn't have to repeat the same parse and compile process every single time it's executing the same file.
Do keep in mind that it's entirely feasible to write a persistent PHP program which keeps running just like Java, it's simply not PHP's default modus operandi. Personally I'm quite a fan of writing workers for specific tasks on Gearman or ZMQ which run persistently, and have some ephemeral scripts running on the web server as "frontend" which delegate work to those workers as needed.
If this sounds like a typical PHP app is much more of a glued-together accumulation of several disparate components, you'd be correct. Java is pretty self-contained, except for external products like RDBMS servers. PHP on the other hand often tends to rely on a bunch of third party products; which can work to its advantage in the sense that you can use best-of-breed products for specific tasks, but also requires more overhead of dealing with different systems.
This is how does PHP work:
(one of the best over the Internet)
In general terms, PHP as an engine interprets the content of PHP files (typically *.php, although alternative extensions are used occasionally) into an abstract syntax tree. The PHP engine then processes the translated AST and then returns the result given whatever inputs and processing are required.
Below image will depict more information
Source: freecodecamp.org

Loading a process to run a webrequest

When I was in my previous workplace which was a search engine company, I noted that they had executable files built using C++ , which were invoked with command line parameters by cgi script for serving each webrequest. (eg., when user hits a search button)
I could not understand the complete big picture, but was surprised that how much overhead would be there in launching a new process for each user request, since OS loader has to map the process space, etc. (it was unix solaris)
Is it an obsolete technology, or am I missing something ? (eg., if a process launching can be optimized by creating permanent mapping and they would have done that). Or are there better alternatives to run C++ code for a webrequest?
Solaris is probably well optimized for that usage. Yes, it has to initialize the process memory, but it can probably reuse a lot of the work and only a few kilobytes really need to be copied.
The only alternative to loading a process per request is to allow extensibility within the server process. This can affect stability, place restrictions on the server extension, and place additional demands on the programmer. The performance benefit might just not be worth it.
If the performance benefit is worth it, then you can rewrite the application as an extension/module/servlet/whatever.

How to persist objects between requests in PHP

I've been using rails, merb, django and asp.net mvc applications in the past. What they have common (that is relevant to the question) is that they have code that sets up the framework. This usually means creating objects and state that is persisted until the web server is recycled (like setting up routing, or checking which controllers are available, etc).
As far as I know PHP is more like a CGI script that gets compiled to some bytecode each time it's run, and after the request it's discarded. Of course you can have sessions, to persist data between requests from the same user, and as I see there are extensions like APC, with which you can persist objects between requests at the server level.
My question is: how can one create a PHP application that works like rails and such? I mean an application that on the first requests sets up the framework, then on the 2nd and later requests use the objects that are already set up. Is there some built in caching facility in mod_php? (for example that stores the compiled bytecode of the executed php applications) Or is using APC or some similar extensions the only way to solve this problem? How would you do it?
Thanks.
EDIT: Alternative question: if I create a large PHP application that has a very large set up time, but minor running time (like in the frameworks mentioned above) then how should I "cache" the things that are already set up (this might mean a lot of things, except for maybe the database connections, because for that you have persistent connections in PHP already).
To justify large set up time: what if I'm using PHP reflection to check what objects are available and set the runtime according to that. Doing a lot of reflection is usually slow, but one has to do it only once (and re-evaluate only if the source code is modified).
EDIT2: It seems it's APC then. The fact that it caches bytecode automatically is good to know.
Not sure if APC is the only solution but APC does take care of all your issues.
First, your script will be compiled once with APC and the bytecode is stored in memory.
If you have something taking long time to setup, you can also cache it in APC as user data. For example, I do this all the time,
$table = #apc_fetch(TABLE_KEY);
if (!$table) {
$table = new Table(); // Take long time
apc_store(TABLE_KEY, $table);
}
With APC, the task of creating table is only performed once per server instance.
PHP (and ruby for that matter) are interpretive languages. That is they parse the files each time they are requested and I suppose you could say are converted to a pseudo byte code. It is more 'apparent' one could say that PHP is more like this than say RoR but they both behave the same way.
The feature of persisting data between requests is a feature of the server not of the language itself. For example, the RoR routing you speak of is in fact cached but that's cached in the server's local memory. It isn't compiled and stored for faster readins. The server (and by server I mean both the box & the web service instances) restarts this information is gone. The 'setting up the framework' you speak of still involves parsing EACH file involved in the framework. Rails parses each file during the request again and again, the production level features may in fact cache this data in memory but certainly in development it does not. The only reason I mention that is because it illustrates that it's a feature of the server not the language.
To achieve the same thing in PHP you could use Zend Server. As far as I know this is the only PHP interpreter that will 'compile' and use byte code when told to. Otherwise you'll need to find a way to store the data you want to persist over requests. APC as you mentioned is a very powerful feature, a more distributed one is Memcached and then of course there's more persistent forms like disc & sql.
I am interested in knowing why you'd like this particular feature. Are you noticing performance issues that would be 'solved' by doing this?
I think you're making some incorrect generalizations. All of those frameworks (ex: Rails) can be run with different configurations. Under some, a process is created for every request. This obviously hurts performance, but it shows that these frameworks don't rely on a long-running process. They can set things up (reparse config files, create objects, etc.) every request if needed.
Of course, mod_php (the way PHP is usually used) runs inside the web server process, unlike CGI. So I don't see anything fundamentally different between CakePHP (for example) and Rails.
I think perhaps you are looking for something like Python's WSGI or Ruby's Rack, but for PHP. This specifies an interface (independent of how the language is run) for an application. For a new request, a new instance of an application object is created. As far as I know, this does not exist for PHP.

Seriously speeding up PHP?

I've been writing PHP for years, and have used every framework under the sun, but one thing has always bugged me... and that's that the whole bloody thing has to be interpreted and executed every time someone tells my server they want the page served.
I've experimented with caching, FastCGI, the Zend Job Queue (and symfony plug-ins that do similar - as well as my own DB-based solutions that implement the System_Daemon class to run background processes) and I've managed to make my apps fairly quick using all that stuff... but I can't get over the mental block that my settings files, system/environment check functions, and all the stuff that should only really be loaded ONCE... loads every darn time someone hits my page.
So, my ramble leads to the following Q--
Is there some method/technique for loading certain aspects of PHP into RAM so that when that page is requested, all my settings.yml files, system checks, framework files, cached pages etc can be loaded directly from memory without ever even touching the HD... or needing to go through the same loading mechanism 50,000 times per day to init the program?
If there's nothing in PHP... are there any other 'web' languages that can be compiled in this way, to allow for true init-once apps?
I think you should give memcached a try, if you're talking about caching data. I think PHP is fairly proficient in caching compiled php-pages if you use stuff like mod_php in apache (which doesn't die in between requests).
Take a look on APC (Alternative PHP Cache), it keeps a cache of compiled files (PHP Opcode) and also lets you store random variables on memory with apc_fetch, apc_store.
The instalation is very simple and it really gives a boost on performance.
Create a full page cache on the ram disk and make your web server serve the page from there. This is a method that wordpress supercache plugin uses and it works great if your web site is suitable for full page caching. This whay you are not even invoking the PHP interpreter.
For users that are logged in (have an open session) you can create a rewrite condition that will redirect their request to the PHP engine.
Also, always use an opcode cache like APC and use it for caching config files (memcache is also fine).
If you are asking for a JVM/Tomcat like application server, then the answer is likely no. To my knowledge nothing (usable) like this exists for PHP. PHP uses a shared-nothing architecture, so it is by design everything is setup on all requests. But actually, this makes PHP scale pretty well.
As for speeding up your apps, try to use memcached and a code accelerator. Maybe look into Zend Server to get a complete package.
Regarding your last question, I believe at least most of the Python and Ruby web frameworks work like that.
Ruby web applications are nowadays built so that the app is only initialized once per server process. When requests come in, the server (Apache, for example) passes them to the web application (over Rack interface) which is running on the background.
This is how web frameworks based on Rack work. Older versions of Ruby on Rails were similar, although they used a different interface to talk to the web server.
I'd keep an eye on the Facebook Engineering ppage (http://www.facebook.com/notes.php?id=9445547199), every now and then they come up with posts about how they keep things fast/optimize/scale. I think they're use of php is super impressive.

In need to program an algorithem to be very fast, should I do it as php extension, or some otherway?

Most of my application is written in PHP ((Front and Back ends).
There is a part that works too slowly and I will need to rewrite it, probably not in PHP.
What will give me the following:
1. Most speed
2. Fastest development
3. Easily maintained.
I have in my mind to rewrite this piece of code in CPP as a PHP extension, but may be I am locked on this solution and misses some simpler/better solutions?
The algorithm is PorterStemmerAlgorithm on several MB of data each time it is run.
The answer really depends on what kind of process it is.
If it is a long running process (at least seconds) then perhaps an external program written in C++ would be super easy. It would not have the complexities of a PHP extension and it's stability would not affect PHP/apache. You could communicate over pipes, shared memory, or the sort...
If it is a short running process (measured in ms) then you will most likely need to write a PHP extension. That would allow it to be invoked VERY fast with almost no per-call overhead.
Another possibility is a custom server which listens on a Unix Domain Socket and will quickly respond to PHP when PHP asks for information. Then your per-call overhead is basically creating a socket (not bad). The server could be in any language (c, c++, python, erlang, etc...), and the client could be a 50 line PHP class that uses the socket_*() functions.
A lot of information needs evaluated before making this decision. PHP does not typically show slowdowns until you get into really tight loops or thousands of repeated function calls. In other words, the overhead of the HTTP request and network delays usually make PHP delays insignificant (unless the above applies)
Perhaps there is a better way to write it in PHP?
Are you database bound?
Is it CPU bound, Network bound, or IO bound?
Can the result be cached?
Does a library already exist which will do the heavy lifting.
By committing to a custom PHP extension, you add significantly to the base of knowledge required to maintain it (even above C++). But it is a great option when necessary.
Feel free to update your question with more details, and I'm sure Stack Overflow will be happy to help out.
Suggestion
The PorterStemmerAlgorithm has a C implementation available at http://tartarus.org/~martin/PorterStemmer/c.txt
It should be an easy matter to tie this C program into your data sources and make it a stand alone executable. Then you could simply invoke it from PHP with one of the proc functions, such as proc_open()
Unless you need to invoke this program many times PER php request, then this approach should save you the effort of building and integrating a PHP extension, not to mention that the hard work (in c) is already done.
Am not sure about what the PorterStemmerAlgorithm is. However if you could make your process run in parallel and collect the information together , you could look at parallel running processes easily implemented in JAVA. Not sure how you could call it in PHP, but definitely maintainable.
You can have a look at this framework. Looks simple to implement
https://computefarm.dev.java.net/
Regards,
Franklin.
If you absolutely need to rewrite in a different language for speed reasons then I think gahooa's answer covers the options nicely. However, before you do, are you absolutely sure you've done everything you can to improve the performance if the PHP implementation?
Is caching the output viable in your situation? Could you get away with running the algorithm once and caching the output rather than on every page load?
Have you tried profiling the code to ensure there's no unnecessary work being done (db queries in an inner loop and the like). Xdebug can help here.
Are there other stemming algorithms available which might perform better on your dataset?

Categories