I'm looking into methods of reducing maintenance downtime for a certain application.
One thing I could do, is make use of anonymous functions in a way that allows me to re-include a file with a new function definition during runtime. That way the application behaviour can be patched at runtime.
I'm wondering if there are any frameworks out there that would be helpful in implementing this behaviour on a large scale through out an application.
I'm thinking it could help with things like keeping track of which functions are loaded from which files, when/how often to reload them and similar tasks.
I'll leave this open for now, but since I can't find anything at this time, I've started my own project: PHP - Reloaded
Related
When inheriting the maintenance programming of a large web application that has many php5 scripts (4000+), what is the best way to determine which scripts are no longer in use and can probably be removed?
There is a project Dead Code Detector (DCD).
It finds functions that are never called, which may help to clean up even the files you keep in production.
For finding out unused files, no firm software/algorithm exists for PHP (at least as per my knowledge).
For finding unused files, I think we should do it manually.
From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.
Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.
The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.
Another approach, done by Facebook, is to completely compile the PHP code to a different language.
So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.
The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.
PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.
PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.
Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.
Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.
It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)
PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.
It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.
Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.
But the classes make this even more complicated, which means that there are potential code fragments like
if(<conditon>)
{
class XYZ() { }
}
else
{
class XYZ() { }
}
class ABC extends XYZ {}
Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.
The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.
But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)
PS: I'm the guy who can't find time to update APC for 5.4 properly
I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).
I came across this article, How to implement a front controller. The article suggests that a better way to load controllers is to leave it to apache as this is what it was designed for.
So I have a few questions...
Is using .htaccess a viable alternative to using php for routing requests to controllers?
Which way is better, faster, modular and portable?
Has anyone actually implemented an mvc framework in this way?
If so, got any tips?
Does anyone know of any websites that discuss this technique (I couldn't find anything in
google)?
The primary objection in that article to using a single entry point seems to be:
...what about when you have hundreds of Page Controllers? You end up with
a massive switch statment or perhaps something disguised in an array,
an XML document or whatever. For every page request, PHP will have to
reload a bunch of data which is irrelevant to the current request the
user is trying to perform.
That's a very weak argument. Firstly, that's a terrible way to implement a routing mechanism. Secondly, an application would have to be considerably complex for this to have any measurable effect - and if an application is this complex, it's likely that any performance hit at the entry point is minimal compared to the execution of the rest of the application.
And consider: if a PHP script for handling the front end of a complex web app is hard to maintain, imagine what the equivalent .htaccess file would look like!
Finally, you can avoid the issue with a bytecode cache, making the "problem" of loading the script for every request moot.
Maybe its the beer, but that article made little sense to me, it also put a lot of "words" in quotes. I disagree with some things mentioned in there. It does say ...that this approach to implementing a Front Controller in PHP does alot (sic) to raise the learning curve required to become fluent with the framework. Sure I suppose thats true, but when has any powerful, flexible, and large system not required a little bit of learning.
In regards to your questions:
.htaccess could be a somewhat viable alternative to using PHP, but is much less extensible and gets complicated and hard to manage quickly. You can do URL configurations like this in Apache, lighttpd, nginx, and I've seen it done on occasion, but to some doing it this way would be a big learning curve.
If you use PHP to do the routing, it can get its route information from config files, arrays, or even injected via an object. This gives you much flexibility and makes it possible to include or exclude routes depending on many factors.
Using the server config file to configure URL routing might be somewhat faster, but the difference is minute. Server config would be far less modular, and not portable across different HTTP servers. The native language front controller works on any server platform.
I've not seen an MVC framework that does that, but I haven't investigated many outside of PHP.
Can't help on this one either.
Personally, in PHP I use Zend Framework a lot. It uses a front controller pattern that routes everything through one script. I've not had any limitations in this method, it provides everything and more than I need.
Those are my thoughts, hope it helps.
This is not a PHP question, but my expertise is with PHP frameworks.
A lot of frameworks have a bootstrapping (loading of classes and files) mechanism. (Drupal, Zend Framework to name a few)
Everytime that you make a request, the complete bootloading process needs to be repeated. And it can be optimized using APC by automatically caching some intermediate code
The general question is:
For any language, is there any way to not load the complete bootstrapping process? Is there any way of "caching" the state (or starting at) at the end of the bootstraping process to not load everything again? (maybe the answer is in some other language/framework/pattern)
It looks to me as extremely inefficient.
In general, it's quite possible to perform bootstrap / init code once per process, instead of having to reload it for every request. In your specific case, I don't think this is possible with PHP (but my knowledge of PHP is limited). I know I have seen this as a frequently criticism of PHP's architecture... but to be fair to PHP, it's not the only language or framework that does things this way. To go into some detail...
The style of "run everything for every request" came about with "CGI" scripts (c.f. Common Gateway Interface), which were essentially just programs that got executed as a separate process by the webserver whenever a request came in matching the file, and predefined environmental variables would be set providing meta information. The file could be basically any executable, written in any language. Since this was basically the first way anyone came up with of doing server-side scripting, a number of the first languages to integrate into a webserver used the cgi interface, Perl and PHP among them.
To eliminate the inefficiency you identified, the a second method was devised, which used plugins into the webserver itself... for Apache, this includes mod_perl for Perl, and mod_python for Python (the latter now replaced by mod_wsgi for Python). Using these plugins, you could configure the server to identify a program to load once per process, which then does the requisite initialization, loads it's persistent state into memory, and offers up a single function for the server to call whenever there is a request. This can lead to some extremely fast frameworks, as well as things such as easy database connection pooling.
The other solution that was devised was to write a web server (usually stripped down) in the language required, and then use the real webserver to act as a proxy for the complicated requests, while still serving static files directly. This route is also used frequently by Python (quite often via the server provided by the 'Paste' project). It's also used by Java, through the Tomcat webserver. These servers, in turn, offer approximately the same interface as I mentioned in the last paragraph.
The short answer is: in PHP there's no good way to skip the bootstrapping. (Technically you could run a PHP service 24/7 that ran forked children to handle requests, but that's not going to make your life any better.)
A good framework shouldn't do much in bootstrapping. In my personal one that I use, it simply registers an autoload function for classes, loads the config settings from MemCache, and connects to a database.
At that point, it parses the request and sends it to the proper controller / action. While creating the new router object every time is a "waste," the actual process of handling the request needs to be done regardless if the bootstrapping process is magically "cached" between requests.
So I would measure the time it takes between starting the page and getting to the action method to see if it's even a problem. If the framework is doing expensive things related to configuration and class loading, you should be able to minimize that via storing the end results in memcache.
Note that you should always be using an opcode cache (e.g. APC) and a persistent SAPI (e.g., php-fpm) in production. Otherwise, there is a lot of overhead with starting up and shutting down.
I would suggest you to look into FastCGI and C/C++ interface if you want to handle multiple requests. Usually it brings many problems (such as data caching / flushing, memory leaks etc), but can raise performance 10-100 times.
PHP is more suitable for web interface, and if you need fast-processing then you can write a persistent handler.
Also take a look at Java / Tomcat, Python and mod_perl. Some people have also suggested xcache.
For the PHP frameworks they do need to support a multi-request structure in the core, and I'm not aware of any framework doing that.
However said that, I'd love to have a project which would let PHP script to respond to multiple requests inside a loop. Not simultaneously, but bypassing the initialization.
Also you can take a look at https://github.com/kvz/system_daemon, and http://gearman.org/.
I can imagine that in larger projects some things tend to get redundant in most PHP scripts. From the top of my head: Including classes, authentication, including a configuration file, setting include path etc.
As far as my imagination has run, this should be done in absolutely every PHP script in the project. This would then be simplified by adding a "core" PHP script that handles all this.
However, from this very site, I can quote
"I am planning on creating a PHP file "core.php" that will be included at the top of EVERY SINGLE PHP file in the project. This file will handle authentication and include base functions. Thoughts?"
I cannot stress enough 'do not do this'. There is a rule among experienced PHP developers that any project with a large core.php file that it's a warning sign of bad development and should be best avoided.
Source
Which leaves me at a loss. Is it better to redundantly write the same 20-30 lines of code on top of every file than to embrace DRY coding?
Any clarification would be appreciated!
I'll quickly clarify here. The "Front Controller pattern" which I actually use when writing most websites and applications does not really fit the type of project I'm talking about. Well actually it does, and I already intend to use it, but my project also contains a lot of PHP scripts that should return content for Ajax requests. It is those PHP scripts that my question regards.
I recommend taking the same approach as Wordpress, the Front Controller pattern.
Basically it filters all incoming page requests through index.php. Open up .htaccess and you can see that it filters all requests through index.php unless the file or directory already exists. This allows you to parse the URL into sections in any syntax you would like. No need to make different files for different URLs. You can receive example.com/page/1 and map the section at the end to any page.
Kohana is a great library to attempt to understand and master this concept. It lets you extend classes and implements tons of PHP 5 features. As an added bonus Kohana is MVC (also HMVC) which is incredibly important for large sites.
I do not believe that answer was against DRY, even though it does not make it easy to see. The author did suggest using an established framework, which most certainly takes care of initialization and common application backend features in a centralized and modular manner.
Possibly the author meant "do not produce a homegrown big spaghetti ball of code"; this might be in practice an ill-conceived attempt to building a framework by bunching a boatload of core methods in a monolithic script.
If building (at least in a big part) to learn, I find nothing wrong with trying to centralize your core functions, organize them and start producing a fledgling framework in such a way. Doing this in a thoughtful manner will gain you invaluable practical experience and insight into how applications in general can be architected. Otherwise, I will side with the author of that answer: why have your application suffer from possibly wrong design decisions when there are many fantastic frameworks ready for use?
If you need to include 20-30 lines on top of every page, it sounds like it's time for a better architecture. Look into Dispatching/Routing for example. Every request is handled by a central .php file, the Dispatcher, which parses the request and decides which files need to be invoked and loaded.
This is implemented in most PHP frameworks. Play around with one to get a feeling for it.