How is memory management in PHP different from that in Python? - php

What is the difference in how they are handled?
Specifically, why is it common to find Python used in production-level long lived applications like web-servers while PHP isn't given their similar efficiency levels?

PHP was designed as a hypertext scripting language. Every process was designed to end after a very short time. So memory management and GC basically didn't matter.
However the ease and popularity of PHP have invoked its usage in long lived programs such as daemons, extensive calculations, socket servers etc.
PHP 5.3 introduced a lot of features and fixes that made it suitable for such applications, however in my opinion memory management was of lower significance on that matter.
PHPs error management is quite good now, but as in every programming language that I know of you can produce memory leaks.
You still cannot code in the same style that you can code Java or Python applications. A lot of PHP programs will probably show severe problems where Java/Python do not.
You can characterize this as "worse", but I would not. PHP just is a different set of tools that you have to handle different.
The company I work at has a lot of system programs and daemons written in PHP that run like a charm.
I think the biggest caveat for PHP when it comes to as you describe "production-level long lived applications" is its multi-processing and threading ability (the 2nd is basically nonexistent).
Of course there is the possibility to fork processes, access shared memory, do inter process communications and have message queues and stuff. But Python is far ahead on that matter, because it was designed bottom up for jobs like that.

Related

Would it be simply better to use the system's functions rather than use the language?

There are many scenarios where I've questioned PHP's performance with some of its functions, and whether I should build a complex class to handle specific things using its seemingly slow tools.
For example, Complex regular expressions with sed and processing with awk would seemingly be exponential in performance rather than making PHP's regular expression and seemingly excessive functions parse and in time manage to finish it. If I were to do a lot of network tasks such as MX lookups/DIGging/retrieving simultaneously I would rather pass it via system() and let the OS handle it itself. There are simply too many functions in PHP, that are inefficient and result in slow pages or can be handled easier by the OS.
What are your opinions?
Do you think I should do the hard work with the OS in its own/custom functions?
System calls can very often be faster than using a solution built in PHP (although that doesn't always hold true, seeing as PHP's functions are themselves built and compiled in C. Many PHP core functions and extensions are pretty fast).
Apart from speed, a second factor is the memory limit. Externally called processes don't eat away at PHP's per-script limitation, which can be great when working with large files for example.
Also, some functions simply are not available in PHP itself. There is no way to imitate the feature set of ImageMagick entirely within PHP, for example. The GD library doesn't come close to what ImageMagick has to offer.
The big, big minus is that by using system commands, you effectively eliminate portability, which is part of PHP's beauty. Moving the application to a different server becomes a huge burden because the feature set of the external commands needs to be identical - and that isn't always the case even across different Linux distros, not to speak of crossing the OS border into Windows or Unix-based Mac OS. I have myself experienced issues with wget and ImageMagick in this respect, I'm sure there are many more.
If you are working on a custom application for which you entirely control the server environment (and the decision what kind of servers will be bought in the next five years), that may not be a problem. It will be one, though, if you build software that needs to be portable.
I personally tend to rather cut away a feature (that would need an external dependency) than lose portability, but then, I am in the trade of building portable applications very much. It really depends on your focus.
Even if system processes are faster and hog less memory (extensive testing is a must here), there's something to keep in mind:
I'd be cautious with using system() calls and only use it if you control the hardware your script will run on. Using those calls may result in the need to install further software / packages and may not work (the same way) on all OS, so if you can't control the server, I'd stick with PHP-functions.
I think that would be slower actually, because each time you call such a function the OS would start a new process, and that is time consuming.
I'd say if your program is intended to be executed on the shell using other external programs like sed/awk is fine as shellscripts also make excessive use of external programs and a php script run on the shell is just like a shell script, just in another language.
However, if it's a web application, better do it in php - most shared hosting environments don't allow you to execute external programs from php scripts.
(My experience is that "system calls" usually refers to calling kernel operations - not invoking other programs - "pass it via system() and let the OS handle it" - you seem to think the same - but none of the programs you mention are OS services - they are just other programs).
PHP is essentially a scripting language - which conventionally are just a glue for moving data between other programs, but some things to consider:
1) performance - forking a new process can be computationally expensive
2) security - giving your webserver unlimited access to all the programs on the system (even constrained by permissions) is potentially very dangerous
3) bearing in mind (2) most configurations will prevent or limit what you can do
4) for large scale development this is rather dangerous - letting programmers write their code in any language of their choice then putting a skim of PHP over the top means you will end up with an application written in lots of different languages
5) You can write your own native code PHP extensions quite easily
If I were to do a lot of network tasks such as MX lookups/DIGging/retrieving
While I could believe that mashing up large data sets using awk/sed might be faster/more efficient then native PHP code, I find it a bit surprising that DNS lookups are faster using a different client. How did you measure this?

Why is PHP apt for high-traffic websites?

I was surprised to learn today that PHP is used widely in high-traffic websites.
I always thought that PHP is not strong in terms of performance, being a dynamic, scripting language (e.g. compared to statically typed, compiled language like C/Java/C# etc.).
So how come it performs so well?
What you'll usually find is that it's not as slow as you think. The reason a lot of sites are slow is because the hosts are overloaded.
But one primary benefit of PHP over a compiled language is ease of maintenance. Because PHP is designed from the ground up for HTTP traffic, there's less to build than with most other compiled languages. Plus, merging in changes becomes easier as you don't need to recompile and restart the server (as you would with a compiled binary)...
I've done a considerable amount of benchmarks on both, and for anywhere under about 50k requests per second (based upon my numbers) there really isn't a significant gain to using a compiled binary (FastCGI). Sure, it's a little faster using compiled C, but unless you're talking Facebook level traffic, that's not really going to mean significant $$$. And it's definitely not going to offset the relatively rapid rate of development that PHP will afford in comparison to using C (which more than likely will require many times the code since it's not memory managed)...
PHP, if properly written can be quite scalable. The limiting factors are typically in your database engine. And that's going to be a common factor no matter what technology you use...
Java deployments in a big enterprise setting are a mess...fighting with builds and code that might not compile for the slightest little things. Also, PHP runs on a fairly simple setup server-wise, not the bulky code that is Weblogic (or others), so others are right in that it's low cost to develop and cheap to deploy on several different machines. It certainly didn't help that I was soured by working in a large, VERY inefficient corporate setting while doing Java....
I wouldn't say that PHP developers are cheaper per se (I make more now as a PHP developer than I did as a Java UI developer) but I do know that my last employer paid me for a not-insignificant amount of time spent configuring, deploying, compiling, etc that is not required in PHP. We're talking probably one day/week of related configuration fussing due to new branch roll outs or release-related configurations. So, the extra I'm paid now is made up for by a significant amount more code that I'm able to work through each week.
PHP is certainly being helped by the fact that MySQL and Postgres (to a smaller extent) have become so much more powerful. They're not directly linked, but having that as a common pairing just sweetens the deal for those making decisions.
It doesn't really perform "so well", just well enough to be used. Keep in mind, though, that Java and C#.NET are also run as bytecode inside a VM. PHP, with tools such as Zend Optimizer, can also skip the compilation step and run as bytecode.
PHP will not run as fast as native, compiled C code, but websites such as Facebook compile PHP to C++ to make it run faster (see HipHop-PHP).
Most websites have performance bottle necks when querying a database etc. The amount of time the script spends executing is usually small compared to this. Using things like libmemcached can help mitigate this.
Many sites started as low-traffic sites. Once you have your PHP website running and suddenly you have to handle much higher traffic, it's cheaper just to buy more servers than to rewrite your app from PHP to something else. Moreover there are tools that improve PHP performance.
Also note, that there are other factors: database, caching strategy which affect performance more than PHP itself.
It doesn't, which is why there are projects like HipHop, but dynamic languages are often faster to develop in, and hardware is cheaper than developers.
In my opinion the stateless nature of PHP is the most important factor to it's scalability. It's been a while since I've done any web work with Java/ASP.NET, but I recall that both technologies have a central application "engine" that all requests are piped through. That's great, because information and state can be shared between instances, and a lot of bootstrapping (reading configuration files, connecting to databases, etc) can be done once, and then shared among instances. It's bad though because that central "engine" itself becomes a bottleneck for the whole application.
The lack of a central engine in PHP also means scaling your application is usually a simple matter of adding another web server to your rig (although scaling the database along with it is more complicated). I imagine scaling a Java/ASP.NET application is a good deal more complicated, and they reach a saturation point where adding more hardware gives less of a boost each time.

PHP Speed Vs Other Languages

I have heard a lot that PHP is slow compared other languages. Is the speed difference noticeable enough that I should switch to another language? And if so what other language would you recommend? Or what would be some good optimizations that could speed up the PHP?
This question comes up a lot. The answer is:
Yes it's slower than C#, Java, C/C++, etc.
No it probably won't matter.
You can build large scale PHP systems. 4 of the top 20 visited Websites are powered by PHP (Facebook, Yahoo, Wikipedia, Flickr). PHP with an opcode cache (eg APC) can take you much further than you'll probably need or care about.
Most slow Websites have nothing to do with the language they're using. A lot of the time spent on an HTTP request comes down to network latency, absent or ineffectual caching of static resources, lack of compression resulting in more bandwidth used than necessary, poorly performning Javascript and so on.
If you get really desperate for performance you can always use HipHop, which compiles PHP to C++.
PHP will be plenty fast enough for web site applications if you use best practices.
If you compare PHP to, say C++, of course it will be slower. But you need to consider total cost of development. Just because one language produces faster programs doesn't mean it will be more cost effective. Depending on your programming style, experience, and the project you are working on, you may find that a different language is better suited for the task.
If you use an opcode cache, you will get a very big speed gain simply by removing the need for accessing the disk and parsing the PHP files.
As with any language, you do need to be familiar with the data structures and how they are to be used efficiently. Poor algorithms will be slow regardless of the language, but especially in a scripting language where lots of "magic" happens under the hood.
To speed up PHP, try APC - Alternative PHP Cache.
It can cache the compiled code so the source code files don't need to be reparsed for every request.
More info about APC and other PHP accelerators can be found at Wikipedia.
It depends on usage case. Nice example to illustrate this:
When you use PHP as server side web scripting language it will be faster than C/C++ program running as a CGI (this is because for CGI a separate process needs to be created and some setup must be done, while PHP scripts are running inside http server module and are just "ready to go")
On the other hand, when you use PHP for numerical computation it will be drastically slower than program written in C/C++
PHP is designed to be server side web programming language and for that purpose it should be used. It is reasonably efficient for this task but you can speed it up with caching tools. If even that is not enough, you can write extension in Zend API.

Facebook's HipHop - What's it for?

The news in the PHP world today is Facebook's HipHop, which:
HipHop for PHP isn't technically a compiler itself. Rather it is a source code transformer. HipHop programmatically transforms your PHP source code into highly optimized C++ and then uses g++ to compile it. HipHop executes the source code in a semantically equivalent manner and sacrifices some rarely used features — such as eval() — in exchange for improved performance. HipHop includes a code transformer, a reimplementation of PHP's runtime system, and a rewrite of many common PHP Extensions to take advantage of these performance optimizations.
My question is, what type of web applications is this actually useful for?
Seems like typical database-bound web apps may not be greatly served by this, but rarer CPU-bound apps would.
Web applications that do a lot of processing and/or use a lot of memory. Apparently this HipHop will reduce CPU usage by around 50% and also reduce memory usage (I didn't see how much the memory usage would be reduced by mentioned anywhere). This means that you should be able to serve the same number of requests with fewer servers.
An added benefit may be that there will be some basic type checking to ensure that the code is consistent before it is compiled. This should help to locate the type of bugs that PHP currently tends to ignore as a result of its weak type system.
The downside appears to be that it might not support some of PHP's more dynamic features such as eval (though arguably that's a positive too).
Well it "transforms" PHP into C++ to help performance of a largely scalable website.
So, HipHop is for when you have a website that you started at Harvard that you quickly grow into a billion dollar company and that people are making a movie about starring Justin Timberlake. When you have such a website and want to save CPU cycles, but don't want to rewrite your codebase, you use HipHop.
If you are just starting out, unless you are trapped on a desert island with only PHP programmers that refuse to learn a more scalable language, you don't use HipHop.
Running machine code over interpreted code is faster. This is useful in one sense, but also reduces the amount of machines you require, as each processor has less work to do.
This is good for a company like Facebook, in that they can cut the amount of machines they need.
In terms of why it's useful for them, they probably run a lot of sorting and indexing, on the large amounts of data they have.
This article:
http://terrychay.com/article/hiphop-for-faster-php.shtml
answers this question perfectly with its series of "if" statements.
You can think of it as some sort of compiler that takes in a bunch of .php files, and generate a bunch of c++ files for which you can then compile using g++ (Not sure if other compilers are supported). The resulting exe is your web application with a web server included. That means you could run the exe and you are good to go. The web server is based on libevent and supposedly pretty efficient.
Hip Hop is essentially pointless to everyone except Facebook and other gigantic PHP-based sites. I'm sure many people will jump on the bandwagon due to "it's fast" but how many PHP based apps use whole server farms?
Just because you are working on a social network site, doesn't mean you should consider using HH.

Simple Facebook HipHop Performance Question

If I write a hello world app using a PHP web framework such as CodeIgniter and then I compile it and run it using HipHop. Will it run faster than if I write the same hello world app in django or rails?
HIPHOP converts php code into C++ code, which needs to be compiled to run. Since pre-compiled code runs faster and uses less memory then scriping languages like python/php it will probably run faster in the example you have given.
However, HIPHOP does not convert all code. A lot of code in php is dynamic and can not be changed to c++, this means you will have to write your code with this in mind. If codeigniter can even be compiled using HIPHOP is another question.
Terry Chay wrote a big article about HIPHOP, covering when to use it, it's limitations and future. I would recomment reading this, as it will most likely answer most of your questions and give you some insight into how it works :)
http://terrychay.com/article/hiphop-for-faster-php.shtml
At that point the run time is inconsequential. HipHop was designed for scaling... meaning billions of requests. There's absolutely no need to use something like HipHop for even a medium size website.
But more to the point of your question... I don't think there have been comparison charts available for us to see, but I doubt the run time would be faster at that level.
i don't know about django or rails, so this is a bit off-topic.
with plain php, the request goes to apache, then to mod_php. mod_php loads the helloworld.php script from disk, parses & tokenizes it, compiles it to bytecode, then interprets the bytecode, passes the output back to apache, apache serves it to the user.
with php and an optimizer the first run is about the same as with plain php, but the compiled source code is stored in ram. then, for the second request: goes to apache, apache to mod_php, apc loads bytecode from ram, interprets it, passes it back to apache, back to the user.
with hiphop there is no apache, but hiphop itself and there's no interpreter, so request goes directly to hiphop and back to the user. so yes, it's faster, because of several reasons:
faster startup because there's no bytecode compilation needed - the program is already in machine-readable code. so no per-request compilation and no source file reading.
no interpreter. machine code is not necessarily faster - that depends on the quality of source translation (hiphop) and the quality of the static compiler (g++). hiphop translated code is not fast compared to hand-written c code, because there's a bit of overhead because of type handling and such.
with node.js, there's also no apache. the script is started and directly compiled to machine code (because the V8 compiler does that), so it's kind of AOT (ahead of time) compiling (or is it still called JIT? i don't really know). every request is then directly handled by the already compiled machine code; so node.js is actually very comparable to hiphop. i assume hiphop to be multithreaded or something like this, while node does evented IO.
facebook claims a 50% speed gain, which is not really that much; if you compare the results of the language shootout, you'll see for the execution speed of assorted algorithms, php is 5 to 250 times slower.
so why only 50%? because ...
web apps depend on much more than just execution speed, e.g. IO
php's type system prevents hiphop to make the best use of c++'s static types
in practice, a lot of php is already C, because most of the functionality is either built in or comes from extensions. extensions are programmed in C and statically compiled.
i'm not sure if there was a huge performance gain for hello world, because hello world, even with a good framework, is still so small execution speed could be negligible in comparison to all the other overhead (network latency and stuff).
imo: if you want speed and ease of use, go for node.js :)
Running a simple application is always faster in any language. When it's become as complex as facebook, then you will face numerous of problems. PHP slowness will be show it's face. In same times, converting existing code to another language is not an options, since all logic and code is not so easy to translated to other language's syntax. That's why facebook developer decide to keep the old code, and make PHP faster. That's the reason they create their own PHP compiler, called HipHop.
Read this story from the perspective one of Facebook developer, so you know the history of HipHop.
That is not really an apple to apples comparison. In the most level playing field you might have something like:
Django running behind apache
Django rendering an HTML template to say hello world (no caching)
AND
HPHP running behind apache
HPHP rendring an HTML template to say hello world (again, no caching)
There is no database, almost no file I/O, and no caching. If you hit the page 10,000 times with a load generator at varying concurrency levels you will probably find that HPHP will outperform Django or rails - that is to say it can serve render more pages per second and keep up with your traffic a bit better.
The question is, will you ever have this many concurrent users? If you will, will they likely be hitting a database or a cached page?
HPHP sounds cool, but IMHO there is no reason to jump ship just yet (unless you are getting lots of traffic, in which case it might make sense to check it out).
Will it run faster than if I write the
same hello world app in django or
rails?
It probably will, but don't fret. If we're talking prospective speed improvements from yet unreleased projects, Pythonistas have pypy-jit and unladen-swallow to look forward to ;)

Categories