Related
When writing python, perl, ruby, or php
I'll often use ...
PERL:
`[SHELL COMMAND HERE]`
system("[SHELL]", "[COMMAND]", "[HERE]")
Python
import os
os.system("[SHELL COMMAND HERE]")
from subprocess import call
call("[SHELL]", "[COMMAND]", "[HERE]")
ruby
`[SHELL COMMAND HERE]`
system("[SHELL COMMAND HERE]")
PHP
shell_exec ( "SHELL COMMAND HERE" )
How much does spawning a subprocess in the shell slow down the performance of a program?
For example, I was just writing a script with perl and libcurl, and it was difficult, with all of libcurl's parameters, to get it to work. I stopped using libcurl and just started using curl and the performance seemed to IMPROVE, scripting became much easier, and furthermore, I could run my script on systems that only had basic perl (no cpan modules) and the basic shell utilities installed.
Why is spawning this subshell considered bad programming practice? Should it be, always in theory, much slower than using a specific binding/equivalent library within the language?
The first reason why executing shell commands is bad is maintainability. Context switching between tasks is bad enough without language switching. Security is also a consideration but coding practice will make it less significant (avoid injections, ...)
There are several factors that impact performance:
Forking a process: This takes a while but in case the code being executed performs well, this becomes less significant.
Optimization becomes impossible: When the control is handed over to another process, the interpreter or compiler cannot perform any optimizations. Also, you cannot perform any optimizations.
Blocking: Shell commands are blocking operations. They will not be scheduled like a native part of the code would.
Parsing: If there is a need to do something about the output, it needs to be parsed. In native code, the data would already be in a relevant data structure. Parsing is also prone to errors.
Command line generation: Generating a command line for an executable may require iterating. Sometimes that takes more cycles than performing the same natively.
Most of these problems arise when the external command is executed in a loop. It may be easy to find examples where none of these become a problem.
Ferrix stated several of the performance-related issues quite nicely.
Regarding security and maintainability, I would submit the following:
Portability/isolation from external dependencies
Sure, you can shell out to call wget--if you're on Linux. On Windows or Mac, it'll die horribly, and you'll either have to explain to your boss why you have to re-write it to use the built-in methods, or support the users/co-workers who need to use your tool (neither of which will be very fun).
Someday you'll spend hours trying to figure out why your script no longer works, only to find that the upgraded version of your external program needs different command-line parameters and no longer works the way your code expects.
Escape characters in one language (Perl/Python/PHP) don't necessarily map to escape characters in the shell language (ex: an SQL-injection attack is arguably the result of non-escaped characters in one language (HTML) being mixed with a different language (SQL)).
Debugging is hard enough in one language--trying to debug a command that generates a command for another language is even harder (especially when escaping quotation-marks, it's easy to end up with strings like \\\\\"some value\\\\\"...)
Who says spawning a shell process is bad practice? Beware the dogmatists. There is no hard and fast rule that will define when to do it or not to do it. In your example, when you started shelling out to curl, you finished your project faster and you got better performance.
The proof is always in the pudding.
As far as performance goes, forking (and exec'ing) a new process induces a hit so you should avoid it for operations that are short. But if the sub-process runs for a few seconds, you won't notice the 25ms (just a place holder #) it takes to spin it up. But if there's a transient function that runs very quick, that you call often, calling it via sub-shell will induce a significant performance hit.
One thing about subprocesses is that they are independently testable from the command line. So they are really stand alone tools, and this can be highly useful for some problems.
One last thing to consider. If you believe in the "right tool for the job", and the right tool happens to already to be on the box, and you can solve the task at hand by shelling out to it, then why not? I've seen so much code in my life that was ultimately irrelevant as the problem was already solved by some freely available (and already installed) tool. It just happened to not fit into the monolithic (read single-tool) implementation environment chosen by the programmers.
The corollary being "if all you have is a hammer, everything looks like a nail". Don't be afraid to reach for the screwdriver, and beware the "one hammer to rule them all" cultists.
I understand that PHP does not support multithreading but I would love to know if there is a good workaround for executing several functions in php concurrently?? I wrote some code that calculates moments of invariance. There are seven functions calculating each moment with each moment subsequently slower to fully execute than the next. Any suggestions welcomed. Thanks
It seems gearman is what you need. There is also a php extension
Also take a look at the pcntl_fork function (pcntl_fork)
I generally use this to spawn children from a worker. Then I use the main thread to watch the children and handle harvesting dead children and spawning new ones.
Leaving aside invoking new processes via fork, most modern operating systems (even including MSWindows) have the facility to spawn non-blocking processes from the shell, albeit that the syntax varies. So you could use the various program execution functions to invoke them.
Another approach would be to split the functionality into multiple URLs (probably restricting access to localhost) then using curl_multi_exec() to invoke them from a controlling script (note that this is likely to be less efficient than running them as seperate processes, which in turn will be less efficient than running via fork).
However any discussion of how to shard a process across mutliple threads / processes is predicated by the question of whether the process itself is shardable. Also, whether sharding will improve performance. I'll leave those questions to you.
I have a site heavily developed in PHP, but due to a lack of support for threading, I've decided to use Ruby.
I'd like to know the fastest way to execute a Ruby script and pass some data from PHP, that will in turn execute multiple PHP scripts from Ruby, and pass the results back to the original PHP script.
I'm quite sure this will never net you any gain over a standard multiprocess model like Apache mod_php, and ruby does not seem like the optimal language to try. What do you gain by running something in threads? Basically you share the code, keep it in memory, and then you can hopefully saturate multiple processors by keeping busy during io-wait etc. But it requires certain safety measures (thread-safety). So the first question is how can you keep a bunch of PHP code in memory to serve multiple processes? Honestly I don't know enough about PHP internals to know how that would be possible, but I guess Zend Cache is where I would start investigating.
Why would you prefer to keep from using bash commands via exec() in php?
I do not consider portability issue (I definitely will not port it to run on Windows). That's just a matter of a good way of writing scripts.
On the one hand:
I need to write much more lines in php then in bash to accomplish the same task.
For example, when I need to filter some lines in a file, I just can't imaging using something instead of cat file | grep string > new_file. This would take much more time and effort to do in php.
I do not want to analyze all situations when something might go wrong. I will just show bash command output to the user, so he would know what exactly happened.
I do not need to write another wrapper around filesystem functions and use it. It is much more efficient to leverage the OS for file searching, manipulation etc.
On the other hand:
Calling unix command with exec() might be inefficient in most cases. It is quite expensive to spawn a separate process. Not talking about scripts running under apache, which is even much less efficient than spawning from command line scripts.
Sometimes it turns out to be 'black magic-like' and perl-like scripting. Though it can be avoided via detailed comments.
Maybe I'm just trying to use two different tools together when they are not supposed to. Each tool has its own application and should not be mixed together.
Even though I'm sure users will not try to run script will malicious purposes, using exec() is a potential security threat. In most cases user data can be escaped with escapeshellarg(), but it is still an issue to take into account.
another reason to avoid this is that it's much easier to create security holes like this.
for example, if a user manage to sneak
`rm -rf /`
(With backticks) into the input, your bash code might actually nuke the server (or nuke something at least).
this is mostly a religious thing, most developers try to write code that always works. relying on external commands is a sure way to get your code to fail on some systems (even on the same OS).
What are you trying to achieve? PHP has regex-based functions to find what you need from a file. Yes, you would probably need about 5 lines of code to do it, but it would probably be no more or less efficient.
The main reason against using exec() in PHP is for security. If you're trusting your user to give you a command to exec() in bash, they could easily run malicious commands, such as installing and starting backdoor trojans, removing files, and the like.
As long as you're careful though (use the shell escaping commands to clean user input, restrict the Apache user permissions etc) it shouldn't be a problem. I'm just working on a complete platform at the moment, which relies on the front-end executing shell processes simply because C++ is much faster than PHP, so I've written a lot of the backend logic as a shell application and keep PHP for the front-end logic.
Even though you say portability isn't an issue, you never know for certain what the future holds, so I'd encourage you to reconsider that position. For example, I was once asked to port an editor that was written (by someone else) from Unix to DOS. The original program wasn't expected to be ported and was written with Unix specific calls deeply embedded in the code. After reviewing the amount of work required, we abandoned the task as too time consuming.
I have used exec calls in PHP; however, I had no other way to accomplish what I needed (I had to call another program written in another language with no other bridge between the languages). However, IMO, exec calls which aren't necessary are ugly. As others have said, they can also create security risks and slow your program down.
As you said yourself, you need to document the exec calls well to be sure they'll be understood by programmers. Why create the extra work? Not just now but in the future, when any changes to the exec call will also need to be documented.
Finally, I suggest you learn PHP and its functions a bit better. I'm not that good with PHP, but in just a matter of minutes with Google and php.net, I think I accomplished the same thing you gave as an example with:
$search_results = preg_grep($search_string, file($file_name));
foreach ($search_results as $result) {
echo $result . "\n";
}
Yes, it's a bit more code, but not that much, and you can put it in a function if appropriate ... and I wouldn't be surprised if a PHP guru could shorten it.
IMHO, the main concern with using exec() to execute *nix commands via PHP is security, more than performance or even code style.
If you have a very good input sanitization (and this is very hard to achieve), you may be able not to have any security hole.
Personally, if portability isn't an issue, I would totally use *nix commands like grep, locate, etc. anyday over trying to duplicate that functionality in PHP.
It's about using the best tool for the job. In some cases, arguably more often than most people realize, it is much more efficient to leverage the OS for file searching, manipulation, etc. (amongst other things)
Lot of people would descend on your like a ton of bricks for even mentioning using exec. Some people would consider is blasphemy but not me. I can see nothing wrong with exec for some situations if your server has been properly configured. The disadvantage though is that you are spawning another process.
If you are running your PHP using a web server, the "user" that runs the script may not have permission to run certain shell commands. you said portability is not an issue, but i can guarantee to you that it IS an issue, (unless you are creating PHP scripts for fun). In the business world where things and condition changes fast, you won't know you might one day have to run your scripts on other platforms.
It is not secure unless you take extreme precautions to make sure it can't be used by people executing the code.
php is not a good executor. php spawns a process from apache, and if that process hangs, your apache server will hang, if your site is also running on the same apache; it will fail.
You can expect to have silly issues like these as well, if it happens you can't even restart apache without killing the spawned process manually from shell.
http://bugs.php.net/bug.php?id=38915
therefore, i'm not talking about security, running linux commands from php fails more than you'd think, worst part of using exec, it's not always possible to get error messages back to php. or write subsequent method that depends on what happened with exec.
consider this pseudo example:
exec ('bash myscript.sh',$x)
if (myScriptWasOk == true) then do this
There is no way that you get that 'myScriptWasOk' variable right. You just don't know anything about it, $x will help you sometimes.
All this being said, if u need something simple, and if you tested your script and it works ok, just go for it.
If you are only aiming for Unix compatibility (which is perfectly fine), I can't see anything wrong with it. Virtually server operating system available today is a Unix clone, except of course for Windows which I think is ridiculous to use as a server platform in the first place (and I'm talking from experience here, this is not just Microsoft hatred). Unix-compatibility is a perfectly legitimate requirement on any server in my opinion.
The only real reason I can see to avoid it is performance. You will quickly find that executing external processes in general is extremely slow. It's slow in C, and it's slow in PHP. I would think that's the biggest real, non-religious concern.
EDIT:
Oh, and as for the security problem, that's a simple matter of making sure that you are in total control of the variables passed to the operating system. It's a concern you have to make when communicating between processes and languages anyway, for example when you do SQL queries. It's not a big enough reason in my opinion to not do something, it's just something that has to be taken into account in this case, like in every case.
If portability really isn't an issue, because you are building a company solution that is always going to be on your own, totally controlled servers, I say go for shell commands as much as you want to. There is no inherent security problem as long as you do proper basic sanitation using escapeshellarg() and consorts.
At the same time, in my projects portability mostly is an issue, and when it is, I try not to use shell commands at all - only when something can't be done in PHP at all (e.g. MP3 decoding/encoding, ImageMagick, Video operations) or not reasonably (i.e. a PHP based solution is way too slow) will I use external commands.
I wish to create a background process and I have been told these are usually written in C or something of that sort. I have recently found out PHP can be used to create a daemon and I was hoping to get some advice if I should make use of PHP in this way.
Here are my requirements for a daemon.
Continuously check if a row has been
added to MySQL database table
Run FFmpeg commands on what was
retrieved from database
Insert output into MySQL table
I am not sure what else I can offer to help make this decision. Just to add, I have not done C before. Only Java and PHP and basic bash scripting.
Does it even make that much of a performance difference?
Please allow for my ignorance, I am learning! :)
Thanks all
As others have noted, various versions of PHP have issues with their garbage collectors. Of course, if you know that your version does not have such issues, you eliminate that problem. The point is, you don't know (for sure) until you write the daemon and run it through valgrind to see if the installed PHP leaks or not on any given machine. So on that hand, you may write it just to discover that what Zend thinks is fixed might still be buggy, or you are dealing with a slightly older version of PHP or some extension. Icky.
The other problem is somewhat buggy signals. In my experience, signal handlers are not always entered correctly with PHP, especially when the signal is queued instead of merged. That may not be an issue for you, i.e. if you just need to handle SIGINT/SIGUSR1/SIGUSR2/SIGHUP.
So, I suggest:
If the daemon is simple, go ahead and use PHP. If it looks like its going to get rather complex, or allocate lots of memory, you might consider writing it in C after prototyping it in PHP.
I am a pretty die hard C person. However, I see nothing wrong with hammering out something quick using PHP (beyond the cases that I explained). I also see nothing wrong with using PHP to prototype something that may or may not be later rewritten in C. For instance, handling database stuff is going to be much simpler if you use PHP, versus managing callbacks using other interfaces in C. So in that instance, for a 'one off', you will surely get it done much faster.
I would be inclined to perform this task with a cron job, rather than polling the database in a daemon.
It's likely that your FFmpeg command will take a while to do it's thing, right? In that case, is it really necessary to be constantly polling the database? Wouldn't a cronjob running each minute (or every five, ten or twenty minutes for that matter) be a simpler way to achieve the same thing?
Php isn't any better or worse for this kind of thing than any of the other common scripting languages. It has fairly complete access to all of the system calls and library utilities you would need to do this sort of work. If you are most comfortable using PHP for scripting, then php will do the job for you.
The only down side is that php is not quite as ubiquitous as, say, perl or python, which is installed on almost every flavor of unix. Php is only found on systems that are going to be serving dynamic web content. Not that a Php interpreter is too large or costly to install also, but if your biggest concern is getting your program to many systems, that may be a slight hurdle.
I'll be contrary and recommend you try the php daemon. It's apparently the language you know the best. You'll presumably incorporate a timer in any case, so you can duplicate the querying frequency on the database. There's really no penalty as long as you aren't naively looping on a query.
If it's something not executed frequently, you could alternatively run the php from cron, letting youor code drain the queue and then die.
But don't be afraid to stick with what you know best, as a first approximation.
Try not to use triggers. They'll impose unnecessary coupling, and they're no fun to test and debug.
One problem with properly daemonizing a PHP script is that PHP doesn't have interfaces to the dup() or dup2() syscalls, which are needed for detaching the file descriptors.
A cron-job would probably work just fine, if near-instant actions is not required.
I'm just about to put live, a system I've built, based on the queueing daemon 'beanstalkd'. I send various small messages from (in this case, PHP) webpage calls to the daemon, and a PHP script then picks them up from the queue and performs various tasks, such as resizing images or checking databases (often passing info back via a Memcache-based store).
To avoid long-running processes, I've wrapped it in a BASH script, that, depending on the value returned from the script ("exit(1);") will restart the script, for every (say) 50 tasks it's performed. If it's restarting because I plan it to, it will do so instantly, any other exit value (the default is 0, so I don't use that) would pause a few seconds before it was restarted.
Running as a cron job with sensibly determined periodicity, a PHP script can do the job, and production stability is certainly achievable. You might want to limit the number of simultaneous FFMpeg instances, and be sure to have complete application logging and exception handling. I have implemented continuously running polling processes in Java, as well as the every-ten-minute cron'd PHP script, and both do the job nicely.
You might want to consider making a mysql trigger that executes a system command (i.e. FFmpeg) instead of a daemon. If some lag isn't a problem, you could also put something in cron that executes every few minutes to check. Cron would be my choice, if it is an option.
To answer your question, php is perfectly fine to run as a daemon. It does not have to be done in C.
If you combine the answers from Kent Fredric, tokenmacguy and Domster you get something useful.
php is probably not good for long execution times,
so let's keep every execution cycle short and make sure the OS takes care of the cleanup of any memoryleaks.
As a tool to start your php script cron can be a good tool.
And if you do it like that, there is not much difference between languages.
However, the question still stands.
Is php even capable to run as a normal daemon for long times (some years)?
Or will assorted memoryleaks eat up all your ram and kill the system?
/Johan
If you do so, pay attention to memory leaks. PHP 5.2 has some problems with its garbage collector, according to this (fixed in 5.3). Perhaps its better to use cron, so the script starts clean every run.
For what you've described, I would go with a daemon. Make sure that you stick a sleep in the poll loop, so that you don't bombard the database when there are no new tasks. A cronjob works better for workflow/report type of jobs, where there isn't some particular event that triggers the next run.
As mentioned, PHP has some problems with memory management. You need to be sure that you test your code for memory leaks, since these would build up over time, in a long running script. PHP doesn't have real garbage collection - It relies on reference counting, which means that cyclic references will cause leaks. If you're aware of this, you can code around it.
If you do decided to go down the daemon route, there is a great PEAR module called System_Daemon which I've recently used successfully on a PHP v5.3.0 installation. It is documented on the authors blog: http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php
If you have PEAR installed, you can install this module using:
pear install -f System_Daemon
You will also need to create a initialisation script: /etc/init.d/<your_daemon_name>
Then you can:
Start Daemon: /etc/init.d/projNotifMailDaemon start
Stop Daemon: /etc/init.d/projNotifMailDaemon stop
Logs are kept at: /var/log/<your_daemon_name>.log
I wouldn't recommend it. PHP is not designed for longterm execution. Its designed primarily with short lived pages.
In my experience PHP can have problems with leaking memory for some of the larger tasks.
A cron job and a little bit of bash scripting should be everything you need by the sounds of it. You can do things like:
$file=`mysqlquery -h server < "select file from table;"`
ffmpeg $file -fps 50 output.a etc.
so bash would be easier to write, port and maintain IMHO than to use PHP.
If you know what you are doing sure. You need to understand your operating system well. PHP generally isn't suited for most daemons because it isn't threaded and doesn't have a decent event based system for all tasks. However if it suits your needs then no problem. Modern PHP (5.3+) is really stable and doesn't have any memory leaks. As long as you enable the GC and don't implement your own memory leaks, etc you'll be fine.
Here are the stats for one daemon I am running:
uptime 17 days (last restart due to PHP upgrade).
bytes written: 200GB
connections: hundreds
connections handled, hundreds of thousands
items/requests processed: millions
node.js is generally better suited although has some minor annoyances. Some attempts to improve PHP in the same areas have been made but they aren't really that great.
Cron job? Yes.
Daemon which runs forever? No.
PHP does not have a garbage collector (or at least, last time I checked it did not). Therefore, if you create a circular reference, it NEVER gets cleaned up - at least not until the main script execution finishes. In daemon process this is approximately never.
If they've added a GC in new versions, then yes you can.
Go for it. I had to do it once also.
Like others said, it's not ideal but it'll get-er-done. Using Windows, right? Good.
If you only need it to run occasionally (Once per hour, etc).
Make a new shortcut to your firefox, place it somewhere relevant.
Open up the properties for the shortcut, change "Target" to:
"C:\Program Files\Mozilla Firefox\firefox.exe" http://localhost/path/to/script.php
Go to Control Panel>Scheduled Tasks
Point your new scheduled task at the shortcut.
If you need it to run constantly or pseudo-constantly, you'll need to spice the script up a bit.
Start your script with
set_time_limit(0);
ob_implicit_flush(true);
If the script uses a loop (like while) you have to clear the buffer:
$i=0;
while($i<sizeof($my_array)){
//do stuff
flush();
ob_clean();
sleep(17);
$i++;
}