Why shouldn't I use unix commands from php? - php

Why would you prefer to keep from using bash commands via exec() in php?
I do not consider portability issue (I definitely will not port it to run on Windows). That's just a matter of a good way of writing scripts.
On the one hand:
I need to write much more lines in php then in bash to accomplish the same task.
For example, when I need to filter some lines in a file, I just can't imaging using something instead of cat file | grep string > new_file. This would take much more time and effort to do in php.
I do not want to analyze all situations when something might go wrong. I will just show bash command output to the user, so he would know what exactly happened.
I do not need to write another wrapper around filesystem functions and use it. It is much more efficient to leverage the OS for file searching, manipulation etc.
On the other hand:
Calling unix command with exec() might be inefficient in most cases. It is quite expensive to spawn a separate process. Not talking about scripts running under apache, which is even much less efficient than spawning from command line scripts.
Sometimes it turns out to be 'black magic-like' and perl-like scripting. Though it can be avoided via detailed comments.
Maybe I'm just trying to use two different tools together when they are not supposed to. Each tool has its own application and should not be mixed together.
Even though I'm sure users will not try to run script will malicious purposes, using exec() is a potential security threat. In most cases user data can be escaped with escapeshellarg(), but it is still an issue to take into account.

another reason to avoid this is that it's much easier to create security holes like this.
for example, if a user manage to sneak
`rm -rf /`
(With backticks) into the input, your bash code might actually nuke the server (or nuke something at least).
this is mostly a religious thing, most developers try to write code that always works. relying on external commands is a sure way to get your code to fail on some systems (even on the same OS).

What are you trying to achieve? PHP has regex-based functions to find what you need from a file. Yes, you would probably need about 5 lines of code to do it, but it would probably be no more or less efficient.
The main reason against using exec() in PHP is for security. If you're trusting your user to give you a command to exec() in bash, they could easily run malicious commands, such as installing and starting backdoor trojans, removing files, and the like.
As long as you're careful though (use the shell escaping commands to clean user input, restrict the Apache user permissions etc) it shouldn't be a problem. I'm just working on a complete platform at the moment, which relies on the front-end executing shell processes simply because C++ is much faster than PHP, so I've written a lot of the backend logic as a shell application and keep PHP for the front-end logic.

Even though you say portability isn't an issue, you never know for certain what the future holds, so I'd encourage you to reconsider that position. For example, I was once asked to port an editor that was written (by someone else) from Unix to DOS. The original program wasn't expected to be ported and was written with Unix specific calls deeply embedded in the code. After reviewing the amount of work required, we abandoned the task as too time consuming.
I have used exec calls in PHP; however, I had no other way to accomplish what I needed (I had to call another program written in another language with no other bridge between the languages). However, IMO, exec calls which aren't necessary are ugly. As others have said, they can also create security risks and slow your program down.
As you said yourself, you need to document the exec calls well to be sure they'll be understood by programmers. Why create the extra work? Not just now but in the future, when any changes to the exec call will also need to be documented.
Finally, I suggest you learn PHP and its functions a bit better. I'm not that good with PHP, but in just a matter of minutes with Google and php.net, I think I accomplished the same thing you gave as an example with:
$search_results = preg_grep($search_string, file($file_name));
foreach ($search_results as $result) {
echo $result . "\n";
}
Yes, it's a bit more code, but not that much, and you can put it in a function if appropriate ... and I wouldn't be surprised if a PHP guru could shorten it.

IMHO, the main concern with using exec() to execute *nix commands via PHP is security, more than performance or even code style.
If you have a very good input sanitization (and this is very hard to achieve), you may be able not to have any security hole.

Personally, if portability isn't an issue, I would totally use *nix commands like grep, locate, etc. anyday over trying to duplicate that functionality in PHP.
It's about using the best tool for the job. In some cases, arguably more often than most people realize, it is much more efficient to leverage the OS for file searching, manipulation, etc. (amongst other things)

Lot of people would descend on your like a ton of bricks for even mentioning using exec. Some people would consider is blasphemy but not me. I can see nothing wrong with exec for some situations if your server has been properly configured. The disadvantage though is that you are spawning another process.

If you are running your PHP using a web server, the "user" that runs the script may not have permission to run certain shell commands. you said portability is not an issue, but i can guarantee to you that it IS an issue, (unless you are creating PHP scripts for fun). In the business world where things and condition changes fast, you won't know you might one day have to run your scripts on other platforms.

It is not secure unless you take extreme precautions to make sure it can't be used by people executing the code.

php is not a good executor. php spawns a process from apache, and if that process hangs, your apache server will hang, if your site is also running on the same apache; it will fail.
You can expect to have silly issues like these as well, if it happens you can't even restart apache without killing the spawned process manually from shell.
http://bugs.php.net/bug.php?id=38915
therefore, i'm not talking about security, running linux commands from php fails more than you'd think, worst part of using exec, it's not always possible to get error messages back to php. or write subsequent method that depends on what happened with exec.
consider this pseudo example:
exec ('bash myscript.sh',$x)
if (myScriptWasOk == true) then do this
There is no way that you get that 'myScriptWasOk' variable right. You just don't know anything about it, $x will help you sometimes.
All this being said, if u need something simple, and if you tested your script and it works ok, just go for it.

If you are only aiming for Unix compatibility (which is perfectly fine), I can't see anything wrong with it. Virtually server operating system available today is a Unix clone, except of course for Windows which I think is ridiculous to use as a server platform in the first place (and I'm talking from experience here, this is not just Microsoft hatred). Unix-compatibility is a perfectly legitimate requirement on any server in my opinion.
The only real reason I can see to avoid it is performance. You will quickly find that executing external processes in general is extremely slow. It's slow in C, and it's slow in PHP. I would think that's the biggest real, non-religious concern.
EDIT:
Oh, and as for the security problem, that's a simple matter of making sure that you are in total control of the variables passed to the operating system. It's a concern you have to make when communicating between processes and languages anyway, for example when you do SQL queries. It's not a big enough reason in my opinion to not do something, it's just something that has to be taken into account in this case, like in every case.

If portability really isn't an issue, because you are building a company solution that is always going to be on your own, totally controlled servers, I say go for shell commands as much as you want to. There is no inherent security problem as long as you do proper basic sanitation using escapeshellarg() and consorts.
At the same time, in my projects portability mostly is an issue, and when it is, I try not to use shell commands at all - only when something can't be done in PHP at all (e.g. MP3 decoding/encoding, ImageMagick, Video operations) or not reasonably (i.e. a PHP based solution is way too slow) will I use external commands.

Related

does calling a shell command from within a scripting language slow down performance?

When writing python, perl, ruby, or php
I'll often use ...
PERL:
`[SHELL COMMAND HERE]`
system("[SHELL]", "[COMMAND]", "[HERE]")
Python
import os
os.system("[SHELL COMMAND HERE]")
from subprocess import call
call("[SHELL]", "[COMMAND]", "[HERE]")
ruby
`[SHELL COMMAND HERE]`
system("[SHELL COMMAND HERE]")
PHP
shell_exec ( "SHELL COMMAND HERE" )
How much does spawning a subprocess in the shell slow down the performance of a program?
For example, I was just writing a script with perl and libcurl, and it was difficult, with all of libcurl's parameters, to get it to work. I stopped using libcurl and just started using curl and the performance seemed to IMPROVE, scripting became much easier, and furthermore, I could run my script on systems that only had basic perl (no cpan modules) and the basic shell utilities installed.
Why is spawning this subshell considered bad programming practice? Should it be, always in theory, much slower than using a specific binding/equivalent library within the language?
The first reason why executing shell commands is bad is maintainability. Context switching between tasks is bad enough without language switching. Security is also a consideration but coding practice will make it less significant (avoid injections, ...)
There are several factors that impact performance:
Forking a process: This takes a while but in case the code being executed performs well, this becomes less significant.
Optimization becomes impossible: When the control is handed over to another process, the interpreter or compiler cannot perform any optimizations. Also, you cannot perform any optimizations.
Blocking: Shell commands are blocking operations. They will not be scheduled like a native part of the code would.
Parsing: If there is a need to do something about the output, it needs to be parsed. In native code, the data would already be in a relevant data structure. Parsing is also prone to errors.
Command line generation: Generating a command line for an executable may require iterating. Sometimes that takes more cycles than performing the same natively.
Most of these problems arise when the external command is executed in a loop. It may be easy to find examples where none of these become a problem.
Ferrix stated several of the performance-related issues quite nicely.
Regarding security and maintainability, I would submit the following:
Portability/isolation from external dependencies
Sure, you can shell out to call wget--if you're on Linux. On Windows or Mac, it'll die horribly, and you'll either have to explain to your boss why you have to re-write it to use the built-in methods, or support the users/co-workers who need to use your tool (neither of which will be very fun).
Someday you'll spend hours trying to figure out why your script no longer works, only to find that the upgraded version of your external program needs different command-line parameters and no longer works the way your code expects.
Escape characters in one language (Perl/Python/PHP) don't necessarily map to escape characters in the shell language (ex: an SQL-injection attack is arguably the result of non-escaped characters in one language (HTML) being mixed with a different language (SQL)).
Debugging is hard enough in one language--trying to debug a command that generates a command for another language is even harder (especially when escaping quotation-marks, it's easy to end up with strings like \\\\\"some value\\\\\"...)
Who says spawning a shell process is bad practice? Beware the dogmatists. There is no hard and fast rule that will define when to do it or not to do it. In your example, when you started shelling out to curl, you finished your project faster and you got better performance.
The proof is always in the pudding.
As far as performance goes, forking (and exec'ing) a new process induces a hit so you should avoid it for operations that are short. But if the sub-process runs for a few seconds, you won't notice the 25ms (just a place holder #) it takes to spin it up. But if there's a transient function that runs very quick, that you call often, calling it via sub-shell will induce a significant performance hit.
One thing about subprocesses is that they are independently testable from the command line. So they are really stand alone tools, and this can be highly useful for some problems.
One last thing to consider. If you believe in the "right tool for the job", and the right tool happens to already to be on the box, and you can solve the task at hand by shelling out to it, then why not? I've seen so much code in my life that was ultimately irrelevant as the problem was already solved by some freely available (and already installed) tool. It just happened to not fit into the monolithic (read single-tool) implementation environment chosen by the programmers.
The corollary being "if all you have is a hammer, everything looks like a nail". Don't be afraid to reach for the screwdriver, and beware the "one hammer to rule them all" cultists.

How to safely allow users submit code to run periodically?

Basically I need to allow users to submit code to be run periodically server side.
The users should submit simple scripts and I'll run their code server side to determine who came up with a better solution. I created a simple submit form and the code is stored on an SQL database.
I'm obviously worried about safety but I also don't know which language to use. I need an scripting language with an easy syntaxis that let's me limit the number of things users can do (I only need to let them define variables, create functions, use loops and some array and algebraic functions). Maybe even create a pseudolanguage with an easy syntaxis.
So basically:
What language could I use?
How do I run users codes periodically? (only know about cronjobs but I don't know if they will allow for long execution times)
Would it be a good idea to create a pseudolanguage? If it is please point me in the right direction
What language: Well, you could use any language, just make sure you have minimal permissions. A scripting language like Ruby or Python would be easier though.
If this task would fall on my lap I'd look into pythons virtualenv so that i have an environment that is isolated. Then obviously I'd make really sure about the permissions of the script running the uploaded programs.
This also means that you could set up a python environment for each user using this service.
Well yeah, cron works.
Indeed, but the scope for a good answer doesn't really fit here. But google DSL or Domain Specific Language and you're sure to find some tutorials.
If you're targeting PHP specifically you can use the runkit extension - specifically created to run user-supplied PHP code:
http://www.php.net/manual/en/intro.runkit.php
There's also a newer runkit project available (though you'll have to compile it manually):
https://github.com/zenovich/runkit/
Q1. What language could I use?
A1. Pretty much any. Because compliers would add to the complexity of the system, an interpreted (or JIT-compiled) language would be preferable.
Q2. How do I run users codes periodically? (only know about cronjobs but I don't know if they will allow for long execution times)
A2. cron jobs are probably the way to go. It doesn't care about execution times. However that means it is your job to make sure you only restart a job if the prior run has finished (assuming that is what you'd like it to do)
Q3. Would it be a good idea to create a pseudolanguage? If it is please point me in the right direction
A3. Inventing the wheel rarely is a good idea. You could do this, but there is reasonable doubt that it is necessary and/or advisable.
My personal pointer would go towards JavaScript as scripting language - since it is so widespread there are tons of tools and documentation around. So you might want to look at Node.js and this sandboxing model to run it server-side.

Efficient PHP programming using exec

I writing a PHP script program under Linux. In the script, I need call many other system tools/programs using exec to achieve some goals. I know that whenever I run a shell script in terminals, a new child process will be created and run with the parent. If I use too many exec in my PHP script and there should be many processes running back and forth, I assume that would be inefficient because processes are heavy-weighted.
Here is my question: what are the efficient ways and common patterns to approach programming goal in Linux? Will PHP ideal in such situation?
Even the overhead of using exec is more than just a standard PHP function call, I would not consider it expensive at all. It is a pretty effective way of doing things and when you keep security considerations in mind, I'd say there is nothing wrong with it.
You might ask if pre-mature optimization is worth the trouble? I'd say no then.

Is it wise to use PHP for a daemon?

I wish to create a background process and I have been told these are usually written in C or something of that sort. I have recently found out PHP can be used to create a daemon and I was hoping to get some advice if I should make use of PHP in this way.
Here are my requirements for a daemon.
Continuously check if a row has been
added to MySQL database table
Run FFmpeg commands on what was
retrieved from database
Insert output into MySQL table
I am not sure what else I can offer to help make this decision. Just to add, I have not done C before. Only Java and PHP and basic bash scripting.
Does it even make that much of a performance difference?
Please allow for my ignorance, I am learning! :)
Thanks all
As others have noted, various versions of PHP have issues with their garbage collectors. Of course, if you know that your version does not have such issues, you eliminate that problem. The point is, you don't know (for sure) until you write the daemon and run it through valgrind to see if the installed PHP leaks or not on any given machine. So on that hand, you may write it just to discover that what Zend thinks is fixed might still be buggy, or you are dealing with a slightly older version of PHP or some extension. Icky.
The other problem is somewhat buggy signals. In my experience, signal handlers are not always entered correctly with PHP, especially when the signal is queued instead of merged. That may not be an issue for you, i.e. if you just need to handle SIGINT/SIGUSR1/SIGUSR2/SIGHUP.
So, I suggest:
If the daemon is simple, go ahead and use PHP. If it looks like its going to get rather complex, or allocate lots of memory, you might consider writing it in C after prototyping it in PHP.
I am a pretty die hard C person. However, I see nothing wrong with hammering out something quick using PHP (beyond the cases that I explained). I also see nothing wrong with using PHP to prototype something that may or may not be later rewritten in C. For instance, handling database stuff is going to be much simpler if you use PHP, versus managing callbacks using other interfaces in C. So in that instance, for a 'one off', you will surely get it done much faster.
I would be inclined to perform this task with a cron job, rather than polling the database in a daemon.
It's likely that your FFmpeg command will take a while to do it's thing, right? In that case, is it really necessary to be constantly polling the database? Wouldn't a cronjob running each minute (or every five, ten or twenty minutes for that matter) be a simpler way to achieve the same thing?
Php isn't any better or worse for this kind of thing than any of the other common scripting languages. It has fairly complete access to all of the system calls and library utilities you would need to do this sort of work. If you are most comfortable using PHP for scripting, then php will do the job for you.
The only down side is that php is not quite as ubiquitous as, say, perl or python, which is installed on almost every flavor of unix. Php is only found on systems that are going to be serving dynamic web content. Not that a Php interpreter is too large or costly to install also, but if your biggest concern is getting your program to many systems, that may be a slight hurdle.
I'll be contrary and recommend you try the php daemon. It's apparently the language you know the best. You'll presumably incorporate a timer in any case, so you can duplicate the querying frequency on the database. There's really no penalty as long as you aren't naively looping on a query.
If it's something not executed frequently, you could alternatively run the php from cron, letting youor code drain the queue and then die.
But don't be afraid to stick with what you know best, as a first approximation.
Try not to use triggers. They'll impose unnecessary coupling, and they're no fun to test and debug.
One problem with properly daemonizing a PHP script is that PHP doesn't have interfaces to the dup() or dup2() syscalls, which are needed for detaching the file descriptors.
A cron-job would probably work just fine, if near-instant actions is not required.
I'm just about to put live, a system I've built, based on the queueing daemon 'beanstalkd'. I send various small messages from (in this case, PHP) webpage calls to the daemon, and a PHP script then picks them up from the queue and performs various tasks, such as resizing images or checking databases (often passing info back via a Memcache-based store).
To avoid long-running processes, I've wrapped it in a BASH script, that, depending on the value returned from the script ("exit(1);") will restart the script, for every (say) 50 tasks it's performed. If it's restarting because I plan it to, it will do so instantly, any other exit value (the default is 0, so I don't use that) would pause a few seconds before it was restarted.
Running as a cron job with sensibly determined periodicity, a PHP script can do the job, and production stability is certainly achievable. You might want to limit the number of simultaneous FFMpeg instances, and be sure to have complete application logging and exception handling. I have implemented continuously running polling processes in Java, as well as the every-ten-minute cron'd PHP script, and both do the job nicely.
You might want to consider making a mysql trigger that executes a system command (i.e. FFmpeg) instead of a daemon. If some lag isn't a problem, you could also put something in cron that executes every few minutes to check. Cron would be my choice, if it is an option.
To answer your question, php is perfectly fine to run as a daemon. It does not have to be done in C.
If you combine the answers from Kent Fredric, tokenmacguy and Domster you get something useful.
php is probably not good for long execution times,
so let's keep every execution cycle short and make sure the OS takes care of the cleanup of any memoryleaks.
As a tool to start your php script cron can be a good tool.
And if you do it like that, there is not much difference between languages.
However, the question still stands.
Is php even capable to run as a normal daemon for long times (some years)?
Or will assorted memoryleaks eat up all your ram and kill the system?
/Johan
If you do so, pay attention to memory leaks. PHP 5.2 has some problems with its garbage collector, according to this (fixed in 5.3). Perhaps its better to use cron, so the script starts clean every run.
For what you've described, I would go with a daemon. Make sure that you stick a sleep in the poll loop, so that you don't bombard the database when there are no new tasks. A cronjob works better for workflow/report type of jobs, where there isn't some particular event that triggers the next run.
As mentioned, PHP has some problems with memory management. You need to be sure that you test your code for memory leaks, since these would build up over time, in a long running script. PHP doesn't have real garbage collection - It relies on reference counting, which means that cyclic references will cause leaks. If you're aware of this, you can code around it.
If you do decided to go down the daemon route, there is a great PEAR module called System_Daemon which I've recently used successfully on a PHP v5.3.0 installation. It is documented on the authors blog: http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php
If you have PEAR installed, you can install this module using:
pear install -f System_Daemon
You will also need to create a initialisation script: /etc/init.d/<your_daemon_name>
Then you can:
Start Daemon: /etc/init.d/projNotifMailDaemon start
Stop Daemon: /etc/init.d/projNotifMailDaemon stop
Logs are kept at: /var/log/<your_daemon_name>.log
I wouldn't recommend it. PHP is not designed for longterm execution. Its designed primarily with short lived pages.
In my experience PHP can have problems with leaking memory for some of the larger tasks.
A cron job and a little bit of bash scripting should be everything you need by the sounds of it. You can do things like:
$file=`mysqlquery -h server < "select file from table;"`
ffmpeg $file -fps 50 output.a etc.
so bash would be easier to write, port and maintain IMHO than to use PHP.
If you know what you are doing sure. You need to understand your operating system well. PHP generally isn't suited for most daemons because it isn't threaded and doesn't have a decent event based system for all tasks. However if it suits your needs then no problem. Modern PHP (5.3+) is really stable and doesn't have any memory leaks. As long as you enable the GC and don't implement your own memory leaks, etc you'll be fine.
Here are the stats for one daemon I am running:
uptime 17 days (last restart due to PHP upgrade).
bytes written: 200GB
connections: hundreds
connections handled, hundreds of thousands
items/requests processed: millions
node.js is generally better suited although has some minor annoyances. Some attempts to improve PHP in the same areas have been made but they aren't really that great.
Cron job? Yes.
Daemon which runs forever? No.
PHP does not have a garbage collector (or at least, last time I checked it did not). Therefore, if you create a circular reference, it NEVER gets cleaned up - at least not until the main script execution finishes. In daemon process this is approximately never.
If they've added a GC in new versions, then yes you can.
Go for it. I had to do it once also.
Like others said, it's not ideal but it'll get-er-done. Using Windows, right? Good.
If you only need it to run occasionally (Once per hour, etc).
Make a new shortcut to your firefox, place it somewhere relevant.
Open up the properties for the shortcut, change "Target" to:
"C:\Program Files\Mozilla Firefox\firefox.exe" http://localhost/path/to/script.php
Go to Control Panel>Scheduled Tasks
Point your new scheduled task at the shortcut.
If you need it to run constantly or pseudo-constantly, you'll need to spice the script up a bit.
Start your script with
set_time_limit(0);
ob_implicit_flush(true);
If the script uses a loop (like while) you have to clear the buffer:
$i=0;
while($i<sizeof($my_array)){
//do stuff
flush();
ob_clean();
sleep(17);
$i++;
}

In need to program an algorithem to be very fast, should I do it as php extension, or some otherway?

Most of my application is written in PHP ((Front and Back ends).
There is a part that works too slowly and I will need to rewrite it, probably not in PHP.
What will give me the following:
1. Most speed
2. Fastest development
3. Easily maintained.
I have in my mind to rewrite this piece of code in CPP as a PHP extension, but may be I am locked on this solution and misses some simpler/better solutions?
The algorithm is PorterStemmerAlgorithm on several MB of data each time it is run.
The answer really depends on what kind of process it is.
If it is a long running process (at least seconds) then perhaps an external program written in C++ would be super easy. It would not have the complexities of a PHP extension and it's stability would not affect PHP/apache. You could communicate over pipes, shared memory, or the sort...
If it is a short running process (measured in ms) then you will most likely need to write a PHP extension. That would allow it to be invoked VERY fast with almost no per-call overhead.
Another possibility is a custom server which listens on a Unix Domain Socket and will quickly respond to PHP when PHP asks for information. Then your per-call overhead is basically creating a socket (not bad). The server could be in any language (c, c++, python, erlang, etc...), and the client could be a 50 line PHP class that uses the socket_*() functions.
A lot of information needs evaluated before making this decision. PHP does not typically show slowdowns until you get into really tight loops or thousands of repeated function calls. In other words, the overhead of the HTTP request and network delays usually make PHP delays insignificant (unless the above applies)
Perhaps there is a better way to write it in PHP?
Are you database bound?
Is it CPU bound, Network bound, or IO bound?
Can the result be cached?
Does a library already exist which will do the heavy lifting.
By committing to a custom PHP extension, you add significantly to the base of knowledge required to maintain it (even above C++). But it is a great option when necessary.
Feel free to update your question with more details, and I'm sure Stack Overflow will be happy to help out.
Suggestion
The PorterStemmerAlgorithm has a C implementation available at http://tartarus.org/~martin/PorterStemmer/c.txt
It should be an easy matter to tie this C program into your data sources and make it a stand alone executable. Then you could simply invoke it from PHP with one of the proc functions, such as proc_open()
Unless you need to invoke this program many times PER php request, then this approach should save you the effort of building and integrating a PHP extension, not to mention that the hard work (in c) is already done.
Am not sure about what the PorterStemmerAlgorithm is. However if you could make your process run in parallel and collect the information together , you could look at parallel running processes easily implemented in JAVA. Not sure how you could call it in PHP, but definitely maintainable.
You can have a look at this framework. Looks simple to implement
https://computefarm.dev.java.net/
Regards,
Franklin.
If you absolutely need to rewrite in a different language for speed reasons then I think gahooa's answer covers the options nicely. However, before you do, are you absolutely sure you've done everything you can to improve the performance if the PHP implementation?
Is caching the output viable in your situation? Could you get away with running the algorithm once and caching the output rather than on every page load?
Have you tried profiling the code to ensure there's no unnecessary work being done (db queries in an inner loop and the like). Xdebug can help here.
Are there other stemming algorithms available which might perform better on your dataset?

Categories