Is it wise to use PHP for a daemon? - php

I wish to create a background process and I have been told these are usually written in C or something of that sort. I have recently found out PHP can be used to create a daemon and I was hoping to get some advice if I should make use of PHP in this way.
Here are my requirements for a daemon.
Continuously check if a row has been
added to MySQL database table
Run FFmpeg commands on what was
retrieved from database
Insert output into MySQL table
I am not sure what else I can offer to help make this decision. Just to add, I have not done C before. Only Java and PHP and basic bash scripting.
Does it even make that much of a performance difference?
Please allow for my ignorance, I am learning! :)
Thanks all

As others have noted, various versions of PHP have issues with their garbage collectors. Of course, if you know that your version does not have such issues, you eliminate that problem. The point is, you don't know (for sure) until you write the daemon and run it through valgrind to see if the installed PHP leaks or not on any given machine. So on that hand, you may write it just to discover that what Zend thinks is fixed might still be buggy, or you are dealing with a slightly older version of PHP or some extension. Icky.
The other problem is somewhat buggy signals. In my experience, signal handlers are not always entered correctly with PHP, especially when the signal is queued instead of merged. That may not be an issue for you, i.e. if you just need to handle SIGINT/SIGUSR1/SIGUSR2/SIGHUP.
So, I suggest:
If the daemon is simple, go ahead and use PHP. If it looks like its going to get rather complex, or allocate lots of memory, you might consider writing it in C after prototyping it in PHP.
I am a pretty die hard C person. However, I see nothing wrong with hammering out something quick using PHP (beyond the cases that I explained). I also see nothing wrong with using PHP to prototype something that may or may not be later rewritten in C. For instance, handling database stuff is going to be much simpler if you use PHP, versus managing callbacks using other interfaces in C. So in that instance, for a 'one off', you will surely get it done much faster.

I would be inclined to perform this task with a cron job, rather than polling the database in a daemon.
It's likely that your FFmpeg command will take a while to do it's thing, right? In that case, is it really necessary to be constantly polling the database? Wouldn't a cronjob running each minute (or every five, ten or twenty minutes for that matter) be a simpler way to achieve the same thing?

Php isn't any better or worse for this kind of thing than any of the other common scripting languages. It has fairly complete access to all of the system calls and library utilities you would need to do this sort of work. If you are most comfortable using PHP for scripting, then php will do the job for you.
The only down side is that php is not quite as ubiquitous as, say, perl or python, which is installed on almost every flavor of unix. Php is only found on systems that are going to be serving dynamic web content. Not that a Php interpreter is too large or costly to install also, but if your biggest concern is getting your program to many systems, that may be a slight hurdle.

I'll be contrary and recommend you try the php daemon. It's apparently the language you know the best. You'll presumably incorporate a timer in any case, so you can duplicate the querying frequency on the database. There's really no penalty as long as you aren't naively looping on a query.
If it's something not executed frequently, you could alternatively run the php from cron, letting youor code drain the queue and then die.
But don't be afraid to stick with what you know best, as a first approximation.
Try not to use triggers. They'll impose unnecessary coupling, and they're no fun to test and debug.

One problem with properly daemonizing a PHP script is that PHP doesn't have interfaces to the dup() or dup2() syscalls, which are needed for detaching the file descriptors.

A cron-job would probably work just fine, if near-instant actions is not required.
I'm just about to put live, a system I've built, based on the queueing daemon 'beanstalkd'. I send various small messages from (in this case, PHP) webpage calls to the daemon, and a PHP script then picks them up from the queue and performs various tasks, such as resizing images or checking databases (often passing info back via a Memcache-based store).
To avoid long-running processes, I've wrapped it in a BASH script, that, depending on the value returned from the script ("exit(1);") will restart the script, for every (say) 50 tasks it's performed. If it's restarting because I plan it to, it will do so instantly, any other exit value (the default is 0, so I don't use that) would pause a few seconds before it was restarted.

Running as a cron job with sensibly determined periodicity, a PHP script can do the job, and production stability is certainly achievable. You might want to limit the number of simultaneous FFMpeg instances, and be sure to have complete application logging and exception handling. I have implemented continuously running polling processes in Java, as well as the every-ten-minute cron'd PHP script, and both do the job nicely.

You might want to consider making a mysql trigger that executes a system command (i.e. FFmpeg) instead of a daemon. If some lag isn't a problem, you could also put something in cron that executes every few minutes to check. Cron would be my choice, if it is an option.
To answer your question, php is perfectly fine to run as a daemon. It does not have to be done in C.

If you combine the answers from Kent Fredric, tokenmacguy and Domster you get something useful.
php is probably not good for long execution times,
so let's keep every execution cycle short and make sure the OS takes care of the cleanup of any memoryleaks.
As a tool to start your php script cron can be a good tool.
And if you do it like that, there is not much difference between languages.
However, the question still stands.
Is php even capable to run as a normal daemon for long times (some years)?
Or will assorted memoryleaks eat up all your ram and kill the system?
/Johan

If you do so, pay attention to memory leaks. PHP 5.2 has some problems with its garbage collector, according to this (fixed in 5.3). Perhaps its better to use cron, so the script starts clean every run.

For what you've described, I would go with a daemon. Make sure that you stick a sleep in the poll loop, so that you don't bombard the database when there are no new tasks. A cronjob works better for workflow/report type of jobs, where there isn't some particular event that triggers the next run.
As mentioned, PHP has some problems with memory management. You need to be sure that you test your code for memory leaks, since these would build up over time, in a long running script. PHP doesn't have real garbage collection - It relies on reference counting, which means that cyclic references will cause leaks. If you're aware of this, you can code around it.

If you do decided to go down the daemon route, there is a great PEAR module called System_Daemon which I've recently used successfully on a PHP v5.3.0 installation. It is documented on the authors blog: http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php
If you have PEAR installed, you can install this module using:
pear install -f System_Daemon
You will also need to create a initialisation script: /etc/init.d/<your_daemon_name>
Then you can:
Start Daemon: /etc/init.d/projNotifMailDaemon start
Stop Daemon: /etc/init.d/projNotifMailDaemon stop
Logs are kept at: /var/log/<your_daemon_name>.log

I wouldn't recommend it. PHP is not designed for longterm execution. Its designed primarily with short lived pages.
In my experience PHP can have problems with leaking memory for some of the larger tasks.

A cron job and a little bit of bash scripting should be everything you need by the sounds of it. You can do things like:
$file=`mysqlquery -h server < "select file from table;"`
ffmpeg $file -fps 50 output.a etc.
so bash would be easier to write, port and maintain IMHO than to use PHP.

If you know what you are doing sure. You need to understand your operating system well. PHP generally isn't suited for most daemons because it isn't threaded and doesn't have a decent event based system for all tasks. However if it suits your needs then no problem. Modern PHP (5.3+) is really stable and doesn't have any memory leaks. As long as you enable the GC and don't implement your own memory leaks, etc you'll be fine.
Here are the stats for one daemon I am running:
uptime 17 days (last restart due to PHP upgrade).
bytes written: 200GB
connections: hundreds
connections handled, hundreds of thousands
items/requests processed: millions
node.js is generally better suited although has some minor annoyances. Some attempts to improve PHP in the same areas have been made but they aren't really that great.

Cron job? Yes.
Daemon which runs forever? No.
PHP does not have a garbage collector (or at least, last time I checked it did not). Therefore, if you create a circular reference, it NEVER gets cleaned up - at least not until the main script execution finishes. In daemon process this is approximately never.
If they've added a GC in new versions, then yes you can.

Go for it. I had to do it once also.
Like others said, it's not ideal but it'll get-er-done. Using Windows, right? Good.
If you only need it to run occasionally (Once per hour, etc).
Make a new shortcut to your firefox, place it somewhere relevant.
Open up the properties for the shortcut, change "Target" to:
"C:\Program Files\Mozilla Firefox\firefox.exe" http://localhost/path/to/script.php
Go to Control Panel>Scheduled Tasks
Point your new scheduled task at the shortcut.
If you need it to run constantly or pseudo-constantly, you'll need to spice the script up a bit.
Start your script with
set_time_limit(0);
ob_implicit_flush(true);
If the script uses a loop (like while) you have to clear the buffer:
$i=0;
while($i<sizeof($my_array)){
//do stuff
flush();
ob_clean();
sleep(17);
$i++;
}

Related

Parallel processing in PHP using zeroMQ

Some background:
I am building a server application in php that will need to execute a number of independent tasks on a user request. Theres is a severe requirement on speed for my application so I would like to execute all of those tasks in parallel.
I've looked at several solutions (e.g gearman, rabbitMQ, zeroMQ) and I've decided to go with zeroMQ (fast, good docs, flexible, and doesn't require a broker). This solves the communication/sync problem between the threads for me.
Question:
I would like to initiate the tasks only when the server receives a request (not to have a long running process). So I receive a request -> start parallel computation -> return the result of the computation to the client. One solution for that seems to be pcntl_fork however the docs mention that there ares some issues with using it in a production server env but doesn't really specify what they are?
My other option is to use proc_open, but I like it less because it would require me to serialize the inputs in some way which seems less flexible and fast then forking. Does it have any advantages over pcntl_fork?
Is there another solution (still using php :p)?
Tread carefully, I see several red-flags in your question that lead me to believe you are concerned about things that maybe you don't need to be, and you probably aren't concerned with things you should be.
You say you have a severe requirement for speed - have you validated that normal single threaded PHP is not fast enough? Run any benchmarks, figured out your bottlenecks? If your speed requirement is that great, you might even consider using a different language, for all of PHP's charms it's never going to be the most efficient hammer in the toolbox. Java is a good option for all-out speed, and node.js is a good option if your bottlenecks are IO dependent. My main concern is that, absent more information, this question smells of premature optimization. This may be unfair and you may have omitted those details because it wasn't the heart of your question, but as an outsider I at least wanted to make sure that you think about these things if you haven't already.
You want to avoid long-running processes - why? There's nothing inherently wrong with long-running processes - but it does feel wrong when what you're used to is the pseudo-efficient "on-demand" nature of Apache+mod_php. Be sure you're not trying to avoid something just because you're not used to it.
What you seem to be describing is performing parallel processing from within your PHP web-app - just like any other web page you write, Apache initiates your PHP script, that script forks another process and, rather than performing its actions serially, performs them in parallel, completes and returns to the user at the completion of the page-render. If that's correct, then here is the answer to your original question:
You cannot use pcntl_fork from within a web process, only from the command line. The details of this are on the page you linked to, down in the comments:
It's not a matter of "should not", it's "can not". Even though I have compiled in PCNTL with --enable-pcntl, it turns out that it only compiles in to the CLI version of PHP, not the Apache module. [...] function_exists('pcntl_fork') was returning false even though it compiled correctly. It turns out it returns true just fine from the CLI, and only returns false for HTTP requests. The same is true of ALL of the pcntl_*() functions.
... which means that either you'll have to initiate your forking process as a separate long-running process, or you'll have to start it on demand with proc_open, there is no way to get it to work the way I assume you want it to.

debugging long running PHP script

I have php script running as a cron job, extensively using third party code. Script itself has a few thousands LOC. Basically it's the data import / treatment script. (JSON to MySQL, but it also makes a lot of HTTP calls and some SOAP).
Now, performance is downgrading with the time. When testing with a few records (around 100), performance is ok, it is done in a 10-20 minutes. When running whole import (about 1600 records), mean time of import of one record grows steadily, and whole thing takes more than 24 hours, so at least 5 times longer than expected.
Memory seems not to be a problem, usage growing as it should, without unexpected peaks.
So, I need to debug it to find the bottleneck. It can be some problem with the script, underlying code base, php itself, database, os or network part. I am suspecting for now some kind of caching somewhere which is not behaving well with a near 100 % miss ratio.
I cannot use XDebug, profile file grows too fast to be treatable.
So question is: how can I debug this kind of script?
PHP version: 5.4.41
OS: Debian 7.8
I can have root privileges if necessary, and install the tools. But it's the production server and ideally debugging should not be too disrupting.
Yes its possible and You can use Kint (PHP Debugging Script)
What is it?
Kint for PHP is a tool designed to present your debugging data in the absolutely best way possible.
In other words, it's var_dump() and debug_backtrace() on steroids. Easy to use, but powerful and customizable. An essential addition to your development toolbox.
Still lost? You use it to see what's inside variables.
Act as a debug_backtrace replacer, too
you can download here or Here
Total Documentations and Help is here
Plus, it also supports almost all php framework
CodeIgniter
Drupal
Symfony
Symfony 2
WordPress
Yii
framework
Zend Framework
All the Best.... :)
There are three things that come to mind:
Set up an IDE so you can debug the PHP script line by line
Add some logging to the script
Look for long running queries in MySQL
Debug option #2 is the easiest. Since this is running as a cron job, you add a bunch of echo's in your script:
<?php
function log_message($type, $message) {
echo "[{strtoupper($type)}, {date('d-m-Y H:i:s')}] $message";
}
log_message('info', 'Import script started');
// ... the rest of your script
log_message('info', 'Import script finished');
Then pipe stdout to a log file in the cron job command.
01 04 * * * php /path/to/script.php >> /path/to/script.log
Now you can add log_message('info|warn|debug|error', 'Message here') all over the script and at least get an idea of where the performance issue lies.
Debug option #3 is just straight investigation work in MySQL. One of your queries might be taking a long time, and it might show up in a long running query utility for MySQL.
Profiling tool:
There is a PHP profiling tool called Blackfire which is currently in public beta. There is specific documentation on how to profile CLI applications. Once you collected the profile you can analyze the application control flow with time measurements in a nice UI:
Memory consumption suspicious:
Memory seems not to be a problem, usage growing as it should, without unexpected peaks.
A growing memory usage actually sounds suspicious! If the current dataset does not depend on all previous datasets of the import, then a growing memory most probably means, that all imported datasets are kept in memory, which is bad. PHP may also frequently try to garbage collect, just to find out that there is nothing to remove from memory. Especially long running CLI tasks are affected, so be sure to read the blog post that discovered the behavior.
Use strace to see what the program is basically doing from the system perspective. Is it hanging in IO operations etc.? strace should be the first thing you try when encountering performance problems with whatever kind of Linux application. Nobody can hide from it! ;)
If you should find out that the program hangs in network related calls like connect, readfrom and friends, meaning the network communication does hang at some point while connecting or waiting for responses than you can use tcpdump to analyze this.
Using the above methods you should be able to find out most common performance problems. Note that you can even attach to a running task with strace using -p PID.
If the above methods doesn't help, I would profile the script using xdebug. You can analyse the profiler output using tools like KCachegrind
Although it is not stipulated, and if my guess is correct you seem to be dealing with records one at a time, but in one big cron.
i.e. Grab a record#1, munge it somehow, add value to it, reformat it then save it, then move to record#2
I would consider breaking the big cron down. ie
Cron#1: grab all the records, and cache all the salient data locally (to that server). Set a flag if this stage is achieved.
Cron #2: Now you have the data you need, munge and add value, cache that output. Set a flag if this stage is achieved.
Cron #3: Reformat that data and store it. Delete all the files.
This kind of "divide and conquer" will ease your debugging woes, and lead to a better understanding of what is actually going on, and as a bonus give you the opportunity to rerun say, cron 2.
I've had to do this many times, and for me logging is the key to identifying weaknesses in your code, identify poor assumptions about data quality, and can hint at where latency is causing a problem.
I've run into strange slowdowns when doing network heavy efforts in the past. Basically, what I found was that during manual testing the system was very fast but when left to run unattended it would not get as much done as I had hoped.
In my case the issue I found was that I had default network timeouts in place and many web requests would simply time out.
In general, though not an external tool, you can use the difference between two microtime(TRUE) requests to time sections of code. To keep the logging small set a flag limit and only test the time if the flag has not been decremented down to zero after reducing for each such event. You can have individual flags for individual code segments or even for different time limits within a code segment.
$flag['name'] = 10; // How many times to fire
$slow['name'] = 0.5; // How long in seconds before it's a problem?
$start = microtime(TRUE);
do_something($parameters);
$used = microtime(TRUE) - $start;
if ( $flag['name'] && used >= $slow['name'] )
{
logit($parameters);
$flag['name']--;
}
If you output what URL, or other data/event took to long to process, then you can dig into that particular item later to see if you can find out how it is causing trouble in your code.
Of course, this assumes that individual items are causing your problem and not simply a general slowdown over time.
EDIT:
I (now) see it's a production server. This makes editing the code less enjoyable. You'd probably want to make integrating with the code a minimal process having the testing logic and possibly supported tags/flags and quantities in an external file.
setStart('flagname');
// Do stuff to be checked for speed here
setStop('flagname',$moredata);
For maximum robustness the methods/functions would have to ensure they handled unknown tags, missing parameters, and so forth.
xdebug_print_function_stack is an option, but what you can also do is to create a "function trace".There are three output formats. One is meant as a human readable trace, another one is more suited for computer programs as it is easier to parse, and the last one uses HTML for formatting the trace
http://www.xdebug.org/docs/execution_trace
Okay, basically you have two possibilities - it's either the ineffective PHP code or ineffective MySQL code. Judging by what you say, it's probably inserting into indexed table a lot of records separately, which causes the insertion time to skyrocket. You should either disable indexes and rebuild them after insertion, or optimize the insertion code.
But, about the tools.
You can configure the system to automatically log slow MySQL queries:
https://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
You can also do the same with PHP scripts, but you need a PHP-FPM environment (and you probably have Apache).
https://rtcamp.com/tutorials/php/fpm-slow-log/
These tools are very powerful and versatile.
P.S. 10-20 minutes for 100 records seems like A LOT.
You can use https://github.com/jmartin82/phplapse to record the application activity for determinate time.
For example start recording after n iterations with:
phplapse_start();
And stop it in next iteration with:
phplapse_stop();
With this process you was created a snapshot of execution when all seems works slow.
(I'm the author of project, don't hesitate to contact with me to improve the functionality)
I have a similar thing running each night (a cron job to update my database). I have found the most reliable way to debug is to set up a log table in the database and regularly insert / update a json string containing a multi-dimensional array with info about each record and whatever useful info you want to know about each record. This way if your cron job does not finish you still have detailed information about where it got up to and what happened along the way. Then you can write a simple page to pull out the json string, turn it back into an array and print useful data onto the page including timing and passed tests etc. When you see something as an issue you can concentrate on putting more info from that area into the json string.
Regular "top" command can show you, if CPU usage by php or mysql is bottleneck. If not, then delays may be caused by http calls.
If CPU usage by mysqld is low, but constant, then it may be disk usage bottleneck.
Also, you can check your bandwidth usage by installing and using "speedometer", or other tools.

Running 30 php script at once in the background

I have a PHP script that must run 30 parallel times each with a different argument. What is the best way to do this so that each script can have as much even exposure to the processor as possible?
Problem description
Like some other users are telling(me too) you should give a little bit more explanation (maybe code samples). For example should these tasks run for ever or just once when php script is being called?
Message Queue
First off I think if possible it should be avoided to run so many tasks at once but schedule(be gentle to PC) them with a message queue like for instance beanstalkd
PHP solution
I don't think PHP is the right tool for your problem because of thread model(no). Threads are lightweight and creating new process is heavy. You could do it like stroncium is explaining. My opinion is that running this code on shared host will not be appreciated because if all users would run long running processes they would over utilize(use too much PC) the server.
Quoto from nettuts
There's no better resource than PHP's creator for knowing what PHP is capable of. Rasmus Lerdorf created PHP in 1995, and since then the language has spread like wildfire through the developer community, changing the face of the Internet. However, Rasmus didn't create PHP with that intent. PHP was created out of a need to solve web development problems.
However, you can't use PHP for everything. Lerdorf is the first to admit that PHP is really just a tool in your toolbox, and that even PHP has limitations.
Better language
Like I said previously I don't think PHP is the right tool.
Some languages which I think could solve the problem better:
java
python
C
Off course a lot more languages which support thread model are right tool for the job, but PHP isn't orginally designed for tasks like this. Even the creator of php Rasmus confirms this. You can read about this on this list from nettuts which I think has some pretty good points.
Google app engine
Last I would advice you to have a look at taskqueu api from google app engine. Because this is also a real good option ;). I might even consider it the best option. you have a free quote and the the costs are fair if you exceed quote. The task queue uses webhooks so that the hooks could be coded in PHP.
PHP itself haven't threads support. But you can just run few copies of your script simultaneously by using popen() or proc_open().
Sometimes multicurl is used for this purposes(when popen and alikes are resricted).
I don't think its CPU affinity that you have to worry about (so much), its how I/O bound each process is bound (pardon the pun) to become.
If using a UNIX like operating system, you can try using the nice command to adjust for processes that you predict will be doing more disk / network / database access, but I don't think you'll see any significant speed up.
If all processes are going to handle the same amount of I/O, you are probably better off just letting the kernel's scheduler do its job.
A little more information regarding what your jobs are actually accomplishing would be extremely helpful.
If you run it CLI you can fork 29-30 child processes and run the code there. You can have one main process with open sockets to each child or serial link them if you want to. You'd mostly have to hope the kernel will balance the processes if they have the same priority.
Given the simplicity of the question, I suggest you look for the simplest answer. Off the top, I'd say you might consider using one instance looping through 30 arguments.

From PHP workers to Python threads

Right now I'm running 50 PHP (in CLI mode) individual workers (processes) per machine that are waiting to receive their workload (job). For example, the job of resizing an image. In workload they receive the image (binary data) and the desired size. The worker does it's work and returns the resized image back. Then it waits for more jobs (it loops in a smart way). I'm presuming that I have the same executable, libraries and classes loaded and instantiated 50 times. Am I correct? Because this does not sound very effective.
What I'd like to have now is one process that handles all this work and being able to use all available CPU cores while having everything loaded only once (to be more efficient). I presume a new thread would be started for each job and after it finishes, the thread would stop. More jobs would be accepted if there are less than 50 threads doing the work. If all 50 threads are busy, no additional jobs are accepted.
I am using a lot of libraries (for Memcached, Redis, MogileFS, ...) to have access to all the various components that the system uses and Python is pretty much the only language apart from PHP that has support for all of them.
Can Python do what I want and will it be faster and more efficient that the current PHP solution?
Most probably - yes. But don't assume you have to do multithreading. Have a look at the multiprocessing module. It already has an implementation of a Pool included, which is what you could use. And it basically solves the GIL problem (multithreading can run only 1 "standard python code" at any time - that's a very simplified explanation).
It will still fork a process per job, but in a different way than starting it all over again. All the initialisations done- and libraries loaded before entering the worker process will be inherited in a copy-on-write way. You won't do more initialisations than necessary and you will not waste memory for the same libarary/class if you didn't actually make it different from the pre-pool state.
So yes - looking only at this part, python will be wasting less resources and will use a "nicer" worker-pool model. Whether it will really be faster / less CPU-abusing, is hard to tell without testing, or at least looking at the code. Try it yourself.
Added: If you're worried about memory usage, python may also help you a bit, since it has a "proper" garbage collector, while in php GC is a not a priority and not that good (and for a good reason too).
Linux has shared libraries, so those 50 php processes use mostly the same libraries.
You don't sound like you even have a problem at all.
"this does not sound very effective." is not a problem description, if anything those words are a problem on their own. Writing code needs a real reason, else you're just wasting time and/or money.
Python is a fine language and won't perform worse than php. Python's multiprocessing module will probably help a lot too. But there isn't much to gain if the php implementation is not completly insane. So why even bother spending time on it when everything works? That is usually the goal, not a reason to rewrite ...
If you are on a sane operating system then shared libraries should only be loaded once and shared among all processes using them. Memory for data structures and connection handles will obviously be duplicated, but the overhead of stopping and starting the systems may be greater than keeping things up while idle. If you are using something like gearman it might make sense to let several workers stay up even if idle and then have a persistent monitoring process that will start new workers if all the current workers are busy up until a threshold such as the number of available CPUs. That process could then kill workers in a LIFO manner after they have been idle for some period of time.

In need to program an algorithem to be very fast, should I do it as php extension, or some otherway?

Most of my application is written in PHP ((Front and Back ends).
There is a part that works too slowly and I will need to rewrite it, probably not in PHP.
What will give me the following:
1. Most speed
2. Fastest development
3. Easily maintained.
I have in my mind to rewrite this piece of code in CPP as a PHP extension, but may be I am locked on this solution and misses some simpler/better solutions?
The algorithm is PorterStemmerAlgorithm on several MB of data each time it is run.
The answer really depends on what kind of process it is.
If it is a long running process (at least seconds) then perhaps an external program written in C++ would be super easy. It would not have the complexities of a PHP extension and it's stability would not affect PHP/apache. You could communicate over pipes, shared memory, or the sort...
If it is a short running process (measured in ms) then you will most likely need to write a PHP extension. That would allow it to be invoked VERY fast with almost no per-call overhead.
Another possibility is a custom server which listens on a Unix Domain Socket and will quickly respond to PHP when PHP asks for information. Then your per-call overhead is basically creating a socket (not bad). The server could be in any language (c, c++, python, erlang, etc...), and the client could be a 50 line PHP class that uses the socket_*() functions.
A lot of information needs evaluated before making this decision. PHP does not typically show slowdowns until you get into really tight loops or thousands of repeated function calls. In other words, the overhead of the HTTP request and network delays usually make PHP delays insignificant (unless the above applies)
Perhaps there is a better way to write it in PHP?
Are you database bound?
Is it CPU bound, Network bound, or IO bound?
Can the result be cached?
Does a library already exist which will do the heavy lifting.
By committing to a custom PHP extension, you add significantly to the base of knowledge required to maintain it (even above C++). But it is a great option when necessary.
Feel free to update your question with more details, and I'm sure Stack Overflow will be happy to help out.
Suggestion
The PorterStemmerAlgorithm has a C implementation available at http://tartarus.org/~martin/PorterStemmer/c.txt
It should be an easy matter to tie this C program into your data sources and make it a stand alone executable. Then you could simply invoke it from PHP with one of the proc functions, such as proc_open()
Unless you need to invoke this program many times PER php request, then this approach should save you the effort of building and integrating a PHP extension, not to mention that the hard work (in c) is already done.
Am not sure about what the PorterStemmerAlgorithm is. However if you could make your process run in parallel and collect the information together , you could look at parallel running processes easily implemented in JAVA. Not sure how you could call it in PHP, but definitely maintainable.
You can have a look at this framework. Looks simple to implement
https://computefarm.dev.java.net/
Regards,
Franklin.
If you absolutely need to rewrite in a different language for speed reasons then I think gahooa's answer covers the options nicely. However, before you do, are you absolutely sure you've done everything you can to improve the performance if the PHP implementation?
Is caching the output viable in your situation? Could you get away with running the algorithm once and caching the output rather than on every page load?
Have you tried profiling the code to ensure there's no unnecessary work being done (db queries in an inner loop and the like). Xdebug can help here.
Are there other stemming algorithms available which might perform better on your dataset?

Categories