PHP thread pool? - php

I have scheduled a CRON job to run every 4 hours which needs to gather user accounts information.
Now I want to speed things up and to split the work between several processes and to use one process to update the MySQL DB with the retrieved data from other processes.
In JAVA I know that there is a thread pool which I can dedicate some threads to accomplish some work.
how do I do it in PHP?
Any advice is welcome.
Thank

PHP is probably not the most suitable language for multi-threading.
You might want to have a look to different solutions. For example, Thrift allows you to have a PHP front-end talking with a Java back-end, where you could easily implement your desired behaviour.
If you still want to do this in PHP, you might want to have a look to:
http://www.php.net/pcntl
http://www.electrictoolbox.com/article/php/process-forking/

PHP and Threads (these 2 words) cannot go together in the same sentence. PHP does not offer thread support. You can try the pcntl forking mechanisms or asynchronous processing which in your case is not helpfull.
You can use a workload distribution mechanism that might be what you want by having a look at Gearman (suggest you google it).
As described by others "it is a distributed forking machine" that can offer the workload distribution that you are looking for in order "to speed things up".
regards,

As others have said, forking processes is easier than spawning threads with PHP. But why do you think that having a single dedicated thread to write the results back to the database is a good idea? Although this is slightly simpler to do with threads rather than processes, its still a complex overhead which doesn't seem to add any value to the overall objective.
Indeed, its a lot simpler to start up several instances of the script (with some parameter to partition the data) from cron rather than initiating a fork from within the PHP code - and not bother with any bottleneck for recording the data back into the database.
C.

You can also check out this article that shows how to simulate threading, including a thread pool manager using async HTTP calls, and a web server:
http://w-shadow.com/blog/2008/05/24/improved-thread-simulation-class-for-php/

You can fork new processes in PHP too: pcntl_fork()
BTW. is that script running longer than 4 hours? Otherwise I see no reason why complicate it with thread or process management.

Nice process pool of Arbow on github
With modification mentioned here

Check these posts -
* http://www.alternateinterior.com/2007/05/multi-threading-strategies-in-php.html
* http://www.electrictoolbox.com/article/php/process-forking/
Basically you need to share data between processes and as I see, you will probably need to write to some file first. Fetch using the main process (make it a ajax-polling type process) and write to DB.

Related

Run functions from PHP library concurrently making HTTP requests, without using curl multi

I want to use Google translate's v3 PHP library to translate some text into a bunch of different languages. There may be workarounds (though none ideal that I know of), but I'm also just trying to learn.
I wanted to use using multiple calls to translateText, one call per target language. However, to make things faster, I would need to do these requests concurrently, so I was looking into some concurrency options. I was wanting to use calls to translateText instead of constructing a bunch of curl requests manually using curl multi.
I tried the first code example I found from one of the big concurrency libraries I've seen recommended, amphp. I used the function parallelMap, but I'm getting timeout errors when creating processes. I'd guess that I'm probably forking out too many processes at a time.
I'd love to learn if there is an easy way to do concurrency in PHP without having to make a bunch of decisions about how many processes to have running at a time, whether I should use threads vs processes, profiling memory usage, and what PHP thread extension is even any good / if the one I've heard of called "parallel" may be discontinued (as suggested in a comment here).
Any stack overflow post I've found so far is just a link to one giant concurrency library or another that I don't want to have to read a bunch of documentation for. I'd be interested to hear how concurrency like this is normally done / what options there are. I've found many people claim that processes aren't much slower than threads these days, and I can't find quick Google answers as to whether they take a lot more memory than threads. I'm not even positive that a lack of memory is my problem, but it probably is. There has been more complexity involved than I would have expected.
I'm wondering how this is normally handled.
Do people normally just use a pool of workers (processes by default, or threads using, say the "parallel" PHP extension) and set a max number of processes to run at a time to make concurrent requests?
If in a hurry, do people just kind of pick a number that isn't very optimized for how many worker processes to use?
It would be nice if the number of workers to use was set dynamically for you based on how much RAM was available or something, but I guess that's not realistic, since the amount of RAM available can quickly change.
So, would you just need to set up a profiler to see how much ram one worker process/thread uses, or otherwise just make some sort of educated guess as to how many worker processes/threads to use?

Multithreading in Php

I understand that PHP does not support multithreading but I would love to know if there is a good workaround for executing several functions in php concurrently?? I wrote some code that calculates moments of invariance. There are seven functions calculating each moment with each moment subsequently slower to fully execute than the next. Any suggestions welcomed. Thanks
It seems gearman is what you need. There is also a php extension
Also take a look at the pcntl_fork function (pcntl_fork)
I generally use this to spawn children from a worker. Then I use the main thread to watch the children and handle harvesting dead children and spawning new ones.
Leaving aside invoking new processes via fork, most modern operating systems (even including MSWindows) have the facility to spawn non-blocking processes from the shell, albeit that the syntax varies. So you could use the various program execution functions to invoke them.
Another approach would be to split the functionality into multiple URLs (probably restricting access to localhost) then using curl_multi_exec() to invoke them from a controlling script (note that this is likely to be less efficient than running them as seperate processes, which in turn will be less efficient than running via fork).
However any discussion of how to shard a process across mutliple threads / processes is predicated by the question of whether the process itself is shardable. Also, whether sharding will improve performance. I'll leave those questions to you.

Recommended way to manage persistent PHP script processes?

First off - hello, this is my first Stack Overflow question so I'll try my best to communicate properly.
The title of my question may be a bit ambiguous so let me expand upon it immediately:
I'm planning a project which involves taking data inputs from several "streaming" APIs, Twitter being one example. I've got a basic script coded up in PHP which runs indefinitely from the command line, taking input from the Twitter streaming API and doing very basic things with it.
My eventual objective is to have several such processes running (perhaps daemonized using the System Daemon PEAR class), and I would like to be able to manage them from some governing process (also a PHP script). By manage I mean basic operations such as stop/start and (most crucially) automatically restarting a process that crashes.
I would appreciate any pointers on how best to approach this process management angle. Again, apologies if this question is too expansive - tips on more tightly focused lines of enquiry would be appreciated if necessary. Thanks for reading and I look forward to your answers.
I think the recommended way of having persistent processes is not doing it in PHP at all ;)
But here are some related questions, it looks like some of the feedback contains some good thoughts and experience in doing this.
Is it wise to use PHP for a daemon?
Run php script as daemon process (especially 2nd and 3rd answer)
PHP Daemon/worker environment
More in the search.
This doesn't work in the context of running a persistent PHP script, but Cron is really handy for running scripts at different times and at different intervals. Instead of having a PHP script that is running constantly, stopping and starting various other scripts, you could run them all using Cron at a suitable interval.
The way you want to do it is possible but will be complex and relatively difficult to maintain.
If you look at it in a different way, instead of steaming continuously, you could be chunking in data at regular intervals. Technically its still streaming, especially with feeds like twitter.
If some feed is pumping out in real time, you may miss some data inbetween then maybe this is not an option for you.
Its far easier to manage processes that start and stop and which manage small amounts of data. They could all be checking a database for control data and updating the status. Also, using cron is a real pleasure.
That's how I would do it these days.

Running 30 php script at once in the background

I have a PHP script that must run 30 parallel times each with a different argument. What is the best way to do this so that each script can have as much even exposure to the processor as possible?
Problem description
Like some other users are telling(me too) you should give a little bit more explanation (maybe code samples). For example should these tasks run for ever or just once when php script is being called?
Message Queue
First off I think if possible it should be avoided to run so many tasks at once but schedule(be gentle to PC) them with a message queue like for instance beanstalkd
PHP solution
I don't think PHP is the right tool for your problem because of thread model(no). Threads are lightweight and creating new process is heavy. You could do it like stroncium is explaining. My opinion is that running this code on shared host will not be appreciated because if all users would run long running processes they would over utilize(use too much PC) the server.
Quoto from nettuts
There's no better resource than PHP's creator for knowing what PHP is capable of. Rasmus Lerdorf created PHP in 1995, and since then the language has spread like wildfire through the developer community, changing the face of the Internet. However, Rasmus didn't create PHP with that intent. PHP was created out of a need to solve web development problems.
However, you can't use PHP for everything. Lerdorf is the first to admit that PHP is really just a tool in your toolbox, and that even PHP has limitations.
Better language
Like I said previously I don't think PHP is the right tool.
Some languages which I think could solve the problem better:
java
python
C
Off course a lot more languages which support thread model are right tool for the job, but PHP isn't orginally designed for tasks like this. Even the creator of php Rasmus confirms this. You can read about this on this list from nettuts which I think has some pretty good points.
Google app engine
Last I would advice you to have a look at taskqueu api from google app engine. Because this is also a real good option ;). I might even consider it the best option. you have a free quote and the the costs are fair if you exceed quote. The task queue uses webhooks so that the hooks could be coded in PHP.
PHP itself haven't threads support. But you can just run few copies of your script simultaneously by using popen() or proc_open().
Sometimes multicurl is used for this purposes(when popen and alikes are resricted).
I don't think its CPU affinity that you have to worry about (so much), its how I/O bound each process is bound (pardon the pun) to become.
If using a UNIX like operating system, you can try using the nice command to adjust for processes that you predict will be doing more disk / network / database access, but I don't think you'll see any significant speed up.
If all processes are going to handle the same amount of I/O, you are probably better off just letting the kernel's scheduler do its job.
A little more information regarding what your jobs are actually accomplishing would be extremely helpful.
If you run it CLI you can fork 29-30 child processes and run the code there. You can have one main process with open sockets to each child or serial link them if you want to. You'd mostly have to hope the kernel will balance the processes if they have the same priority.
Given the simplicity of the question, I suggest you look for the simplest answer. Off the top, I'd say you might consider using one instance looping through 30 arguments.

Is it wise to use PHP for a daemon?

I wish to create a background process and I have been told these are usually written in C or something of that sort. I have recently found out PHP can be used to create a daemon and I was hoping to get some advice if I should make use of PHP in this way.
Here are my requirements for a daemon.
Continuously check if a row has been
added to MySQL database table
Run FFmpeg commands on what was
retrieved from database
Insert output into MySQL table
I am not sure what else I can offer to help make this decision. Just to add, I have not done C before. Only Java and PHP and basic bash scripting.
Does it even make that much of a performance difference?
Please allow for my ignorance, I am learning! :)
Thanks all
As others have noted, various versions of PHP have issues with their garbage collectors. Of course, if you know that your version does not have such issues, you eliminate that problem. The point is, you don't know (for sure) until you write the daemon and run it through valgrind to see if the installed PHP leaks or not on any given machine. So on that hand, you may write it just to discover that what Zend thinks is fixed might still be buggy, or you are dealing with a slightly older version of PHP or some extension. Icky.
The other problem is somewhat buggy signals. In my experience, signal handlers are not always entered correctly with PHP, especially when the signal is queued instead of merged. That may not be an issue for you, i.e. if you just need to handle SIGINT/SIGUSR1/SIGUSR2/SIGHUP.
So, I suggest:
If the daemon is simple, go ahead and use PHP. If it looks like its going to get rather complex, or allocate lots of memory, you might consider writing it in C after prototyping it in PHP.
I am a pretty die hard C person. However, I see nothing wrong with hammering out something quick using PHP (beyond the cases that I explained). I also see nothing wrong with using PHP to prototype something that may or may not be later rewritten in C. For instance, handling database stuff is going to be much simpler if you use PHP, versus managing callbacks using other interfaces in C. So in that instance, for a 'one off', you will surely get it done much faster.
I would be inclined to perform this task with a cron job, rather than polling the database in a daemon.
It's likely that your FFmpeg command will take a while to do it's thing, right? In that case, is it really necessary to be constantly polling the database? Wouldn't a cronjob running each minute (or every five, ten or twenty minutes for that matter) be a simpler way to achieve the same thing?
Php isn't any better or worse for this kind of thing than any of the other common scripting languages. It has fairly complete access to all of the system calls and library utilities you would need to do this sort of work. If you are most comfortable using PHP for scripting, then php will do the job for you.
The only down side is that php is not quite as ubiquitous as, say, perl or python, which is installed on almost every flavor of unix. Php is only found on systems that are going to be serving dynamic web content. Not that a Php interpreter is too large or costly to install also, but if your biggest concern is getting your program to many systems, that may be a slight hurdle.
I'll be contrary and recommend you try the php daemon. It's apparently the language you know the best. You'll presumably incorporate a timer in any case, so you can duplicate the querying frequency on the database. There's really no penalty as long as you aren't naively looping on a query.
If it's something not executed frequently, you could alternatively run the php from cron, letting youor code drain the queue and then die.
But don't be afraid to stick with what you know best, as a first approximation.
Try not to use triggers. They'll impose unnecessary coupling, and they're no fun to test and debug.
One problem with properly daemonizing a PHP script is that PHP doesn't have interfaces to the dup() or dup2() syscalls, which are needed for detaching the file descriptors.
A cron-job would probably work just fine, if near-instant actions is not required.
I'm just about to put live, a system I've built, based on the queueing daemon 'beanstalkd'. I send various small messages from (in this case, PHP) webpage calls to the daemon, and a PHP script then picks them up from the queue and performs various tasks, such as resizing images or checking databases (often passing info back via a Memcache-based store).
To avoid long-running processes, I've wrapped it in a BASH script, that, depending on the value returned from the script ("exit(1);") will restart the script, for every (say) 50 tasks it's performed. If it's restarting because I plan it to, it will do so instantly, any other exit value (the default is 0, so I don't use that) would pause a few seconds before it was restarted.
Running as a cron job with sensibly determined periodicity, a PHP script can do the job, and production stability is certainly achievable. You might want to limit the number of simultaneous FFMpeg instances, and be sure to have complete application logging and exception handling. I have implemented continuously running polling processes in Java, as well as the every-ten-minute cron'd PHP script, and both do the job nicely.
You might want to consider making a mysql trigger that executes a system command (i.e. FFmpeg) instead of a daemon. If some lag isn't a problem, you could also put something in cron that executes every few minutes to check. Cron would be my choice, if it is an option.
To answer your question, php is perfectly fine to run as a daemon. It does not have to be done in C.
If you combine the answers from Kent Fredric, tokenmacguy and Domster you get something useful.
php is probably not good for long execution times,
so let's keep every execution cycle short and make sure the OS takes care of the cleanup of any memoryleaks.
As a tool to start your php script cron can be a good tool.
And if you do it like that, there is not much difference between languages.
However, the question still stands.
Is php even capable to run as a normal daemon for long times (some years)?
Or will assorted memoryleaks eat up all your ram and kill the system?
/Johan
If you do so, pay attention to memory leaks. PHP 5.2 has some problems with its garbage collector, according to this (fixed in 5.3). Perhaps its better to use cron, so the script starts clean every run.
For what you've described, I would go with a daemon. Make sure that you stick a sleep in the poll loop, so that you don't bombard the database when there are no new tasks. A cronjob works better for workflow/report type of jobs, where there isn't some particular event that triggers the next run.
As mentioned, PHP has some problems with memory management. You need to be sure that you test your code for memory leaks, since these would build up over time, in a long running script. PHP doesn't have real garbage collection - It relies on reference counting, which means that cyclic references will cause leaks. If you're aware of this, you can code around it.
If you do decided to go down the daemon route, there is a great PEAR module called System_Daemon which I've recently used successfully on a PHP v5.3.0 installation. It is documented on the authors blog: http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php
If you have PEAR installed, you can install this module using:
pear install -f System_Daemon
You will also need to create a initialisation script: /etc/init.d/<your_daemon_name>
Then you can:
Start Daemon: /etc/init.d/projNotifMailDaemon start
Stop Daemon: /etc/init.d/projNotifMailDaemon stop
Logs are kept at: /var/log/<your_daemon_name>.log
I wouldn't recommend it. PHP is not designed for longterm execution. Its designed primarily with short lived pages.
In my experience PHP can have problems with leaking memory for some of the larger tasks.
A cron job and a little bit of bash scripting should be everything you need by the sounds of it. You can do things like:
$file=`mysqlquery -h server < "select file from table;"`
ffmpeg $file -fps 50 output.a etc.
so bash would be easier to write, port and maintain IMHO than to use PHP.
If you know what you are doing sure. You need to understand your operating system well. PHP generally isn't suited for most daemons because it isn't threaded and doesn't have a decent event based system for all tasks. However if it suits your needs then no problem. Modern PHP (5.3+) is really stable and doesn't have any memory leaks. As long as you enable the GC and don't implement your own memory leaks, etc you'll be fine.
Here are the stats for one daemon I am running:
uptime 17 days (last restart due to PHP upgrade).
bytes written: 200GB
connections: hundreds
connections handled, hundreds of thousands
items/requests processed: millions
node.js is generally better suited although has some minor annoyances. Some attempts to improve PHP in the same areas have been made but they aren't really that great.
Cron job? Yes.
Daemon which runs forever? No.
PHP does not have a garbage collector (or at least, last time I checked it did not). Therefore, if you create a circular reference, it NEVER gets cleaned up - at least not until the main script execution finishes. In daemon process this is approximately never.
If they've added a GC in new versions, then yes you can.
Go for it. I had to do it once also.
Like others said, it's not ideal but it'll get-er-done. Using Windows, right? Good.
If you only need it to run occasionally (Once per hour, etc).
Make a new shortcut to your firefox, place it somewhere relevant.
Open up the properties for the shortcut, change "Target" to:
"C:\Program Files\Mozilla Firefox\firefox.exe" http://localhost/path/to/script.php
Go to Control Panel>Scheduled Tasks
Point your new scheduled task at the shortcut.
If you need it to run constantly or pseudo-constantly, you'll need to spice the script up a bit.
Start your script with
set_time_limit(0);
ob_implicit_flush(true);
If the script uses a loop (like while) you have to clear the buffer:
$i=0;
while($i<sizeof($my_array)){
//do stuff
flush();
ob_clean();
sleep(17);
$i++;
}

Categories