Multithreaded Programming in PHP to avoid runtime limitations - php

I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?

If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.

By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.

Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.

Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)

Related

(having the process run until completion, after browser is closed)

I have a website, created using PHP and running on Apache. I want a subscriber to be able to log in and start a process on the server. They can then log out or close the browser without interrupting the process. Later they can log in and see the progress or see the results of the original process. What is the best way to accomplish this (having the process run until completion, after the browser is closed)?
Just looking for someone to point me in the right direction. A few people mentioned Gearman.
Gearman would be an ideal candidate, and I would use it for exactly the purpose you describe. It has everything you need out of the box to meet your requirements ("background" a long running, CPU-bound process to another machine, e.g. video encoding).
There is a Gearman PHP library, but you can write your worker code in a different language if it's better suited to doing the work.
For reporting progress information, I recommend having the worker write to Redis or Memcached - some kind of temporary storage that your web server can also access.
Check out the simple PHP example on the Gearman site. For learning, I recommend setting up a lab environment that contains 3 separate VM's, one for your web server (the client), one for the Gearman job queue (the server) and another for processing jobs (the workers).

Logical explanation of PHP background processes

I'm a moderate to good PHP programer and have experience with terminal/shell scripts but what I'm trying to wrap my head around is the logics behind background processes and is most certainly not Cron or Cron Jobs but a continual flow of data.
I recently talked to someone who made a little web app that worked with the twitter streaming API and Phirehose to gather tweets and save them to a DB. Now sounds simple but all this happens in the background as a process. What I'm ineptly used to is:
call process -> process finishes -> handle data from process.
What is so different about this is that it happens all the time non stop. I remember there was also so talk of socket connection as well.
So my questions are:
When executing a background process, is it a continual loop of specific function? That's all that I can logically conclude, or does it some how "stay open" and happen?
What does a socket connection do in this equation?
Is the any form of latency inherit from running this type of process?
I know this is not a "code specific" type of question but I can't find much information regarding this type of question.
With PHP, it's most likely that a cronjob is scheduled to execute the scripts once every hour or so. The script doesn't run continuously.
PHP has many ways of connecting to resources, most of these use sockets. If you do file_get_contents() to connect to a webserver, you're using sockets as well, you might just not notice it.
1. When executing a background process, is it a continual loop of specific function? That's all that I can logically conclude, or does it some how "stay open" and happen?
No, there is not requirement of such a continual loop. A background process can just be invoked, run and finish as well. It than does not run any longer like any other process as well. Maybe not useful for a background process, but possible.
2. What does a socket connection do in this equation?
Sockets are sometimes used to allow communication between different processes, also worded IPC - Inter Process Communication.
3. Is the any form of latency inherit from running this type of process?
Yes, every form of indirection comes with a price. Additionally, if you run multiple processes in parallel, there is also some overhead for the computer system to manage these multiple processes (which it does anyway nowadays, but just saying, if there were only one process, there would be nothing to manage).
If you want to take a tutorial on background processes:
http://thedjbway.b0llix.net/daemontools/blabbyd.html - really useful.
Daemontools makes it very easy to maintain backgound processes (daemons).

PHP script that works forever :)

I'm looking for some ideas to do the following. I need a PHP script to perform certain action for quite a long time. This is an extension for a CMS and this can't be anything else but PHP. It also can't be a command line script because it should be used by common people that will have only the standard means of the CMS. One of the options is having a cron job (most simple hostings have it) that will trigger the script often so that instead of working for a long time it could perform the action step by step preserving its state from one launch to the next one. This is not perfect but I can't see of any other solutions. If the script will be redirecting to itself server will interrupt it. What other options can suit?
Thanks everyone in advance!
What you're talking about is a daemon or long running program that waits for calls by client programs, performs and action, provides a response then keeps on waiting for more calls.
You might be familiar w/ these in the form of Apache & MySQL ;) Anyway PHP is generally OK in this regard, it does have the ability to function over raw sockets as well as fork sub-processes to handle multiple requests simultaneously.
Having said that PHP daemons are a tool where YMMV. Some folks will say they work great, other folks like me will say they have issues w/ interprocess communication and leaking memory even amidst plethora unset() calls.
Anyway you likely won't be able to deploy a daemon of any type on a shared hosting environment. You'll need to get a better server package or stick with a Cron based solution.
Here's a link about writing a PHP daemon.
Also, one more note. Daemons do crash from time to time and therefore you may still need to store state about whats going on, just in case someone trips over the power cord to your shared server :)
I would also suggest that you think about making it a daemon but if not then you can simply use
set_time_limit(0);
ignore_user_abort(true);
at the top to tell it not to time out and not to get interrupted by anything. Then call it from the cron to start it every day or whatever. I have this on many long processing daily tasks and it works great for me. However, it won't be able to easily talk to the outside world (other scripts can't query it or anything -- if that is what you want look into php services) so once you get it running make sure it will stop and have it print its progress to a logfile.

What is daemon? Their practical use? Usage with php?

Could someone explain me in two words, what is daemon and what use of them in php?
I, know that this is a process, which is runing all the time.
But i can't understand what use of it in php app?
Can someone please give examples of use?
Can i use daemon to lessen memory usage of my app?
As i understand, daemon can hold data and give it on request, so basically i can store most usable data there, to avoid getting it from mysql for each visitor?
Or i'm totally wrong? :)
Thanks ;)
A daemon is a endless running process, which just waits for jobs. A webserver ("http-daemon") waits for requests to handle, a printer daemon waits for something to print (and so on). On Win systems its called "service".
If you can use it for your application in some way highly depends on your application and what you want to do with a daemon. But also I dont recommend PHP for that.
Could someone explain me in two words, what is daemon and what use of them in php?
cli application or process
I, know that this is a process, which is runing all the time. But i can't understand what use of it in php app?
You can use it to do; job that is not visible to user or from interface, e.g. database stale data cleanup, schedule task that you you wanted to update part or something on db or page in background
Can someone please give examples of use? Can i use daemon to lessen memory usage of my app?
I think drupal or cron had cron script...perhaps checking it would help. Lessen memory? no, memory optimization is always on the application design or script coded.
As i understand, daemon can hold data and give it on request, so basically i can store most usable data there, to avoid getting it from mysql for each visitor?
No, a daemon is a script however you can create a JSON or XML data file that the daemon script can process.
Please see this answer regarding the use of PHP for a daemon. There are times when you might want to fork a child process in PHP, perhaps to execute some query while the parent does other work and then inform the parent that the job as a whole can be completed.
I would not, however use PHP to set up a socket server or similar, nor would I use PHP in any other instance where execution was measured in units greater than seconds.
I don't want to discourage you from exploring and experimenting, just caution you against putting too much trust in an implementation that exceeds the capabilities of the language.
Because a daemon is just a process that runs in an infinite loop, whether or not a daemon can be helpful for your particular app is entirely up to the daemon and the requirements of your app.
MySQL is itself run as a daemon, but a typical way of decreasing the number of calls to MySQL is to cache their output in Memcached (which not surprisingly also runs as a daemon). So the advantage of using Memcached isn't that it's a daemon, it's that it's a daemon more geared to a specific task (caching objects) than MySQLd (providing a SQL-queryable database).
If your app repeatedly needs to make the same SQL queries, then it's definitely worth considering using Memcache or another caching layer (which, yes, will most likely be provided by a daemon) in between the app and MySQL.

How to do this (PHP) in python or ruby?

My app takes a loooong list of urls, and split it in X (where X = $threads) so then I can start a thread.php and calculate the urls for it. Then it does GET and POST request to retrieve data
I am using this:
for($x=1;$x<=$threads;$x++){
$pid[] = exec("/path/bin/php thread.php <options> > /dev/null & echo \$!");
}
For "threading" (I know its not really threading, is it forking or what?), I save the pids into a file for later checking if N thread is running and to stop them.
Now I want to move out from php, I was thinking about using python because I'd like to learn more about it.
How can I achieve this kind of "threading" with python? (or ruby)
Or is there a better way to launch multiple background threads in python or ruby that runs in parallel (at the same time)?
The threads doesn't need to communicate between each other or with a main thread, they are independent, they do http request and interact with a mysql db, they may need to access/modify the same table entries (I haven't tought about this or how I will solve it yet).
The app works with "projects", each project has a "max threads" variable and I use a web interface to control it (so I could still use php for the interface [starting/stopping threads] in the new app).
I wanted to use
from threading import Thread
in python, but I've been told those threads wont run in parallel but once at a time.
The app is intended to run on linux web servers.
Any suggestion will be appreciated.
For Python 2.6+, consider the multiprocessing module:
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows
For Python 2.5, the same functionality is available via pyprocessing.
In addition to the example at the links above, here are some additional links to get you started:
multiprocessing Basics
Communication between processes with multiprocessing
You don't want threading. You want a work queue like Gearman that you can send jobs to asynchronously.
It's worth noting that this is a cross-platform, cross-language solution. There are bindings for many languages (including Python and PHP) provided officially, and many more unofficially with a bit of work with Google.
The original intent is effectively load balancing, but it works just as well with only one machine. Basically, you can create one or more Workers that listen for Jobs. You can control the number of Workers and the types of Jobs they can listen for.
If you insert five Jobs into the queue at the same time, and there happen to be five Workers waiting, each Worker will be handed one of the Jobs. If there are more Jobs than Workers, the Jobs get handled sequentially. Your Client (the thing that submits Jobs) can either wait for all of the Jobs it's created to complete, or it can simply place them in the queue and continue on.

Categories