I am trying to create a multi threaded PHP application right now. I have read lots of paper that explains how to create multi threading. All of those examples are built on diving the processes on different worker PHP files. Actualy that is also what I am trying to do but there is a problem :)
There are too many jobs even to divide
in 30 seconds (which is the execution time limit)
We are using multi server environment on local network to complete the processes as the processes do not linked to each other or shares the same memory. We just need to fire them up and let them work at an exact time. Each of the processes works for 0.5 secs but it has a possibility to work for 30 secs.
Most of the examples fires up the PHP's and waits for the results. But unfortunately in my situation I dont need to expect a result from the thread. I just need it to execute the command and write the result to its own database.
How can I achieve to fire up the phps and wait for them to work for 10000 processes ?
ADDITIONAL INFO:
I know that PHP neither have multi threading feature nor is built for. But we have to create a way to use it for instance we can send request to http://server1/dothis.php?jobid=5 but standart methods makes us wait for the result. If we can manage to send request to this server without waiting for result it would solve our problem I think or we will need completely different approach such as a process divider with c++ or qt.
As has been pointed out, php doesn't support multi threading. However, and as tomaszsobczak mentioned, there is a library which will let you create "threads" and leave them running, and reconnect to them through other scripts to check their status and so on, called "Gearman".
From the project homepage: "Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates."
Rasmus' blog has a great write up about it here:
playing with gearman and for your case, it might just be the solution, although I've not read any in depth test cases... Would be interested to know though, so if you end up using this, please report back!
As the comments say, multi-threading is not possible in PHP. But based on your comment:
If we can manage to send request to this server without waiting for result it would solve our problem I think
You can start a PHP script to run in the background using exec(), redirecting the script's output somewhere else (e.g. /dev/null). I think that's the best you will get. From the manual:
Note: If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
There are several notes and pointers in the User Contributed Comments, for example this snippet that allows background execution on both Windows and Linux platforms.
Of course, that PHP script will not share the state, or any data, of the PHP script you are running. You will have to initialize the background script as if you are making a completely new request.
Since PHP does not have the ability to support multithreading, I don't quite know how to advise you. Each time a new script is loaded, a new instance of PHP is loaded, so if your server can handle X many PHP instances, then you can do what you want.
Here is something however:
Code for background execution
<?php
function execInBackground($cmd) {
if (substr(php_uname(), 0, 7) == "Windows"){
pclose(popen("start /B ". $cmd, "r"));
}
else {
exec($cmd . " > /dev/null &");
}
}
?>
Found at http://php.net/manual/en/function.exec.php#86329
Do you really want to have multi-threading in php?
OR do you just want to execute a php script every second? For the latter case, a cronjob-like "execute this file every second" approach via linux console tools should be enough.
If your task is to make a lot of HTTP requests you can use curl multi. There is a good library for doing it: http://code.google.com/p/rolling-curl/
As everyone's already mentioned, PHP doesn't natively support multi-threading and the workarounds are, well, workarounds...
That said, have you heard of the Facebook PHP Compiler? Basically it compiles your PHP to highly optimized C++ and uses g++ to compile it. This opens up a world of opportunities, including but not limited to multi-threading!
The project is open-source and its on github
if you just want to post a HTTP request, just do it using PHP CURL lib. It will solve your issue.
Related
I'm looking for some ideas to do the following. I need a PHP script to perform certain action for quite a long time. This is an extension for a CMS and this can't be anything else but PHP. It also can't be a command line script because it should be used by common people that will have only the standard means of the CMS. One of the options is having a cron job (most simple hostings have it) that will trigger the script often so that instead of working for a long time it could perform the action step by step preserving its state from one launch to the next one. This is not perfect but I can't see of any other solutions. If the script will be redirecting to itself server will interrupt it. What other options can suit?
Thanks everyone in advance!
What you're talking about is a daemon or long running program that waits for calls by client programs, performs and action, provides a response then keeps on waiting for more calls.
You might be familiar w/ these in the form of Apache & MySQL ;) Anyway PHP is generally OK in this regard, it does have the ability to function over raw sockets as well as fork sub-processes to handle multiple requests simultaneously.
Having said that PHP daemons are a tool where YMMV. Some folks will say they work great, other folks like me will say they have issues w/ interprocess communication and leaking memory even amidst plethora unset() calls.
Anyway you likely won't be able to deploy a daemon of any type on a shared hosting environment. You'll need to get a better server package or stick with a Cron based solution.
Here's a link about writing a PHP daemon.
Also, one more note. Daemons do crash from time to time and therefore you may still need to store state about whats going on, just in case someone trips over the power cord to your shared server :)
I would also suggest that you think about making it a daemon but if not then you can simply use
set_time_limit(0);
ignore_user_abort(true);
at the top to tell it not to time out and not to get interrupted by anything. Then call it from the cron to start it every day or whatever. I have this on many long processing daily tasks and it works great for me. However, it won't be able to easily talk to the outside world (other scripts can't query it or anything -- if that is what you want look into php services) so once you get it running make sure it will stop and have it print its progress to a logfile.
I am looking for the PHP equivalent for VB doevents.
I have written a realtime analysis package in VB and used doevents to release to the operating system.
Doevents allows me to stay in memory and run continuously without filling up memory and allows me to respond to user input.
I have rewritten the package in PHP and I am looking for that same doevents feature.
If it doesn't exist I could reschedule myself and exit.
But I currently don't know how to do that and I think that would add a lot more overhead.
Thank you, gerardg
usleep is what you are looking for.. Delays program execution for the given number of micro seconds
http://php.net/manual/en/function.usleep.php
It's been almost 10 years since I last wrote anything in VB and as I recall, doevents() function allowed the application to yield to the processor during intensive processing (usually to allow other system events to fire - the most common being WM_PAINT so that your UI won't appear hung).
I don't think PHP has such functionality - your script will run as a single process and end (either when it's done or when it hits the default 30 second timeout).
If you are thinking in terms of threads (as most Windows programmers tend to do) and needing to spawn more than 1 instance of your script, perhaps you should take look at PHP's Process Control functions as a start.
I'm not entirely sure which aspects of doevents you're looking to emulate, so here's pretty much everything that could be useful for you.
You can use ob_implicit_flush(true) at the top of your script to enable implicit output buffer flushing. That means that whenever your script calls echo or print or whatever you use to display stuff, PHP will automatically send it all to the user's browser. You could also just use ob_flush() after each call to display something, which acts more like Application.DoEvents() in VB with regards to keeping your UI active, but must be called each time something is output.
Naturally if your script uses the output buffer already, you could build a copy of the buffer before flushing, with ob_get_contents().
If you need to allow the script to run for more time than usual, you can set a longer tiemout with set_time_limit($time). If you need more memory, and you have access to edit your .htaccess file, place the following code and edit the value:
php_value memory_limit 64M
That sets the memory limit to 64 megabytes.
For running multiple scripts at once, you can use pcntl_exec to start another one running.
If I am missing something important about DoEvents(), let me know and I will try to help you make it work.
PHP is designed for asynchronous on demand processing. However it can be forced to become a background task with a little hackery.
As PHP is running as a single thread you do not have to worry about letting the CPU do other things as that is already taken care of. If this was not the case then a web server would only be able to serve up one page at a time and all other requests would have to sit in a queue. You will need to write some sort of look that never expires until some detectable condition happens (like the "now please exit" message you set in the DB or something).
As pointed out by others you will need to set_time_limit($something); with perhaps usleep stopping the code from running "too fast" if it eats very much CPU each loop. However if you are also using a Database connection most of your script time is actually the script waiting for the Database (by far the biggest overhead for a script).
I have seen PHP worker threads created by using screen and detatching it to a background task. Other approaches also work so long as you do not have a session that will time out or exit (say when the web browser is closed). A cron that starts a script to check if the script is running every x mins or hours gives you automatic recovery from forced exists and/or system restarts.
TL;DR: doevents is "baked in" to PHP and you don't have to worry about it.
I'm trying to build a web interface for some python scripts. The thing is I have to use PHP (and not CGI) and some of the scripts I execute take quite some time to finish: 5-10 minutes. Is it possible for PHP to communicate with the scripts and display some sort of progress status? This should allow the user to use the webpage as the task runs and display some status in the meantime or just a message when it's done.
Currently using exec() and on completion I process the output. The server is running on a Windows machine, so pcntl_fork will not work.
LATER EDIT:
Using another php script to feed the main page information using ajax doesn't seem to work because the server kills it (it reaches max execution time, and I don't really want to increase this unless necessary)
I was thinking about socket based communication but I don't see how is this useful in my case (some hints, maybe?
Thank you
You want inter-process communication. Sockets are the first thing that comes to mind; you'd need to set up a socket to listen for a connection (on the same machine) in PHP and set up a socket to connect to the listening socket in Python and send it its status.
Have a look at this socket programming overview from the Python documentation and the Python socket module's documentation (especially the examples at the end). I'm sure PHP has similar resources.
Once you've got an more specific idea of what you want to build and need help, feel free to ask a new question on StackOverflow (if it isn't already answered).
I think you would have to use a meta refresh and maybe have the python write the status to a file and then have the php read from it.
You could use AJAX as well to make it more dynamic.
Also, probably shouldn't use exec()...that opens up a world of vulnerabilities.
You could use a queuing service like Gearman, with a client in PHP and a worker in Python or vice versa.
Someone has created an example setup here.
https://github.com/dbaltas/gearman-python-worker
Unfortunately my friend, I do believe you'll need to use Sockets as you requested. :( I have little experience working with them, but This Python Tutorial on Sockets/Network Programming may help you get the Python socket interaction you need. (Beau Martinez's links seem promising as well.)
You'd also need to get some PHP socket connections, too, so it can request the status.
Continuing on that, my thoughts would be that your Python script is likely going to run in a loop. Ergo, I'd put the "Check for a status request" check inside the beginning of a part of that loop. It'd reply one status, while a later loop inside that script would reply with an increased status.. etc.
Good luck!
Edit: I think that the file writing recommendation from Thomas Schultz is probably the easiest to implement. The only downside is waiting for the file to be opened-- You'll need to make sure your PHP and Python scripts don't hang or return failure without trying again.
I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?
If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.
By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.
Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.
Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)
Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.