I have to call a php function wich takes one second to response, in a "for" loop :
for ($i=0; $i<count($wsdlTab); $i++)
{
$serverLoadTab[$i] = $this->getServerLoad($wsdlTab[$i]);
}
My problem is that I would like to call my getServerLoad($wsdlTab[$i]) function simultaneous for each row of my $wsdlTab[$i], to not have to wait one second on each loop.
That is the reason why I need to call that function in a thread.
I have seen various ways to "emulate" threads, but I have not found any way with my limitations :
I have to get the return value of my getServerLoad($wsdlTab[$i]), and put in in an array
The Apache server is on Windows
Thanks in advance for your responses.
You should check out Gearman for parallel processing: http://gearman.org/
PHP doesn't really have asynchronous or threading built-in, as you've discovered.
What I might do in a case like this is separate the script I need to execute in a parallel, putting it into its own, small and self-contained PHP file. Then I'd execute that in a separate thread, storing the result somewhere I could monitor in the original thread. Once all the scripts have returned and filled the results, or with some given timeout, I would then continue with processing.
So, for instance,
prepareResults(); // something like clearing a db row or zeroing out a file or whatever
for ($i=0; $i<count($wsdlTab); $i++) {
exec('./doServerLoad.php ' . $wsdlTab[$i] . ' &');
}
while (!waitingForResults()) { // checking the results table/row/file
}
$serverLoadTab = parseTheResults();
May be request WSDLs in multi threads/forks using pcntl_fork() or curl_multi, and save them to the local drive. Then just parse them:
$soapClient = new SoapClient('WSDLSTemp/wsdl1.wsdl');
// .. do whatever you want...
Related
I have sript php with three function like this:
public function a($html,$text)
{
//blaa
return array();
}
public function b($html,$text){
//blaa
return array();
}
public function c($html,$text){
//blaa
return array();
}
require_once 'simple_html_dom.php';
$a=array();
$html=new simple_html_dom();
$a=$this->a($html,$text);
$b=$this->b($html,$text);
$c=$this->c($html,$text);
$html->clear();
unset($html);
$a=array_merge($a, $c);
$a=array_merge($a, $b);
a($html,$text) takes 5 seconds before giving a result
b($html,$text) takes 10 seconds before giving a result
c($html,$text) takes 12 seconds before giving a result
Thus the system takes 27 seconds before geving me a result, but I want take my result in 12 seconds. I can't use threads because my hosting does not support threads. How can I solve this problem?
PHP does not support this out of the box. If you really want to do this, you have two basic options (yep, it's going to be dirty). If you want a serious solution depending on your actual use-case, there is another option to consider.
Option 1: Use some AJAX-trickery
Create a page with a button that triggers three AJAX-calls to the different functions that you want to call.
Option 2: Run a command
If you're on UNIX, you can trigger a command from the PHP script to run a PHP script (php xyz.php) and that actually runs it on a different thread.
Serious option: use queues
Seriously: use a queue system like rabbitMQ or BeanstalkD to do these kind of things. Laravel supports it out of the box.
If the wait time is caused by blocking IO (waiting for server response) then curl_multi might help.
From the code you posted, though, it doesn't look like is your problem.
It looks more like simple html dom is taking a long time to parse your html. That's not too surprising because it's not a very good library. If this is the case you should consider switching to DomXPath.
You might wanna look into jQuery deferred objects.... $.when should handle this kinda of situation.
I have to scrap a web site where i need to fetch multiple URLs and then process them one by one. The current process somewhat goes like this.
I fetch a base URL and get all secondary URLs from this page, then for each secondary url I fetch that URL, process found page, download some photos (which takes quite a long time) and store this data to database, then fetch next URL and repeat the process.
In this process, I think I am wasting some time in fetching secondary URL at the start of each iteration. So I am trying to fetch next URLs in parallel while processing first iteration.
The solution in my mind is, from main process call a PHP script, say downloader, which will download all the URL (with curl_multi or wget) and store them in some database.
My questions are
How to call such downloder asynchronously, I don't want my main script to wait till downloder completes.
Any location to store downloaded data, such as shared memory. Of course, other than database.
There any chances that data gets corrupt while storing and retrieving, how to avoid this?
Also, please guide me know if anyone have a better plan.
When I hear someone uses curl_multi_exec it usually turns out they just load it with, say, 100 urls, then wait when all complete, and then process them all, and then start over with the next 100 urls... Blame me, I was doing so too, but then I found out that it is possible to remove/add handles to curl_multi while something is still in progress, And it really saves a lot of time, especially if you reuse already open connections. I wrote a small library to handle queue of requests with callbacks; I'm not posting full version here of course ("small" is still quite a bit of code), but here's a simplified version of the main thing to give you the general idea:
public function launch() {
$channels = $freeChannels = array_fill(0, $this->maxConnections, NULL);
$activeJobs = array();
$running = 0;
do {
// pick jobs for free channels:
while ( !(empty($freeChannels) || empty($this->jobQueue)) ) {
// take free channel, (re)init curl handle and let
// queued object set options
$chId = key($freeChannels);
if (empty($channels[$chId])) {
$channels[$chId] = curl_init();
}
$job = array_pop($this->jobQueue);
$job->init($channels[$chId]);
curl_multi_add_handle($this->master, $channels[$chId]);
$activeJobs[$chId] = $job;
unset($freeChannels[$chId]);
}
$pending = count($activeJobs);
// launch them:
if ($pending > 0) {
while(($mrc = curl_multi_exec($this->master, $running)) == CURLM_CALL_MULTI_PERFORM);
// poke it while it wants
curl_multi_select($this->master);
// wait for some activity, don't eat CPU
while ($running < $pending && ($info = curl_multi_info_read($this->master))) {
// some connection(s) finished, locate that job and run response handler:
$pending--;
$chId = array_search($info['handle'], $channels);
$content = curl_multi_getcontent($channels[$chId]);
curl_multi_remove_handle($this->master, $channels[$chId]);
$freeChannels[$chId] = NULL;
// free up this channel
if ( !array_key_exists($chId, $activeJobs) ) {
// impossible, but...
continue;
}
$activeJobs[$chId]->onComplete($content);
unset($activeJobs[$chId]);
}
}
} while ( ($running > 0 && $mrc == CURLM_OK) || !empty($this->jobQueue) );
}
In my version $jobs are actually of separate class, not instances of controllers or models. They just handle setting cURL options, parsing response and call a given callback onComplete.
With this structure new requests will start as soon as something out of the pool finishes.
Of course it doesn't really save you if not just retrieving takes time but processing as well... And it isn't a true parallel handling. But I still hope it helps. :)
P.S. did a trick for me. :) Once 8-hour job now completes in 3-4 mintues using a pool of 50 connections. Can't describe that feeling. :) I didn't really expect it to work as planned, because with PHP it rarely works exactly as supposed... That was like "ok, hope it finishes in at least an hour... Wha... Wait... Already?! 8-O"
You can use curl_multi: http://www.somacon.com/p537.php
You may also want to consider doing this client side and using Javascript.
Another solution is to write a hunter/gatherer that you submit an array of URLs to, then it does the parallel work and returns a JSON array after it's completed.
Put another way: if you had 100 URLs you could POST that array (probably as JSON as well) to mysite.tld/huntergatherer - it does whatever it wants in whatever language you want and just returns JSON.
Aside from the curl multi solution, another one is just having a batch of gearman workers. If you go this route, I've found supervisord a nice way to start a load of deamon workers.
Things you should look at in addition to CURL multi:
Non-blocking streams (example: PHP-MIO)
ZeroMQ for spawning off many workers that do requests asynchronously
While node.js, ruby EventMachine or similar tools are quite great for doing this stuff, the things I mentioned make it fairly easy in PHP too.
Try execute from PHP, python-pycurl scripts. Easier, faster than PHP curl.
Can PHP call a function and don't wait for it to return? So something like this:
function callback($pause, $arg) {
sleep($pause);
echo $arg, "\n";
}
header('Content-Type: text/plain');
fast_call_user_func_array('callback', array(3, 'three'));
fast_call_user_func_array('callback', array(2, 'two'));
fast_call_user_func_array('callback', array(1, 'one'));
would output
one (after 1 second)
two (after 2 seconds)
three (after 3 seconds)
rather than
three (after 3 seconds)
two (after 3 + 2 = 5 seconds)
one (after 3 + 2 + 1 = 6 seconds)
Main script is intended to be run as a permanent process (TCP server). callback() function would receive data from client, execute external PHP script and then do something based on other arguments that are passed to callback(). The problem is that main script must not wait for external PHP script to finish. Result of external script is important, so exec('php -f file.php &') is not an option.
Edit:
Many have recommended to take a look at PCNTL, so it seems that such functionality can be achieved. PCNTL is not available in Windows, and I don't have an access to a Linux machine right now, so I can't test it, but if so many people have advised it, then it should do the trick :)
Thanks, everyone!
On Unix platforms you can enable the PCNTL functions, and use pcntl_fork to fork the process and run your jobs in child processes.
Something like:
function fast_call_user_func_array($func, $args) {
if (pcntl_fork() == 0) {
call_user_func_array($func, $args);
}
}
Once you call pcntl_fork, two processes will execute your code from the same position. The parent process will get a PID returned from pcntl_fork, while the child process will get 0. (If there's an error the parent process will return -1, which is worth checking for in production code).
You can check out PHP Process Control:
http://us.php.net/manual/en/intro.pcntl.php
Note: This is not threading, but the handling of separate processes. There is more overhead attached.
Wouldn't it solve your problem to fork, keeping the parent process free for other connections & actions? See http://www.php.net/pcntl_fork. If you need an answer back you could possibly listen to a socket in the parent, and write with the child. A simple while(true) loop with a read could possibly do, and probably you already have that basic functionality if you run a permanent TCP server. Another option would be to keep track of your childprocess-ids, keep a accessable store somewhere (file/database/memcached etc), with a pcnt_wait in the main process with a WNOHANG to check which process has exited, and retrieve the data from the store.
You can do some threading in PHP if you use the method pcntl_fork.
http://ca.php.net/manual/en/function.pcntl-fork.php
I have never use this myself, but the are some good example of how to use it on php.net.
PHP doesn't have this functionality as far as I know
You can emulate the function using a different technique, like this one:
Parallel functions in PHP
PHP does not support multi-threading, so there's no other option than taking advantage of the OS or the web server multi processing capabilities. Note that actually you can fetch both the result and output of exec:
string exec ( string $command [,
array &$output [, int &$return_var
]] )
You can, at least, prevent the parent process from hanging until the child process is done by ignoring the child signals using pcntl_signal(SIGCHLD, SIG_IGN).
So, let's say you want to fork a process and execute another PHP function that takes a while without making the parent wait for it to finish (since you want the main process to finish in a timely manner):
pcntl_signal(SIGCHLD, SIG_IGN);
$pid = pcntl_fork();
if ($pid < 0) {
exit(0);
} elseif (!$pid) {
my_slow_function();
exit(0);
}
// Parent keeps executing and finishes before the child does
If you want to execute a slow external script as the child process, pcntl_exec is handy:
$script = array('/path/to/my/script'); // E.g. /home/my_user/my_script.php
pcntl_exec('/path/to/program/executable',$script); // E.g. /usr/bin/php
i have a php script that accepts a POST request as a listener to a web service then process all the data to two final arrays,
I'm looking for a way to initiate a second script that GET's those serialized arrays and do some more processing.
include() will not be good for me since i actually want to "free" or "end" the first script after passing the data
your help is much appreciated as always :)
EDIT - OK so looks like queue might be the solution! i never did anything like this before any examples or reference?
Does it need to happen immediately? Otherwise you could set up a cronjob that does that every X minutes. You'll have to make some kind of queue in which your first script sticks "requests" to the second script. The cronjob then processes the requests in the queue.
You should get into the habit of writing php scripts that are just a collection of functions (no auto-ran scripts, per se). This way you can include a script file at the top of the script your talking about and then call the function that does what you want.
For instance:
<?php
include('common_functions.php');
$array_1 = whatever_you_do_with_post_values();
$array_2 = other_thing_you_do_with_post_values();
// this function is located in 'common_functions.php'
do_stuff_with_arrays($array_1,$array_2);
?>
In Fact:
Just to be consistent with what I'm saying:
<?php
include('common_functions.php');
do_your_stuff();
function do_your_stuff() {
$array_1 = whatever_you_do_with_post_values();
$array_2 = other_thing_you_do_with_post_values();
// this function is located in 'common_functions.php'
do_stuff_with_arrays($array_1,$array_2);
}
?>
Obviously you should use better function & variable names, haha.
I'd do it all in one request. It cuts down on latency and makes the whole operation more efficient.
Remember you can have a long running request, but still service other requests. Apache will just spawn another php process to handle the other request from the webservice even though the first has not completed. As long as the script doesn't lock a shared resource (database file etc) this will work just fine.
That said, you should use cURL to call the second script. then post the unserialized array. cUrl will handle the rest.
I have a list of data that needs to be processed. The way it works right now is this:
A user clicks a process button.
The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on.
This takes way too long. What I'd like instead is that:
The user clicks the process button.
A PHP script takes the first item and starts to process it.
Simultaneously another instance of the script takes the next item and processes it.
And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one.
Is something like this possible?
I was thinking that I use CRON to launch an instance of the script every second. All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag.
Thoughts?
Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. Hence I don't have to supply the items as command line arguments.
Here's one solution, not the greatest, but will work fine on Linux:
Split the processing PHP into a separate CLI scripts in which:
The command line inputs include `$id` and `$item`
The script writes its PID to a file in `/tmp/$id.$item.pid`
The script echos results as XML or something that can be read into PHP to stdout
When finished the script deletes the `/tmp/$id.$item.pid` file
Your master script (presumably on your webserver) would do:
`exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item
Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough)
If they are never deleted kill all the processing scripts and report failure
If successful read the from `/tmp/$id.$item.xml` for format/output to user
Delete the XML files if you don't want to cache for later use
A backgrounded nohup started application will run independent of the script that started it.
This interested me sufficiently that I decided to write a POC.
test.php
<?php
$dir = realpath(dirname(__FILE__));
$start = time();
// Time in seconds after which we give up and kill everything
$timeout = 25;
// The unique identifier for the request
$id = uniqid();
// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");
// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}
echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout)
{
echo (time() - $start), " seconds\n";
clearstatcache(); // Required since PHP will cache for file_exists
$running = array();
foreach($items as $item)
{
// If the pid file still exists the process is still running
if (file_exists("$dir/proc.$id.$item.pid")) {
$running[] = $item;
}
}
if (empty($running)) break;
echo implode($running, ','), " running\n";
flush();
sleep(1);
}
// Clean up if we timeout out
if (!empty($running)) {
clearstatcache();
foreach ($items as $item) {
// Kill process of anything still running (i.e. that has a pid file)
if(file_exists("$dir/proc.$id.$item.pid")
&& $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
posix_kill($pid, 9);
unlink("$dir/proc.$id.$item.pid");
// Would want to log this in the real world
echo "Failed to process: ", $item, " pid ", $pid, "\n";
}
// delete the useless data
unlink("$dir/proc.$id.$item.out");
}
} else {
echo "Successfully processed all items in ", time() - $start, " seconds.\n";
foreach ($items as $item) {
// Grab the processed data and delete the file
echo(file_get_contents("$dir/proc.$id.$item.out"));
unlink("$dir/proc.$id.$item.out");
}
}
echo "</pre>";
?>
proc.php
<?php
$dir = realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];
// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());
for($i=0;$i<80;++$i)
{
echo $item,':', $i, "\n";
usleep(250000);
}
// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");
?>
Put test.php and proc.php in the same folder of your server, load test.php and enjoy.
You will of course need nohup (unix) and PHP cli to get this to work.
Lots of fun, I may find a use for it later.
Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. You can spin up as many workers as you have memory / CPU. Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. beanstalkd has a slew of client APIs and itself has a very basic API, think memcached.
We use beanstalkd to process all of our background jobs, I love it. Easy to use, its very fast.
There is no multithreading in PHP, however you can use fork.
php.net:pcntl-fork
Or you could execute a system() command and start another process which is multithreaded.
can you implementing threading in javascript on the client side? seems to me i've seen a javascript library (from google perhaps?) that implements it. google it and i'm sure you'll find something. i've never done it, but i know its possible. anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. that might be easier than trying to do it all on the server side.
-don
If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . You do not have to make code modifications to run APC.
Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt.
To solve this problem, I've used two different products; Gearman and RabbitMQ.
The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s).
Gearman is easier to setup, so I'd suggest poking around with it a bit first. If you find you need something more heavy duty with queue robustness; Look into RabbitMQ
http://www.danga.com/gearman/
http://pear.php.net/package/Net_Gearman (PEAR library)
You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished.
You could have them write to shared memory, like via memcache or a DB.
You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client.
The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc)
Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it.
Also, can you cache the data once its been processed, or is it unique data every time? that would surely speed things up..?