Related
I'm building an integration that communicates data to several different systems via API (REST). I need to process data as quickly as possible. This is a basic layout:
Parse and process data (probably into an array as below)
$data = array( Title => "Title", Subtitle => "Test", .....
Submit data into service (1) $result1 = $class1->functionservice1($data);
Submit data into service (2) $result2 = $class2->functionservice2($data);
Submit data into service (3) $result3 = $class3->functionservice3($data);
Report completion echo "done";
Run in a script as above I'll need to wait for each function to finish before it starts the next one (taking 3 times longer).
Is there an easy way to run each service function asynchronously but wait for all to complete before (5) reporting completion. I need to be able to extract data from each $result and return that as one post to a 4th service.
Sorry if this is an easy question - I'm a PHP novice
Many thanks, Ben
Yes, there are multiple ways.
The most efficient is to use an event loop that leverages non-blocking I/O to achieve concurrency and cooperative multitasking.
One such event loop implementation is Amp. There's an HTTP client that works with Amp, it's called Artax. An example is included in its README. You should have a look at how promises and coroutines work. There's Amp\wait to mix synchronous code with async code.
<?php
Amp\run(function() {
$client = new Amp\Artax\Client;
// Dispatch two requests at the same time
$promises = $client->requestMulti([
'http://www.google.com',
'http://www.bing.com',
]);
try {
// Yield control until all requests finish
list($google, $bing) = (yield Amp\all($promises));
var_dump($google->getStatus(), $bing->getStatus());
} catch (Exception $e) {
echo $e;
}
});
Other ways include using threads and or processes to achieve concurrency. Using multiple processes is the easiest way if you want to use your current code. However, spawning processes isn't cheap and using threads in PHP isn't really a good thing to do.
You can also put your code in another php file and call it using this :
exec("nohup /usr/bin/php -f your script > /dev/null 2>&1 &");
If you want to use asynchronicity like you can do in other languages ie. using threads, you will need to install the pthreads extension from PECL, because PHP does not support threading out of the box.
You can find an explaination on how to use threads with this question :
How can one use multi threading in PHP applications
I have sript php with three function like this:
public function a($html,$text)
{
//blaa
return array();
}
public function b($html,$text){
//blaa
return array();
}
public function c($html,$text){
//blaa
return array();
}
require_once 'simple_html_dom.php';
$a=array();
$html=new simple_html_dom();
$a=$this->a($html,$text);
$b=$this->b($html,$text);
$c=$this->c($html,$text);
$html->clear();
unset($html);
$a=array_merge($a, $c);
$a=array_merge($a, $b);
a($html,$text) takes 5 seconds before giving a result
b($html,$text) takes 10 seconds before giving a result
c($html,$text) takes 12 seconds before giving a result
Thus the system takes 27 seconds before geving me a result, but I want take my result in 12 seconds. I can't use threads because my hosting does not support threads. How can I solve this problem?
PHP does not support this out of the box. If you really want to do this, you have two basic options (yep, it's going to be dirty). If you want a serious solution depending on your actual use-case, there is another option to consider.
Option 1: Use some AJAX-trickery
Create a page with a button that triggers three AJAX-calls to the different functions that you want to call.
Option 2: Run a command
If you're on UNIX, you can trigger a command from the PHP script to run a PHP script (php xyz.php) and that actually runs it on a different thread.
Serious option: use queues
Seriously: use a queue system like rabbitMQ or BeanstalkD to do these kind of things. Laravel supports it out of the box.
If the wait time is caused by blocking IO (waiting for server response) then curl_multi might help.
From the code you posted, though, it doesn't look like is your problem.
It looks more like simple html dom is taking a long time to parse your html. That's not too surprising because it's not a very good library. If this is the case you should consider switching to DomXPath.
You might wanna look into jQuery deferred objects.... $.when should handle this kinda of situation.
I want to write an Ajax web application, a game to be specific. Two web clients have to communicate with each other via the PHP server. My approach to solve this is to use Ajax between client and server and server and client. Each client creates a separate server process using Ajax. I want that these two server processes communicate via MySQL and via named pipes. I need the named pipes to get immediate response for the whole application.
I cannot use one server process, which first creates a pipe and then forks into two processes, which use the pipe. Web applications create server processes when the web-browser sends a request. So, I need named pipes, where each process doesn't know more than the file name of the named pipe. They cannot exchange file handles (at least I don't know how).
My problem is that named pipes in the PHP way indeed work as long as they are
used within the same function:
public function writeAndReadPipe_test(){
$pipeA = fopen("testpipe",'r+');
fwrite($pipeA, 'ABCD');
$pipeB = fopen("testpipe",'r+');
$content = fread($pipeB, 4);
echo "[" . $content . "]<br>\n";
}
public function testingPipes_same_function(){
posix_mkfifo("testpipe", 0777);
$this->writeAndReadPipe_test();
}
But, when I use different functions, then the fread($pipeB, 4) command blocks the whole application:
public function writePipe_test(){
$pipeA = fopen("testpipe",'r+');
fwrite($pipeA, 'ABCD');
}
public function readPipe_test(){
$pipeB = fopen("testpipe",'r+');
$content = fread($pipeB, 4);
echo "[" . $content . "]<br>\n";
}
public function testingPipes_different_functions(){
posix_mkfifo("testpipe", 0777);
$this->writePipe_test();
$this->readPipe_test();
}
Does anybody know why? And what can I do to make it work between different functions in the first step? In the second step, it should work even between different processes! I found out that I also get a problem when the writer closes the pipe before the reader reads from
it. I suppose that the function closes it automatically when it ends, but this is only a guess.
If the PHP way does not work, I plan to let PHP open a command line, generate BASH commands and let execute them. This should work in any case as long as my web-server works in a LAMP environment. Disadvantage is that it will not work in WAMP environments.
So, has anybody some ideas to this?
P.S:
I need blocking pipes to let the reader wait until an event is sent. I know that the pipes can work in a non-blocking mode using
stream_set_blocking($pipe,false);
or so, but the whole idea is to do it without polling just using a pipe, which wakes the
reader up as soon as an event is fired.
Just a short statement, that I actually found a nice solution
using named pipes:
public function createPipe_test(){
posix_mkfifo("testpipe", 0777);
}
public function writePipe_test($value){
$pipeA = fopen("testpipe",'w');
$valueString = number_format($value);
$valueLen = number_format(strlen($valueString));
fwrite($pipeA, $valueLen);
fwrite($pipeA, $valueString);
}
public function readPipe_test(){
$pipeB = fopen("testpipe",'r');
$valueLen = fread($pipeB, 1);
return fread($pipeB, $valueLen);
}
I have two processes.
If process 1 calls writePipe_test(), then it waits until process 2 calls
readPipe_test() to read the data out of the pipe.
If process 1 calls readPipe_test(), then it waits until process 2 calls
writePipe_test() to write something into the pipe.
The trick is 'w' and 'r' instead of 'r+'.
When you use the pipes in separate functions like this, the write pipe A would seem to be closed/discarded again (local scope of $pipeA). The assumption would be that the pipe must be opened for reading and/or writing in order to retain any info, which makes sense really. Though I don't know the inner magic.
You can also observe that your blocking read-call succeeds when you feed the pipe from another process (like echo magic >> testpipe). So you'd already have step 2 done, but you need some pipehandle management.
If you change it as follows it'd work:
private $pipeA;
public function writePipe_test(){
$this->pipeA = fopen("testpipe",'r+');
fwrite($this->pipeA, 'ABCD');
}
Edit: or setting $pipeA to have global scope, for that matter..
Im not sure wether I understand your 2nd last post..
But to comment on the last one, if I don't misunderstand, TCP might be even more complex because you will have to establish a connection before you can either read or write, so youve got different overhead
As for the pipehandle closing at the function end, I assume you'll face the same problem with the sockets; but the pipefile remains!
Persistent storage (files,db) would make the clients independent timingwise, if you want to use blocking calls then files might actually be a way to go..
Which design pattern exist to realize the execution of some PHP processes and the collection of the results in one PHP process?
Background:
I do have many large trees (> 10000 entries) in PHP and have to run recursive checks on it. I want to reduce the elapsed execution time.
If your goal is minimal time - the solution is simple to describe, but not that simple to implement.
You need to find a pattern to divide the work (You don't provide much information in the question in this regard).
Then use one master process that forks children to do the work. As a rule the total number of processes you use should be between n and 2n, where n is the number of cores the machine has.
Assuming this data will be stored in files you might consider using non-blocking IO to maximize the throughput. Not doing so will make most of your process spend time waiting for the disk. PHP has stream_select() that might help you. Note that using it is not trivial.
If you decide not to use select - increasing the number of processes might help.
In regards to pcntl functions: I've written a deamon with them (a proper one with forking, changing session id, the running user, etc...) and it's one of the most reliable piece of software I've written. Because it spawns workers for every task, even if there is a bug in one of the tasks, it does not affect the others.
From your php script, you could launch another script (using exec) to do the processing. Save status updates in a text file, which could then be read periodically by the parent thread.
Note: to avoid php waiting for the exec'd script to complete, pipe the output to a file:
exec('/path/to/file.php | output.log');
Alternatively, you can fork a script using the PCNTL functions. This uses one php script, which when forked can detect whether it is the parent or the child and operate accordingly. There are functions to send/receive signals for the purpose of communicating between parent/child, or you have the child log to a file and the parent read from that file.
From the pcntl_fork manual page:
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
}
This might be a good time to consider using a message queue, even if you run it all on one machine.
The question seems to be a bit confused.
I want to reduce the absolute execution time.
Do you mean elapsed time? Certainly use of the right data-structure will improve throughput, but for a given data-structure, the minmimum order of the algorithm is absolute, and nothing to do with how you implement the algorithm.
Which design pattern exist to realize....?
Design Patterns are something which code is, not a template for writing programs, and a useful tools for curriculum design. To start with a pattern and make your code fit it is in itself an anti-pattern.
Nobody can answer this question withuot knowing a lot more about your data and how its structured, however the key driver for efficiency will be the data-structure you use to implement your tree. If elapsed time is important then certainly look at parallel execution, however it may also be worth considering performing the operation in a different tool - databases are highly optimized for dealing with large sets of data, however note that the obvious method for describing a tree in a relational database is very inefficient when it comes to isolating sub-trees and walking the tree.
In response to Adam's suggesting of forking you replied:
I "heard" that pcntl isnt a good solution. Any experiences?
Where did you hear that? Certainly forking from a CGI or mod_php invoked script is a bad idea, but nothing wrong with doing it from the command line. Do have a google for long running PHP processes (be warned there is a lot of bad information out there). What code you write will vary depending on the underlying OS - which you've not stated.
I suspect that you could solve a large part of your performance issues by identifying which parts of the tree need to be checked and only checking those parts AND triggering the checks when the tree is updated, or at least marking the nodes as 'dirty'.
You might find these helpful:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
http://en.wikipedia.org/wiki/Threaded_binary_tree
C.
You could use a more efficient data structure, such as a btree. I used once in Java but not in PHP. You can try this script: http://www.phpclasses.org/browse/file/708.html, it is an implementation of btree.
If it is not enough, you can use Hadoop to implement a Map/Reduce pattern, as Michael said. I would not fork PHP process, it does not seem to help for performace.
Personally, I would use PHP as client and put everything in Hadoop. This tutorial might help: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html.
Another solution can be to use a Java implementation of Btree: http://jdbm.sourceforge.net/. JDBM is an object database using a Btree+ data astructures. Then you can search with PHP by exposing data with a web service or by accessing it directly with Quercus
Using web or CLI?
If you use web, you could intergrate that part in Quercus Then you could use the advantages of JAVA multithreading.
I don't actually know how reliable Quercus is though. I'd also suggest using a kind of message queue and refactoring the code, so it doesn't need the scope.
Maybe you could rebuild the code to a Map/Reduce pattern. You then can run the PHP code in Hadoop Then you can cluster the processing through a couple of machines.
I don't know if it's useful, but I came across another project, called Gearman. It's also used to cluster PHP processes. I guess you can combine that with a reduce script as well, if Hadoop is not the way you want to go.
pthreads
There is a rather new (since 2012) PHP extension available: pthreads. It can be installed via PECL.
Simple Implementation in PHP Code: extend from Thread Class. Add a run() method and execute the start() method.
<?php
// Example from http://www.phpgangsta.de/richtige-threads-in-php-einfach-erstellen-mit-pthreads
class AsyncOperation extends Thread
{
public function __construct($threadId)
{
$this->threadId = $threadId;
}
public function run()
{
printf("T %s: Sleeping 3sec\n", $this->threadId);
sleep(3);
printf("T %s: Hello World\n", $this->threadId);
}
}
$start = microtime(true);
for ($i = 1; $i <= 5; $i++) {
$t[$i] = new AsyncOperation($i);
$t[$i]->start();
}
echo microtime(true) - $start . "\n";
echo "end\n";
Outputs
>php pthreads.php
0.041301012039185
end
T 1: Sleeping 3sec
T 2: Sleeping 3sec
T 3: Sleeping 3sec
T 4: Sleeping 3sec
T 5: Sleeping 3sec
T 1: Hello World
T 2: Hello World
T 3: Hello World
T 4: Hello World
T 5: Hello World
Try this: PHPThreads
Code Example:
function threadproc($thread, $param) {
echo "\tI'm a PHPThread. In this example, I was given only one parameter: \"". print_r($param, true) ."\" to work with, but I can accept as many as you'd like!\n";
for ($i = 0; $i < 10; $i++) {
usleep(1000000);
echo "\tPHPThread working, very busy...\n";
}
return "I'm a return value!";
}
$thread_id = phpthread_create($thread, array(), "threadproc", null, array("123456"));
echo "I'm the main thread doing very important work!\n";
for ($n = 0; $n < 5; $n++) {
usleep(1000000);
echo "Main thread...working!\n";
}
echo "\nMain thread done working. Waiting on our PHPThread...\n";
phpthread_join($thread_id, $retval);
echo "\n\nOur PHPThread returned: " . print_r($retval, true) . "!\n";
Requires PHP extensions:
posix
pcntl
sockets
I've encountered the dreaded error-message, possibly through-painstaking effort, PHP has run out of memory:
Allowed memory size of #### bytes exhausted (tried to allocate #### bytes) in file.php on line 123
Increasing the limit
If you know what you're doing and want to increase the limit see memory_limit:
ini_set('memory_limit', '16M');
ini_set('memory_limit', -1); // no limit
Beware! You may only be solving the symptom and not the problem!
Diagnosing the leak:
The error message points to a line withing a loop that I believe to be leaking, or needlessly-accumulating, memory. I've printed memory_get_usage() statements at the end of each iteration and can see the number slowly grow until it reaches the limit:
foreach ($users as $user) {
$task = new Task;
$task->run($user);
unset($task); // Free the variable in an attempt to recover memory
print memory_get_usage(true); // increases over time
}
For the purposes of this question let's assume the worst spaghetti code imaginable is hiding in global-scope somewhere in $user or Task.
What tools, PHP tricks, or debugging voodoo can help me find and fix the problem?
PHP doesn't have a garbage collector. It uses reference counting to manage memory. Thus, the most common source of memory leaks are cyclic references and global variables. If you use a framework, you'll have a lot of code to trawl through to find it, I'm afraid. The simplest instrument is to selectively place calls to memory_get_usage and narrow it down to where the code leaks. You can also use xdebug to create a trace of the code. Run the code with execution traces and show_mem_delta.
Here's a trick we've used to identify which scripts are using the most memory on our server.
Save the following snippet in a file at, e.g., /usr/local/lib/php/strangecode_log_memory_usage.inc.php:
<?php
function strangecode_log_memory_usage()
{
$site = '' == getenv('SERVER_NAME') ? getenv('SCRIPT_FILENAME') : getenv('SERVER_NAME');
$url = $_SERVER['PHP_SELF'];
$current = memory_get_usage();
$peak = memory_get_peak_usage();
error_log("$site current: $current peak: $peak $url\n", 3, '/var/log/httpd/php_memory_log');
}
register_shutdown_function('strangecode_log_memory_usage');
Employ it by adding the following to httpd.conf:
php_admin_value auto_prepend_file /usr/local/lib/php/strangecode_log_memory_usage.inc.php
Then analyze the log file at /var/log/httpd/php_memory_log
You might need to touch /var/log/httpd/php_memory_log && chmod 666 /var/log/httpd/php_memory_log before your web user can write to the log file.
I noticed one time in an old script that PHP would maintain the "as" variable as in scope even after my foreach loop. For example,
foreach($users as $user){
$user->doSomething();
}
var_dump($user); // would output the data from the last $user
I'm not sure if future PHP versions fixed this or not since I've seen it. If this is the case, you could unset($user) after the doSomething() line to clear it from memory. YMMV.
There are several possible points of memory leaking in php:
php itself
php extension
php library you use
your php code
It is quite hard to find and fix the first 3 without deep reverse engineering or php source code knowledge. For the last one you can use binary search for memory leaking code with memory_get_usage
I recently ran into this problem on an application, under what I gather to be similar circumstances. A script that runs in PHP's cli that loops over many iterations. My script depends on several underlying libraries. I suspect a particular library is the cause and I spent several hours in vain trying to add appropriate destruct methods to it's classes to no avail. Faced with a lengthy conversion process to a different library (which could turn out to have the same problems) I came up with a crude work around for the problem in my case.
In my situation, on a linux cli, I was looping over a bunch of user records and for each one of them creating a new instance of several classes I created. I decided to try creating the new instances of the classes using PHP's exec method so that those process would run in a "new thread". Here is a really basic sample of what I am referring to:
foreach ($ids as $id) {
$lines=array();
exec("php ./path/to/my/classes.php $id", $lines);
foreach ($lines as $line) { echo $line."\n"; } //display some output
}
Obviously this approach has limitations, and one needs to be aware of the dangers of this, as it would be easy to create a rabbit job, however in some rare cases it might help get over a tough spot, until a better fix could be found, as in my case.
I came across the same problem, and my solution was to replace foreach with a regular for. I'm not sure about the specifics, but it seems like foreach creates a copy (or somehow a new reference) to the object. Using a regular for loop, you access the item directly.
I would suggest you check the php manual or add the gc_enable() function to collect the garbage... That is the memory leaks dont affect how your code runs.
PS: php has a garbage collector gc_enable() that takes no arguments.
I recently noticed that PHP 5.3 lambda functions leave extra memory used when they are removed.
for ($i = 0; $i < 1000; $i++)
{
//$log = new Log;
$log = function() { return new Log; };
//unset($log);
}
I'm not sure why, but it seems to take an extra 250 bytes each lambda even after the function is removed.
I didn't see it explicitly mentioned, but xdebug does a great job profiling time and memory (as of 2.6). You can take the information it generates and pass it off to a gui front end of your choice: webgrind (time only), kcachegrind, qcachegrind or others and it generates very useful call trees and graphs to let you find the sources of your various woes.
Example (of qcachegrind):
If what you say about PHP only doing GC after a function is true, you could wrap the loop's contents inside a function as a workaround/experiment.
One huge problem I had was by using create_function. Like in lambda functions, it leaves the generated temporary name in memory.
Another cause of memory leaks (in case of Zend Framework) is the Zend_Db_Profiler.
Make sure that is disabled if you run scripts under Zend Framework.
For example I had in my application.ini the folowing:
resources.db.profiler.enabled = true
resources.db.profiler.class = Zend_Db_Profiler_Firebug
Running approximately 25.000 queries + loads of processing before that, brought the memory to a nice 128Mb (My max memory limit).
By just setting:
resources.db.profiler.enabled = false
it was enough to keep it under 20 Mb
And this script was running in CLI, but it was instantiating the Zend_Application and running the Bootstrap, so it used the "development" config.
It really helped running the script with xDebug profiling
I'm a little late to this conversation but I'll share something pertinent to Zend Framework.
I had a memory leak problem after installing php 5.3.8 (using phpfarm) to work with a ZF app that was developed with php 5.2.9. I discovered that the memory leak was being triggered in Apache's httpd.conf file, in my virtual host definition, where it says SetEnv APPLICATION_ENV "development". After commenting this line out, the memory leaks stopped. I'm trying to come up with an inline workaround in my php script (mainly by defining it manually in the main index.php file).
I didn't see it mentioned here but one thing that might be helpful is using xdebug and xdebug_debug_zval('variableName') to see the refcount.
I can also provide an example of a php extension getting in the way: Zend Server's Z-Ray. If data collection is enabled it memory use will balloon on each iteration just as if garbage collection was off.