I'm new to PHP. I am familiar with ASP.NET which support asynchronous programming. That is, if one request needs to do some I/O job. It is suggested to program the web page with BeginProcess/EndProcess way. The asynchronous programming is key to improve scalability.
I'm wondering whether there is counterpart of asynchronous programming(BeginXXXX/EndXXXX) in PHP world.
In .NET BeginXXX/EndXXX paradigm relies heavily on threading, while on PHP I am not sure that you could even start a new thread (except maybe the PECL package).
FastCGI is the alternative to multithreading in most interpreted languages. Instead of spawning new threads it uses processes, but as spawning a new process is expensive, it keeps a reusable process pool just as the ThreadPool in .NET.
If the I/O is performed with sockets or files you should use stream_socket_select() or stream_select() respectively (similar to system calls in C/C++).
Here's a simple command line chat tutorial done with PHP:
Simple PHP socket-based terminal chat
Note: This is not a general multi-threading solution, but a simple solution for situations where you need "semi-parallel" I/O
The core has a set of process control functions, including the ability to fork a process.
I don't know that I'd use these in a web script, but have used them in command line scripts before.
http://www.php.net/manual/en/book.pcntl.php
http://www.php.net/manual/en/pcntl.example.php
Here's an interesting link on the subject of PHP multiplexing with PHP4 and PHP5 samples:
http://netevil.org/blog/2005/may/guru-multiplexing
PHP doesn't, but you could use AJAX once the page has loaded, which will allow asynchronous requests.
Honestly though, there is no point. If you really want that heavyweight of a back end, you're better off writing a separate program that does the heavy lifting. PHP modules are written in pure C as far as I'm aware, so you should be able to use that and then call your own custom function from PHP.
Using stream_select you can create child processes via a HTTP request. Checkout the code in http://drupal.org/project/httprl for some ideas on how to do this. I plan on pushing this library to github once I get it more polished; something that can be ran outside of drupal. But for now it lives in Drupal land.
Related
I recently read about http://php.net/pcntl and was woundering how good that functions works and if it would be smart to use multithreading in PHP since it isn't a core function of PHP.
I would want to trigger events that don't require feedback through it like fireing a cronjob execution manually.
All of it is supposed to run in a web app written with Zend Framework
The pcntl package works quite fine - it just uses the according unix functions. The only shortage is that you can't use them if php is invoked from a web server context. i.e. you can use it in shell scripts, but not on web pages - at least not without using a hack like calling a forking script with exec or similar.
[edit]
I just found a page explaining why mod_php cannot fork. Basically it's a security issue.
[/edit]
This is not thread control, this is process control. The library for the threads is pthreads (POSIX threads) and it's not included in PHP, so there are no multi-threading functions in PHP.
As of multiprocessing, you cannot use that in mod_php, as that would be a giant security hole (spawned process would have all the web-server's privileges).
The only possible way to have php code executing in multiple threads is to run php as a module of a threaded web server, which is useless because threads are fully isolated and your code has no control over them. As far as i know, pcntl only manages processes, not threads.
If I needed to do manual crontab executions or the like from PHP, I'd probably use a queue. Have a database table that you append jobs to. Another process, either from a cron or running as a daemon, executes the jobs as they show up.
Another way to do it is to set up a separate script and do an HTTP GET to it. It's not quite threading, but it's one way of shelling to another command in PHP.
For example, if I wanted to run /usr/bin/somescript.sh on demand, I'd have a somescript.php that did a system call. This would be on a virtual host only accessible from localhost.
I'd do a socket call to the webserver and GET the script. The key is to not read on the socket so it doesn't block. If I wanted to check the return value of somescript.php, I'd do it later in my main script to prevent blocking.
If somescript.php takes a long time to execute (longer than the calling script), you'll have to do some magic to stop apache from killing the script when the socket is closed.
Multiplatform PHP Multithreading engine
http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine
Examples of multithreading working in PHP (with excerpts from their project pages):
Cron Multi-Threaded.
As of October 25th, 2011, this module has reached "end of life" and is deprecated in favor of projects such as Elysia Cron. This module wasn't completely useless in that a core patch inspired by Cron MT was committed to D7.
Boost.
... provides static page caching for Drupal enabling a very significant performance and scalability boost for sites that receive mostly anonymous traffic. For shared hosting this is your best option in terms of improving performance. On dedicated servers, you may want to consider Varnish instead.
I'm working on an embedded system that is programmed with PHP 4.4.9 - unfortunately without the PCNTL extension.
I need to create a script that runs in the background as a daemon. You'd usually do this using fork(), or in the PHP case, pcntl_fork() - but this function is not available. A shell is also missing, so I can't use the standard tools.
So, what other ways are there to cleanly start a process in the background?
As kingCrunch says, you really should upgrade.
Firstly, there's more to making a daemon than just calling pcntl_fork(). You might want to read the Unix programming FAQ and the Unix socket FAQ.
Next, you've not mentioned how you intend to solve the problem of concurrency - while forking is one solution to this it is not the only reason for using fork() in a daemon.
So you've really got 2 problems to solve, first how you daemonize the program then how you handle concurrency.
Note that one approach to the latter which obviates the former is to run the server from [x]inetd.
Another approach to solving the concurrency problem is to run a single threaded server and use socket_select (or stream_select) to multiplex the connections - but I'm not sure how well that is supported in PHP 4 - there is a good example here.
A simple solution would be to write a simple wrapper program in C using daemon() to bootstrap the program. Or you could start it up directly from inittab. Or for a solution with complex management facilities have a look at DJB's daemontools
I am currently trying to implement a job queue in php. The queue will then be processed as a batch job and should be able to process some jobs in parallel.
I already did some research and found several ways to implement it, but I am not really aware of their advantages and disadvantages.
E.g. doing the parallel processing by calling a script several times through fsockopen like explained here:
Easy parallel processing in PHP
Another way I found was using the curl_multi functions.
curl_multi_exec PHP docs
But I think those 2 ways will add pretty much overhead for creating batch processing on a queue that should mainly run on the background?
I also read about pcntl_fork which also seems to be a way to handle the problem. But that looks like it can get really messy if you don't really know what you are doing (like me at the moment).
I also had a look at Gearman, but there I would also need to spawn the worker threads dynamically as needed and not just run a few and let the gearman job server then sent it to the free workers. Especially because the threads should be exit cleanly after one job has been executed, to not run into eventual memory leaks (code may not be perfect in that issue).
Gearman Getting Started
So my question is, how do you handle parallel processing in PHP? And why do you choose your method, which advantages/disadvantages may the different methods have?
i use exec(). Its easy and clean. You basically need to build a thread manager, and thread scripts, that will do what you need.
I dont like fsockopen() because it will open a server connection, that will build up and may hit the apache's connection limit
I dont like curl functions for the same reason
I dont like pnctl because it needs the pnctl extension available, and you have to keep track of parent/child relations.
never played with gearman...
Well I guess we have 3 options there:
A. Multi-Thread:
PHP does not support multithread natively.
But there is one PHP extension (experimental) called pthreads (https://github.com/krakjoe/pthreads) that allows you to do just that.
B. Multi-Process:
This can be done in 3 ways:
Forking
Executing Commands
Piping
C. Distributed Parallel Processing:
How it works:
The Client App sends data (AKA message) “can be JSON formatted” to the Engine (MQ Engine) “can be local or external a web service”
The MQ Engine stores the data “mostly in Memory and optionally in Database” inside a queues (you can define the queue name)
The Client App asks the MQ Engine for a data (message) to be processed them in order (FIFO or based on priority) “you can also request data from specific queue".
Some MQ Engines:
ZeroMQ (good option, hard to use)
a message orientated IPC Library, is a Message Queue Server in Erlang, stores jobs in memory. It is a socket library that acts as a concurrency framework. Faster than TCP for clustered products and supercomputing.
RabbitMQ (good option, easy to use)
self hosted, Enterprise Message Queues, Not really a work queue - but rather a message queue that can be used as a work queue but requires additional semantics.
Beanstalkd (best option, easy to use)
(Laravel built in support, built by facebook, for work queue) - has a "Beanstalkd console" tool which is very nice
Gearman
(problem: centralized broker system for distributed processing)
Apache ActiveMQ
the most popular open source message broker in Java, (problem: lot of bugs and problems)
Amazon SQS
(Laravel built in support, Hosted - so no administration is required. Not really a work queue thus will require extra work to handle semantics such as burying a job)
IronMQ
(Laravel built in support, Written in Go, Available both as cloud version and on-premise)
Redis
(Laravel built in support, not that fast as its not designed for that)
Sparrow
(written in Ruby that based on memcache)
Starling
(written in Ruby that based on memcache, built in twitter)
Kestrel
(just another QM)
Kafka
(Written at LinkedIn in Scala)
EagleMQ
open source, high-performance and lightweight queue manager (Written in C)
More of them can be foun here: http://queues.io
If your application is going to run under a unix/linux enviroment I would suggest you go with the forking option. It's basically childs play to get it working. I have used it for a Cron manager and had code for it to revert to a Windows friendly codepath if forking was not an option.
The options of running the entire script several times do, as you state, add quite a bit of overhead. If your script is small it might not be a problem. But you will probably get used to doing parallel processing in PHP by the way you choose to go. And next time when you have a job that uses 200mb of data it might very well be a problem. So you'd be better of learning a way that you can stick with.
I have also tested Gearman and I like it a lot. There are a few thing to think about but as a whole it offers a very good way to distribute works to different servers running different applications written in different languages. Besides setting it up, actually using it from within PHP, or any other language for that matter, is... once again... childs play.
It could very well be overkill for what you need to do. But it will open your eyes to new possibilities when it comes to handling data and jobs, so I would recommend you to try Gearman for that fact alone.
Here's a summary of a few options for parallel processing in PHP.
AMP
Checkout Amp - Asynchronous concurrency made simple - this looks to be the most mature PHP library I've seen for parallel processing.
Peec's Process Class
This class was posted in the comments of PHP's exec() function and provides a real simple starting point for forking new processes and keeping track of them.
Example:
// You may use status(), start(), and stop(). notice that start() method gets called automatically one time.
$process = new Process('ls -al');
// or if you got the pid, however here only the status() metod will work.
$process = new Process();
$process.setPid(my_pid);
// Then you can start/stop/check status of the job.
$process.stop();
$process.start();
if ($process.status()) {
echo "The process is currently running";
} else {
echo "The process is not running.";
}
Other Options Compared
There's also a great article Async processing or multitasking in PHP that explains the pros and cons of various approaches:
pthreads extension (see also this SitePoint article)
Amp\Thread Library
hack's async (requires running Facebook's HHVM)
pcntl_fork
popen
fopen/curl/fsockopen
Doorman
Then, there's also this simple tutorial which was wrapped up into a little library called Doorman.
Hope these links provide a useful starting point for more research.
First of all, this answer is based on the linux OS env.
Yet another pecl extension is parallel,you can install it by issuing pecl install parallel,but it has some prerequisities:
Installing ZTS(Zend Thread safety) Build PHP 7.2+ version
if you build this extension by source, you should check the php.ini like config file,then add extension=parallel.so to it
then see the full example gist :https://gist.github.com/krakjoe/0ee02b887288720d9b785c9f947f3a0a
or the php official site url:https://www.php.net/manual/en/book.parallel.php
Use native PHP (7.2+) Parallel , i.e.:
use \parallel\Runtime;
$sampleFunc = function($num, $param2, $param3) {
echo "[Start: $num]";
sleep(rand(1,3) );
echo "[end:$num]";
};
for($i = 0; $i < 11; $i++) {
\parallel\run($sampleFunc, [$param1=$i, $param2=null, $param3="blabla"] );
}
for ($i = 0; $i < 11; $i++) {
echo " <REGULAR_CODE> ";
sleep(1);
}
(BTW, you will need to go through hard path to install PHP with ZTS support, and then enable parallel. I recommend phpbrew to do that.)
I prefer exec() and gearman.
exec() is easy and no connection and less memory consuming.
gearman should need a socket connection and the worker should take some memory.
But gearman is more flexible and faster than exec(). And the most important is that it can deploy the worker in other server. If the work is time and resource consuming.
I'm using gearman in my current project.
I use PHP's pnctl - it is good as long as you know what you do. I understand you situation but I don't think it's something difficult to understand our code, we just have to be little more conscious than ever when implementing JOB queue or Parallel process.
I feel as long as you code it perfectly and make sure the flow is perfect off-course you should keep PARALLEL PROCESS in mind when you implement.
Where you could do mistakes:
Loops - should be able to handle by GLOBAL vars.
Processing some set of transactions - again as long as you define the sets proper, you should be able to get it done.
Take a look at this example - https://github.com/rakesh-sankar/Tools/blob/master/PHP/fork-parallel-process.php.
Hope it helps.
The method described in 'Easy parallel processing in PHP' is downright scary - the principle is OK - but the implementation??? As you've already pointed out the curl_multi_ fns provide a much better way of implementing this approach.
But I think those 2 ways will add pretty much overhead
Yes, you probably don't need a client and server HTTP stack for handing off the job - but unless you're working for Google, your development time is much more expensive than your hardware costs - and there are plenty of tools for managing HTTP/analysing performance - and there is a defined standard covering stuff such as status notifications and authentication.
A lot of how you implement the solution depends on the level transactional integrity you require and whether you require in-order processing.
Out of the approaches you mention I'd recommend focussing on the HTTP request method using curl_multi_ . But if you need good transactional control / in order delivery then you should definitely run a broker daemon between the source of the messages and the processing agents (there is a well written single threaded server suitable for use as a framework for the broker here). Note that the processing agents should process a single message at a time.
If you need a highly scalable solution, then take a look at a proper message queuing system such as RabbitMQ.
HTH
C.
I realize it's probably something strange, but here is what I have.
I have an application (handwriting recognition engine) written in C/C++. This application has Perl wrapper which was made by application's authors using SWIG. My website is written in PHP, so I'm looking for some ways to make PHP work with C/C++ application.
The only way I can think of now is to create a CGI script (perl script) which accepts POST request from my website (AJAX request), sends it to the recognition engine through it's Perl wrapper, gets the required data and returns the required data as a response to AJAX request.
Do you think it could be done this way? Are there any better solutions?
Thank you!
Do you think it could be done this way?
Yes, no reason it can't be done.
Are there any better solutions?
May be. If you intend to execute the perl wrapper as a system call to a separate Perl script, you don't need a separate CGI perl script. You can just do system calls from PHP in your site directly. Not a big difference but might help if PHP is more of your comfort zone for web stuff than Perl's CGI
OTOH, if the Perl script wrapper is a fairly obvious and simple set of API calls, and you feel comfortable with Perl CGI, a better soltrion is to port that command line Perl script into the Perl CGI script which uses the API internally, bypassing system calls.
For high-volume stuff, removing system calls is a Big Win performance wise, plus allows for much better and easier error handling.
What you are proposing is:
web client <-> Perl CGI script <-> Perl wrapper <-> C program
There is nothing particularly wrong with this approach, although it's clearly not the most efficient possible way to do it. How important is performance? If it's doesn't have to be amazingly fast, sure do it this way, which sounds like it's the easiest to develop.
If you want to go a step further, then the obvious point of optimization in the schematic above is to collapse the two Perl layers:
web client <-> Perl CGI script <-> C program
The question is, is it worth your time to do this? You can look at the source for the Perl wrapper and decide for yourself.
My advice would be to initially develop it the simple way and then if, for whatever reason, you decide it is insufficient, to merge the two Perl scripts into one. But for now, don't worry about it and go ahead with your idea.
I recently read about http://php.net/pcntl and was woundering how good that functions works and if it would be smart to use multithreading in PHP since it isn't a core function of PHP.
I would want to trigger events that don't require feedback through it like fireing a cronjob execution manually.
All of it is supposed to run in a web app written with Zend Framework
The pcntl package works quite fine - it just uses the according unix functions. The only shortage is that you can't use them if php is invoked from a web server context. i.e. you can use it in shell scripts, but not on web pages - at least not without using a hack like calling a forking script with exec or similar.
[edit]
I just found a page explaining why mod_php cannot fork. Basically it's a security issue.
[/edit]
This is not thread control, this is process control. The library for the threads is pthreads (POSIX threads) and it's not included in PHP, so there are no multi-threading functions in PHP.
As of multiprocessing, you cannot use that in mod_php, as that would be a giant security hole (spawned process would have all the web-server's privileges).
The only possible way to have php code executing in multiple threads is to run php as a module of a threaded web server, which is useless because threads are fully isolated and your code has no control over them. As far as i know, pcntl only manages processes, not threads.
If I needed to do manual crontab executions or the like from PHP, I'd probably use a queue. Have a database table that you append jobs to. Another process, either from a cron or running as a daemon, executes the jobs as they show up.
Another way to do it is to set up a separate script and do an HTTP GET to it. It's not quite threading, but it's one way of shelling to another command in PHP.
For example, if I wanted to run /usr/bin/somescript.sh on demand, I'd have a somescript.php that did a system call. This would be on a virtual host only accessible from localhost.
I'd do a socket call to the webserver and GET the script. The key is to not read on the socket so it doesn't block. If I wanted to check the return value of somescript.php, I'd do it later in my main script to prevent blocking.
If somescript.php takes a long time to execute (longer than the calling script), you'll have to do some magic to stop apache from killing the script when the socket is closed.
Multiplatform PHP Multithreading engine
http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine
Examples of multithreading working in PHP (with excerpts from their project pages):
Cron Multi-Threaded.
As of October 25th, 2011, this module has reached "end of life" and is deprecated in favor of projects such as Elysia Cron. This module wasn't completely useless in that a core patch inspired by Cron MT was committed to D7.
Boost.
... provides static page caching for Drupal enabling a very significant performance and scalability boost for sites that receive mostly anonymous traffic. For shared hosting this is your best option in terms of improving performance. On dedicated servers, you may want to consider Varnish instead.