Does PHP have threading? - php

I found this PECL package called threads, but there is not a release yet. And nothing is coming up on the PHP website.

From the PHP manual for the pthreads extension:
pthreads is an Object Orientated API that allows user-land multi-threading in PHP. It includes all the tools you need to create multi-threaded applications targeted at the Web or the Console. PHP applications can create, read, write, execute and synchronize with Threads, Workers and Stackables.
As unbelievable as this sounds, it's entirely true. Today, PHP can multi-thread for those wishing to try it.
The first release of PHP4, 22 May 2000, PHP was shipped with a thread safe architecture - a way for it to execute multiple instances of it's interpreter in separate threads in multi-threaded SAPI ( Server API ) environments. Over the last 13 years, the design of this architecture has been maintained and advanced: It has been in production use on the worlds largest websites ever since.
Threading in user land was never a concern for the PHP team, and it remains as such today. You should understand that in the world where PHP does it's business, there's already a defined method of scaling - add hardware. Over the many years PHP has existed, hardware has got cheaper and cheaper and so this became less and less of a concern for the PHP team. While it was getting cheaper, it also got much more powerful; today, our mobile phones and tablets have dual and quad core architectures and plenty of RAM to go with it, our desktops and servers commonly have 8 or 16 cores, 16 and 32 gigabytes of RAM, though we may not always be able to have two within budget and having two desktops is rarely useful for most of us.
Additionally, PHP was written for the non-programmer, it is many hobbyists native tongue. The reason PHP is so easily adopted is because it is an easy language to learn and write. The reason PHP is so reliable today is because of the vast amount of work that goes into it's design, and every single decision made by the PHP group. It's reliability and sheer greatness keep it in the spot light, after all these years; where it's rivals have fallen to time or pressure.
Multi-threaded programming is not easy for most, even with the most coherent and reliable API, there are different things to think about, and many misconceptions. The PHP group do not wish for user land multi-threading to be a core feature, it has never been given serious attention - and rightly so. PHP should not be complex, for everyone.
All things considered, there are still benefits to be had from allowing PHP to utilize it's production ready and tested features to allow a means of making the most out of what we have, when adding more isn't always an option, and for a lot of tasks is never really needed.
pthreads achieves, for those wishing to explore it, an API that does allow a user to multi-thread PHP applications. It's API is very much a work in progress, and designated a beta level of stability and completeness.
It is common knowledge that some of the libraries PHP uses are not thread safe, it should be clear to the programmer that pthreads cannot change this, and does not attempt to try. However, any library that is thread safe is useable, as in any other thread safe setup of the interpreter.
pthreads utilizes Posix Threads ( even in Windows ), what the programmer creates are real threads of execution, but for those threads to be useful, they must be aware of PHP - able to execute user code, share variables and allow a useful means of communication ( synchronization ). So every thread is created with an instance of the interpreter, but by design, it's interpreter is isolated from all other instances of the interpreter - just like multi-threaded Server API environments. pthreads attempts to bridge the gap in a sane and safe way. Many of the concerns of the programmer of threads in C just aren't there for the programmer of pthreads, by design, pthreads is copy on read and copy on write ( RAM is cheap ), so no two instances ever manipulate the same physical data, but they can both affect data in another thread. The fact that PHP may use thread unsafe features in it's core programming is entirely irrelevant, user threads, and it's operations are completely safe.
Why copy on read and copy on write:
public function run() {
...
(1) $this->data = $data;
...
(2) $this->other = someOperation($this->data);
...
}
(3) echo preg_match($pattern, $replace, $thread->data);
(1) While a read, and write lock are held on the pthreads object data store, data is copied from its original location in memory to the object store. pthreads does not adjust the refcount of the variable, Zend is able to free the original data if there are no further references to it.
(2) The argument to someOperation references the object store, the original data stored, which it itself a copy of the result of (1), is copied again for the engine into a zval container, while this occurs a read lock is held on the object store, the lock is released and the engine can execute the function. When the zval is created, it has a refcount of 0, enabling the engine to free the copy on completion of the operation, because no other references to it exist.
(3) The last argument to preg_match references the data store, a read lock is obtained, the data set in (1) is copied to a zval, again with a refcount of 0. The lock is released, The call to preg_match operates on a copy of data, that is itself a copy of the original data.
Things to know:
The object store's hash table where data is stored, thread safe, is
based on the TsHashTable shipped with PHP, by Zend.
The object store has a read and write lock, an additional access lock is provided for the TsHashTable such that if requires ( and it does, var_dump/print_r, direct access to properties as the PHP engine wants to reference them ) pthreads can manipulate the TsHashTable outside of the defined API.
The locks are only held while the copying operations occur, when the copies have been made the locks are released, in a sensible order.
This means:
When a write occurs, not only are a read and write lock held, but an
additional access lock. The table itself is locked down, there is no
possible way another context can lock, read, write or affect it.
When a read occurs, not only is the read lock held, but the
additional access lock too, again the table is locked down.
No two contexts can physically nor concurrently access the same data from the object store, but writes made in any context with a reference will affect the data read in any context with a reference.
This is shared nothing architecture and the only way to exist is co-exist. Those a bit savvy will see that, there's a lot of copying going on here, and they will wonder if that is a good thing. Quite a lot of copying goes on within a dynamic runtime, that's the dynamics of a dynamic language. pthreads is implemented at the level of the object, because good control can be gained over one object, but methods - the code the programmer executes - have another context, free of locking and copies - the local method scope. The object scope in the case of a pthreads object should be treated as a way to share data among contexts, that is it's purpose. With this in mind you can adopt techniques to avoid locking the object store unless it's necessary, such as passing local scope variables to other methods in a threaded object rather than having them copy from the object store upon execution.
Most of the libraries and extensions available for PHP are thin wrappers around 3rd parties, PHP core functionality to a degree is the same thing. pthreads is not a thin wrapper around Posix Threads; it is a threading API based on Posix Threads. There is no point in implementing Threads in PHP that it's users do not understand or cannot use. There's no reason that a person with no knowledge of what a mutex is or does should not be able to take advantage of all that they have, both in terms of skill, and resources. An object functions like an object, but wherever two contexts would otherwise collide, pthreads provides stability and safety.
Anyone who has worked in java will see the similarities between a pthreads object and threading in java, those same people will have no doubt seen an error called ConcurrentModificationException - as it sounds an error raised by the java runtime if two threads write the same physical data concurrently. I understand why it exists, but it baffles me that with resources as cheap as they are, coupled with the fact the runtime is able to detect the concurrency at the exact and only time that safety could be achieved for the user, that it chooses to throw a possibly fatal error at runtime rather than manage the execution and access to the data.
No such stupid errors will be emitted by pthreads, the API is written to make threading as stable, and compatible as is possible, I believe.
Multi-threading isn't like using a new database, close attention should be paid to every word in the manual and examples shipped with pthreads.
Lastly, from the PHP manual:
pthreads was, and is, an experiment with pretty good results. Any of its limitations or features may change at any time; that is the nature of experimentation. It's limitations - often imposed by the implementation - exist for good reason; the aim of pthreads is to provide a useable solution to multi-tasking in PHP at any level. In the environment which pthreads executes, some restrictions and limitations are necessary in order to provide a stable environment.

Here is an example of what Wilco suggested:
$cmd = 'nohup nice -n 10 /usr/bin/php -c /path/to/php.ini -f /path/to/php/file.php action=generate var1_id=23 var2_id=35 gen_id=535 > /path/to/log/file.log & echo $!';
$pid = shell_exec($cmd);
Basically this executes the PHP script at the command line, but immediately returns the PID and then runs in the background. (The echo $! ensures nothing else is returned other than the PID.) This allows your PHP script to continue or quit if you want. When I have used this, I have redirected the user to another page, where every 5 to 60 seconds an AJAX call is made to check if the report is still running. (I have a table to store the gen_id and the user it's related to.) The check script runs the following:
exec('ps ' . $pid , $processState);
if (count($processState) < 2) {
// less than 2 rows in the ps, therefore report is complete
}
There is a short post on this technique here: http://nsaunders.wordpress.com/2007/01/12/running-a-background-process-in-php/

There is nothing available that I'm aware of. The next best thing would be to simply have one script execute another via CLI, but that's a bit rudimentary. Depending on what you are trying to do and how complex it is, this may or may not be an option.

In short: yes, there is multithreading in php but you should use multiprocessing instead.
Backgroud info: threads vs. processes
There is always a bit confusion about the distinction of threads and processes, so i'll shortly describe both:
A thread is a sequence of commands that the CPU will process. The only data it consists of is a program counter. Each CPU core will only process one thread at a time but can switch between the execution of different ones via scheduling.
A process is a set of shared resources. That means it consists of a part of memory, variables, object instances, file handles, mutexes, database connections and so on. Each process also contains one or more threads. All threads of the same process share its resources, so you may use a variable in one thread that you created in another. If those threads are parts of two different processes, then they cannot access each others resources directly. In this case you need inter-process communication through e.g. pipes, files, sockets...
Multiprocessing
You can achieve parallel computing by creating new processes (that also contain a new thread) with php. If your threads do not need much communication or synchronization, this is your choice, since the processes are isolated and cannot interfere with each other's work. Even if one crashes, that doesn't concern the others. If you do need much communication, you should read on at "multithreading" or - sadly - consider using another programming language, because inter-process communication and synchronization introduces a lot of complexion.
In php you have two ways to create a new process:
let the OS do it for you: you can tell your operation system to create a new process and run a new (or the same) php script in it.
for linux you can use the following or consider Darryl Hein's answer:
$cmd = 'nice php script.php 2>&1 & echo $!';
pclose(popen($cmd, 'r'));
for windows you may use this:
$cmd = 'start "processname" /MIN /belownormal cmd /c "script.php 2>&1"';
pclose(popen($cmd, 'r'));
do it yourself with a fork: php also provides the possibility to use forking through the function pcntl_fork(). A good tutorial on how to do this can be found here but i strongly recommend not to use it, since fork is a crime against humanity and especially against oop.
Multithreading
With multithreading all your threads share their resources so you can easily communicate between and synchronize them without a lot of overhead. On the other side you have to know what you are doing, since race conditions and deadlocks are easy to produce but very difficult to debug.
Standard php does not provide any multithreading but there is an (experimental) extension that actually does - pthreads. Its api documentation even made it into php.net.
With it you can do some stuff as you can in real programming languages :-) like this:
class MyThread extends Thread {
public function run(){
//do something time consuming
}
}
$t = new MyThread();
if($t->start()){
while($t->isRunning()){
echo ".";
usleep(100);
}
$t->join();
}
For linux there is an installation guide right here at stackoverflow's.
For windows there is one now:
First you need the thread-safe version of php.
You need the pre-compiled versions of both pthreads and its php extension. They can be downloaded here. Make sure that you download the version that is compatible with your php version.
Copy php_pthreads.dll (from the zip you just downloaded) into your php extension folder ([phpDirectory]/ext).
Copy pthreadVC2.dll into [phpDirectory] (the root folder - not the extension folder).
Edit [phpDirectory]/php.ini and insert the following line
extension=php_pthreads.dll
Test it with the script above with some sleep or something right there where the comment is.
And now the big BUT: Although this really works, php wasn't originally made for multithreading. There exists a thread-safe version of php and as of v5.4 it seems to be nearly bug-free but using php in a multi-threaded environment is still discouraged in the php manual (but maybe they just did not update their manual on this, yet). A much bigger problem might be that a lot of common extensions are not thread-safe. So you might get threads with this php extension but the functions you're depending on are still not thread-safe so you will probably encounter race conditions, deadlocks and so on in code you did not write yourself...

You can use pcntl_fork() to achieve something similar to threads. Technically it's separate processes, so the communication between the two is not as simple with threads, and I believe it will not work if PHP is called by apache.

If anyone cares, I have revived php_threading (not the same as threads, but similar) and I actually have it to the point where it works (somewhat) well!
Project page
Download (for Windows PHP 5.3 VC9 TS)
Examples
README

pcntl_fork() is what you are searching for, but its process forking not threading.
so you will have the problem of data exchange. to solve them you can use phps semaphore functions ( http://www.php.net/manual/de/ref.sem.php ) message queues may be a bit easier for the beginning than shared memory segments.
Anyways, a strategy i am using in a web framework that i am developing which loads resource intensive blocks of a web page (probably with external requests) parallel:
i am doing a job queue to know what data i am waiting for and then i fork off the jobs for every process. once done they store their data in the apc cache under a unique key the parent process can access. once every data is there it continues.
i am using simple usleep() to wait because inter process communication is not possible in apache (children will loose the connection to their parents and become zombies...).
so this brings me to the last thing:
its important to self kill every child!
there are as well classes that fork processes but keep data, i didn't examine them but zend framework has one, and they usually do slow but reliably code.
you can find it here:
http://zendframework.com/manual/1.9/en/zendx.console.process.unix.overview.html
i think they use shm segments!
well last but not least there is an error on this zend website, minor mistake in the example.
while ($process1->isRunning() && $process2->isRunning()) {
sleep(1);
}
should of course be:
while ($process1->isRunning() || $process2->isRunning()) {
sleep(1);
}

There is a Threading extension being activley developed based on PThreads that looks very promising at https://github.com/krakjoe/pthreads

Just an update, its seem that PHP guys are working on supporting thread and its available now.
Here is the link to it:
http://php.net/manual/en/book.pthreads.php

I have a PHP threading class that's been running flawlessly in a production environment for over two years now.
EDIT: This is now available as a composer library and as part of my MVC framework, Hazaar MVC.
See: https://git.hazaarlabs.com/hazaar/hazaar-thread

I know this is a way old question, but you could look at http://phpthreadlib.sourceforge.net/
Bi-directional communication, support for Win32, and no extensions required.

Ever heard about appserver from techdivision?
It is written in php and works as a appserver managing multithreads for high traffic php applications. Is still in beta but very promesing.

There is the rather obscure, and soon to be deprecated, feature called ticks. The only thing I have ever used it for, is to allow a script to capture SIGKILL (Ctrl+C) and close down gracefully.

Related

Does PHP's SyncMutex class named mutexes work across different processes on the same server?

BACKGROUND
I am using Apache and PHP to build a web application and I need to synchronize access to a region of shared memory. Since different instances of PHP have different process ID's, I am wondering if PHP's SyncMutex class with named mutexes can be used for this. Doing a Google search, I see quite a bit of information about using files as mutexes, but not much for the SyncMutex class. Not even the manual has much more information beyond the definition of the class and a few examples on how to use it.
QUESTION
Will a SyncMutex class named mutex in one process be visible in a different process?
RESEARCH
https://helperbyte.com/questions/470561/how-to-prevent-simultaneous-running-of-a-php-script
PHP mutual exclusion (mutex)
PHP rewrite an included file - is this a valid script?
Most reliable & safe method of preventing race conditions in PHP
FINALLY
The data involved is VERY transient and can change on a moment's notice (think volatile keyword in C). The data becomes useless after 30 seconds or so, and is purged periodically. For performance reasons, I'm storing it in the server's shared memory where the access is much faster than writing to a file or a database. Is this a valid use case or am I barking up the wrong tree? Should I use something else? Like a semaphore?
ADDITIONAL INFORMATION (EDIT 8/25/2021)
Upon further research, I have discovered that the pthreads module for PHP has been depreciated by the module owner several months ago and is no longer maintained. PHP is now using something called parallel to facilitate multithreading. Because of the design of the setup that I'm using, parallel is not compatible with what I am trying to do. So it looks like that I will have to use MySQL to handle this after all for right now.
My idea of using a memory server is still viable, but the server needs to be written in a language other than PHP due to the change in PHP's multithreading architecture. That will be done using C++, but not right now.
Thank you to everyone who responded.
According to the user supplied example below the documentation page for SyncMutex::unlock, SyncMutex lock created in one process is visible in another.
Additionally SyncSharedMemory class description says it explicitly:
Shared memory lets two separate processes communicate without the need
for complex pipes or sockets. [...] Synchronization objects (e.g. SyncMutex) are still required to protect most uses of shared memory.

Working with Threads using Apache and PHP

Is there anyway to work with threads in PHP via Apache (using a browser) on Linux/Windows?
The mere fact that it is possible to do something, says nothing whatever about whether it is appropriate.
The facts are, that the threading model used by pthreads+PHP is 1:1, that is to say one user thread to one kernel thread.
To deploy this model at the frontend of a web application inside of apache doesn't really make sense; if a front end controller instructs the hardware to create even a small number of threads, for example 8, and 100 clients request the controller at the same time, you will be asking your hardware to execute 800 threads.
pthreads can be deployed inside of apache, but it shouldn't be. What you should do is attempt to isolate those parts of your application that require what threading provides and communicate with the isolated multi-threading subsystem via some sane form of RPC.
I wrote pthreads, please listen.
Highly discouraged.
The pcntl_fork function, if allowed at all in your setup, will fork the Apache worker itself, rather than the script, and most likely you won't be able to claim the child process after it's finished.
This leads to many zombie Apache processes.
I recommend using a background worker pool, properly running as a daemon/service or at least properly detached from a launching console (using screen for example), and your synchronous PHP/Apache script would push job requests to this pool, using a socket.
Does it help?
[Edit] I would have offered the above as a commment, but I did not have enough reputation to do so (I find it weird btw, to not be able to comment because you're too junior).
[Edit2] pthread seems a valid solution! (which I have not tried so I can't advise)
The idea of "thread safe" can be very broad. However PHP is on the very, furthest end of the spectrum. Yes, PHP is threadsafe, but the driving motivation and design goals focus on keeping the PHP VM safe to operate in threaded server environments, not providing thread safe features to PHP userspace. A huge amount of sites use PHP, one request at a time. Many of these sites change very slowly - for example statistics say more sites serve pages without Javascript still than sites that use Node.js on the server. A lot of us geeks like the idea of threads in PHP, but the consumers don't really care. Even with multiple packages that have proved it's entirely possible, it will likely be a long time before anything but very experimental threads exist in PHP.
Each of these examples (pthreads, pht, and now parallel) worked awesome and actually did what they were designed to do - as long as you use very vanilla PHP. Once you start using more dynamic features, and practically any other extensions, you find that PHP has a long way to go.

difference between Threads, Workers, Mutex, Stackable?

After some effort, I managed to set up a PHP multithreaded server that receives input from multiple WebSockets and passes it on to a single TCP socket.
I accomplished this with pthreads and managed to use the Stackable object as an array.
The next step now is to really understand what I'm doing and most definitely what I can do better. I read somewhere here on SO that multithreading is not something you just 'do'. For that reason I like to dig deeper first before I move on coding.
Brings me to these questions:
What is meant by A Worker Thread has a persistent state (pthreads.org). Will a Worker continue running when out of scope? What is their relation to Stackables?
The PHP manual says Worker threads should be used over Threads, why?
I hope my question is not to broad.
A Worker Thread has a persistent state: when you start a worker thread, the worker is available for execution until the variable goes out of scope or the worker is shutdown.
Take the example of a script that downloads multiple pages from multiple sources online: on a tiny scale, you might get away with, and it might be the best design to have multiple threads, each thread dealing with a single request and processing the content. However, normally, you should consider it a huge waste to start a thread to make a single request, or execute anything very simple, instead you would start as many workers as your hardware and environment can manage, and implement the request and processing of content as a Stackable. When using Workers, each request does not cost you the initialization of a threading context. Persistence in this context refers to the fact that each of the Stackables can read $this->worker, and access all/any of the global (including class level static variable) scopes, so from the perspective of the Stackable during execution the context of the Worker persists, in reality the Stackables are simply sharing a context.
Mutex and Cond are there for hardcore posix threading addicts, and for my purposes during development of pthreads itself. Normal users shouldn't really have to use this functionality, though it's as well documented as the counterpart posix functions, and much of the original documentation still applies (with a layer of sensible on top). Mutex and Cond are both thin wrappers around familiar functionality, and not really aimed at Joe Bloggs.
I guess you are wondering if mutex are required in order to have safe reading and writing of objects, this is not the case, safety is built into pthreads.
Times you should use a Thread over a Worker would be; from the example, on the tiny scale. Additionally, if you have the hardware to support it, Threads can create Workers (and Threads), which lends itself to more complex systems design possibilities. You might also turn out to be awesome at this, in which case you can program your threads to act like workers without the overhead of object initialization (stackables) for every task. A Worker is a Thread, only some of the Thread's functionality has been hi-jacked to execute a list of Stackables, you will notice subtle differences in the synchronization methods of Thread and Worker, which doesn't matter if you are using workers as intended. If however your awesomeness leads you to think it'd be better to avoid the standard model, then you will require the synchronization from a Thread object to implement a Worker pattern in PHP.
The fact is, that if you are looking at threading, you are wanting to do more than 2 or 3 things at a time, the chances are that you could have found ways to do a few things. So providing Worker/Stackable functionality as part of the pthreads package makes sense, because it's better suited to executing a bunch of things in separate contexts, writing this stuff in PHP (even with the help of pthreads) would be tricky.

Why does PHP use opcode caches while Java compiles to bytecode files?

From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.
Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.
The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.
Another approach, done by Facebook, is to completely compile the PHP code to a different language.
So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.
The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.
PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.
PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.
Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.
Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.
It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)
PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.
It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.
Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.
But the classes make this even more complicated, which means that there are potential code fragments like
if(<conditon>)
{
class XYZ() { }
}
else
{
class XYZ() { }
}
class ABC extends XYZ {}
Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.
The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.
But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)
PS: I'm the guy who can't find time to update APC for 5.4 properly
I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).

Parallel processing in PHP - How do you do it?

I am currently trying to implement a job queue in php. The queue will then be processed as a batch job and should be able to process some jobs in parallel.
I already did some research and found several ways to implement it, but I am not really aware of their advantages and disadvantages.
E.g. doing the parallel processing by calling a script several times through fsockopen like explained here:
Easy parallel processing in PHP
Another way I found was using the curl_multi functions.
curl_multi_exec PHP docs
But I think those 2 ways will add pretty much overhead for creating batch processing on a queue that should mainly run on the background?
I also read about pcntl_fork which also seems to be a way to handle the problem. But that looks like it can get really messy if you don't really know what you are doing (like me at the moment).
I also had a look at Gearman, but there I would also need to spawn the worker threads dynamically as needed and not just run a few and let the gearman job server then sent it to the free workers. Especially because the threads should be exit cleanly after one job has been executed, to not run into eventual memory leaks (code may not be perfect in that issue).
Gearman Getting Started
So my question is, how do you handle parallel processing in PHP? And why do you choose your method, which advantages/disadvantages may the different methods have?
i use exec(). Its easy and clean. You basically need to build a thread manager, and thread scripts, that will do what you need.
I dont like fsockopen() because it will open a server connection, that will build up and may hit the apache's connection limit
I dont like curl functions for the same reason
I dont like pnctl because it needs the pnctl extension available, and you have to keep track of parent/child relations.
never played with gearman...
Well I guess we have 3 options there:
A. Multi-Thread:
PHP does not support multithread natively.
But there is one PHP extension (experimental) called pthreads (https://github.com/krakjoe/pthreads) that allows you to do just that.
B. Multi-Process:
This can be done in 3 ways:
Forking
Executing Commands
Piping
C. Distributed Parallel Processing:
How it works:
The Client App sends data (AKA message) “can be JSON formatted” to the Engine (MQ Engine) “can be local or external a web service”
The MQ Engine stores the data “mostly in Memory and optionally in Database” inside a queues (you can define the queue name)
The Client App asks the MQ Engine for a data (message) to be processed them in order (FIFO or based on priority) “you can also request data from specific queue".
Some MQ Engines:
ZeroMQ (good option, hard to use)
a message orientated IPC Library, is a Message Queue Server in Erlang, stores jobs in memory. It is a socket library that acts as a concurrency framework. Faster than TCP for clustered products and supercomputing.
RabbitMQ (good option, easy to use)
self hosted, Enterprise Message Queues, Not really a work queue - but rather a message queue that can be used as a work queue but requires additional semantics.
Beanstalkd (best option, easy to use)
(Laravel built in support, built by facebook, for work queue) - has a "Beanstalkd console" tool which is very nice
Gearman
(problem: centralized broker system for distributed processing)
Apache ActiveMQ
the most popular open source message broker in Java, (problem: lot of bugs and problems)
Amazon SQS
(Laravel built in support, Hosted - so no administration is required. Not really a work queue thus will require extra work to handle semantics such as burying a job)
IronMQ
(Laravel built in support, Written in Go, Available both as cloud version and on-premise)
Redis
(Laravel built in support, not that fast as its not designed for that)
Sparrow
(written in Ruby that based on memcache)
Starling
(written in Ruby that based on memcache, built in twitter)
Kestrel
(just another QM)
Kafka
(Written at LinkedIn in Scala)
EagleMQ
open source, high-performance and lightweight queue manager (Written in C)
More of them can be foun here: http://queues.io
If your application is going to run under a unix/linux enviroment I would suggest you go with the forking option. It's basically childs play to get it working. I have used it for a Cron manager and had code for it to revert to a Windows friendly codepath if forking was not an option.
The options of running the entire script several times do, as you state, add quite a bit of overhead. If your script is small it might not be a problem. But you will probably get used to doing parallel processing in PHP by the way you choose to go. And next time when you have a job that uses 200mb of data it might very well be a problem. So you'd be better of learning a way that you can stick with.
I have also tested Gearman and I like it a lot. There are a few thing to think about but as a whole it offers a very good way to distribute works to different servers running different applications written in different languages. Besides setting it up, actually using it from within PHP, or any other language for that matter, is... once again... childs play.
It could very well be overkill for what you need to do. But it will open your eyes to new possibilities when it comes to handling data and jobs, so I would recommend you to try Gearman for that fact alone.
Here's a summary of a few options for parallel processing in PHP.
AMP
Checkout Amp - Asynchronous concurrency made simple - this looks to be the most mature PHP library I've seen for parallel processing.
Peec's Process Class
This class was posted in the comments of PHP's exec() function and provides a real simple starting point for forking new processes and keeping track of them.
Example:
// You may use status(), start(), and stop(). notice that start() method gets called automatically one time.
$process = new Process('ls -al');
// or if you got the pid, however here only the status() metod will work.
$process = new Process();
$process.setPid(my_pid);
// Then you can start/stop/check status of the job.
$process.stop();
$process.start();
if ($process.status()) {
echo "The process is currently running";
} else {
echo "The process is not running.";
}
Other Options Compared
There's also a great article Async processing or multitasking in PHP that explains the pros and cons of various approaches:
pthreads extension (see also this SitePoint article)
Amp\Thread Library
hack's async (requires running Facebook's HHVM)
pcntl_fork
popen
fopen/curl/fsockopen
Doorman
Then, there's also this simple tutorial which was wrapped up into a little library called Doorman.
Hope these links provide a useful starting point for more research.
First of all, this answer is based on the linux OS env.
Yet another pecl extension is parallel,you can install it by issuing pecl install parallel,but it has some prerequisities:
Installing ZTS(Zend Thread safety) Build PHP 7.2+ version
if you build this extension by source, you should check the php.ini like config file,then add extension=parallel.so to it
then see the full example gist :https://gist.github.com/krakjoe/0ee02b887288720d9b785c9f947f3a0a
or the php official site url:https://www.php.net/manual/en/book.parallel.php
Use native PHP (7.2+) Parallel , i.e.:
use \parallel\Runtime;
$sampleFunc = function($num, $param2, $param3) {
echo "[Start: $num]";
sleep(rand(1,3) );
echo "[end:$num]";
};
for($i = 0; $i < 11; $i++) {
\parallel\run($sampleFunc, [$param1=$i, $param2=null, $param3="blabla"] );
}
for ($i = 0; $i < 11; $i++) {
echo " <REGULAR_CODE> ";
sleep(1);
}
(BTW, you will need to go through hard path to install PHP with ZTS support, and then enable parallel. I recommend phpbrew to do that.)
I prefer exec() and gearman.
exec() is easy and no connection and less memory consuming.
gearman should need a socket connection and the worker should take some memory.
But gearman is more flexible and faster than exec(). And the most important is that it can deploy the worker in other server. If the work is time and resource consuming.
I'm using gearman in my current project.
I use PHP's pnctl - it is good as long as you know what you do. I understand you situation but I don't think it's something difficult to understand our code, we just have to be little more conscious than ever when implementing JOB queue or Parallel process.
I feel as long as you code it perfectly and make sure the flow is perfect off-course you should keep PARALLEL PROCESS in mind when you implement.
Where you could do mistakes:
Loops - should be able to handle by GLOBAL vars.
Processing some set of transactions - again as long as you define the sets proper, you should be able to get it done.
Take a look at this example - https://github.com/rakesh-sankar/Tools/blob/master/PHP/fork-parallel-process.php.
Hope it helps.
The method described in 'Easy parallel processing in PHP' is downright scary - the principle is OK - but the implementation??? As you've already pointed out the curl_multi_ fns provide a much better way of implementing this approach.
But I think those 2 ways will add pretty much overhead
Yes, you probably don't need a client and server HTTP stack for handing off the job - but unless you're working for Google, your development time is much more expensive than your hardware costs - and there are plenty of tools for managing HTTP/analysing performance - and there is a defined standard covering stuff such as status notifications and authentication.
A lot of how you implement the solution depends on the level transactional integrity you require and whether you require in-order processing.
Out of the approaches you mention I'd recommend focussing on the HTTP request method using curl_multi_ . But if you need good transactional control / in order delivery then you should definitely run a broker daemon between the source of the messages and the processing agents (there is a well written single threaded server suitable for use as a framework for the broker here). Note that the processing agents should process a single message at a time.
If you need a highly scalable solution, then take a look at a proper message queuing system such as RabbitMQ.
HTH
C.

Categories