I am not pretty familiar with PHP threads, as long as I am searching for options for using threads in PHP, the most suitable tool I can find is pthreads. Though it is very convenient to use, it requires ZTS and it is clearly mentioned in the documents that this tool cannot be used in a web server environment.
Warning The pthreads extension cannot be used in a web server
environment. Threading in PHP is therefore restricted to CLI-based
applications only.
So I was wondering what is the best way to use threads or multi threads in an web server environment in PHP.
Yaba daba don't.
Why not? Because other languages do threads better.
What you want to do can be done with a worker that consumes events from a queue. This is how we do things in the PHP world.
Essentially you run another PHP process somewhere else (from a cron job, for example) that does your background processing. The web-related (fpm) workers should be as light weight as possible and only submit these tasks or events to the queue.
Multi threading (not actually) with PHP is achieved by a http server sending multiple requests to a php-fpm daemon (or mod_php if you are so inclined) and by running schedulers or workers in the background as separate, independent processes.
Related
I found that pthreads does not work on web environment. I use PHP7.1 on FPM on Linux Debian which i also use Symfony 3.2. All I want to do is, for example:
User made a request and PUT a file (which is 1GB)
PHP Server receives the file and process it.
Immediately return true to user (jsonResponse) without awaiting processing uploaded file
Later, when processing file is finished (move, copy, duplicate whatever you want) just add an event or do callback from background and notify user.
Now. For this I created Console Command. I execute a Process('bin/console my:command')->start(); from background and I do my processing. But this is killing a fly with bazooka for me. I have to pass many variables to this executable command.
All I want to is creating another thread and just return to user without awaiting processing.
You may say this is duplicate. And point to pthreads. But pthreads stated that it is only intended for CLI. Also last version of pthreads doesn't work with symfony. (fatal error).
I am stuck at this point and have doubt if should I stay with creating processes for each uploaded file or move to python -> django
You don't want threads. You want a job queue. Have a look at Gearman or similar things.
Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
I understand that PHP supports handling multiple concurrent connections and depending on server it can be configured as mentioned in this answer
How does server manages multiple connections does it forks a child process for each request or does it handle using threads or does it handles using a thread pool?
The linked answer says a process is forked and then the author in comment says threads or process, which makes it confusing, if requests are served using child-processes, threads or thread pool?
As I know, every webserver has it's own kind of handling multpile simultanous request.
Usually Apache2 schould fork a child process for each new request. But you can somehow configure this behaviour as mentioned in your linked StackOverflow answer.
Nginx for example gets every request in one thread (processes new connections asyncronously like Node.js does) or sometimes uses caching (as configured; Nginx could also be used as a load balancer or HTTP proxy). It's a thing of choosing the right webserver for your application.
Apache2 could be a very good webserver but you need more loadbalancing when you want to use it in production. But it also has good power when having multiply short lasting connections or even documents which don't change at all (or using caching).
Nginx is very good if you expect many long lasting connections with somehow long processing time. You don't need that much loadbalancing then.
I hope, I was able to help you out with this ;)
Sources:
https://httpd.apache.org/docs/2.4/mod/worker.html
https://anturis.com/blog/nginx-vs-apache/
I recommend you to also look at: What is thread safe or non-thread safe in PHP?
I think the answer depends on how the web server and the cgi deploy.
In my company, we use Nginx as the web server and php-fpm as cgi, so the concurrent request is handled as process by php-fpm, not thread.
We configure the max number of process, and each request is handled by a single php process, if more requests(larger than the max number of process) come , they wait.
So, I believe PHP itself can support all of them, but how to use it, that depends.
After doing some research I ended up with below conclusions.
It is important to consider how PHP servers are set to be able to get insights into it.For setting up the server and PHP on your own, there could be three possibilities:
1) Using PHP as module (For many servers PHP has a direct module interface (also called SAPI))
2) CGI
3) FastCGI
Considering Case#1 PHP as module, in this case the module is integrated with the web server itself and now it puts the ball entirely on web server how it handles requests in terms of forking process, using threads, thread pools, etc.
For module, Apache mod_php appears to be very commonly used, and the Apache itself handles the requests using processes and threads in two models as mentioned in this answer
Prefork MPM uses multiple child processes with one thread each and
each process handles one connection at a time.
Worker MPM uses
multiple child processes with many threads each. Each thread handles
one connection at a time.
Obviously, other servers may take other approaches but, I am not aware of same.
For #2 and #3, web server and PHP part are handled in different processes, and how a web server handles the request and how it is further processed by application(PHP part) varies. For e.g.: NGINX may handle the request using asynchronous non-blocking I/O and Apache may handle requests using threads, but, how the request would be processed by FastCGI or CGI application is a different aspect as described below. Both the aspects i.e. how web server handles requests and how PHP part is processed would be important for PHP servers performance.
Considering #2, CGI protocol has makes web server and application (PHP) independent of each other and CGI Protocol requires application and web server to be handled using different process and the protocol does not promote reuse of the same process, which in turn means a new process is required to handle each request.
Considering#3, FastCGI protocol overcomes the limitation of CGI by allowing process re-use. If you check IIS FastCGI link FastCGI addresses the performance issues that are inherent in CGI by providing a mechanism to reuse a single process over and over again for many requests.
FastCGI maintains compatibility with non-thread-safe libraries by
providing a pool of reusable processes and ensuring that each process
handles only one request at a time.
That said, in case of FastCGI it appears that the server maintains a process pool and it uses the process pool to handle incoming client requests and since, the process pool does not require thread safe check, it provides a good performance.
PHP does not handle requests. The web server does.
For Apache HTTP Server, the most popular is "mod_php". This module is actually PHP itself, but compiled as a module for the web server, and so it gets loaded right inside it.
Since with mod_php, PHP gets loaded right into Apache, if Apache is going to handle concurrency using its Worker MPM (that is, using Threads)
For nginx PHP is totally outside of the web server with multiple PHP processes
It gives you choice sometimes to use non-thread safe or thread safe PHP.
But setlocale() function (when supported) is actually modifies the operation system process status and it is not thread safe.
You should remember it when you are not sure of how legacy code works.
I am familiar with the various methods available within php for spawning new processes, forking, etc... Everything I have read urges against using pcntl_fork from within a web-accessible app. Can anyone tell me why this is not recommended?
At a fundamental level, I can see how if you are not careful, things could quickly get out of hand. But what if you are careful? In my case, I would like to pcntl_fork my parent script into a new child, run a short series of specific functions, and then close the child. Seems pretty straightforward, right? Would it still be dangerous for me to try this?
On a related note, can anyone talk about the overhead involved in doing this a different way... Calling proc_open() to launch an entirely new PHP process? Will I lose any possible speed increase by having to launch the new process?
Background: Consider a site with roughly 2,000 concurrent users running fastcgi.
Have you considered gearman for 'forking' new processes? It's also described as 'a distributed forking mechanism' so your workers do not need to be on the same machine.
Synchronous and asynchronous calls are also available.
You will find it here: http://gearman.org/ and it might be a candidate solution to the problem.
I would like to propose another possibility... Tell me what you think about this.
What if I created a pool of web servers whose sole job was to respond to job requests from the master application server? I would have something like this:
Master Application Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Instead of spawning new PHP processes on my master application server, I would send out job requests to my "workers" using asynchronous sockets. The workers would then run these jobs in realtime and send the result back to the main application server.
Has anyone tried this? Do you foresee any problems? It seems to me that this might work wonderfully.
The problem is not that the app is web-accessible.
The problem is that the web server (or here the FastCGI module) may not handle forks very well. Just try yourself.
My current web host allows for up to 25 processes running at once. From what I can figure, Python scripts take up a spot in processes, but PHP doesn't?
I get a 500 error if more than 25 processes are running at once (unlikely, but still a hassle), so I was wondering if it would be easier on the server if I were to port my site over to PHP?
Thanks!
You are using HostGator. Switch hosts. Their shared server offerings should be used by very low traffic, brochure sites as they cram 100's of vhosts onto each server.
If you can't switch, ensure you're setup to use mod_php (not suPHP or cgi) or Python equivalent. Otherwise, new processes will be spawned on each request and you'll be serving up blank pages in no time.
It depends on how you have PHP/Python set up. If you have, say, Apache loading PHP via mod_php, then it doesn't actually spawn a new process. Likewise, if you were using, say, Tornado to handle web requests, then the webserver itself is already running the Python process, and thus there's no additional Python processes required.
Basically... don't change languages just to alter the number of processes you have running. Instead, figure out what methods your current language has to reduce the process count.
My app takes a loooong list of urls, and split it in X (where X = $threads) so then I can start a thread.php and calculate the urls for it. Then it does GET and POST request to retrieve data
I am using this:
for($x=1;$x<=$threads;$x++){
$pid[] = exec("/path/bin/php thread.php <options> > /dev/null & echo \$!");
}
For "threading" (I know its not really threading, is it forking or what?), I save the pids into a file for later checking if N thread is running and to stop them.
Now I want to move out from php, I was thinking about using python because I'd like to learn more about it.
How can I achieve this kind of "threading" with python? (or ruby)
Or is there a better way to launch multiple background threads in python or ruby that runs in parallel (at the same time)?
The threads doesn't need to communicate between each other or with a main thread, they are independent, they do http request and interact with a mysql db, they may need to access/modify the same table entries (I haven't tought about this or how I will solve it yet).
The app works with "projects", each project has a "max threads" variable and I use a web interface to control it (so I could still use php for the interface [starting/stopping threads] in the new app).
I wanted to use
from threading import Thread
in python, but I've been told those threads wont run in parallel but once at a time.
The app is intended to run on linux web servers.
Any suggestion will be appreciated.
For Python 2.6+, consider the multiprocessing module:
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows
For Python 2.5, the same functionality is available via pyprocessing.
In addition to the example at the links above, here are some additional links to get you started:
multiprocessing Basics
Communication between processes with multiprocessing
You don't want threading. You want a work queue like Gearman that you can send jobs to asynchronously.
It's worth noting that this is a cross-platform, cross-language solution. There are bindings for many languages (including Python and PHP) provided officially, and many more unofficially with a bit of work with Google.
The original intent is effectively load balancing, but it works just as well with only one machine. Basically, you can create one or more Workers that listen for Jobs. You can control the number of Workers and the types of Jobs they can listen for.
If you insert five Jobs into the queue at the same time, and there happen to be five Workers waiting, each Worker will be handed one of the Jobs. If there are more Jobs than Workers, the Jobs get handled sequentially. Your Client (the thing that submits Jobs) can either wait for all of the Jobs it's created to complete, or it can simply place them in the queue and continue on.