As I understand, Apache isn't suited to serving long-poll requests, as each request into Apache will use one worker thread until the request completes, which may be a long time for long-poll/COMET requests.
But what about socket connections. On the PHP website I saw an example of a "simple multi-client server written in PHP that really works".
My question: Does such socket servers only use one worker thread for all established connections? And what about the opposite: Is it possible to write a PHP client which connects to several socket servers simultaneously using only one worker thread?
Look at phpDaemon. It design for long-pool applications and similar. But I advise you to use node.js for these tasks, if possible.
That is an example of a polling-loop style server - see the MSG_DONTWAIT constant being passed to socket_recv()? Essentially, it has a single thread that loops through all of its open sockets to see if any of them has data waiting. If a socket doesn't have data waiting, it moves on to the next and checks it.
However, note that with such a server, you don't get nice protocol handling beyond the TCP base - you have to worry about parsing a stream of raw data yourself.
All of your connections are done with sockets. The main difference is whether I/O is blocking or not. Choosing to receive from a socket that blocks will cause the thread to block, but using MSG_DONTWAIT will finish immediately.
Apache gives you a few options in this regard. You can fork for concurrent connections (mpm-prefork), use a different thread for each connection (mpm-worker), or threads with non-blocking I/O (mpm-event).
Related
From PHP.net:
http://www.php.net/manual/en/function.pfsockopen.php
I understand the gist of what this function accomplishes, but I'm still unclear as to whether this will accomplish what I'd like it to. Here is my scenario:
I have a large PHP application that is used by many users simultaneously. Within the application, I'm opening a TCP socket to a remote server for logging messages, etc... It was my hope that I might be able to leverage pfsockopen in order that many fewer connections would need to be opened. For example, user1 signs in - socket opens. User2 signs in, no socket is opened because he can "piggyback" on the socket opened by user1.
Is this possible?
pfsockopen will indeed keep the socket open when the script ends, allowing it to be re-used from a request to another, effectively opening less connections like you would expect. However, this is not compatible with all SAPIs.
The persistence occurs on a per-process basis. As such, pfsockopen ran in a CLI SAPI will close and re-open a socket at every execution, because the CLI script is executed in a single process that starts, open a socket and ends (closing the socket along with the process).
In CGI mode with one process per script, this is also true.
With the Apache SAPI, it depends what type of multi-processing module (MPM) is in use. mpm-prefork spawns a new process at every request, so it most likely doesn't support it. mpm-worker however, spawns threads, so it will probably work there. mpm-winnt is a Windows variant of a multi-threaded MPM, so it should work too.
The worst that can happen is that the call will be executed as a normal fsockopen call.
So a friend and I are building a web based, AJAX chat software with a jQuery and PHP core. Up to now, we've been using the standard procedure of calling the sever every two seconds or so looking for updates. However I've come to dislike this method as it's not fast, nor is it "cost effective" in that there are tons of requests going back and forth from the server, even if no data is returned.
One of our project supporters recommended we look into a technique known as COMET, or more specifically, Long Polling. However after reading about it in different articles and blog posts, I've found that it isn't all that practical when used with Apache servers. It seems that most people just say "It isn't a good idea", but don't give much in the way of specifics in the way of how many requests can Apache handle at one time.
The whole purpose of PureChat is to provide people with a chat that looks great, goes fast, and works on most servers. As such, I'm assuming that about 96% of our users will being using Apache, and not Lighttpd or Nginx, which are supposedly more suited for long polling.
Getting to the Point:
In your opinion, is it better to continue using setInterval and repeatedly request new data? Or is it better to go with Long Polling, despite the fact that most users will be using Apache? Also, it possible to get a more specific rundown on approximately how many people can be using the chat before an Apache server rolls over and dies?
As Andrew stated, a socket connection is the ultimate solution for asynchronous communication with a server, although only the most cutting edge browsers support WebSockets at this point. socket.io is an open source API you can use which will initiate a WebSocket connection if the browser supports it, but will fall back to a Flash alternative if the browser does not support it. This would be transparent to the coder using the API however.
Socket connections basically keep open communication between the browser and the server so that each can send messages to each other at any time. The socket server daemon would keep a list of connected subscribers, and when it receives a message from one of the subscribers, it can immediately send this message back out to all of the subscribers.
For socket connections however, you need a socket server daemon running full time on your server. While this can be done with command line PHP (no Apache needed), it is better suited for something like node.js, a non-blocking server-side JavaScript api.
node.js would also be better for what you are talking about, long polling. Basically node.js is event driven and single threaded. This means you can keep many connections open without having to open as many threads, which would eat up tons of memory (Apaches problem). This allows for high availability. What you have to keep in mind however is that even if you were using a non-blocking file server like Nginx, PHP has many blocking network calls. Since It is running on a single thread, each (for instance) MySQL call would basically halt the server until a response for that MySQL call is returned. Nothing else would get done while this is happening, making your non-blocking server useless. If however you used a non-blocking language like JavaScript (node.js) for your network calls, this would not be an issue. Instead of waiting for a response from MySQL, it would set a handler function to handle the response whenever it becomes available, allowing the server to handle other requests while it is waiting.
For long polling, you would basically send a request, the server would wait 50 seconds before responding. It will respond sooner than 50 seconds if it has anything to report, otherwise it waits. If there is nothing to report after 50 seconds, it sends a response anyways so that the browser does not time out. The response would trigger the browser to send another request, and the process starts over again. This allows for fewer requests and snappier responses, but again, not as good as a socket connection.
From PHP.net:
http://www.php.net/manual/en/function.pfsockopen.php
I understand the gist of what this function accomplishes, but I'm still unclear as to whether this will accomplish what I'd like it to. Here is my scenario:
I have a large PHP application that is used by many users simultaneously. Within the application, I'm opening a TCP socket to a remote server for logging messages, etc... It was my hope that I might be able to leverage pfsockopen in order that many fewer connections would need to be opened. For example, user1 signs in - socket opens. User2 signs in, no socket is opened because he can "piggyback" on the socket opened by user1.
Is this possible?
pfsockopen will indeed keep the socket open when the script ends, allowing it to be re-used from a request to another, effectively opening less connections like you would expect. However, this is not compatible with all SAPIs.
The persistence occurs on a per-process basis. As such, pfsockopen ran in a CLI SAPI will close and re-open a socket at every execution, because the CLI script is executed in a single process that starts, open a socket and ends (closing the socket along with the process).
In CGI mode with one process per script, this is also true.
With the Apache SAPI, it depends what type of multi-processing module (MPM) is in use. mpm-prefork spawns a new process at every request, so it most likely doesn't support it. mpm-worker however, spawns threads, so it will probably work there. mpm-winnt is a Windows variant of a multi-threaded MPM, so it should work too.
The worst that can happen is that the call will be executed as a normal fsockopen call.
can you tell me how server handles different http request at a time. If 10 users logged in a site and send request for a page at the same time what will happen?
Usually, each of the users sends a HTTP request for the page. The server receives the requests and delegates them to different workers (processes or threads).
Depending on the URL given, the server reads a file and sends it back to the user. If the file is a dynamic file such as a PHP file, the file is executed before it's send back to the user.
Once the requested file has been sent back, the server usually closes the connection after a few seconds.
For more, see: HowStuffWorks Web Servers
HTTP uses TCP which is a connection-based protocol. That is, clients establish a TCP connection while they're communicating with the server.
Multiple clients are allowed to connect to the same destination port on the same destination machine at the same time. The server just opens up multiple simultaneous connections.
Apache (and most other HTTP servers) have a multi-processing module (MPM). This is responsible for allocating Apache threads/processes to handle connections. These processes or threads can then run in parallel on their own connection, without blocking each other. Apache's MPM also tends to keep open "spare" threads or processes even when no connections are open, which helps speed up subsequent requests.
The program ab (short for ApacheBench) which comes with Apache lets you test what happens when you open up multiple connections to your HTTP server at once.
Apache's configuration files will normally set a limit for the number of simultaneous connections it will accept. This will be set to a reasonable number, such that during normal operation this limit should never be reached.
Note too that the HTTP protocol (from version 1.1) allows for a connection to be kept open, so that the client can make multiple HTTP requests before closing the connection, potentially reducing the number of simultaneous connections they need to make.
More on Apache's MPMs:
Apache itself can use a number of different multi-processing modules (MPMs). Apache 1.x normally used a module called "prefork", which creates a number of Apache processes in advance, so that incoming connections can often be sent to an existing process. This is as I described above.
Apache 2.x normally uses an MPM called "worker", which uses multithreading (running multiple execution threads within a single process) to achieve the same thing. The advantage of multithreading over separate processes is that threading is a lot more light-weight compared to opening separate processes, and may even use a bit less memory. It's very fast.
The disadvantage of multithreading is you can't run things like mod_php. When you're multithreading, all your add-in libraries need to be "thread-safe" - that is, they need to be aware of running in a multithreaded environment. It's harder to write a multi-threaded application. Because threads within a process share some memory/resources between them, this can easily create race condition bugs where threads read or write to memory when another thread is in the process of writing to it. Getting around this requires techniques such as locking. Many of PHP's built-in libraries are not thread-safe, so those wishing to use mod_php cannot use Apache's "worker" MPM.
Apache 2 has two different modes of operation. One is running as a threaded server the other is using a mode called "prefork" (multiple processes).
The requests will be processed simultaneously, to the best ability of the HTTP daemon.
Typically, the HTTP daemon will spawn either several processes or several threads and each one will handle one client request. The server may keep spare threads/processes so that when a client makes a request, it doesn't have to wait for the thread/process to be created. Each thread/process may be mapped to a different processor or core so that they can be processed more quickly. In most circumstances, however, what holds the requests is network I/O, not lack of raw computing, so there is frequently no slowdown from having a number of processors/cores significantly lower than the number of requests handled at one time.
The server (apache) is multi-threaded, meaning it can run multiple programs at once. A few years ago, a single CPU could switch back and forth quickly between multiple threads, giving on the appearance that two things were happening at once. These days, computers have multiple processors, so the computer can actually run two threads of code simultaneously. That being said, threads aren't really mapped to processors in any simple way.
With that ability, a PHP program can be thought of as a single thread of execution. If two requests reach the server at the same time, two threads can be used to process the request simultaneously. They will probably both get about the same amount of CPU, so if they are doing the same thing, they will complete at approximately the same time.
One of the most common issues with multi-threading is "race conditions"-- where you two requests are doing the same thing ("racing" to do the same thing), if it is a single resource, one of them is going to win. If they both insert a record into the database, they can't both get the same id-- one of them will win. So you need to be careful when writing code to realize other requests are going on at the same time and may modify your database, write files or change globals.
That being said, the programming model allows you to mostly ignore this complexity.
My understanding is that PHP's p* connections is that it keeps a connection persistent between page loads to the service (be it memcache, or a socket etc). But are these connections thread safe? What happens when two pages try to access the same connection at the same time?
In the typical unix deployment, PHP is installed as a module that runs inside the apache web server, which in turn is configured to dispatch HTTP requests to one of a number of spawned children.
For the sake of efficiency, apache will often spawn these processes ahead of time (pre-forking them) and maintain them, so that they can dispatch more than one request, and save the overhead of starting up a process for every request that comes in.
PHP works on the principle of starting every request with a clean environment; no script variables persist between page loads. (Contrast this with mod_perl or python, where applications often manifest subtle bugs due to unexpected state hangovers).
This means that the typical resource allocated by a PHP script, be it an image handle for GD or a database connection, will be released at the end of a request.
Some resources, particularly Oracle database connections, have quite a high cost to establish, so it is desirable to somehow cache that connection between dispatched web requests.
Enter persistent resources.
The way these work is that any given apache child process may maintain a resource beyond the scope of a request by registering it in a "persistent list" of resources. The persistent list is not cleaned up at the end of the request (known as RSHUTDOWN internally). When you use a pconnect function, it will look up the persistent list entry for a given set of unique credentials and return that, if it exists, or establish a new connection with those credentials.
If you have configured apache to maintain 200 child processes, you should expect to see that many connections established from your web server to your database machine.
If you have many web servers and a single database machine, you may end loading your database machine much more than you anticipated.
With a threaded SAPI, the persistent list is maintained per thread, so it should be thread safe and have similar benefits, but the usual caveat about PHP not being recommended to run in threaded SAPI applies--while PHP is itself thread safe, so many libraries that it uses may have thread safety problems of their own and cause you a good number of headaches.
The manual's page Persistent Database Connections might get you a couple of informations about persistent connections.
It doesn't say anything specific about thread safety, still ; I've quite never seen anything about that anywhere, as far as I remember, so I suppose it "just works OK". My guess would be a connection is re-used only if not already used by another thread at the same time, but it's just some kind of (logical) wild guess...
Generally speaking, PHP will make one persistent connection per process or thread running on the webserver. Because of this, a process or thread will not access the connection of another process or thread.
Instead, when you make a database connection PHP will check to see if one is already open (in the process or thread that is handling the page request) and if it is then it will use it, otherwise it will just initialize a new one.
So to answer your question, they aren't necessarily thread safe but because of how they operate there isn't a situation where two threads or processes will access the same connection.
Generally speaking, when a PHP script requests a persistent connection, PHP will look for one in the connection pool with the same connection parameters.
If one is found that is NOT being used, it is given to the script, and returned to the pool at the end of the script.