How to reuse pfsockopen resource [duplicate] - php

From PHP.net:
http://www.php.net/manual/en/function.pfsockopen.php
I understand the gist of what this function accomplishes, but I'm still unclear as to whether this will accomplish what I'd like it to. Here is my scenario:
I have a large PHP application that is used by many users simultaneously. Within the application, I'm opening a TCP socket to a remote server for logging messages, etc... It was my hope that I might be able to leverage pfsockopen in order that many fewer connections would need to be opened. For example, user1 signs in - socket opens. User2 signs in, no socket is opened because he can "piggyback" on the socket opened by user1.
Is this possible?

pfsockopen will indeed keep the socket open when the script ends, allowing it to be re-used from a request to another, effectively opening less connections like you would expect. However, this is not compatible with all SAPIs.
The persistence occurs on a per-process basis. As such, pfsockopen ran in a CLI SAPI will close and re-open a socket at every execution, because the CLI script is executed in a single process that starts, open a socket and ends (closing the socket along with the process).
In CGI mode with one process per script, this is also true.
With the Apache SAPI, it depends what type of multi-processing module (MPM) is in use. mpm-prefork spawns a new process at every request, so it most likely doesn't support it. mpm-worker however, spawns threads, so it will probably work there. mpm-winnt is a Windows variant of a multi-threaded MPM, so it should work too.
The worst that can happen is that the call will be executed as a normal fsockopen call.

Related

Do socket connections only need one thread?

As I understand, Apache isn't suited to serving long-poll requests, as each request into Apache will use one worker thread until the request completes, which may be a long time for long-poll/COMET requests.
But what about socket connections. On the PHP website I saw an example of a "simple multi-client server written in PHP that really works".
My question: Does such socket servers only use one worker thread for all established connections? And what about the opposite: Is it possible to write a PHP client which connects to several socket servers simultaneously using only one worker thread?
Look at phpDaemon. It design for long-pool applications and similar. But I advise you to use node.js for these tasks, if possible.
That is an example of a polling-loop style server - see the MSG_DONTWAIT constant being passed to socket_recv()? Essentially, it has a single thread that loops through all of its open sockets to see if any of them has data waiting. If a socket doesn't have data waiting, it moves on to the next and checks it.
However, note that with such a server, you don't get nice protocol handling beyond the TCP base - you have to worry about parsing a stream of raw data yourself.
All of your connections are done with sockets. The main difference is whether I/O is blocking or not. Choosing to receive from a socket that blocks will cause the thread to block, but using MSG_DONTWAIT will finish immediately.
Apache gives you a few options in this regard. You can fork for concurrent connections (mpm-prefork), use a different thread for each connection (mpm-worker), or threads with non-blocking I/O (mpm-event).

PHP - Can someone explain the pfsockopen function for me? (persistent socket)

From PHP.net:
http://www.php.net/manual/en/function.pfsockopen.php
I understand the gist of what this function accomplishes, but I'm still unclear as to whether this will accomplish what I'd like it to. Here is my scenario:
I have a large PHP application that is used by many users simultaneously. Within the application, I'm opening a TCP socket to a remote server for logging messages, etc... It was my hope that I might be able to leverage pfsockopen in order that many fewer connections would need to be opened. For example, user1 signs in - socket opens. User2 signs in, no socket is opened because he can "piggyback" on the socket opened by user1.
Is this possible?
pfsockopen will indeed keep the socket open when the script ends, allowing it to be re-used from a request to another, effectively opening less connections like you would expect. However, this is not compatible with all SAPIs.
The persistence occurs on a per-process basis. As such, pfsockopen ran in a CLI SAPI will close and re-open a socket at every execution, because the CLI script is executed in a single process that starts, open a socket and ends (closing the socket along with the process).
In CGI mode with one process per script, this is also true.
With the Apache SAPI, it depends what type of multi-processing module (MPM) is in use. mpm-prefork spawns a new process at every request, so it most likely doesn't support it. mpm-worker however, spawns threads, so it will probably work there. mpm-winnt is a Windows variant of a multi-threaded MPM, so it should work too.
The worst that can happen is that the call will be executed as a normal fsockopen call.

what does threading mean when it comes to programming?

i read somewhere that php has threading ability called pcntl_fork(). but i still dont get what it means? whats its purpose?
software languages have multithreading abilities from what i understand, correct me if im wrong, that its the ability for a parent object to have children of somekind.
thanks
From wikipedia: "In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. It generally results from a fork of a computer program into two or more concurrently running tasks."
Basically, having threads is having the ability to do multiple things within the same running application (or, process space as said by RC).
For example, if you're writing a chat server app in PHP, it would be really nice to "fork" certain tasks so if the server gets hung up processing something like a file transfer or very slow client it can spawn a thread to take care of the file transfer while the main application continues to transfer messages between clients without delay. Last time I'd used PHP, the threading was clunky/not very well supported.
Or, on the client end, while sending said file, it would be a good idea to thread the file transfer, otherwise you wouldn't be able to send messages to the server while sending the file.
It is not a very good metaphor. Think of a thread like a worker or helper that will do work or execute code for you in the background while your program could possibly be doing other tasks such as taking user input.
Threading means you can have more than one line of execution within the same process space. Note that this is different than a multi-process paradigm as processes will not share the same memory space.
This wiki link will do a good job getting you up to speed with threads. Having said that, the function pcntl_fork() in PHP, appears to create a child process which falls inline with the multi-process paradigm.
In layman's terms, since a thread gives you more than one line of execution within a program, it allows you to do more than one thing at the same time. Technically, you're not always doing these things simultaneously as in a single core processor, you're really just time-slicing so it appears you're doing more than one thing at a time.
A pretty straight-forward use of threads is how connections to a web server are handled. If you didn't have multiple threads, your application would listen for a connection on a socket, accept the connection when a client requested a connection, and then would process whatever page the client asked for. This seems well and good until you have a page that takes 5 seconds to load and you have 2 clients connecting at the same time. One of the clients will sit and wait for the server to accept their connection for ~5 seconds, because the 1st client is using the only line of execution to serve the page and it can't do that and accept the 2nd connection.
Now if you have multiple threads, you'll have one thread (i.e. the listener thread) that only accepts connections. As soon as the connection is accepted by the listener thread, he will pass the connection on to another thread. We'll call it the processor thread. Once the connection is passed on to the processor thread, the listener thread will immediately go back to waiting for a new connection. Meanwhile, the processor thread will use it's own execution line to serve the page that takes 5 seconds. In the scenario above, the 2nd client would have it's connection accepted immediately after the 1st client was handed to the 1st processor thread and an additional 2nd processor thread would be created to handle the request from the 2nd client. This would typically allow you to serve both clients the data in a little over 5 seconds while the single-threaded app would take ~10 seconds.
Hope this helps with your understanding of application threading.
Threading mean not to allow sequential behavior inside your code, whatever is your programming language..

How server manage different user's requests at a time?

can you tell me how server handles different http request at a time. If 10 users logged in a site and send request for a page at the same time what will happen?
Usually, each of the users sends a HTTP request for the page. The server receives the requests and delegates them to different workers (processes or threads).
Depending on the URL given, the server reads a file and sends it back to the user. If the file is a dynamic file such as a PHP file, the file is executed before it's send back to the user.
Once the requested file has been sent back, the server usually closes the connection after a few seconds.
For more, see: HowStuffWorks Web Servers
HTTP uses TCP which is a connection-based protocol. That is, clients establish a TCP connection while they're communicating with the server.
Multiple clients are allowed to connect to the same destination port on the same destination machine at the same time. The server just opens up multiple simultaneous connections.
Apache (and most other HTTP servers) have a multi-processing module (MPM). This is responsible for allocating Apache threads/processes to handle connections. These processes or threads can then run in parallel on their own connection, without blocking each other. Apache's MPM also tends to keep open "spare" threads or processes even when no connections are open, which helps speed up subsequent requests.
The program ab (short for ApacheBench) which comes with Apache lets you test what happens when you open up multiple connections to your HTTP server at once.
Apache's configuration files will normally set a limit for the number of simultaneous connections it will accept. This will be set to a reasonable number, such that during normal operation this limit should never be reached.
Note too that the HTTP protocol (from version 1.1) allows for a connection to be kept open, so that the client can make multiple HTTP requests before closing the connection, potentially reducing the number of simultaneous connections they need to make.
More on Apache's MPMs:
Apache itself can use a number of different multi-processing modules (MPMs). Apache 1.x normally used a module called "prefork", which creates a number of Apache processes in advance, so that incoming connections can often be sent to an existing process. This is as I described above.
Apache 2.x normally uses an MPM called "worker", which uses multithreading (running multiple execution threads within a single process) to achieve the same thing. The advantage of multithreading over separate processes is that threading is a lot more light-weight compared to opening separate processes, and may even use a bit less memory. It's very fast.
The disadvantage of multithreading is you can't run things like mod_php. When you're multithreading, all your add-in libraries need to be "thread-safe" - that is, they need to be aware of running in a multithreaded environment. It's harder to write a multi-threaded application. Because threads within a process share some memory/resources between them, this can easily create race condition bugs where threads read or write to memory when another thread is in the process of writing to it. Getting around this requires techniques such as locking. Many of PHP's built-in libraries are not thread-safe, so those wishing to use mod_php cannot use Apache's "worker" MPM.
Apache 2 has two different modes of operation. One is running as a threaded server the other is using a mode called "prefork" (multiple processes).
The requests will be processed simultaneously, to the best ability of the HTTP daemon.
Typically, the HTTP daemon will spawn either several processes or several threads and each one will handle one client request. The server may keep spare threads/processes so that when a client makes a request, it doesn't have to wait for the thread/process to be created. Each thread/process may be mapped to a different processor or core so that they can be processed more quickly. In most circumstances, however, what holds the requests is network I/O, not lack of raw computing, so there is frequently no slowdown from having a number of processors/cores significantly lower than the number of requests handled at one time.
The server (apache) is multi-threaded, meaning it can run multiple programs at once. A few years ago, a single CPU could switch back and forth quickly between multiple threads, giving on the appearance that two things were happening at once. These days, computers have multiple processors, so the computer can actually run two threads of code simultaneously. That being said, threads aren't really mapped to processors in any simple way.
With that ability, a PHP program can be thought of as a single thread of execution. If two requests reach the server at the same time, two threads can be used to process the request simultaneously. They will probably both get about the same amount of CPU, so if they are doing the same thing, they will complete at approximately the same time.
One of the most common issues with multi-threading is "race conditions"-- where you two requests are doing the same thing ("racing" to do the same thing), if it is a single resource, one of them is going to win. If they both insert a record into the database, they can't both get the same id-- one of them will win. So you need to be careful when writing code to realize other requests are going on at the same time and may modify your database, write files or change globals.
That being said, the programming model allows you to mostly ignore this complexity.

How do PHP's p* connect methods work?

My understanding is that PHP's p* connections is that it keeps a connection persistent between page loads to the service (be it memcache, or a socket etc). But are these connections thread safe? What happens when two pages try to access the same connection at the same time?
In the typical unix deployment, PHP is installed as a module that runs inside the apache web server, which in turn is configured to dispatch HTTP requests to one of a number of spawned children.
For the sake of efficiency, apache will often spawn these processes ahead of time (pre-forking them) and maintain them, so that they can dispatch more than one request, and save the overhead of starting up a process for every request that comes in.
PHP works on the principle of starting every request with a clean environment; no script variables persist between page loads. (Contrast this with mod_perl or python, where applications often manifest subtle bugs due to unexpected state hangovers).
This means that the typical resource allocated by a PHP script, be it an image handle for GD or a database connection, will be released at the end of a request.
Some resources, particularly Oracle database connections, have quite a high cost to establish, so it is desirable to somehow cache that connection between dispatched web requests.
Enter persistent resources.
The way these work is that any given apache child process may maintain a resource beyond the scope of a request by registering it in a "persistent list" of resources. The persistent list is not cleaned up at the end of the request (known as RSHUTDOWN internally). When you use a pconnect function, it will look up the persistent list entry for a given set of unique credentials and return that, if it exists, or establish a new connection with those credentials.
If you have configured apache to maintain 200 child processes, you should expect to see that many connections established from your web server to your database machine.
If you have many web servers and a single database machine, you may end loading your database machine much more than you anticipated.
With a threaded SAPI, the persistent list is maintained per thread, so it should be thread safe and have similar benefits, but the usual caveat about PHP not being recommended to run in threaded SAPI applies--while PHP is itself thread safe, so many libraries that it uses may have thread safety problems of their own and cause you a good number of headaches.
The manual's page Persistent Database Connections might get you a couple of informations about persistent connections.
It doesn't say anything specific about thread safety, still ; I've quite never seen anything about that anywhere, as far as I remember, so I suppose it "just works OK". My guess would be a connection is re-used only if not already used by another thread at the same time, but it's just some kind of (logical) wild guess...
Generally speaking, PHP will make one persistent connection per process or thread running on the webserver. Because of this, a process or thread will not access the connection of another process or thread.
Instead, when you make a database connection PHP will check to see if one is already open (in the process or thread that is handling the page request) and if it is then it will use it, otherwise it will just initialize a new one.
So to answer your question, they aren't necessarily thread safe but because of how they operate there isn't a situation where two threads or processes will access the same connection.
Generally speaking, when a PHP script requests a persistent connection, PHP will look for one in the connection pool with the same connection parameters.
If one is found that is NOT being used, it is given to the script, and returned to the pool at the end of the script.

Categories