I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.
One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!
There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.
PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.
Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.
Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.
Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...
Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.
Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server
Related
TL;DR: I'm not sure this topic has its place on StackOverflow, but basically it's just a topic of debate and thinking about making PHP apps like we would do with NodeJS for example (stateless request flow, asynchronous calls, etc.)
The situation
We know NodeJS can be used as both a web-server and web-app.
But for PHP, the internal web-server is not recommended for production (so says the documentation).
But, as Symfony full-stack is based on the Kernel which handles Request objects, it means we should be able to send lots of requests to the same kernel, only if we could "bootstrap" the php web-server (not the app) by creating a kernel before listening to HTTP requests. And our router would only create a Request object and make the kernel handle it.
But for this, a Symfony app has to be stateless, for example we need Doctrine to effectively clear its unit of work after a request, or maybe we would need to sort of isolate some components based on a request (By identifying a request with its unique PHP class reference id? Or by using other php processes?), and obviously, we would need more asynchronous things in PHP or in the way we use the internal web-server.
The main questions I sometimes ask myself, and now ask to the community
To clarify this, I have some questions about PHP:
Why exactly is the internal PHP webserver not recommended for production?
I mean, if we can configure how the server is run and its "router" file, we should be able to use it like any PHP server, yes or no?
How does it behaves internally? Is memory shared between two requests?
By using the router, it seems obvious to me that variables are not shared, else we could make nodejs-like apps, but it seems PHP is not capable of doing something like this.
Is it really possible to make a full-stateless application with Symfony?
e.g. I send two different requests to the same kernel object, in this case, is there any possibility that the two requests create a conflict in Symfony core components?
Actually, the idea of "Create a kernel -> start server -> on request, make the kernel handle it" behavior would be awesome, because it would be something quite similar to NodeJS, but actually, the PHP paradigm is not compatible with this because we would need each request to be handled asynchronously. But if a kernel and its container is stateless, then, there should be a way to do something like that, shouldn't it?
Some thoughts
I've heard about React PHP, Ratchet PHP for Websocket integration, Icicle, PHP-PM but never experienced them, it seems a bit too complex to me for now (I may lack some concepts about asynchronicity in apps, that's why my brain won't understand until I have some more answers :D ).
Is there any way that these libraries could be used as "wrappers" for our kernel request handling?
I mean, let's create this reactphp/icicle/whatever environment setup, create our kernel like we would do in any Symfony app, and run the app as web-server, and when a request is retrieved, we send it asynchrously to our kernel, and as long as the kernel has not sent the response, the client waits for it, even if the response is also sent asynchrously (from nested callbacks, etc., like in NodeJS).
This would make any existing Symfony app compatible with this paradigm, as long as the app is stateless, obviously. (if the app config changes based on a request, there's a paradigm issue in the app itself...)
Is it even a possible reality with PHP libraries rather than using PHP internal web-server in another way?
Why ask these questions?
Actually, it would be kind of a revolution if PHP could implement real asynchronous stuff internally, like Javascript has, but this would also has a big impact on performances in PHP, because of persistent data in our web-server, less bootstraping (require autoloader, instantiate kernel, get heavy things from cached files, resolve routing, etc.).
In my thoughts, only the $kernel->handleRaw($request); would consume CPU, the whole rest (container, parameters, services, etc.) would be already in the memory, or, for the case of services, "awaiting to be instantiated". Then, performance boost, I think.
And it may troll a bit the people who still think PHP is a very bad and slow language to use :D
For readers and responders ;)
If a core PHP contributor reads me, is there any way that internally PHP could be more asynchronous even with a specific new internal API based on functions or classes?
I'm not a pro of all of these concepts, and I hope really good experts are going to read this and answer me!
It could be a great advance in the PHP world if all of this was possible in any way.
Why exactly is the internal PHP webserver not recommended for
production? I mean, if we can configure how the server is run and its
"router" file, we should be able to use it like any PHP server, yes or
no?
Because it's not written to behave well under load, and there are no configuration options that let you handle HTTP request processing before it reaches PHP.
Basically, it lacks features if you compare it to nginx. It would be equal to comparing a skateboard to a Lamborghini.
It can get you from A to B but.. you get the gist.
How does it behaves internally? Is memory shared between two requests?
By using the router, it seems obvious to me that variables are not
shared, else we could make nodejs-like apps, but it seems PHP is not
capable of doing something like this.
Documentation states it's singlethreaded, so it appears that it would behave the same as if you wrote while(true) { // all your processing here }.
It's a playtoy designed to quickly check a few things if you can't be bothered to set up a proper web server before trying out your code.
Is it really possible to make a full-stateless application with
Symfony? e.g. I send two different requests to the same kernel object,
in this case, is there any possibility that the two requests create a
conflict in Symfony core components?
Why would it go to the same kernel object? Why not design your app in such a way that it's not relevant which object or even processing server gets the request? Why not design for redundancy and high availability from the get go? HTTP = stateless by default. Your task = make it irrelevant what processes the request. It's not difficult to do so, if you avoid coupling with the actual processing server (example: don't store sessions to local filesystem etc.)
Actually, the idea of "Create a kernel -> start server -> on request,
make the kernel handle it" behavior would be awesome, because it would
be something quite similar to NodeJS, but actually, the PHP paradigm
is not compatible with this because we would need each request to be
handled asynchronously. But if a kernel and its container is
stateless, then, there should be a way to do something like that,
shouldn't it?
Actually, nginx + php-fpm behave almost identical to node.js.
nginx uses a reactor to handle all connections on the same thread. Node.js does the exact same thing. What you do is create a closure / callback that is fed into Node's libraries and I/O is handled in a threaded environment. Multithreading is abstracted from you (related to I/O, not CPU). That's why you can experience that Node.js blocks when it's asked to do a CPU intensive task.
nginx implements the exact same concept, except this callback isn't a closure written in javascript. It's a callback that expects an answer from php-fpm during <timeout> seconds. Nginx takes care of async for you. What your task is is to write what you want in PHP. Now, if you're reading a huge file, then async code in your PHP would make sense, except it's not really needed.
With nginx and sending off requests for processing to a fastcgi worker, scaling becomes trivial. For example, let's assume that 1 PHP machine isn't enough to deal with the amount of requests you're dealing with. No problem, add more machines to nginx's pool.
This is taken from nginx docs:
upstream backend {
server backend1.example.com weight=5;
server backend2.example.com:8080;
server unix:/tmp/backend3;
server backup1.example.com:8080 backup;
server backup2.example.com:8080 backup;
}
server {
location / {
proxy_pass http://backend;
}
}
You define a pool of servers and then assign various weights / proxying options related to balancing how requests are handled.
However, the important part is that you can add more servers to cope with availability requirements.
This is the reason why nginx + php-fpm stack is appealing. Since nginx acts as a proxy, it can proxy requests to node.js as well, letting you handle web socket related operations in node.js (which, in turn, can perform an HTTP request to a PHP endpoint, allowing you to contain your entire app logic in PHP).
I know this answer might not be what you're after, but what I wanted to highlight is the way node.js works (conceptually) is identical to what nginx does when it comes to handling incoming request. You could make php work as node does, but there's no need for that.
Your questions can be summed up as this:
"Could PHP be more like Node?"
to which the answer is of course "Yes." But that leads us to another question:
"Should PHP be more like Node?"
and now the answer is not that obvious.
Of course in theory PHP could be made more like Node - even to a point to make it exactly the same. Just take the next version of Node and call it PHP 6.0 or something.
I would argue that it would be harmful to both Node and PHP. There is a diversity in the runtime environments for a reason. One of the variations is the concurrency model used in a given environment. Making one like the other would mean less choice for the programmer. And less choice is less freedom of expression.
PHP and Node were created in different times and for different reasons.
PHP was developed in 1995 and the name stood for Personal Home Page. The use case was to add some server-side dynamic features to HTML. We already had SSI and CGI at that point but people wanted to be able to inject right into the HTML - synchronously, as it wouldn't make much sense otherwise - results of database queries and other computations. It isn't a surprise how good it is at this job even today.
Node, on the other hand, was developed in 2009 - almost 15 years later - to create high performance network servers. So it shouldn't surprise us that writing such servers in Node is easy and that they have great performance characteristics. This is why Node was created in the first place. One of the choices it had to make was a 100% non-blocking environment of single-threaded, asynchronous event loops.
Now, single-threading concurrency is conceptually more difficult than multi-threading. But if you want performance for I/O-heavy operations then currently you have no other options. You will not be able to create 10,000 threads but you can easily handle 10,000 connections with Node in a single thread. There is a reason why nginx is single-threaded and why Redis is single threaded. And one common characteristic of nginx and Redis is amazing performance - but both of those were hard to write.
Now, as far as Node and PHP go, those technologies are so far from each other that it's hard to even comprehend how their fusion would look like. It reminds me the old April Fool's joke about unifying Perl and Python that so many people believed in.
PHP has its strengths and Node has it strengths. And just like it would be hard to imagine Node with blocking-I/O, it would be equally hard to imagine PHP with non-blocking I/O.
To summarize: it could be possible to make PHP like Node, but I wouldn't expect it to happen any time soon - if ever.
I'm coming from a PHP background and I'm a bit confused about how to safely use resources with Golang. My main concern is, in a web context, with PHP, scripts are usually short-lived (HTTP request / response lifetime), but with Golang they're supposed to run forever (because the Golang program acts as a web server and a web application at the same time).
So, when it comes to deal with database connections, log files, I often see that they should be opened once and not for each request, which makes sense. However how stable is it to do this?
For example, if I open a database connection, how can I be sure it won't break at some point? (if the database decides to kill it for some reason, or if my machine looses internet access, would the connection become valid again when I gain internet access later on?) Same for log files, with PHP it's not a problem for a sysadmin to setup log rotation, however in Golang it would break the file handle I think (if the program doesn't know about this)?
I'd really like to work the Golang way and not open/close those resources for each request but I'm not sure what is the "safe" way to do this, are there any recommendations for this? Or is there built-in features for those concerns?
In Go, there are built-in packages that will handle both log files and databases in a Go idiomatic way.
They are:
database/sql: https://golang.org/pkg/database/sql
log: https://golang.org/pkg/log
In Go, as with any language where long running daemon's can be created, error checking and error handling will be important.
I am trying understand websockets.
I have seen 2 examples here in doc and also here.
Both examples are using endless loop cycling, listening for when a new client connects, when they do something interesting and when they are disconnected.
My question is: Is using websockets (with endless loop cycling) better than an ajax solution with http requests per x time ?
AJAX and WebSockets are vastly different. Asking if one is better than the other is like asking if a screwdriver is better than a hammer.
WebSockets are used for real time, interactive communication. Both sides of a WebSocket connection can send data and it will be received within milliseconds by the other end. The connection stays open, reducing latency due to connection negotiation.
However, it only sort of plays nicely with HTTP. That is, it plays nicely with proxies that are WebSocket aware, and with firewalls. WebSocket traffic is most definitely not HTTP traffic, except for the client's first packet, which requests switching from HTTP to the WebSocket protocol.
AJAX, on the other hand, is pure HTTP. The only difference between AJAX and a standard web request is that an AJAX request is initiated by client side scripts and the response is available to that same script rather than reloading the page.
In both AJAX and WebSockets, the client scripts can receive data and use it within that same script. That's where the similarities end.
WebSockets set up a permanent connection and both sides can send data at any time, or sit quietly at any time. With AJAX, the client makes a request and the server responds.
For instance, if you were to set up a new message notification system, if you were using WebSockets, then as soon as a new message is available, the server sends it straight to the browser. If there are no new messages, the server stays quiet. If you were using AJAX, the client would periodically send a request to the server, which would always respond, either saying there were no new messages, or delivering the notifications that are pending. There is no way for the server to initiate things on its end, it must wait for the AJAX request.
Server side, things diverge from the traditional PHP web development paradigms. A typical WebSocket server will be a stand alone, CLI application running as a daemon. (If that last sentence doesn't make sense, please spend a while taking the time to really understanding how to administer a server.)
This means that multiple clients will be connecting to the same script, and superglobal variables like $_GET and $_SESSION will be absolutely meaningless. It seems easy to conceptualize in a small use case, but remember that you will most likely want to get information from other parts of your site, which often means using libraries and frameworks that have absolutely no concept of accessing data outside of the HTTP request/response model.
Thus, for simplicity, you'll usually want to stick with AJAX requests and periodic polling, unless you have the means to rethink the network data and possibly re-implement things that your libraries automate, if you're looking to update standard web traffic.
As for the server's loop:
It is not a busy loop, it is an IO blocked loop.
If the server tries to read network data and none is available, the operating system will block (pause) the script and go off to do whatever else needs to be done. In my WS server, I block waiting for network traffic for at most 1 second at a time, before the script returns to check and see if anything else new happened that I should notify my clients of. Typically, this is barely a few milliseconds before the server goes right back to its IO blocked state waiting for new data on the wire. Some others have implemented my server using LibEv, which allows them to respond to events outside of the network IO without having to wait for the block to timeout.
This is the way nearly every server does things. This is why you can have Apache actively listening and serving web traffic without every server that runs Apache being pegged at 100% CPU usage even when there is no traffic.
In closing, WebSockets is a wonderful technology, but web libraries and frameworks are simply not built to use them. Thus, unless you're working in a system where waiting 3 seconds for a full AJAX request is far, far too long, it's probably best to use AJAX. If you're writing a multiplayer interactive game or a chat system, then you've found a perfect use for WebSockets.
I do heartily encourage everyone to learn WebSockets... but it's not a magic bullet, and few parts of the web are designed in ways where people can get real use out of it.
Yes, sockets are better in many cases.
It's not forever loop with 100% cpu utilizing, it's just liveloop, which exists in each daemon application.
Sync accept operation is where 99.99% of time we are.
Ajax heartbeat is more traffic, more server CPU and memory.
I too am in the learning phase. I have built a php-based websocket server and have it communicating with web pages. Perhaps my 2c perspective is useful.
Getting the websocket server (wss) working using available sources as a starting point is not that difficult, but what to do with it all next is.
The wss runs in CLI version of php. Late model browser loads a normal http or https page containing a request to the wss, along with anything else that page needs to do, a handshake occurs. Communication is then possible directly between browser and wss at the whim of either end. This is low overhead and hence fast and simple. Very cool. What is said over that link needs to be understood by both ends - subprotocol agreement. You may have to roll your own in php and in javascript. No more http headers, urls, etc etc.
The wss is a long-lived, stateful instance of php (very unlike apache etc which forget you on sending the page). An entire app can be run in the wss instance, keeping state for itself and each connected client. It used to be said that php was too leaky for long life but I don't hear that much any more. But I believe you still have to be careful with memory.
However, being a single php instance there is not the usual separation between client instances. For example statics in classes are shared with every class instance and hence every client. So for a single user style app sharing data with a heap of clients this is great. I can see that Ajax type calls can be replaced in this way, but if the app still had to rebuild state to service each client, and then release it to save resources, that seems to lessen the advantage.
Going a step further and keeping truly stateful instances for clients seems like a possible next step. Replicating the traditional session based system is one possibility, alternatively fork new php interpreters and look after communications between parent and children via sockets or suchlike. But this would require resources per client that would be severely limiting for any non-trivial app.
Or perhaps it is possible to put the bulk of the app in the parent and let the children just do the very client specific stuff. Or break the app design into small independent units that can communicate directly via sockets. Socket communication does seem to be catching on nowadays.
As Ghedpunk says in so many ways, the real world does not yet seem ready to realise the full potential of the web socket concept but it can certainly replace Ajax. The added advantage of the server sending without being asked opens up new possibilities previously too difficult to consider.
This is a bit complicated, so please don't jump to conclusions, feel free to ask about anything that is not clear enough.
Basically, I have a websocket server written in PHP. Please note that websocket messages are asynchronous, that is, a response to a request might take a lot of time, all the while the client keeps on working (if applicable).
Clients are supposed to ask the server for access to files on other servers. This can be an FTP service, or Dropbox, for the matter.
Here, please take note of two issues: connections should be shared and reused and the server actually 'freezes' while it does its work, hence any requests are processed after the server has 'unfrozen'.
Therefore, I thought, why not offload file access (which is what freezes the server) to PHP threads?
The problem here is twofold;
how do I make a connection resource in the main thread (the server) available to the sub threads (not possible with the above threading model)?
what would happen if two threads end up needing the same resource? It's perfectly fine if one is locked until the other one finishes, but we still need to figure out issue #1.
Perhaps my train of thought is all screwed up, if you can find a better solution, I'm eager to hear it out. I've also had the idea of having a PHP thread hosting a connection resource, but it's pretty memory intensive.
PHP supports no threads. The purpose of PHP is to respond to web requests quickly. That's what the architecture was built for. Different libraries try to do something like threads but they usually cause more issues than they solve.
In general there are two ways to achieve what you want:
off-load the long processes to an external process. A common approach is using a system like gearman http://php.net/gearman
Use asynchronous operations. Some stream operations and such provide an "async" flag or "non-blocking" mode. http://php.net/stream-set-blocking
I have been searching Google for a while, but the problem I am running into is I am not exactly sure what it is I need to be searching for. (Searching for PHP C++ communication doesn't seem to be what I need) I am basically developing a c++ plugin for a game server, and I would like to create a web interface that can pass/pull data to and from the C++ plugin. The game already uses an RCON port for remote administrative access, but I stumbled across a header for the Network interface they use, so I assume I could use this.
My problem is I am not very familiar with using sockets. I assume I will basically need to open a socket in C++ and leave it listening, and then in the PHP, connect to that socket, pass the data, and close it.
Here is the interface...
http://www.ampaste.net/m2f6b6dbc
I am mostly just going to be pulling information like current list of connected players, names, and scores. And passing commands to restart the server, shut it down, etc.
Any help would be great, thanks!
You could try Thrift. It was written by the engineers at Facebook, and it's now an Apache project.
Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
Link: http://incubator.apache.org/thrift/
In a nutshell it does exactly what you're trying to do. It makes it easy for different languages to communicate with each other. Rather than trying to come up with some socket based protocol for communication, you can call a function in PHP like this:
$game->getScores();
And it automatically plugs into a function named getScores in your C/C++ program. The only drawback is it can be a bit of a pain to configure correctly.
I'd dare to recommend to use some standard means of distributed components communication, for example, XML RPC. There are libraries for both PHP and C++: http://en.wikipedia.org/wiki/XML-RPC#Implementations
This approach will keep you from reinventing the wheel during communication protocol implementation, and will make further maintenance cheaper.
I assume I will basically need to open a socket in C++ and leave it listening
err, yes, that's the description I'd give to my 12 year-old daughter - but if you're going to have more than one client connecting its a bit more involved. Especially if you are bolting the code onto an existing server. So have a read of the socket programming FAQ.
You do need to define a protocol of how data will be represented when travelling across the socket. THere are lots of 'standard methods - but sometimes things like CORBA / SOAP etc can just be overkill and more effort than starting from scratch.
If you are bolting code ontp an existing server, life will be a lot simpler if you use the current socket and extend the protocol if necessary.
There are 3 models for writing a socket server - the code snippet you provided does not seem to include details of which you are currently working with:
forking server (may split threads rather than processes)
single-threaded server
socketless server
forking server
An instance of the server is started (call it p1), calling setsid()
p1 starts listening on the relevant socket
a client tries to connect
p1 forks to create p2
p2 then accepts the connection and starts conversing with the client
p1 continues to listen for further connections
p2 exits when the connection closes
There are variations of this - p2 may accept further connections, p1 might fork prior to a connection coming in)
single-threaded
An instance of the server is started, calling setsid()
it starts listening for a connection, and creates an array of the sockets in use (including the initial one)
socket_select() is used to identify activity from any of the sockets
when a client connects, the connection is accepted and added to an array of connections
whenever socket_select() returns activity on one of the sockets, the server generaets an appropriate response / closes the socket / binds the new connection
socketless server
some process (e.g. inetd) handles all the socket stuff
when a client connects, this other server starts an instance of your program and binds the socket I/O to the STDIN/STDOUT of your program
when your program exits, the other process closes the socket (if its still open) and handles the clean up (e.g. if it is implemented as a forking server, then the spawned process may end)
What it appears you want to google is C++ client / server. There are two approaches I could suggest here.
First, would be to make a very basic HTTP protocol server so that your php script can simply go to http://yourip/ and send your commands through the POST variables. You can find an example of a C++ Web Server at: https://stackoverflow.com/questions/175507/c-c-web-server-library
The second approach which allows a lot more flexibility is make up your own basic protocol and use PHP's SOCKETS to connect to the server and send commands. You can find an example of a C++ client / server application at http://www.codeproject.com/KB/IP/client_server_socket.aspx. Keep in mind, for the C++ end, you are only concerned about the Server part. You can find a basic PING client in PHP, using sockets, at the following URL: http://www.planet-source-code.com/vb/scripts/ShowCode.asp?lngWId=8&txtCodeId=1786. There are also classes out there to handle most of the protocol part, though I am not aware of any that work for both languages.
Please note I have not tested any of the codes I linked to. I simply found them on google.
A good place to start would be http://php.net/manual/en/book.sockets.php.
Basically, you're going to create another remote administration port and method for PHP to connect. Naturally, if your going to only be accepting web communication from one IP, that's a good way to secure it (check and allow access to only the one IP which will connect). However, you will need the C++ server to listen on a (secure?) port and have PHP connect to it (as long as host allows it).
So overall, if you already have a server running, this should be simple from the C++ side. All you need to do from the PHP side is really research connecting to different servers and passing information along (which PHP is more than capable of doing efficiently)
But, this is obviously an alternative to the poster up 2. I personally enjoy (in many cases) "reinventing the wheel" so to speak as to be able to manage my own work. But of course, that is not always efficient by cost or otherwise.
Good luck!