Stateless & asynchronous web-server with PHP (and Symfony)

Stateless & asynchronous web-server with PHP (and Symfony) - php

TL;DR: I'm not sure this topic has its place on StackOverflow, but basically it's just a topic of debate and thinking about making PHP apps like we would do with NodeJS for example (stateless request flow, asynchronous calls, etc.)
The situation
We know NodeJS can be used as both a web-server and web-app.
But for PHP, the internal web-server is not recommended for production (so says the documentation).
But, as Symfony full-stack is based on the Kernel which handles Request objects, it means we should be able to send lots of requests to the same kernel, only if we could "bootstrap" the php web-server (not the app) by creating a kernel before listening to HTTP requests. And our router would only create a Request object and make the kernel handle it.
But for this, a Symfony app has to be stateless, for example we need Doctrine to effectively clear its unit of work after a request, or maybe we would need to sort of isolate some components based on a request (By identifying a request with its unique PHP class reference id? Or by using other php processes?), and obviously, we would need more asynchronous things in PHP or in the way we use the internal web-server.
The main questions I sometimes ask myself, and now ask to the community
To clarify this, I have some questions about PHP:
Why exactly is the internal PHP webserver not recommended for production?
I mean, if we can configure how the server is run and its "router" file, we should be able to use it like any PHP server, yes or no?
How does it behaves internally? Is memory shared between two requests?
By using the router, it seems obvious to me that variables are not shared, else we could make nodejs-like apps, but it seems PHP is not capable of doing something like this.
Is it really possible to make a full-stateless application with Symfony?
e.g. I send two different requests to the same kernel object, in this case, is there any possibility that the two requests create a conflict in Symfony core components?
Actually, the idea of "Create a kernel -> start server -> on request, make the kernel handle it" behavior would be awesome, because it would be something quite similar to NodeJS, but actually, the PHP paradigm is not compatible with this because we would need each request to be handled asynchronously. But if a kernel and its container is stateless, then, there should be a way to do something like that, shouldn't it?
Some thoughts
I've heard about React PHP, Ratchet PHP for Websocket integration, Icicle, PHP-PM but never experienced them, it seems a bit too complex to me for now (I may lack some concepts about asynchronicity in apps, that's why my brain won't understand until I have some more answers :D ).
Is there any way that these libraries could be used as "wrappers" for our kernel request handling?
I mean, let's create this reactphp/icicle/whatever environment setup, create our kernel like we would do in any Symfony app, and run the app as web-server, and when a request is retrieved, we send it asynchrously to our kernel, and as long as the kernel has not sent the response, the client waits for it, even if the response is also sent asynchrously (from nested callbacks, etc., like in NodeJS).
This would make any existing Symfony app compatible with this paradigm, as long as the app is stateless, obviously. (if the app config changes based on a request, there's a paradigm issue in the app itself...)
Is it even a possible reality with PHP libraries rather than using PHP internal web-server in another way?
Why ask these questions?
Actually, it would be kind of a revolution if PHP could implement real asynchronous stuff internally, like Javascript has, but this would also has a big impact on performances in PHP, because of persistent data in our web-server, less bootstraping (require autoloader, instantiate kernel, get heavy things from cached files, resolve routing, etc.).
In my thoughts, only the $kernel->handleRaw($request); would consume CPU, the whole rest (container, parameters, services, etc.) would be already in the memory, or, for the case of services, "awaiting to be instantiated". Then, performance boost, I think.
And it may troll a bit the people who still think PHP is a very bad and slow language to use :D
For readers and responders ;)
If a core PHP contributor reads me, is there any way that internally PHP could be more asynchronous even with a specific new internal API based on functions or classes?
I'm not a pro of all of these concepts, and I hope really good experts are going to read this and answer me!
It could be a great advance in the PHP world if all of this was possible in any way.

Why exactly is the internal PHP webserver not recommended for
production? I mean, if we can configure how the server is run and its
"router" file, we should be able to use it like any PHP server, yes or
no?
Because it's not written to behave well under load, and there are no configuration options that let you handle HTTP request processing before it reaches PHP.
Basically, it lacks features if you compare it to nginx. It would be equal to comparing a skateboard to a Lamborghini.
It can get you from A to B but.. you get the gist.
How does it behaves internally? Is memory shared between two requests?
By using the router, it seems obvious to me that variables are not
shared, else we could make nodejs-like apps, but it seems PHP is not
capable of doing something like this.
Documentation states it's singlethreaded, so it appears that it would behave the same as if you wrote while(true) { // all your processing here }.
It's a playtoy designed to quickly check a few things if you can't be bothered to set up a proper web server before trying out your code.
Is it really possible to make a full-stateless application with
Symfony? e.g. I send two different requests to the same kernel object,
in this case, is there any possibility that the two requests create a
conflict in Symfony core components?
Why would it go to the same kernel object? Why not design your app in such a way that it's not relevant which object or even processing server gets the request? Why not design for redundancy and high availability from the get go? HTTP = stateless by default. Your task = make it irrelevant what processes the request. It's not difficult to do so, if you avoid coupling with the actual processing server (example: don't store sessions to local filesystem etc.)
Actually, the idea of "Create a kernel -> start server -> on request,
make the kernel handle it" behavior would be awesome, because it would
be something quite similar to NodeJS, but actually, the PHP paradigm
is not compatible with this because we would need each request to be
handled asynchronously. But if a kernel and its container is
stateless, then, there should be a way to do something like that,
shouldn't it?
Actually, nginx + php-fpm behave almost identical to node.js.
nginx uses a reactor to handle all connections on the same thread. Node.js does the exact same thing. What you do is create a closure / callback that is fed into Node's libraries and I/O is handled in a threaded environment. Multithreading is abstracted from you (related to I/O, not CPU). That's why you can experience that Node.js blocks when it's asked to do a CPU intensive task.
nginx implements the exact same concept, except this callback isn't a closure written in javascript. It's a callback that expects an answer from php-fpm during <timeout> seconds. Nginx takes care of async for you. What your task is is to write what you want in PHP. Now, if you're reading a huge file, then async code in your PHP would make sense, except it's not really needed.
With nginx and sending off requests for processing to a fastcgi worker, scaling becomes trivial. For example, let's assume that 1 PHP machine isn't enough to deal with the amount of requests you're dealing with. No problem, add more machines to nginx's pool.
This is taken from nginx docs:
upstream backend {
server backend1.example.com weight=5;
server backend2.example.com:8080;
server unix:/tmp/backend3;
server backup1.example.com:8080 backup;
server backup2.example.com:8080 backup;
}
server {
location / {
proxy_pass http://backend;
}
}
You define a pool of servers and then assign various weights / proxying options related to balancing how requests are handled.
However, the important part is that you can add more servers to cope with availability requirements.
This is the reason why nginx + php-fpm stack is appealing. Since nginx acts as a proxy, it can proxy requests to node.js as well, letting you handle web socket related operations in node.js (which, in turn, can perform an HTTP request to a PHP endpoint, allowing you to contain your entire app logic in PHP).
I know this answer might not be what you're after, but what I wanted to highlight is the way node.js works (conceptually) is identical to what nginx does when it comes to handling incoming request. You could make php work as node does, but there's no need for that.

Your questions can be summed up as this:
"Could PHP be more like Node?"
to which the answer is of course "Yes." But that leads us to another question:
"Should PHP be more like Node?"
and now the answer is not that obvious.
Of course in theory PHP could be made more like Node - even to a point to make it exactly the same. Just take the next version of Node and call it PHP 6.0 or something.
I would argue that it would be harmful to both Node and PHP. There is a diversity in the runtime environments for a reason. One of the variations is the concurrency model used in a given environment. Making one like the other would mean less choice for the programmer. And less choice is less freedom of expression.
PHP and Node were created in different times and for different reasons.
PHP was developed in 1995 and the name stood for Personal Home Page. The use case was to add some server-side dynamic features to HTML. We already had SSI and CGI at that point but people wanted to be able to inject right into the HTML - synchronously, as it wouldn't make much sense otherwise - results of database queries and other computations. It isn't a surprise how good it is at this job even today.
Node, on the other hand, was developed in 2009 - almost 15 years later - to create high performance network servers. So it shouldn't surprise us that writing such servers in Node is easy and that they have great performance characteristics. This is why Node was created in the first place. One of the choices it had to make was a 100% non-blocking environment of single-threaded, asynchronous event loops.
Now, single-threading concurrency is conceptually more difficult than multi-threading. But if you want performance for I/O-heavy operations then currently you have no other options. You will not be able to create 10,000 threads but you can easily handle 10,000 connections with Node in a single thread. There is a reason why nginx is single-threaded and why Redis is single threaded. And one common characteristic of nginx and Redis is amazing performance - but both of those were hard to write.
Now, as far as Node and PHP go, those technologies are so far from each other that it's hard to even comprehend how their fusion would look like. It reminds me the old April Fool's joke about unifying Perl and Python that so many people believed in.
PHP has its strengths and Node has it strengths. And just like it would be hard to imagine Node with blocking-I/O, it would be equally hard to imagine PHP with non-blocking I/O.
To summarize: it could be possible to make PHP like Node, but I wouldn't expect it to happen any time soon - if ever.

Related

Is there a speed disadvantage to API-centric websites

I'm building an MVC web application (let's call it xyz.com). I want it to also support mobile apps via an API on the same server (let's call this api.xyz.com).
I'm confused about how to structure the web app vs. the API:
Should the web app use the API to query the database? Or should it perform them independently (like using a Model)? I mean, should the flow be (User > Controller > API > Database) or (User > Controller > Model > Database)?
If we use the APIs to query the database, how would you query the API? Would you use something like cURL?
If we use cURL, wouldn't that slow down the process? (As we are making a second request from the controller)? What's the ideal way to do this?
I've tried reading up on API-centric web apps, but there's not too much information about this on the net.
Any guidance would be appreciated.

since I have not enough points to comment I'll just leave my thoughts here.
1) I would make the web app use the API for 2 main reasons:
I don't like having multiples components accessing my database, you'll most likely have a lot of duplicate code on the data extraction and if you have a lot of filtering you might make mistakes and they'll behave differently.
I believe that (if I'm wrong please correct me) it'll be easier to increase performances on the app rather than the database. Since the API will handle resources it'll be easier to cache things than a full HTML web page.
2) To query the API I would most likely use composer and find a nice HTTP library, not sure which one though since I didn't use php in a while (symfony's and laravel's are quite nice if I remember well).
3) Yeah it might slow down the process a little but I believe it won't be enough to be noticed by the user. As I said above, with a correct cache handling you'll do just fine.
Hope having my thoughts on this can help, if I was wrong somewhere please feel free to correct me in the comments below (Don't know if I'll be able to respond tho... :s).
Have a nice day !

You really want to know the difference between an n-tier architecture and a single-tier architecture. An n-tier architecture is comprised of several tiers, which are connected via an internal protocol and API. E.g.:
backend frontend
data --- HTTP server --- HTTP client --- application --- HTTP server --- browser
application
You could have many more tiers in there. Those tiers can all run on the same physical machine, but they still talk to each other over HTTP; more likely you'd want to run each tier on a different machine though.
Yes, this will obviously incur some overhead in talking to your database backend over HTTP instead of doing it within the same PHP process. However, this is offset by:
Caching is built-in in this architecture. If you make proper use of HTTP caching by using a fully capable HTTP server and client and are using the HTTP cache mechanism well, you can reduce the absolute amount of queries made enormously. Practically you'd set up a reverse proxy on the web server on your frontend server, so your queries go PHP → curl → web server reverse proxy → backend server. If the reverse proxy is caching properly, that's where the chain often stops, which can be much faster than executing the actual query on the database. If the backend server is using HTTP caching effectively, it too can probably often respond with a simple 304 Not Modified, which is also very fast.
Load is distributed among more machines, which speeds up each individual machine. If your frontend servers can largely work with cached data without needing to bother the database, the database is much faster for the queries that it does need to handle. You can also scale up your frontend servers to many instances, each of which is faster because it has less load.
These are the advantages of such an architecture. The further advantage is that you can also directly expose your internal HTTP API to the outside world (again, reverse proxies make a lot of sense here). If you're not using an internal HTTP API, you have to write:
A controller and view which gets data from the model and renders to HTML.
A controller and view which gets data from the model and renders to JSON (or whatever).
To some degree controllers can be reused and simply the view switched out, if everything else is the same, but often you'll find that you need to duplicate each logical "thing" to be served as HTML in one case and JSON in another, and you need to keep those in sync.

How Can a LAMP Guy Easily Implement WebSockets?

I've always worked with Apache, MySQL, and PHP. I'd like to eventually branch out to Python/Django or Ruby/Ruby on Rails, but that's another discussion. Two great things about Apache, MySQL, and PHP are all three are ubiquitous and it's very easy to launch a website. Just set up an Apache virtual host, import the database into MySQL, and copy the PHP files onto the server. That's it. This is all I've ever done and all I've ever known. Please keep this in mind.
These days, it's becoming increasingly important for websites to be able to deliver data in real-time to the users. Users expect this too due to the live nature of Facebook and Gmail. This effect can be faked with Ajax polling, but that has a lot of overhead, as explained here. I'd like to use WebSockets. Now remember that I've always been a LAMP guy. I've only ever launched websites using the method I described earlier. So if I have, say, a CakePHP site, how can I "add on" the feature of WebSockets? Do I need to install some other server or something or can I get it to work smoothly with Apache? Will it require Apache 2.4? Please explain the process to me keeping in mind that I only know about LAMP. Thanks!

One key thing to keep in mind, is that a realtime websockets server needs to be "long running", so that it can push stuff to clients. In the classic LAMP setup, Apache spawns a PHP interpreter on each request. Between requests the PHP interpreter is not running, and the only protocol state kept between requests is sessions.
One nice property of the LAMP way, is that memory management is easy. You just implicitly allocate whatever memory you need, and it is automatically reclaimed when the request is done, and the PHP process exits. As soon as you want the server to keep running, you need to consider memory management. In some laguages, like C++, you manage allocation and deallocation explicitly. In other languages, like Java or Javascript, you have garbage collection. In PHP you throw everything away, and start with a fresh slate on each request.
I think you will have a hard time making long running servers with something like Cake or any other classic PHP framework. Those frameworks works by basically taking an HTTP request and turning it into an HTTP response.
My advice is that you should look into something like Node.JS and SocketIO. If you know Javascript, or don't mind learning, these technologies allow you to easily implement real-time servers and clients. If necessary you could run a reverse proxy like nginx, so that your existing LAMP stack would get some requests, and one or more NodeJS servers would get some.
This answer came out a bit fluffy, but I hope that it helps a little.. :-)

Socket.io and ajax request to php page

I'm setting up a realtime app that will be using socket.io. There's currently some core functionally in php, that utilizes memcache and mysql backend.
Would it make sense in the socket.io server to do an ajax request (if that's even possible) to the php page that handles this? There's a lot of MySQL querying, I know it can be done in node.js, but I'd rather keep this part abstracted in php if possible.
So, my question is, is that a proper thing to do? Call a php page from within the socket.io server to then return to the client?
Thanks!

I don't see any problems with having your node.js app communicate with your PHP app by exposing a RESTful API or some PHP script that you can POST to or GET from your socket.io node.js server. There are plenty of npm modules (like request) that can make HTTP requests like that a breeze for you. After retrieving the data from PHP in your node app, you can use socket.io to emit() the data to the socket.io client on the frontend.

There is nothing wrong with that. You are simply using a RESTful API to access the MySQL data, thus isolating the database details.
If one day you are tired of PHP, you can easily switch to Ruby, Python or Whatever for that part without even touching the node.js. If your logic is already written in PHP (you are upgrading an old app), it make even more sense as you can reuse what has already been tested and debugged. A lot of folks are advocating for that kind of separation between systems. Just look at all the SOA (Service Oriented Architecture) buzz.
Where I work we are using this very architecture in a project (though in this case its an ASP.NET MVC Website calling a Java EE app) and it served us very well. With the event model of node.js, its even better since you won't block waiting for the PHP.
But of course, there are some drawback
Performance overhead
Architecture is more complicated
You now work with two language instead of only one (though javascript and PHP are so often used together that I don't think it's really is a problem in this case)
So you need to ask yourself if your problem really need that solution. But in a lot of case the answer may be yes. Just don't forget the virtue of Keeping It Simple and Stupid (the KISS principle)

Should I be worried about HTTP overhead when calling local web services?

I'm currently re-developing a fairly large-scale PHP web application. Part of this redevelopment involves moving the bulk of some fairly hefty business logic out of the core of the web app and moving it into a set of SOAP web services.
What's currently concerning me (just slightly) is perceived overhead this brings with it in terms of local HTTP traffic. I should explain that the SOAP web services currently, and for the foreseeable future, will reside on the same physical server, and if and when they move they will remain on the same network. What concerns me is the fact that for each call that was an internal php function call, it is now an http request invoking a similar function call.
Obviously this is something I can measure as we move further along the line, but I was wondering if anyone could offer any advice, or more importantly share any previous experience of taking an application down this route.

Are you doing hundreds or thousands of these calls a second? If not, then you probably don't have to worry.
But profile it. Set up a prototype of the system working across the network with a large number of SOAP calls, and see if it slows down to unacceptable levels.

If the server is running on the same physical box then you can't have any privilege seperation. And increasing capacity by adding tiers to the stack (instead of euivalent nodes) is a recipe for non-availability.
If you're hiding something behind SOAP, the the HTTP overhead is likely to be relatively small in comparison to what the 'something' is doing (e.g. reading from database). However when you add up the cost of constructing the SOAP request, decomposing the soap request the compositing the response, add the overhead for HTTP then one has to wonder why you don't provide a shortcut by calling the PHP code implemented within the server directly in the same thread of execution.
If you were to design an abstract soap interface which mapped directly to and from php types then this could be shortcutted without having any overhead in maintining different APIs.

Does it HAVE to be a SOAP web service? Is this the client telling you this, or is it your decision?
You seem concerned about HTTP calls, which is fine, but what about the overhead of unnecessarily serializing/de-serializing to/from XML to be transferred over the "wire" - that "wire" being the same machine. =)
It's doesnt really make sense to have a SOAP-based web service where it's only to be consumed by a client on the same machine.
However I agree with #Skilldrick's answer - its not going to be an issue as long as you are intelligent about your HTTP calls.
Cache whenever you can, batch your calls, etc.

SOAP is more verbose that REST. REST uses HTTP protocol to do the same with less network bandwith, if that's your concern.
See:
WhatIsREST
Wikipedia
As to really answer your question, remember the 80/20 rule. For that use a benchmarking/tracing tool to help you finding where your hotspots are. Fix those and forget about the rest.

Shared/pooled connections to backend services in PHP

I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.
One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!

There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.
PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.
Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.
Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.
Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...

Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.
Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.