Bootstrapping web application - PHP, Ruby, Python, Node.js

Bootstrapping web application - PHP, Ruby, Python, Node.js - php

Assumption 1: When running PHP via apache or nginx, each incoming request results in the script bootstrapping all of its include files, so essentially there is no shared memory, and the "world is recreated" upon each request.
Assumption 2: Node.js applications are bootstrapped when the server is started. The "world is only created once".
Are Python and Ruby applications bootstrapped in a similar way as PHP or as Node.js?
If possible, would appreciate some guidance regarding terminology: is this basically a question of multi-threaded or concurrency support?

It depends totally on how the application is run.
Most web applications in Python are run as servers which receive requests, rather than being a 'dead' script that gets called on request. "The world is created before the request arrives".
Note I didn't say "only once", as you phrased it.
The reason I phrase it that way is that there are different ways of serving python web applications.
Most python (web) apps are 'WSGI' applications. WSGI is a specification which basically requires the application (or framework) to have a single entry-point function:
def app(environment, start_response):
where environment is all the stuff like the address being asked for, cookies, request type, query args, etc. start_response is a callback function, which the app function needs to call with the response HTTP code, and headers.
start_response('200 OK', [('Content-type', 'text/html')])
for example. Once the this has been called, either the function needs to return the body of the response to be sent back to the client, or else to yield it back as a generator (for super big files).
All of this is usually handled by a WSGI framework, which does all that transparently, and provides a easier to write interface for writing your application logic.
In PHP, all your routes & routing is normally handled by apache (or nginx/php-fpm) running individual script files. This, as you rightly suggest, requires re-creating the whole world each time. With WSGI, the world is already created, and WSGI simply calls the application function each time a new request comes in. Most python based web frameworks have some kind of router, either the flask style:
#app.route('/elephants')
def elephants_view():
return 'view the elephants!'
or Django style routing table:
urls = [
(r'^/kangaroos$', 'views.kangaroos'),
]
# in views.py:
def kangaroos():
return 'kangaroos, baby!'
or other ways. There are many different WSGI frameworks which all have their pros and cons. Some of the popular WSGI-based frameworks include Flask, Django, Falcon.
There are many different ways to serve WSGI applications. Flask & Django come with basic development servers, which are single-threaded, and great for development, but not suitable for production.
Since they're single-threaded, "the world is only created once". So global variables last between requests, etc.
There are many other WSGI servers, which can serve any of the frameworks on top of WSGI. Waitress is a great pure-python one. uWSGI is another production grade server, as is gUnicorn, and many others.
These servers do NOT guarantee that global state is shared between requests, and will 'create the world' an unspecified (configurable) number of times. Some of them use a fixed number of workers, which the main incoming reciever will pass out requests to, others may spin up new worker threads or processes as they are needed.
Flask, and most of the other Python WSGI frameworks do have the concepts of 'Application Globals' which is how you can store data which must last the whole server lifespan. These special values are shared between 'worlds'. (By using magic rings and pools in a forest).
(Side note: For fun, I started writing a WSGI server using the very cool gevent async library, which does work in the same kind of manner as Node.js in that it is only a single process, which does as much as possible asynchronously (although without Node.js callback style...) in a single thread. It's very short, just one file, so it's pretty easy to see how it all works.)
Ruby is pretty similar to Python in this way, except the protocol is called 'Rack', rather than WSGI, and common servers are 'Puma', 'Unicorn' and 'Rainbows!'. Common Ruby Rack-based frameworks are 'Ruby on Rails', 'Sinatra', and 'Merb'.
One advantage of this kind of model is that you can create 'middleware' which sits between the application responder, and the WSGI (or Rack) server, and "does stuff" to the request on the way (such as minifying javascript, caching, logging, authentication, etc).
Another good introduction to WSGI, and how it works is in 'Full stack Python'.
There are other ways of writing web servers than using WSGI (or Rack). For instance, the Tornado and Twisted frameworks in Python allow a totally different asynchronous style of (web) app to be written. They are also using 'the world is created before requests come in' style servers.

Related

Stateless & asynchronous web-server with PHP (and Symfony)

TL;DR: I'm not sure this topic has its place on StackOverflow, but basically it's just a topic of debate and thinking about making PHP apps like we would do with NodeJS for example (stateless request flow, asynchronous calls, etc.)
The situation
We know NodeJS can be used as both a web-server and web-app.
But for PHP, the internal web-server is not recommended for production (so says the documentation).
But, as Symfony full-stack is based on the Kernel which handles Request objects, it means we should be able to send lots of requests to the same kernel, only if we could "bootstrap" the php web-server (not the app) by creating a kernel before listening to HTTP requests. And our router would only create a Request object and make the kernel handle it.
But for this, a Symfony app has to be stateless, for example we need Doctrine to effectively clear its unit of work after a request, or maybe we would need to sort of isolate some components based on a request (By identifying a request with its unique PHP class reference id? Or by using other php processes?), and obviously, we would need more asynchronous things in PHP or in the way we use the internal web-server.
The main questions I sometimes ask myself, and now ask to the community
To clarify this, I have some questions about PHP:
Why exactly is the internal PHP webserver not recommended for production?
I mean, if we can configure how the server is run and its "router" file, we should be able to use it like any PHP server, yes or no?
How does it behaves internally? Is memory shared between two requests?
By using the router, it seems obvious to me that variables are not shared, else we could make nodejs-like apps, but it seems PHP is not capable of doing something like this.
Is it really possible to make a full-stateless application with Symfony?
e.g. I send two different requests to the same kernel object, in this case, is there any possibility that the two requests create a conflict in Symfony core components?
Actually, the idea of "Create a kernel -> start server -> on request, make the kernel handle it" behavior would be awesome, because it would be something quite similar to NodeJS, but actually, the PHP paradigm is not compatible with this because we would need each request to be handled asynchronously. But if a kernel and its container is stateless, then, there should be a way to do something like that, shouldn't it?
Some thoughts
I've heard about React PHP, Ratchet PHP for Websocket integration, Icicle, PHP-PM but never experienced them, it seems a bit too complex to me for now (I may lack some concepts about asynchronicity in apps, that's why my brain won't understand until I have some more answers :D ).
Is there any way that these libraries could be used as "wrappers" for our kernel request handling?
I mean, let's create this reactphp/icicle/whatever environment setup, create our kernel like we would do in any Symfony app, and run the app as web-server, and when a request is retrieved, we send it asynchrously to our kernel, and as long as the kernel has not sent the response, the client waits for it, even if the response is also sent asynchrously (from nested callbacks, etc., like in NodeJS).
This would make any existing Symfony app compatible with this paradigm, as long as the app is stateless, obviously. (if the app config changes based on a request, there's a paradigm issue in the app itself...)
Is it even a possible reality with PHP libraries rather than using PHP internal web-server in another way?
Why ask these questions?
Actually, it would be kind of a revolution if PHP could implement real asynchronous stuff internally, like Javascript has, but this would also has a big impact on performances in PHP, because of persistent data in our web-server, less bootstraping (require autoloader, instantiate kernel, get heavy things from cached files, resolve routing, etc.).
In my thoughts, only the $kernel->handleRaw($request); would consume CPU, the whole rest (container, parameters, services, etc.) would be already in the memory, or, for the case of services, "awaiting to be instantiated". Then, performance boost, I think.
And it may troll a bit the people who still think PHP is a very bad and slow language to use :D
For readers and responders ;)
If a core PHP contributor reads me, is there any way that internally PHP could be more asynchronous even with a specific new internal API based on functions or classes?
I'm not a pro of all of these concepts, and I hope really good experts are going to read this and answer me!
It could be a great advance in the PHP world if all of this was possible in any way.

Why exactly is the internal PHP webserver not recommended for
production? I mean, if we can configure how the server is run and its
"router" file, we should be able to use it like any PHP server, yes or
no?
Because it's not written to behave well under load, and there are no configuration options that let you handle HTTP request processing before it reaches PHP.
Basically, it lacks features if you compare it to nginx. It would be equal to comparing a skateboard to a Lamborghini.
It can get you from A to B but.. you get the gist.
How does it behaves internally? Is memory shared between two requests?
By using the router, it seems obvious to me that variables are not
shared, else we could make nodejs-like apps, but it seems PHP is not
capable of doing something like this.
Documentation states it's singlethreaded, so it appears that it would behave the same as if you wrote while(true) { // all your processing here }.
It's a playtoy designed to quickly check a few things if you can't be bothered to set up a proper web server before trying out your code.
Is it really possible to make a full-stateless application with
Symfony? e.g. I send two different requests to the same kernel object,
in this case, is there any possibility that the two requests create a
conflict in Symfony core components?
Why would it go to the same kernel object? Why not design your app in such a way that it's not relevant which object or even processing server gets the request? Why not design for redundancy and high availability from the get go? HTTP = stateless by default. Your task = make it irrelevant what processes the request. It's not difficult to do so, if you avoid coupling with the actual processing server (example: don't store sessions to local filesystem etc.)
Actually, the idea of "Create a kernel -> start server -> on request,
make the kernel handle it" behavior would be awesome, because it would
be something quite similar to NodeJS, but actually, the PHP paradigm
is not compatible with this because we would need each request to be
handled asynchronously. But if a kernel and its container is
stateless, then, there should be a way to do something like that,
shouldn't it?
Actually, nginx + php-fpm behave almost identical to node.js.
nginx uses a reactor to handle all connections on the same thread. Node.js does the exact same thing. What you do is create a closure / callback that is fed into Node's libraries and I/O is handled in a threaded environment. Multithreading is abstracted from you (related to I/O, not CPU). That's why you can experience that Node.js blocks when it's asked to do a CPU intensive task.
nginx implements the exact same concept, except this callback isn't a closure written in javascript. It's a callback that expects an answer from php-fpm during <timeout> seconds. Nginx takes care of async for you. What your task is is to write what you want in PHP. Now, if you're reading a huge file, then async code in your PHP would make sense, except it's not really needed.
With nginx and sending off requests for processing to a fastcgi worker, scaling becomes trivial. For example, let's assume that 1 PHP machine isn't enough to deal with the amount of requests you're dealing with. No problem, add more machines to nginx's pool.
This is taken from nginx docs:
upstream backend {
server backend1.example.com weight=5;
server backend2.example.com:8080;
server unix:/tmp/backend3;
server backup1.example.com:8080 backup;
server backup2.example.com:8080 backup;
}
server {
location / {
proxy_pass http://backend;
}
}
You define a pool of servers and then assign various weights / proxying options related to balancing how requests are handled.
However, the important part is that you can add more servers to cope with availability requirements.
This is the reason why nginx + php-fpm stack is appealing. Since nginx acts as a proxy, it can proxy requests to node.js as well, letting you handle web socket related operations in node.js (which, in turn, can perform an HTTP request to a PHP endpoint, allowing you to contain your entire app logic in PHP).
I know this answer might not be what you're after, but what I wanted to highlight is the way node.js works (conceptually) is identical to what nginx does when it comes to handling incoming request. You could make php work as node does, but there's no need for that.

Your questions can be summed up as this:
"Could PHP be more like Node?"
to which the answer is of course "Yes." But that leads us to another question:
"Should PHP be more like Node?"
and now the answer is not that obvious.
Of course in theory PHP could be made more like Node - even to a point to make it exactly the same. Just take the next version of Node and call it PHP 6.0 or something.
I would argue that it would be harmful to both Node and PHP. There is a diversity in the runtime environments for a reason. One of the variations is the concurrency model used in a given environment. Making one like the other would mean less choice for the programmer. And less choice is less freedom of expression.
PHP and Node were created in different times and for different reasons.
PHP was developed in 1995 and the name stood for Personal Home Page. The use case was to add some server-side dynamic features to HTML. We already had SSI and CGI at that point but people wanted to be able to inject right into the HTML - synchronously, as it wouldn't make much sense otherwise - results of database queries and other computations. It isn't a surprise how good it is at this job even today.
Node, on the other hand, was developed in 2009 - almost 15 years later - to create high performance network servers. So it shouldn't surprise us that writing such servers in Node is easy and that they have great performance characteristics. This is why Node was created in the first place. One of the choices it had to make was a 100% non-blocking environment of single-threaded, asynchronous event loops.
Now, single-threading concurrency is conceptually more difficult than multi-threading. But if you want performance for I/O-heavy operations then currently you have no other options. You will not be able to create 10,000 threads but you can easily handle 10,000 connections with Node in a single thread. There is a reason why nginx is single-threaded and why Redis is single threaded. And one common characteristic of nginx and Redis is amazing performance - but both of those were hard to write.
Now, as far as Node and PHP go, those technologies are so far from each other that it's hard to even comprehend how their fusion would look like. It reminds me the old April Fool's joke about unifying Perl and Python that so many people believed in.
PHP has its strengths and Node has it strengths. And just like it would be hard to imagine Node with blocking-I/O, it would be equally hard to imagine PHP with non-blocking I/O.
To summarize: it could be possible to make PHP like Node, but I wouldn't expect it to happen any time soon - if ever.

How is application lifecycle managed in PHP frameworks?

I come from a Java background, where the JVM is a long-running process, servers can be started by user code and application state is normally kept and managed by the framework of choice (e.g. Spring).
In the PHP world, things are stateless, as each script execution is volatile in the sense that it does not keep state (unless it uses an external medium, like a DB, in-memory cache, etc). The web server invokes the script via CGI (probably through php-fpm to optimise resources).
Is it correct that every single HTTP request incurs in the overhead of initialising the framework and middleware only for that request? It appears so when reading Laravel's Request Lifecycle, for example.
Doesn't this entail a lot of repetitive overhead for every single request that enters the system (e.g. detecting environment, initialising handlers, routes, ORM, logging, etc.)?
Or am I missing something? Do these frameworks indeed keep state in some manner?

Is there a central / main context in PHP?

Here is my question: Consider Django or web2py in Python (as web frameworks) or Java WEB applications (being simple servlets apps or complex struts2/wicket/whatever frameworks). They share at least two things I like:
There's a Context environment or a way to access data out of the request or session contexts (i.e. global data, singletones, pools ... anything that can share in-memory values and behavior).
Classes are loaded/initialized ONCE. Perhaps i'm missing something but AFAIK in PHP a class is loaded and initialized in a PER REQUEST basis (so, in a regular class, if I (e.g.) modify a static value, this will live only in the current request, and even a simultaneous request hitting that value will get a different one).
Is there a way to get that in php? e.g. in Python/Django i can declare a regular class and that class can hold static data or be a true singleton (again: perhaps a pool or a kind of central queue manager), and will be the same object until the django server dies (note: modules in python are kept loaded in the python context when imported).

The fact that PHP's "context" lives on a per-request basis is pretty much core to how the language works with web servers.
If you want to get it working more like Java or other languages where the data doesn't get reset every request, you basically have two options:
Serialize data into a file, DB, whatever, and reload it on the next request
Instead of serving your pages through a web server, write the server using PHP
Serializing data into storage and reloading it on subsequent requests is the typical approach.
Writing a server in PHP itself, while possible, is not something I would recommend. Despite much effort, PHP still has sort of bad memory management, and you are very likely to encounter memory leaks in long-running PHP processes.

Can Node.js have the same functionality as PHP, or should they be used together?

I know the basics of both PHP and Node.js, but I don't understand why some people argue over which one is better... Can Node.js be used to render web pages like PHP can? For example, can you make a BBS using Node.js (I know you have access to DBs, but rwndering posts seems to be a problem)? It seems to me like Node.js can be used either as a very simple HTTP server, that only serves basic HTML, without changing it, or for communication (which I use it for). For example, I'm making a browser MMO game, and I use PHP to serve the site and the forum/devblog, and Node.js for the actual game. Or am I missing something?

Can Node.js be used to render web pages like PHP can?
Yes. There are some differences, but the main idea is to use Node.js on the server-side to serve responses to the requests, similarly to PHP. So again, yes, you can render web pages using Node.js.
Can you make a Bulletin Board System using Node.js?
Yes. Rendering posts also can be done within Node.js. You can render posts also on client side JavaScript, so why you think server-side JavaScript (Node.js) would be more limited? For templating see eg. {{ moustache }} or Pure.
The main advantage of Node.js over "standard" PHP is that JavaScript is event-driven and you use single thread for all the requests, whereas in PHP you use separate thread for every request.
Node.js lets you write your own "server" using only JavaScript. It simplifies a lot if you want to build eg. communication server.
More resources
For more comprehensive solution for rendering web pages in Node.js see eg. ExpressJS Node.js framework. You may specifically be interested in the way ExpressJS allows you to render views.

Someone will give a more detailed answer than me but basically node.js can replace both Apache and Php. It's a platform that allows you to create a web application using only javascript. It has a different model respect to apache because node is event driven (being written in javascript) while Apache uses threads.
Anyway take a look here for a decent tutorial on node, the best way to understand it's by using it i think

Node.js and PHP can both be used to build server-side applications, but they have different strengths and weaknesses. While Node.js excels at handling large amounts of I/O and real-time applications, PHP is often used for traditional web applications. They can be used together to complement each other's strengths.

RPC w/ PHP - agnostic to transport mechanism

For a recent project, I have a PHP script running as a CLI-based daemon. This daemon will be responsible for monitoring/controlling independent worker processes.
Periodically, users will issue requests to manage workers through a PHP web front-end (CLI daemon and front-end code are on the same physical server). The front-end will need to make method calls to the daemon.
I'm confused about how to handle these "remote" method calls. I thought that using a RPC protocol such as JSON-RPC over a standard UNIX or TCP socket would be the way to go, but every implementation of JSON-RPC, XML-RPC, SOAP, etc. for PHP seems to be tightly coupled to HTTP. Since I'm not communicating over the web, HTTP is completely unnecessary.
So, two questions:
Why are most of the PHP RPC packages coupled to HTTP?
What is the best way to handle method calls as described above?

Why are most of the PHP RPC packages coupled to HTTP?
This is easy. PHP is tailored for the web. It's rarer to write CLI applications in PHP.
Why are most of the PHP RPC packages coupled to HTTP?
It's more common to have PHP perform RPC on programs running in another language, such as Java, and there are good options there.
For a CLI PHP program, I'm not aware of any out-of-the-box solution. But it should be possible to implement a custom solution with UNIX sockets. See the sockets extension. Note that the nonexistence of multi-threading support in PHP may make this a bit more difficult (to handle multiple connections you'll have to fork or implement your own single-thread scheduler...)

You could still use HTTP and connect to localhost, that would not generate any network traffic. I don't think there is any real advantage to be gained by using sockets directly, however if you really want a different transport layer, you could use Ripcord (http://ripcord.googlecode.com/) which allows you to specify your own transport layer class. For full disclosure, I am the author of Ripcord, so I may be biased.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.