I want to make the most lightweight possible HTTP server in C that supports PHP and possibly FastCGI if it will make a huge difference.
I'm not sure how to implement PHP support. Does it just call PHP.exe with the path to a .php file and read the output? What about things like header () in PHP? How are those handled by the server?
And another question, is it ideal to use separate threads for each request? I don't expect a very heavy load, but I'm not 100% sure on the design aspect of this...
I'm still pretty new to C and C++ and this is a learning experience.
Firstly let me say that if the goal is a lightweight HTTP server that serves PHP pages, this has already been done. Have a look at nginx.
As for a learning experience, you've chosen something that's actually fairly tough.
Multithreaded is hard at the best of times. On C/C++ (anything with manual memory allocation really) it's an order of magnitude harder.
Added to this is network communication. There are quirks to deal with, different versions of HTTP (mostly a non-issue now), all sorts of HTTP headers to deal with and so on.
The most intuitive solution to this problem is to have a process that listens to a port. When it receives a request, it spawns a process, which may exec to a PHP process if required.
This however does not scale. The first (obvious) optimization is to use threads instead of processes and some form of interthread communication. While this helps, it will still only scale so far.
Go beyond that and you're looking at async socket handling, which is fairly low level.
All of these however are fairly big projects.
Is there any particular reason you're doing this in C/C++? Or any particular reason you're learning one or both of those languages? These languages certainly have their place but they're increasingly becoming niche languages. Managed (garbage collected) languages/platforms have almost completely taken over. Joel argues that garbage collection is about the only huge productivity increase in programming in about the last 20 years and I tend to agree.
For a learning experience regarding HTTP code written in C you may also take a look at:
http://hping.org/wbox/
To make your own HTTP server, I reccomend to get inspiration from other peoples code. The programmer ry famous for the node.js framework has written simple elegant code regarding this matter.
Check out his libebb library, it has a parser generated with Raegel using the easy yet powerful PEG (it's based on Zed Shaw's mongrel parser). Also check the example usage. It is really clean and usable code.
libebb is a lightweight HTTP server library for C.
It lays the foundation for writing a web server
by providing the socket juggling and request parsing.
By implementing the HTTP/1.1 grammar provided
in RFC2612, libebb understands most most valid HTTP/1.1
connections (persistent, pipelined, and
chunked requests included) and rejects invalid or
malicious requests. libebb supports SSL over HTTP.
Regarding PHP-Server coupling, the easiest way is CGI but if you feel adventurous dig into php source code under SAPI (Server API) modules to see how to do it.
Similar to libebb, see http://www.gnu.org/software/libmicrohttpd/. It too uses GnuTLS for optional SSL.
Related
I once saw a post suggesting that PHP was originally built for merely showing static webpage, not for actual programming and therefore it has serious drawback in long time execution due to memory leakage.
I know PHP "can" be used to crawl dozens of web pages, process audio/video files etc but is it "good" at it? Of course judging whether it is good at something shall be a relative assessment. In this case, ASP/node.js/Python will be the comparisons.
Has PHP7.x been improved or adapted to long time execution?
This is a sort of question in general not the specific one. But I think this post may give useful insight to many people.
PHP is not a bad choice for writing long-running processes, mostly the issues are that starting a long-running process from mod_apache or PHP-FPM gets overly complicated.
There are full Web servers, WebSocket servers, and more written in pure PHP, and are long-running, and work quite well, perhaps not as fast as Node.js or Python in their execution, but for network/database bound workloads I don't think there would be a significant difference.
If you are comfortable programming in PHP, then I would suggest using it would be a good choice to get started.
Some example of PHP based servers:
https://reactphp.org/
https://github.com/appserver-io/webserver
Both of those examples are non-blocking servers written in PHP.
I'm starting to consider websockets as a solution to replace long polling in a new build PHP app I am commissioning.
I have a few questions which I wonder if people could help me out with.
Can a Nodejs server call PHP and if it did wouldn't it suffer the same shortcomings as just going through Apache in terms of the connections? We all know nodejs is non blocking and Apache etc isn't but if Nodejs is just making a call to a PHP server in it's own procedure would that not bottle neck in a similar way?
Are PHP and websockets a good match?
Are there any good js libraries besides socketio which apparently only works with Nodejs?
Has anyone found a good tutorial which uses websockets and a PHP backend maybe using something like that Ratchet PHP library which might help me get on my way?
Thoughts would be muchly appreciated.
Please excuse my paraphrasing of your questions.
1: Can Node.js call PHP, and wouldn't that have the same shortcomings as Apache?
Calling a run-once PHP script will have the same general shortcomings as calling a web page, except that you are removing an extra layer of processing. Apache or any web server itself is such a thin layer that, while you'll save some time, the savings will be insignificant.
If PHP is more effective at gathering data for your clients than Node.js, for whatever reason, then it might be wise to include PHP in your application.
2: Are PHP and WebSockets a good match?
Traditional PHP scripts are normally intended to be run once per request. The vast majority of PHP developers are unfamiliar with event driven development, and PHP itself does not (yet) have support for asynchronous processing.
PHP is a fast, mature scripting language that is only getting faster, even with all of its many warts and shortcomings. (Some say that its weak typing is a shortcoming. Others say that it's a shortcoming that its typing isn't weak enough.)
That said, the minimum that any language needs in order to implement WebSockets is the ability to open up a basic TCP port and listen for requests. For PHP, it is implemented as a thin wrapper around the C sockets library, and there are additional extensions and frameworks available that can also change the feel of working in TCP sockets with PHP.
PHP's garbage collector has also matured. Memory leaks come either from gross disregard for the memory space (I'm looking at you, Zend Framework) or from intentional sabotage of the garbage collection system by developers who think they're clever or want to prove how easy it is to defeat the GC. (Spoiler: It's easy in every language, if you know the details!)
It is quite possible and very easy to set up a daemon (long running background process) in PHP. It's even possible to make it well behaved enough to gracefully restart and hand its connections off to a new version of the same script, or even the same script on the same server running different versions of PHP, though this is treading out of scope just a tiny little bit.
As for whether it's a good match, that is completely up to the developer. Are you willing, able, and happy to work with PHP to write a WebSockets server, or to use one of the existing servers? Yes? Then you're a good match for PHP and WebSockets.
3: JS Libraries for WebSockets
I honestly haven't researched them.
4: Tutorials for using PHP and Websockets
I'm personally fond of this tutorial: http://www.phpbuilder.com/articles/application-architecture/optimization/creating-real-time-applications-with-php-and-websockets.html
Although I have it on good authority that the specifics of that tutorial will soon be obsolete for that specific WebSockets server. (There will still be an actively maintained legacy branch for that server, though.)
In case of link rot:
Using the PHP-Websockets server (available on Github, will be homed soon), extend the base WebSocketServer abstract class and implement the abstract methods process(), connected(), and closed().
There's much better information at the link above, though, so follow it as long as the link exists.
It would hit the same bottleneck if you go through apache. This can be remedied by using a different web server, such as lighthttpd or nginx. You won't even need node at all.
PHP does not have decent shared memory making the biggest advantages of a WebSockets irrelevent. It should be decent enough if you don't want interaction between users, but even then I would have to frown upon the usage of PHP. PHP is great for a lot of things, but real-time communication is not one of them.
You might want to look at https://github.com/einaros/ws.
PHP is not a good back-end. Anything with an execution model that isn't run-and-forget in its own sandbox, such as Node, .NET, C/C++ and Java are good matches. PHP is suited for short running executions, such as actual web sites and even web services -- but not real time connections.
Per this post here there are 3 ways
(1)do the whole thing in C++, making your program a standalone web server (possibly proxying through apache to provide things like ssl, static media, authentication etc.)
(2)run C++ in a cgi-bin, through apache
make a PHP wrapper that shells out to the C++ part (this is a nice option if the performance-critical part is small, as you can still use the comfort that PHP's garbage collection and string manipulation gives you)
I'm not sure which is best so I looked at what a high volume site does. Here is a post from Facebook in 2010
They use a static analysis tool Hip Hop, to convert PHP to C++.
I don't need the static analysis tool as I only have about 1500 lines and can convert by hand...but I need a starting point.
Right now I run a Lamp stack and want to stay on it minus the (P)HP.
Here is a link that explains how Facebook works. Not sure how accurate it is.
Thanks
As the comments note, Facebook is almost certainly using a highly-customized solution that involves high administration costs in return for very high efficiency. It is unlikely that this is actually what you want.
Since what you want is simply to replace the "P" in your LAMP stack, that implies that you probably want to keep the "LAM" -- the Linux, Apache, and MySQL (if relevant) parts. That's a good idea; while there are advantages at Facebook's scale to running a custom web server, it is extremely unlikely that it will actually be useful for you, and continuing to run Apache is certainly much easier and simpler. (And probably more secure, since you don't have to think about the security and fix bugs all by yourself.)
And you're planning to translate all your PHP, not just part of it, so calling C++ from PHP doesn't make sense.
Thus, in your case, the best solution is most likely to be running the C++ application via cgi-bin with your existing Apache server.
FastCGI is a much better option than CGI, and can act like CGI in certain circumstances. If you only want to work with Apache, you can also develop an Apache module, and there's an excellent book on the subject: The Apache Modules Book This describes many elements of C development with Apache acting in many ways like a (sort of) application server.
With careful C/C++ coding, you can achieve remarkable performance with limited memory. Not for everything, but in some circumstances, very powerful.
There have been great things happening in the Haskell web development world, and some of the available frameworks (Yesod and Snap server) seem quite mature. However the learning curve can be a bit steep, and perhaps building web apps cannot quite be considered Haskell's forte.
The answer to another SO question of mine indicates that writing PHP extensions in Haskell should be possible. Infact I'm currently in the process of trying to convert a small Haskell program to a PHP extension as a proof of concept.
So, the question is - is there a case for creating a Haskell web framework that is meant to be run as a PHP extension and leaves all the request/response / cookies etc. handling to PHP?
What would be the design decisions involved in creating such a framework? Right now, the only thing I can think of is that it would probably expose an XML/JSON API accessible by the PHP pages using GET and SET function calls.
I can't think of a use case where this makes any sense. If you want something else to handle the HTTP request/response, you'd be better off writing to the Apache API directly.
Introducing PHP gives you argument parsing and cookie handling but also introduces a lot of other silliness. Not only are many of the common practices very unsafe or insecure, but you are limited to content generation -- if you want to dispatch to other parts of code based on the URL you have to write all that yourself. Many mature PHP programs end up just having one "start" PHP script. You also will have problems if you want to do anything interesting with uploaded files, because PHP handles that in a suboptimal way.
You could theoretically do something very processor intensive in your Haskell extension, but you might as well just write a C extension for PHP in that case. And PHP invocations are never supposed to hang around for very long anyway.
Seems like you are limiting yourself to PHP's brain-damaged model of a web application for the very trivial benefit of argument and header parsing.
Writing a Haskell interface to the Apache API could potentially be liberating. You could rely on a battle-tested web server, and also hook into every phase of the Apache request cycle. Apache's way of preforking and killing children every now and then might be a way of dealing with Haskell space leaks, although it's a sledgehammer approach.
Most of my application is written in PHP ((Front and Back ends).
There is a part that works too slowly and I will need to rewrite it, probably not in PHP.
What will give me the following:
1. Most speed
2. Fastest development
3. Easily maintained.
I have in my mind to rewrite this piece of code in CPP as a PHP extension, but may be I am locked on this solution and misses some simpler/better solutions?
The algorithm is PorterStemmerAlgorithm on several MB of data each time it is run.
The answer really depends on what kind of process it is.
If it is a long running process (at least seconds) then perhaps an external program written in C++ would be super easy. It would not have the complexities of a PHP extension and it's stability would not affect PHP/apache. You could communicate over pipes, shared memory, or the sort...
If it is a short running process (measured in ms) then you will most likely need to write a PHP extension. That would allow it to be invoked VERY fast with almost no per-call overhead.
Another possibility is a custom server which listens on a Unix Domain Socket and will quickly respond to PHP when PHP asks for information. Then your per-call overhead is basically creating a socket (not bad). The server could be in any language (c, c++, python, erlang, etc...), and the client could be a 50 line PHP class that uses the socket_*() functions.
A lot of information needs evaluated before making this decision. PHP does not typically show slowdowns until you get into really tight loops or thousands of repeated function calls. In other words, the overhead of the HTTP request and network delays usually make PHP delays insignificant (unless the above applies)
Perhaps there is a better way to write it in PHP?
Are you database bound?
Is it CPU bound, Network bound, or IO bound?
Can the result be cached?
Does a library already exist which will do the heavy lifting.
By committing to a custom PHP extension, you add significantly to the base of knowledge required to maintain it (even above C++). But it is a great option when necessary.
Feel free to update your question with more details, and I'm sure Stack Overflow will be happy to help out.
Suggestion
The PorterStemmerAlgorithm has a C implementation available at http://tartarus.org/~martin/PorterStemmer/c.txt
It should be an easy matter to tie this C program into your data sources and make it a stand alone executable. Then you could simply invoke it from PHP with one of the proc functions, such as proc_open()
Unless you need to invoke this program many times PER php request, then this approach should save you the effort of building and integrating a PHP extension, not to mention that the hard work (in c) is already done.
Am not sure about what the PorterStemmerAlgorithm is. However if you could make your process run in parallel and collect the information together , you could look at parallel running processes easily implemented in JAVA. Not sure how you could call it in PHP, but definitely maintainable.
You can have a look at this framework. Looks simple to implement
https://computefarm.dev.java.net/
Regards,
Franklin.
If you absolutely need to rewrite in a different language for speed reasons then I think gahooa's answer covers the options nicely. However, before you do, are you absolutely sure you've done everything you can to improve the performance if the PHP implementation?
Is caching the output viable in your situation? Could you get away with running the algorithm once and caching the output rather than on every page load?
Have you tried profiling the code to ensure there's no unnecessary work being done (db queries in an inner loop and the like). Xdebug can help here.
Are there other stemming algorithms available which might perform better on your dataset?