PHP vs. C CGI for small web service - php

I am creating a small web service which will only be accessed by machines, not users that simply takes a query string and makes a few MySQL queries. I decided to code this in PHP because it is simple and easy to write and does its job well. My boss however, wants we to write it as a CGI in C (using FastCGI) because he says it will be faster and use less memory. I'm not so keen on this idea for a few reasons:
The MySQL API for C seems to have a lot more calls than the equivalent PHP and need a lot more error handling.
String manipulation in C is somewhat complicated and messy.
The code in C is almost 3 times as long as the equivalent code in PHP and looks rather messy, with lots of error handling.
But that's just my opinion. What other factors do I need to take in to account? Is C the best tool for this job? Or is PHP?

If speed is your (or your boss's) concern, check out the G-WAN server, which allows allows you to write C scripts. There are some MySQL samples in the forum. It'll be much faster than FastCGI (which has to cross process boundaries via sockets).

Some points about this:
It's much easier to implement any web
service in PHP rather than C, because
PHP is a language designed to work
online and C is a general language.
C is compiled and this makes it much faster than PHP programs, but if you are really worried about optimization (your program consumes a lot of resources and your hardware is very limited) you can use PHP "accelerators" that compiles the source (for example)
Access to databases is equal penalized in both languages, so if your program has to access a lot to DB, it will have a similar performance.
The more important: the boss is the boss :)

IMHO, if most processing is made by MySQL, there is no need to write this code in C, because difference is negligible, but if there is much processing will be made by your code, it makes sense to listen to your boss and make it in C

Ask your boss if it wouldn't be better if you used C++.
Remind him about the maintainability of the code, and that with C++, you can introduce less bugs, at the price of just little speed.
Remind him that the "time" needed for fixing or improving software is "included" in the "execution time" - the time you'd have the services suffer because of a little mistake.
And try fastcgi++. You've also got boost and other libraries which can help you do the boring stuff faster, and concentrate on WHAT MATTERS.
And if the application does lots of calculations, it is a good idea to make it in C / C++.
Otherwise remind your boss that usually, no matter the language, the application won't be slowed down by computations, but by "waiting for resources" - waiting for the disk or waiting for the database are the most prevalent ones - which happens no matter the language.
Remind your boss that for the speed, you can get about 10-60% improvement by using apc.
If he simply doesn't want, then you have two choices:
quit the job
do as he asks you to
Why C++, and not C
It is true that with good coding conventions, you can write manageable code in C too.
But with C, you still have the verbosity of error handling, as opposed to exceptions, of NUL-terminated strings, of maps or what not. I love C too more than C++, but let's face it:
here is not about the language, it's about the mentality of a despot (the boss) who needs to be fought with arguments and learn to lend an ear to his employees.

Both tools are ok. If your boss wants C, made it in C.

You got no choice because your boss want it. Unless you can't code in C then just C it.

Your app is most likely I/O bound. So it won't even be noticeable faster in C.

shudder... You should leave the company. A boss shouldn't be allowed to micormanage the workers... unless he's also the programming guru of the company :)

Related

Embed C in PHP to improve speed

I'm thinking of writing a PHP extension from C, just to improve the speed. strpos() and preg_match() etc. are way too slow for my project.
But it struck me, that strpos() and preg_match() must have been 'originally' written in C or some other primitive language.
So, here my question: Is it meaningful, that I write some extension in C, just in order to improve the computation speed?
It might be useful if you can identify a "self-contained" bottleneck. PHP is still a scripting language. There are a lot of lookup operations, some memory operations which can be optimized away in C, maybe a handle/value/memory block from one of the underlying libraries that you could store/use more efficiently in your specific case, and so on and on.
But, make sure that the code block you're touching is worth the effort. I.e. first identify the bottleneck. Run a php profiler (like e.g. xdebug) and then maybe even a C profiler to see where the time is spent in the php runtime. And keep in mind that if you write the extension it's your job to keep it up to date, running and functional (including bug tracking/fixing, quality assurance, ...).
It is really cool that you are interested in writing some extension in PHP.
Please go through the below link to understand more about how Facebook started the HipHop project to increase speed. They achieved this by writing some code in the native language like C instead of PHP.
http://developers.facebook.com/blog/post/2010/02/02/hiphop-for-php--move-fast/
But instead of rewriting some already written ext in PHP, try to write your new one, you will find many articles on writing a new extension in PHP.
The existing extensions are already optimized, so if you want to do some specific work, and have a good algorithm to support it, go for writing your own extension.
not proofed, but i think, that you can't gain significantly better speed when you just do your own low-level implementation of regular-expressions or string-scanning...
php is written in c and highly optimized already...
check your code and improve the flow...
if it's impossible, take a look at "HipHop" from facebook...
Its not meaningful to write another implementation of strpos() or preg_match() in C. because PHP has already implemented them in C.
rather its meaningful to make your PHP code optimized such that It can use those functions instead of abusing them
But still If you really want to speed things up by providing yet another implementation It might help if and only if its fast enough. otherwise its just waste of time and labour.
You can have a look on PHP source code and check the current implementation of these function and see if you can really improve or not.

Server side execution of user submitted code

Here is my situation. I am building an application that contains some heavy mathematical calculations where the formula needs to be editable by a sufficiently privileged, but untrusted, user.
I need a secure server side scripting language. I need to be able to access constants and values from 4+ database tables, the results of previous calculations, define user variables and functions, use if/then/else statements, and I'm sure more that I can't think of right now.
Some options I've considered:
I have considered using something like this matheval library but I would end up needing to extend it considerably for my use case. I would essentially be creating my own custom language.
PHP runkit sandbox. I've never used this before but am very concerned about the security issues involved. Considering the possible security issues, I don't think that this is a viable option.
One other idea that has crossed my mind that I don't know if it is possible would be to use something like javascript on the server side. I've seen js used as a scripting platform in desktop applications to extend functionality and it seems a similar approach may be feasible. I could ideally define the environment that things ran it, such as disabling filesystem access etc. Again, security seems like it would be an issue.
From the research I have done, it seems like #1 is probably my only option, but I thought I would check with a larger talent pool. :-)
If #3 is possible, it seems that it would be the way to go, but I can't seem to turn up anything that is helpful. On the other hand, there may not be much difference between #2 and #3.
Performance is another consideration. There will be roughly 65 some odd formulas each executing about 450 times. Each formula will have access to approximately 15 unique variables a hundred or so constants, and the results of previous formulas. (Yes, there is a specific order of execution.)
I can work with an asynchronous approach to calculation where the calculation would be initiated by a user event and stored in the db, but would prefer to not have to.
What is the best way to work with this situation? Are there any other third party libraries that I haven't turned up in my research? Is there another option in addition to my 3 that I should consider?
There's almost no reason to create a custom language today. There's so many available and hackable, writing your own is really a waste of time.
If you're not serving a zillion users (for assorted values of a zillion), most any modern scripting language is securable, especially if you're willing to take draconian measures to do so (such as completely eliminating I/O and system interfaces).
JavaScript is a valid option. Its straightforward to create mini-sandboxes within JS itself to run foreign code. If you want folks to be able to persist state across runs, simply require them store it in "JSON-like" JS structures that can be readily serialized from the system on exit, and just as easily reloaded. These can even be the results of the function.
If there's a function or routine you don't want them to use, you can un-define it before firing off of the foreign code. Don't want them using "read" to read a file? read = func(s) { }
Obviously you should talk to the mailing lists of the JS implementation you want to use to get some tips for better securing it.
But JS has good support, well documented, and the interpreters are really accessible.
You have two basic choices:
a) Provide your own language in which you completely control what is done,
so nothing bad can happen,
b) Use some other execution engine, and check everything it does to verify nothing bad happens.
My problem with b) is it is pretty hard to figure out all the bad things somebody might do in obscure ways.
I prefer a), because you only have to give them the ability to do what you allow.
If you have a rather simple set of formulas you want to process, it is actually pretty easy to write a parser/evaluator. See Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
It isn't clear to me that you have a performance problem. yes, you want to execute something 450 times; but it includes database accesses, whose cost will dominate any computation involivng a 1000 arithmetic steps. You may find that your speed is limited by the DB access that that you need to cache the DB accesses to get it to go faster.

Multi-tier applications with PHP?

I am relatively new to PHP, but experienced Java programmer in complex enterprise environments with SOA architecture and multitier applications. There, we'd normally implement business applications with business logic on the middle tier.
I am programming an alternative currency system, which should be easy deployable and customizable by individuals and communities; it will be open source. That's why php/mysql seems the best choice for me.
Users have accounts, and they get a balance. also, the system calculates prices depending on total services delivered and total available assets.
This means, on a purchase a series of calculations happen; the balance and the totals get updated; these are derived figures, something normally not put into a database.
Nevertheless, I resorted to putting triggers and stored procedures into the db, so that in the php code none of these updates are made.
What do people think? Is that a good approach? My experience suggests to me that this is not the best solution, and prompts me to implement a middle tier. However, I would not even know how to do that. On the other hand, what I have so far with store procs seems to me the most appropriate.
I hope I made my question clear. All comments appreciated. There might not be a "perfect" solution.
As is the tendency these days, getting away from the DB is generally a good thing. You get easier version control and you get to work in just one language. More than that, I feel that stored procedures are a hard way to go. On the other hand, if you like that stuff and you feel comfortable with SPs in MySql, they're not bad, but my feeling has always been that they're harder to debug and harder to handle.
On the triggers issue, I'm not sure whether that's necessary for your app. Since the events that trigger the calculations are invoked by the user, those things can happen in PHP, even if the user is redirected to a "waiting" page or another page in the meantime. Obviously, true triggers can only be done on the DB level, but you could use a daemon thread that runs a PHP script every X seconds... Avoid this at all costs and try to get the event to trigger from the user side.
All of this said, I wanted to plug my favorite solution for the data access layer on PHP: Doctrine. It's not perfect, but PHP being what it is, it's good enough. Does most of what you want, and keeps you working with objects instead of database procedures and so forth.
Regarding your title, multiple tiers are, in PHP, totally doable, but you have to do them and respect them. PHP code can call other PHP code, and it is now (5.2+) nicely OO and all that. Do make sure to ignore the fact that a lot of PHP code you'll see around is total crap and does not even use methods, let alone tiers, and decent OO modelling. It's all possible if you want to do it, including doing your own (or using an existing) MVC solution.
One issue with pushing lots of features to the DB level, instead of a data abstraction layer, is that you get locked into the DBMS's feature set. Open source software is often written so that it can be used with different DBs (certainly not always). It's possible that down the road you will want to make it easy to port to postgres or some other DBMS. Using lots of MySQL specific features now will make that harder.
There is absolutely nothing wrong with using triggers and stored procedures and other features that are provided by your DB server. It works and works well, you are using the full potential of the DB, instead of simply relegating it to being a simplistic data store.
However, I'm sure that for every developer on here who agrees with you (and me), there are at least as many who think the exact opposite and have had good experiences with doing that.
Thanks guys.
I was using db triggers because I thought it might be easier to control transaction integrity like that. As you might realize, I am a developer who is also trying to get grip of the db knowledge.
Now, I see there is the solution to spread the php code on multiple tiers, not only logically but also physically by deploying on different servers.
However, at this stage of development, I think I'll stick to my triggers/sp solution, as that doesn't feel to be that wrong. Distributing on multiple layers would require me to redesign my app consistently.
Also, thinking open source, if someone likes the alternative money system, it might be easier for people to just change layout for their requirements, while I would not need to worry that calculations get wrong if people touch php code.
On the other hand, of course, I agree that db stuff might get very hard to debug.
The DB init scripts are in source control, as are the php files :)
Thanks again

is it worth it to compile a C program and run it instead of PHP page?

it seems that most of the time, the speed gained is not worth it -- is it so? otherwise many people will do it for their most popular page. Is there real benefit of using a C program. I can think of a case where it is not important: when the network bottleneck on the server is quite bigger than the CPU bottleneck, then how fast the program runs becomes less important.
C is an excellent language. But it was designed for systems level programming not making web pages. PHP on the other hand was designed for making web pages. Use the right tool for the right job. In this case PHP.
Also you're starting with a faulty premise. Namely that PHP won't be fast enough to deliver the page content. There are a multitude of websites out there that simply disagree with that statement. Maybe there is some corner case out there that C is the only choice for the job but I find it highly unlikely that you are going to run into that scenario.
When you use C as a hammer, everything looks like a thumb.
As Jared stated above, use the right tools. You could do it all in C, many have. But the development speed of PHP vs C for the web is something you might look into also. Something that is pretty simple to do in PHP (dynamic array's for example) is something that is not simple in C.
Most websites use caching in memory for high speed response.
In this case the load is not in the creation of the page anymore so I wouldn't go for c program.
I would say, even in the case that you illustrated, it would never be worth it to redesign your server side work to run as a C program. Regardless of the amount of traffic.
The cost of upgrading your server or implementing effective caching will always be less than the work it would require to write rewrite your server side code to run in C.
it really depends on the code. For instance, I'd rather compile C code that deals with my network graph algorithm; however, i'd never convert a simple static page that displays minimal CSS.
If you even need to ask the question, the answer is no.
Web based apps are limited by network, database, i/o, and CPU, in that order, and the speed increase you would get by writing your code in C would certainly be offset but the increased development time, due the nature of the language, and still be affected more markedly by those other factors.
If your code needs to run really fast, doing something quite computationally difficult or involved, perhaps C could be the right answer, but most web sites are not like this.
It could be worthwhile having very specific processing or heavy lifting code rewritten as C modules once it is determined that the current implementation is too slow, but these optimisations are best done down the track, with sure knowledge of the real bottlenecks in the app.
If you have a decent library or platform eg .NET MVC, using C# for general web dev is a feasible idea. C, not so much.
There is one scenario (and in all other cases I can think of, the answer is that it won't worth it -good reasonings are in the other answers).
So the case is when
the program needs to do a lot of data processing or calculations
there are heavily optimized C codes available
I'd say a complex raytracing system is a candidate for the situation, but nothing under the complexity of this.
Even in this case, I'd implement the pages in PHP and run a background job to do the heavy lifting.

Seriously, should I write bad PHP code? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm doing some PHP work recently, and in all the code I've seen, people tend to use few methods. (They also tend to use few variables, but that's another issue.) I was wondering why this is, and I found this note "A function call with one parameter and an empty function body takes about the same time as doing 7-8 $localvar++ operations. A similar method call is of course about 15 $localvar++ operations" here.
Is this true, even when the PHP page has been compiled and cached? Should I avoid using methods as much as possible for efficiency? I like to write well-organized, human-readable code with methods wherever a code block would be repeated. If it is necessary to write flat code without methods, are there any programs that will "inline" method bodies? That way I could write nice code and then ugly it up before deployment.
By the way, the code I've been looking at is from the Joomla 1.5 core and several WordPress plugins, so I assume they are people who know what they're doing.
Note: I'm pleased that everyone has jumped on this question to talk about optimization in general, but in fact we're talking about optimization in interpreted languages. At least some hint of the fact that we're talking about PHP would be nice.
How much "efficiency" do you need? Have you even measured? Premature optimization is the root of all evil, and optimization without measurement is ALWAYS premature.
Remember also the rules of Optimization Club.
The first rule of Optimization Club is, you do not Optimize.
The second rule of Optimization Club is, you do not Optimize without measuring.
If your app is running faster than the underlying transport protocol, the optimization is over.
One factor at a time.
No marketroids, no marketroid schedules.
Testing will go on as long as it has to.
If this is your first night at Optimization Club, you have to write a test case.
I think Joomla and Wordpress are not the greatest examples of good PHP code, with no offense. I have nothing personal against the people working on it and it's great how they enable people to have a website/blog and I know that a lot of people spend all their free time on either of those projects but the code quality is rather poor (with no offense).
Review security announcements over the past year if you don't believe me; also assuming you are looking for performance from either of the two, their code does not excel there either. So it's by no means good code, but Wordpress and Joomla both excel on the frontend - pretty easy to use, people get a website and can do stuff.
And that's why they are so successful, people don't select them based on code quality but on what they enabled them to do.
To answer your performance question, yes, it's true that all the good stuff (functions, classes, etc.) slow your application down. So I guess if your application/script is all in one file, so be it. Feel free to write bad PHP code then.
As soon as you expand and start to duplicate code, you should consider the trade off (in speed) which writing maintainable code brings along. :-)
IMHO this trade off is rather small because of two things:
CPU is cheap.
Developers are not cheap.
When you need to go back into your code in six months from now, think if those nano seconds saved running it, still add up when you need to fix a nasty bug (three or four times, because of duplicated code).
You can do all sorts of things to make PHP run faster. Generally people recommend a cache, such as APC. APC is really awesome. It runs all sorts of optimizations in the background for you, e.g. caching the bytecode of a PHP file and also provides you with functions in userland to save data.
So for example if you parse a configuration file each time you run that script disk i/o is really critical. With a simple apc_store() and apc_fetch() you can store the parsed configuration file either in a file-based or a memory-based (RAM) cache and retrieve it from there until the cache expired or is deleted.
APC is not the only cache, of course.
You should see the responses to this question: Should a developer aim for readability or performance first?
To summarize the consensus: Unless you know for a fact (through testing/profiling) that your performance needs to be addressed in some specific area, readability is far more important.
In 99% of the cases, you should better worry about code understandability. Write code easy to test, understand and mantain.
In those few cases where performance really is critical, scripting languages like PHP are not your best choice. There's a reason many base library functions in PHP are written in C, after all.
Personally, while there may be overhead for a function call, if it means I write the code once (parameterized), and then use it in 85 places, I'm WAY further ahead because I can fix it in one place.
Scripting languages tend to give people the idea that "good enough" and "works" are the only criteria to consider when coding.
Especially with a fast interpreter like PHP's, I don't think lack of readability/maintainability is EVER worth the efficiency you may (or may not!) gain from it.
And a note about WordPress: I've done a lot of browsing of the WordPress code. Don't assume those people know anything about good code, please.
To answer your first question, yes it is true and it is also true for compiled op-code. Yes you can make your code faster by avoiding function calls except in extreme cases where your code grows too large because of code duplication.
You should do what you like "I like to write well-organized, human-readable code with methods wherever a code block would be repeated."
If your going to commit this horrible atrocity of removing all function calls at least use a profiler and only do it to the 10% of your code that matters.
An example of how micro-optimization leads to macro slowdowns:
If you're seriously considering manually inlining functions, consider manually unrolling loops.
JMPs are expensive, and if you can eliminate loops by unrolling and also eliminate all conditional blocks, you'll eliminate all that time wasted merely seeking around the CPU's cache.
Variable augmentation at runtime is slow too, as is pulling things out of a database, so you should inline all that data into your code as well.
Actually, loading up an interpreter for merely executing code and copying memory out to a user is exhaustively wasteful, why don't we just pre-compute all the possible pages and store each page in memory ready to go so its just a mem-copy? surely thats fast!
Ah, now we've got that slow thing called the internet between us, which is hindering user experience and limiting how much content we can use, how about we pre-compute the pages in advance, and archive them all and run them on the users local machine? that'll be really fast!
But that's going to waste cpu cycles, lots of them, what with page load time and browser content rendering etc, we'll skip the middleman and just deliver the pages to them on printed media!. Genius!.
/me watches your company collapse on its face while you spend 10 years precomputing (by hand) and printing pages nobody wants to see.
This may sound silly to you, but to the rest of us, what you proposed is just that ridiculous.
Optimisation is good, but draw the line somewhere sensible so you don't have to worry about future people whom work on the code tracking you down in your sleep for having such a crappy codebase thats unmaintainable.
note: yes, I use gentoo. how did you guess?
Of course you shouldn't write bad PHP code. But once you have something written bad, you may always use perfomance as an excuse :-)
This is premature optimization. While the statement is true that a function call costs more than increasing a local integer variable (nearly everything costs more), the costs of a function call are still very low compared to a database query.
See also:
Wikipedia -> Optimization -> When to optimize
c2.com Wiki -> Premature Optimization
PHP's main strength is that it's quick and easy to get a working app. That strength comes from the opportunity to write loose (bad) code and have it still operate in a somewhat expected way.
If you are in a position to need to conserve a few CPU cycles, PHP is not what you should be using. When PHP web apps perform poorly, it is far more likely due to inefficient queries, not the speed of the code execution.
If you're that worried about every bit on efficiency, then why on earth are you using a scripting language? You should be programming in a much faster language (insert your favorite compiled language here), probably resulting in more, and less readable code, but it'll run really fast, and you can still aim for best coding practices.
Seriously, if you're coding for running speed, you shouldn't be using PHP at all.
If you develop web applications with a MVC architectural pattern, you can greatly benefit from caching and serialization. You can cache views, or portions of it, and you can serialize models.
From experience, models often parse and generate most of the data that's being displayed. If you know a certain model won't be generating new data frequently, like a model that parses an RSS feed, you can just have it stuffed somewhere with all the parsed data and have it refreshed every once in a while.
If you look at wordpress php code, it intermingles php tags in between its html which leads to spaghetti in my mind.
Phpbb3 however is way better in that regard. For example it has a strict division between the php part, and the styles part, which are xhtml formatted files with {template} tags, parsed by a template engine. Which is much cleaner.
Write a couple 10 minute examples and run them in your profiler.
That will tell you which is faster to the millisecond.
If you don't have a profiler, post them here, and I will run them in my PHPEd profiler.
I suspect that much of the time difference, if any, comes from having to open the file that a class is stored in, but that would have to be tested too.
Then ask yourself if you care that much about a few milliseconds vs having to maintain spaghetti code - will any of your users ever notice?
Edit
The profiler won't simulate high traffic volumes, but it will tell you which method is faster for a single user, and which parts of the code are using how much time. Especially if you profile the operations being done repeatedly - say 1000 times each in a loop.
We can assume (though not always) that faster code used by a lot of people will be faster than slower code used by a lot of people.
Those who will lecture you about code micro-optimization are generally the same ones which will have 50 SQL queries per page, taking up a total of 2 seconds, because they never heard about profiling. But their code is optimizized !!! (and slow as hell)
Fact : adding another webserver is not difficult. Replicating a database is.
Optimizing webserver code can be a net loss if it adds load on the DB.
Note : 2-3 ms for simple pages (like a forum topic) including SQL is a good target for a PHP website. My old website used to do that.

Categories