I am making a website where I have to keep track of logged in users. So in every PHP document, I have written code to connect to the database. If I write the database connecting code in one php document and call it in other PHP documents, will it make my page slow?
Instead of putting all features on one page, what if I design features in different pages and call all features on one page? Will it a slow downloading speed of a website?
It would certainly have some impact, but this should be weighed against the benefits of code organization. In this case, I'd strongly err on the side of code organization, so I suggest break up your logic into multiple files. A few points in favor of this approach:
Keep in mind that you are talking server-side only. That means the delay comes from opening local files on the server, rather than, say, sending HTTP requests. This is a very fast operation on any modern computer.
"Premature optimization is the root of all evil". Until you actually have speed issues, bending backwards for optimization's sake is universally considered a bad idea. This is because optimization tends to obfuscate code while rarely providing appreciable speed benefits. This bogs down developer comprehension and increases the likelyhood of bugs.
And, as Andreas pointed out, code reuse is king. Rewriting the same code in multiple places means that making a change requires duplicating that change in all those places, which takes time and (again) increases the likelyhood of bugs.
Related
With this question, I aim to understand the inner workings of PHP a little better.
Assume that you got a 50K library. The library is loaded with a bunch of handy functions that you use here and there. Also assume that these functions are needed/used by say 10% of the pages of your site. But your home page definitely needs it.
Now, the question is... should you use a global include that points to this library - across the board - so that ALL the pages ( including the 90% that do not need the library ) will get it, or should you selectively add that include reference only on the pages you need?
Before answering this question, let me point "why" I ask this question...
When you include that reference, PHP may be caching it. So the performance hit I worry may be one time deal, as opposed to every time. Once that one time out of the way, subsequent loads may not be as bad as one might think. That's all because of the smart caching mechanisms that PHP deploys - which I do not have a deep knowledge of, hence the question...
Since the front page needs that library anyway, the argument could be why not keep that library warm and fresh in the memory and get it served across the board?
When answering this question, please approach the matter strictly from a caching/performance point of view, not from a convenience point of view just to avoid that the discussion shifts to a programming style and the do's and don'ts.
Thank you
Measure it, then you know.
The caching benefit it likely marginal after the first hit, since the OS will cache it as well, but that save only the I/O hit (granted, this is not nothing). However, you will still incur the processing hit. If you include your 50K of code in to a "Hello World" page, you will still pay the CPU and memory penalty to load and parse that 50K of source, even if you do not execute any of it. That part of the processing will most likely not be cached in any way.
In general, CPU is extremely cheap today, so it may not be "worth saving". But that's why you need to actually measure it, so you can decide yourself.
The caching I think you are referring to would be opcode caching from something like APC? All that does is prevent PHP from needing to interpret the source each time. You still take some hit for each include or require you are using. One paradigm is to scrap the procedural functions and use classes loaded via __autoload(). That makes for a simple use-on-demand strategy with large apps. Also agree with Will that you should measure this if you are concerned. Premature optimization never helps.
I very much appreciate your concerns about performance.
The short answer is that, for best performance, I'd probably conditionally include the file on only the pages that need it.
PHP's opcode caches will maintain both include files in a cached form, so you don't have to worry about keeping the cache "warm" as you might when using other types of caches. The cache will remain until there are memory limitations (not an issue with your 50K script), the source file is updated, you manually clear the cache, or the server is restarted.
That said, opcode (PHP bytecode) caching is only one part of the PHP parsing process. Every time a script is run, the bytecode is then processed to build up the functions, classes, objects, and other instance variables that are defined and optionally used within the script. This all adds up.
In this case, a simple change can lead to significant improvement in performance. Be green, every cycle counts :)
Here is my situation. I am building an application that contains some heavy mathematical calculations where the formula needs to be editable by a sufficiently privileged, but untrusted, user.
I need a secure server side scripting language. I need to be able to access constants and values from 4+ database tables, the results of previous calculations, define user variables and functions, use if/then/else statements, and I'm sure more that I can't think of right now.
Some options I've considered:
I have considered using something like this matheval library but I would end up needing to extend it considerably for my use case. I would essentially be creating my own custom language.
PHP runkit sandbox. I've never used this before but am very concerned about the security issues involved. Considering the possible security issues, I don't think that this is a viable option.
One other idea that has crossed my mind that I don't know if it is possible would be to use something like javascript on the server side. I've seen js used as a scripting platform in desktop applications to extend functionality and it seems a similar approach may be feasible. I could ideally define the environment that things ran it, such as disabling filesystem access etc. Again, security seems like it would be an issue.
From the research I have done, it seems like #1 is probably my only option, but I thought I would check with a larger talent pool. :-)
If #3 is possible, it seems that it would be the way to go, but I can't seem to turn up anything that is helpful. On the other hand, there may not be much difference between #2 and #3.
Performance is another consideration. There will be roughly 65 some odd formulas each executing about 450 times. Each formula will have access to approximately 15 unique variables a hundred or so constants, and the results of previous formulas. (Yes, there is a specific order of execution.)
I can work with an asynchronous approach to calculation where the calculation would be initiated by a user event and stored in the db, but would prefer to not have to.
What is the best way to work with this situation? Are there any other third party libraries that I haven't turned up in my research? Is there another option in addition to my 3 that I should consider?
There's almost no reason to create a custom language today. There's so many available and hackable, writing your own is really a waste of time.
If you're not serving a zillion users (for assorted values of a zillion), most any modern scripting language is securable, especially if you're willing to take draconian measures to do so (such as completely eliminating I/O and system interfaces).
JavaScript is a valid option. Its straightforward to create mini-sandboxes within JS itself to run foreign code. If you want folks to be able to persist state across runs, simply require them store it in "JSON-like" JS structures that can be readily serialized from the system on exit, and just as easily reloaded. These can even be the results of the function.
If there's a function or routine you don't want them to use, you can un-define it before firing off of the foreign code. Don't want them using "read" to read a file? read = func(s) { }
Obviously you should talk to the mailing lists of the JS implementation you want to use to get some tips for better securing it.
But JS has good support, well documented, and the interpreters are really accessible.
You have two basic choices:
a) Provide your own language in which you completely control what is done,
so nothing bad can happen,
b) Use some other execution engine, and check everything it does to verify nothing bad happens.
My problem with b) is it is pretty hard to figure out all the bad things somebody might do in obscure ways.
I prefer a), because you only have to give them the ability to do what you allow.
If you have a rather simple set of formulas you want to process, it is actually pretty easy to write a parser/evaluator. See Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
It isn't clear to me that you have a performance problem. yes, you want to execute something 450 times; but it includes database accesses, whose cost will dominate any computation involivng a 1000 arithmetic steps. You may find that your speed is limited by the DB access that that you need to cache the DB accesses to get it to go faster.
I am trying to build a very user-friendly user interface for my site. The standard right now is to use client side as well as server side validation for forms. Right? I was wondering if I could just forgo client side validation, and rely simply on server side. The validation would be triggered on blur, and will use ajax.
To go one step ahead, I was also planning to save a particular field in the database if it has been validated as correct. Something like a real-time form update.
You see, I am totally new to programming. So I dont know if this approach can work practically. I mean, will there be speed or connection problems? Will it take toll on the server in case of high traffic? Will the site slow down on HTTPS?
Are there any site out there which have implemented this?
Also, the way I see it, I would need a separate PHP script for every field! Is there a shorter way?
What you want to do is very doable. In fact, this is the out-of-the-box functionality you would get if you were using JSF with a rich component framework like ICEfaces or PrimeFaces.
Like all web technology, being able to do it with one language means you can do it with others. I have written forms like you describe in PHP manually. It's a substantial amount of work, and when you're first getting started it will definitely be easiest with one script per field backing the form. As you get better, you will discover how you can include the field name in the request and back it down to one script for Ajax interactions per form. You can of course reduce the burden even further.
PHP frameworks may be able to make this process less onerous, but I haven't used them and would recommend you avoid them initially until you get your bearings. The magic that a system like Cake or Rails provides is very helpful but you have to understand the tradeoffs and the underlying technology or it will be very hard to build robust systems atop their abstractions.
Calculating the server toll is not intuitive. On the one hand, handling large submissions is more work than handling smaller ones. It may be that you are replacing one big request with several tiny ones for a net gain. It's going to depend on the kind of work you have to do with each form field. For example, auto completion is much more expensive than checking for a username already being taken, which is more expensive than (say) verifying that some string is actually a number or some other obvious validation.
Since you don't want to repeat yourself it's very tempting to put all your validation on one side or the other, but there are tradeoffs either way, and it is true that server-side validation is going to be slower than client-side. But the speed of client-side validation is no substitute for the fact that it will introduce security problems if you count on it. So my general approach is to do validation on the server-side, and if I have time, I will add it to the client side as well so as to improve responsiveness. (In point of fact, I actually start with validation in the database as much as possible, then in the server-side code, then client-side, because this way even if my app blows up I don't have invalid data sticking around to worry about).
It used to be that you could expect your site to run about 1/3 as fast under SSL. I don't have up-to-date numbers but it will always be more expensive than unencrypted. It's just plain more work. SSL setup is also not a great deal of fun. Most sites I've worked on either put the whole thing under SSL, or broke the site into some kind of shopping cart which was encrypted and left the rest alone. I would not spend undue energy trying to optimize this. If you need encryption, use it and get on with your day.
At your stage of the game I would not lose too much sleep over performance. Since you're totally new, focus on the learning process, try to implement the features that you think will be gratifying and aim for improvement. It's easy to obsess about performance, but you're not going to have the kind of traffic that will squash you for a long time, unless half the planet is going to want to buy your product and your site is extremely heavy and your host extremely weak. When it comes, you should profile your code and find where you are doing too much work and fix that, and you will get much further than if you try and design up front a performant system. You just don't have enough data yet to do that. And most servers these days are well beyond equipped to handle fairly heavy load—you're probably not going to have hundreds of visitors per second sustained in the near future, and it will take a lot more than that to bring down a $20 VPS running a fairly simple PHP site. Consider that one visitor a second works out to about 80,000 hits a day, you'd need 8 million hits a day to reach 100/second. You're not going to need a whole second to render a page unless you've done something stupid. Which we all do, a few times, when we're learning. :)
Good luck on your journey!
I've just made a user-content orientated website.
It is done in PHP, MySQL and jQuery's AJAX. At the moment there is only a dozen or so submissions and already I can feel it lagging slightly when it goes to a new page (therefore running a new MySQL query)
Is it most important for me to try and optimise my MySQL queries (by prepared statements) or is it worth in looking at CDN's (Amazon S3) and caching (much like the WordPress plugin WP Super Cache) static HTML files when there hasn't been new content submitted.
Which route is the most beneficial, for me as a developer, to take, ie. where am I better off concentrating my efforts to speed up the site?
Premature optimization is the root of all evil
-Donald Knuth
Optimize when you see issues, don't jump to conclusions and waste time optimizing what you think might be the issue.
Besides, I think you have more important things to work out on the site (like being able to cast multiple votes on the same question) before worrying about a caching layer.
Its done in PHP, MySQL and jQuery's AJAX, at the moment there is only a dozen or so submissions and already i can feel it lagging slightly when it goes to a new page (therefore running a new mysql query)
"Can feel it lagging slightly" – Don't feel it, know it. Run benchmarks and time your queries. Are you running queries effectively? Is the database setup with the right indexes and keys?
That being said...
CDN's
A CDN works great for serving static content. CSS, JavaScript, images, etc. This can speed up the loading of the page by minimizing the time it takes to request all the resources. It will not fix bad query practice.
Content Caching
The easiest way to implement content caching is with something like Varnish. Basically sits in front of your site and re-serves content that hasn't been updated. Minimally intrusive and easy to setup while being amazingly effective.
Database
Is it most important for me to try and optimise my MySQL queries (by prepared statements)
Why the hell aren't you already using prepared statements? If you're doing raw SQL queries always use prepared statements unless you absolutely trust the content in the queries. Given a user content based site I don't think you can safely say that. If you notice query times running high then take a look at the database schema, the queries you are running per-page, and the amount of content you have. With a few dozen entries you should not be noticing any issue even with the worst queries.
I checked out your site and it seems a bit sluggish to me as well, although it's not 100% clear it's the database.
A good first step here is to start on the outside and work your way in. So use something like Firebug (for Firefox), that - like similar plug-ins of its type - will allow you to break down where the time goes in loading a page.
http://getfirebug.com/
Second, per your comment above, do start using PreparedStatements where applicable; it can make a big difference.
Third, make sure your DB work is minimally complete - that means make sure you have indexes in the right place. It can be useful here to run the types of queries you get on your site and where the time goes. Explaining plans
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and MySQL driver logging (if your driver supports it) can be helpful here.
If the site is still slow and you've narrowed it to use of the database, my suggestion is to do a simple optimization at first. Caching DB data, if feasible, is likely to give you a pretty big bang for the buck here. One very simple solution towards that end, especially given the stack you mention above, is to use Memcached:
http://memcached.org/
After injecting that into your stack, measure your performance + scalability and only pursue more advanced technologies if you really need to. I think you'll find that simple load balancing, caching, and a few instances of your service will go pretty far in addressing basic performance + scalability goals.
In parallel, I suggest coming up with a methodology to measure this more regularly and accurately. For example, decide how you will actually do automated latency measures and load testing, etc.
For me - optimising DB is on first place - because any caching can cause that when you find some problem , you need to rebuild all cache
There are several areas that can be optimized.
Server
CSS/JS/Images
PHP Code/Setup
mySQL Code/Setup
1st, I would use firefox, and the yslow tag, to evaluate your website's performance, and it will give server based suggestions.
Another solution, I have used is this addon.
http://aciddrop.com/php-speedy/
"PHP Speedy is a script that you can install on your web server to automatically speed up the download time of your web pages."
2nd, I would create a static domain name like static.yourdomainane.com, in a different folder, and move all your images, css, js there. Then point all your code to that domain, and then tweak your web server settings to cache all those files.
3rd, I would look at articles/techniques like this, http://www.catswhocode.com/blog/3-ways-to-compress-css-files-using-php to help compress/optimize your static files like css/js.
4th, review all your images, and their sizes, and make sure they are fully optimized. Or, convert to using css sprites.
http://www.smashingmagazine.com/2009/04/27/the-mystery-of-css-sprites-techniques-tools-and-tutorials/
http://css-tricks.com/css-sprites/
Basically for all your main site images, move them into 1 css sprite, then change your css, to refer to different spots on that sprite to display the image needed.
5th, Review your content pages, which pages, change frequently, and which ones rarely change, and those that rarely change, make those into static html pages. Those that change frequently, you can either leave as php pages, or create a cron or scheduled task using php command line to create new static html versions of the php page.
6th, for mySQL, I recommend you have the slow query log on, to help identify slow queries. Review your table structure, make sure they are optimal, and have tables, that are well designed. Use views and stored procedures, to move hard sql logic or functioning from php to mySQL.
I know this is a lot, but I hope it's useful.
It depends where your slowdowns really lie. You have a lot of twitter and facebook stuff on there that could easily slow your page down significantly.
Use firebug to see if anything is being downloaded during your perceived slow loading times. You can also download the YSlow firefox plugin to give you tips on speeding up page loads.
A significant portion of perceived slowness can be due to the javascript on the page rather than your back-end. With such a small site you should not see any performance issues on the back end until you have thousands of submissions.
Is it most important for me to try and optimise my MySQL queries (by prepared statements)
Sure.
But prepared statements has nothing to do with optimizations.
Nearly 99% of sites are running with no cache at all. So, I don't think you're really need it.
If your site is running slow, you have to profile it first and then optimise certain place that proven being a bottleneck.
I had an idea today (that millions of others have probably already had) of putting all the sites script into a single file, instead of having multiple, seperate ones. When submitting a form, there would also be a hidden field called something like 'action' which would represent which function in the file would handle it.
I know that things like Code Igniter and CakePHP exist which help seperate/organise the code.
Is this a good or bad idea in terms of security, speed and maintenance?
Do things like this already exist that i am not aware of?
What's the point? It's just going to make maintenance more difficult. If you're having a hard time managing multiple files, you should invest the time into finding a better text editor / IDE and stop using Notepad or whatever is making it so difficult in the first place!
Many PHP frameworks rely on the Front Controller design: a single small PHP script serves as the landing point for all requests. Based on request arguments, the front controller invokes code in other PHP scripts.
But storing all code for your site in a single file is not practical, as other people have commented.
There are many forums that do this. Personally, I don't like it, mainly because if you make an error in the file, the entire site is broken until you fix it.
I like separation of each part, but I guess it has its plusses.
It's likely bad for maintenance, as you can't easily disable a section of your site for an update.
Speed: I'm not sure to be honest.
Security: You could accomplish the exact same security settings but just adding a security check to a file and then including that file in all your pages.
If you're not caching your scripts, everything in a single file means less disk I/O, and since generally, disk I/O is an expensive operation, this probably can be a significant benefit.
The thing is, by the time you're getting enough traffic for this to matter, you're probably better off going with caching anyway. I suppose it might make some limited sense, though, in special cases where you're stuck on a shared hosting environment where bandwidth isn't an issue.
Maintenance and security: composing software out of small integral pieces of code a programmer can fit inside their head (and a computer can manage neatly in memory) is almost always a better idea than a huge ol' file. Though if you wanted to make it hell for other devs to tinker with your code, the huge ol' file might serve well enough as part of an obfuscation scheme. ;)
If for some reason you were using the single-file approach to try and squeeze out extra disk I/O, then what you'd want to do is create a build process, where you did your actual development work in a series of broken-out discrete files, and issued make or ant like command to generate your single file.