I'm having somewhat theoretical question: I'm designing my own CMS/app-framework (as many PHP programmers on various levels did before... and always will) to either make production-ready solution or develop various modules/plugins that I'll use later.
Anyway, I'm thinking on gathering SQL connections from whole app and then run them on one place:
index.php:
<?php
include ('latestposts.php');
include ('sidebar.php');
?>
latestposts.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
sidebar.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
Now, while whole module system application is yet-to-be-figured, it's idea is already floating somewhere in my brain. However, I'm thinking, if I'm able to first load all gather_data functions, then run sql and then run draw functions - and if I'm able to reuse results!
If, in example, $sql is SELECT * FROM POSTS LIMIT 10 and $sql2 is SELECT * FROM POSTS LIMIT 5, is it possible to program PHP to see: "ah, it's the same SQL, I'll call it just once and reuse the first 5 rows"?
Or is it possible to add this behavior to some DRM?
However, as tags say, this is still just an idea in progress. If it proves to be easy to accomplish, then I will post more question how :)
So, basically: Is it possible, does it make sense? If both are yes, then... any ideas how?
Don't get me wrong, that sounds like a plausible idea and you can probably get it running. But I wonder if it is really going to be beneficial. Will it cause a system to be faster? Give you more control? Make development easier?
I would just look into using (or building) a system using well practiced MVC style coding standards, build a good DB structure, and tweak the heck out of Apache (or use something like Lighttpd). You will have a lot more widespread acceptance of your code if you ever decide to make it open source, and if you ever need a hand with it another developer could step right in and pick up the keyboard.
Also, check out query caching in MySQL--you will see a similar (though not one-to-one) benefit from caching your query results server side with regard to your query example. Even better that is stored in server memory so PHP/MySQL overhead is dropped AND you don't have to code it.
All of that aside, I do think it is possible. =)
Generally speaking, such a cache system can generate significant time savings, but at the cost of memory and complexity. The more results you want to keep, the more memory it will take; and there's no guarantee that your results will ever be used again, particularly the larger result sets.
Second, there are certain queries that should never be cached, or that should be run again even if they're in the cache. For the most part, only SELECT and SHOW queries can be cached effectively, but you need to worry about invalidating them when you modify the underlying data. Even in the same pageview, you might find yourself working around your own cache system on occasion.
Third, this kind of problem has already been solved several times. First, consider turning on the MySQL query cache. Most of the time, it will speed things up a bit without requiring any code changes on your end. However, it's a bit aggressive about invalidating entries, so you could gain some performance at a higher level.
If you need another level, consider memcached. You'll have to store and invalidate entries manually, but it can store results across page views (where you'll really find the performance benefit), and will let unused entries expire before running out of memory.
Related
I'm wondering if it's worth caching a result that is only going to change every couple of hours but is shown on every page of the website. Makes sense to me to do it but just wanting a second or third opinion.
The situation is that on each page there is 2 top 5 lists. These do change but only really need to be updated every hour for example. The result of this top 5 is found through a MySQL query. Would it be a smart idea to cache these results? What would be the best way to cache them? (Using PHP).
I've had a look through some of the questions asked here but they don't relate well enough so thought it'd be best to ask again.
Thanks.
Your use case is precisely the reason that caching was developed! The most common pattern is to use a caching service, such as memcached. (see Caching with PHP to take stress away from MySQL)
However, I would encourage you to think about whether or not this is a premature optimization. Is your database unable to cope with the requests? Are all pages loading slower than you need? Further, are these database queries slow for some reason? Are you properly using indexes?
I would encourage you to think only add a cache when you need to, and to ensure you have optimized as best as possible before then.
Checkout https://stackoverflow.com/a/1600252/1058853 for a good commentary.
Im on an optimization crusade for one of my sites, trying to cut down as many mysql queries as I can.
Im implementing partial caching, which writes .txt files for various modules of the site, and updates them on demand. I've came across one, that cannot remain static for all the users, so the .txt file thats written on the HD, will need to be altered on the fly via php.
Which is done via
flush();
ob_start();
include('file.txt');
$contents = ob_get_clean();
Then I modify the html in the $contents variable, and echo it out for different users.
Alternatively, I can leave it as it is, which runs a mysql query, which queries a small table that has category names (about 13 of them).
Which one is less expensive? Running a query every single time.... or doing it via the method I posted above, to inject html code on the fly, into a static .txt file?
Reading the file (save in very weird setups) will be minutely faster than querying the DB (no network interaction, &c), but the difference will hardly be measurable -- just try and see if you can measure it!
Optimize your queries first! Then use memcache or similar caching system, for data that is accessed frequently and then you can add file caching. We use all three combined and it runs very smooth. Small optimized queries aren't so bad. If your DB is in local server - network is not an issue. And don't forger to use MySQL query cache (i guess you do use MySQL).
Where is your the performance bottleneck?
If you don't know the bottleneck, you can't make any sensible assessment about optimisations.
Collect some metrics, and optimise accordingly.
Try both and choose the one that either is a clear winner or if not available, more maintainable. This depends on where the DB is, how much load it's getting, and whether you'll need to run more than one application instance (then they'd need to share this file on the network and it's not local anymore).
Here are the patterns that work for me when I'm refactoring PHP/MySQL site code.
The number of queries per page is absolutely critical - one complex query with joins is fastest as long as indexes are proper. A single page can almost always be generated with five or fewer queries in my experience, plus good use of classes and arrays of classes. Often one query for the session and one query for the app.
After indexes the biggest thing to work on is the caching configuration parameters.
Never have queries in loops.
Moving database queries to files has never been a useful strategy, especially since it often ends up screwing up your query integrity.
Alex and the others are right about testing. If your pages are noticeably slow, then they are slow for a reason (or reasons) - don't even start changing anything until you know what the reasons are and can measure the consequences of your changes. Refactoring by guessing is always a losing strategy espeically when (as in your case) you're adding complexity.
A few minutes ago, I asked whether it was better to perform many queries at once at log in and save the data in sessions, or to query as needed. I was surprised by the answer, (to query as needed). Are there other good rules of thumb to follow when building PHP/MySQL multi-user apps that speed up performance?
I'm looking for specific ways to create the most efficient application possible.
hashing
know your hashes (arrays/tables/ordered maps/whatever you call them). a hash lookup is very fast, and sometimes, if you have O(n^2) loops, you may reduce them to O(n) by organizing them into an array (keyed by primary key) first and then processing them.
an example:
foreach ($results as $result)
if (in_array($result->id, $other_results)
$found++;
is slow - in_array loops through the whole $other_result, resulting in O(n^2).
foreach ($other_results as $other_result)
$hash[$other_result->id] = true;
foreach ($results as $result)
if (isset($hash[$result->id]))
$found++;
the second one is a lot faster (depending on the result sets - the bigger, the faster), because isset() is (almost) constant time. actually, this is not a very good example - you could do this even faster using built in php functions, but you get the idea.
optimizing (My)SQL
mysql.conf: i don't have any idea how much performance you can gain by optimizing your mysql configuration instead of leaving the default. but i've read you can ignore every postgresql benchmark that used the default configuration. afaik with configuration matters less with mysql, but why ignore it? rule of thumb: try to fit the whole database into memory :)
explain [query]: an obvious one, a lot of people get wrong. learn about indices. there are rules you can follow, you can benchmark it and you can make a huge difference. if you really want it all, learn about the different types of indices (btrees, hashes, ...) and when to use them.
caching
caching is hard, but if done right it makes the difference (not a difference). in my opinion: if you can live without caching, don't do it. it often adds a lot of complexity and points of failures. google did a bit of proxy caching once (to make the intertubes faster), and some people saw private information of others.
in php, there are 4 different kinds of caching people regulary use:
query caching: almost always translates to memcached (sometimes to APC shared memory). store the result set of a certain query to a fast key/value (=hashing) storage engine. queries (now lookups) become very cheap.
output caching: store your generated html for later use (instead of regenerating it every time). this can result in the biggest speed-ups, but somewhat works against PHPs dynamic nature.
browser caching: what about etags and http responses? if done right you may avoid most of the work right at the beginning! most php programmers ignore this option because they have no idea what HTTP is.
opcode caching: APC, zend optimizer and so on. makes php code load faster. can help with big applications. got nothing to do with (slow) external datasources though, and the potential is somewhat limited.
sometimes it's not possible to live without caches, e.g. if it comes to thumbnails. image resizing is very expensive, but fortunatley easy to control (most of the time).
profiler
xdebug shows you the bottlenecks of your application. if your app is too slow, it's helpful to know why.
queries in loops
there are (php-)experts who do not know what a join is (and for every one you educate, two new ones without that knowledge will surface - and they will write frameworks, see schnalles law). sometimes, those queries-in-loops are not that obvious, e.g. if they come with libraries. count the queries - if they grow with the results shown, there is something wrong.
inexperienced developers do have a primal, insatiable urge to write frameworks and content management systems
schnalle's law
Optimize your MySQL queries first, then the PHP that handles it, and then lastly cache the results of large queries and searches. MySQL is, by far, the most frequent bottleneck in an application. A poorly designed query can take two to three times longer than a well designed query that only selects needed information.
Therefore, if your queries are optimized before you cache them, you have saved a good deal of processing time.
However, on some shared hosts caching is file-system only thanks to a lack of Memcached. In this instance it may be better to run smaller queries than it is to cache them, as the seek time of the hard drive (and waiting for access due to other sites) can easily take longer than the query when your site is under load.
Cache.
Cache.
Speedy indexed queries.
I have a page that will pull many headlines from multiple categories based off a category id.
I'm wondering if it makes more sense to pull all the headlines and then sort them out via PHP if/ifelse statements or it is better to run multiple queries that each contain the headlines from each category.
Why not do it in one query? Something like:
SELECT headline FROM headlines WHERE category_id IN (1, 2, 3, ...);
If you filter your headlines in PHP, think how many you'll be throwing away. If you end up with removing just 10% of the headlines, it won't matter as much as when you'd be throwing away 90% of the results.
These kinds of questions are always hard to answer because the situation determines the best course. There is never a truly correct answer, only better ways. In my experience doesn't really matter whether you attempt to do the work in PHP or in the database because you should always try to cache the results of any expensive operation using a caching engine such as memcached. That way you are not going to spend a lot of time in the db or in php itself since the results will be cached and ready instantaneously for use. When it comes down to it, unlss you profile your application using a tool like xDebug, what you think are your performance bottlenecks are just guesses.
It's usually better not to overload the DB, because you might cause a bottleneck if you have many simultaneous queries.
However, handling your processing in PHP is usually better, as Apache will fork threads as it needs to handle multiple requests.
As usual, it all comes down to: "How much traffic is there?"
MySQL can already do the selecting and ordering of the data for you. I suggest to be lazy and use this.
Also I'd look for a (1) query that fetches all the categories and their headlines at once. Would an ORDER BY category, publishdate or something do?
Every trip to the database costs you something. Returning extra data that you then decide to ignore costs you something. So you're almost certainly better to let the database do your pruning.
I'm sure one could come up with some case where deciding what data you need makes the query hugely complex and thus difficult for the database to optimize, while you could do it in your code easily. But if we're talking about "select headline from story where category='Sports'" followed by "select headline from story where category='Politics'" then "select headline from story where category='Health'" etc, versus "select category, headline from story where category in ('Health','Sports','Politics')", the latter is clearly better.
On the topic of "Faster to query in MYSQL or to use PHP logic", which is how I ended up on this question 10 years later. I have determined that the correct answer is "it depends". There are just too many examples where using the DB saves processing time over writing PHP Code.... but there are just as many examples where writing PHP Code saves time on excessively complex MySQL queries.
There is no right answer here. If you end up here, like I did, then the best I can suggest is try to solve your problem with the skills that you have. Start first with the Query and try to solve it, if you run into issues then start thinking about just gathering the data and running the logic through PHP code to come up with a solution.
At the end of the day, you need to solve a problem.... if you solve it, but its not fast enough, then thats another problem... work on optimizing which may end up meaning that you go back to writing more MySQL logic.
Use the 80/20 rule and try to get things 80% of the way there as quickly as possible. You can go back and optimize once its workable. Spending all your effort on making it perfect the first time will surely mean you miss your deadline.
Thats my $0.02
Okay, so I'm sure plenty of you have built crazy database intensive pages...
I am building a page that I'd like to pull all sorts of unrelated database information from. Here are some sample different queries for this one page:
article content and info
IF the author is a registered user, their info
UPDATE the article's view counter
retrieve comments on the article
retrieve information for the authors of the comments
if the reader of the article is signed in, query for info on them
etc...
I know these are basically going to be pretty lightning quick, and that I could combine some; but I wanted to make sure that this isn't abnormal?
How many fairly normal and un-heavy queries would you limit yourself to on a page?
As many as needed, but not more.
Really: don't worry about optimization (right now). Build it first, measure performance second, and IFF there is a performance problem somewhere, then start with optimization.
Otherwise, you risk spending a lot of time on optimizing something that doesn't need optimization.
I've had pages with 50 queries on them without a problem. A fast query to a non-large (ie, fits in main memory) table can happen in 1 millisecond or less, so you can do quite a few of those.
If a page loads in less than 200 ms, you will have a snappy site. A big chunk of that is being used by latency between your server and the browser, so I like to aim for < 100ms of time spent on the server. Do as many queries as you want in that time period.
The big bottleneck is probably going to be the amount of time you have to spend on the project, so optimize for that first :) Optimize the code later, if you have to. That being said, if you are going to write any code related to this problem, write something that makes it obvious how long your queries are taking. That way you can at least find out you have a problem.
I don't think there is any one correct answer to this. I'd say as long as the queries are fast, and the page follows a logical flow, there shouldn't be any arbitrary cap imposed on them. I've seen pages fly with a dozen queries, and I've seen them crawl with one.
Every query requires a round-trip to your database server, so the cost of many queries grows larger with the latency to it.
If it runs on the same host there will still be a slight speed penalty, not only because a socket is between your application but also because the server has to parse your query, build the response, check access and whatever else overhead you got with SQL servers.
So in general it's better to have less queries.
You should try to do as much as possible in SQL, though: don't get stuff as input for some algorithm in your client language when the same algorithm could be implemented without hassle in SQL itself. This will not only reduce the number of your queries but also help a great deal in selecting only the rows you need.
Piskvor's answer still applies in any case.
Wordpress, for instance, can pull up to 30 queries a page. There are several things you can use to stop MySQL pull down - one of them being memchache - but right now and, as you say, if it will be straightforward just make sure all data you pull is properly indexed in MySQL and don't worry much about the number of queries.
If you're using a Framework (CodeIgniter for example) you can generally pull data for the page creation times and check whats pulling your site down.
As other have said, there is no single number. Whenever possible please use SQL for what it was built for and retrieve sets of data together.
Generally an indication that you may be doing something wrong is when you have a SQL inside a loop.
When possible Use joins to retrieve data that belongs together versus sending several statements.
Always try to make sure your statements retrieve exactly what you need with no extra fields/rows.
If you need the queries, you should just use them.
What I always try to do, is to have them executed all at once at the same place, so that there is no need for different parts (if they're separated...) of the page to make database connections. I figure it´s more efficient to store everything in variables than have every part of a page connect to the database.
In my experience, it is better to make two queries and post-process the results than to make one that takes ten times longer to run that you don't have to post-process. That said, it is also better to not repeat queries if you already have the result, and there are many different ways this can be achieved.
But all of that is oriented around performance optimization. So unless you really know what you're doing (hint: most people in this situation don't), just make the queries you need for the data you need and refactor it later.
I think that you should be limiting yourself to as few queries as possible. Try and combine queries to mutlitask and save time.
Premature optimisation is a problem like people have mentioned before, but that's where you're crapping up your code to make it run 'fast'. But people take this 'maxim' too far.
If you want to design with scalability in mind, just make sure whatever you do to load data is sufficiently abstracted and calls are centralized, this will make it easier when you need to implement a shared memory cache, as you'll only have to change a few things in a few places.