I'm currently saving my language files in a MySQL database.
Is it generally better (I'm thinking about performance) to fetch all page specific language strings at once (a lot fewer queries, but they are bigger and contains some unnecessary strings) or fetch at request (that gives a lot more requests, but instead, they are much smaller and won't fetch unnecessary strings).
EDIT: I'm using APC, and there's about 200-250 page specific strings, but it becomes maybe 100-150 if I fetch one request. I'm hosting MySQL on the same machine.
It depends entirely on your situation and your available resources. Fetching everything at once will probably be better if you're making single-threaded requests to a remote server, for example, but more small requests might be faster and less memory-intensive running on a local MySQL server (Tuncay said it results in poor performance, tough). It would probably be even faster if the page were rigged up to make the requests asynchronously so that you're not waiting for the last one before making another.
However, the only way to really know is to run some benchmarks in your environment.
My experience is that the mysql server can easily handle a big request. Several small ones instead result in very poor performance. In comparable situations I find one query is most always better in terms of performance. Get the whole data from database and let php sort out the rest.
However, just fetching the data from db in one query is even better. Are you sure you cant use an appropiate "where" clause ?
Related
I have searched for a few hours already but have found nothing on the subject.
I am developing a website that depends on a query to define the elements that must be loaded on the page. But to organize the data, I must repass the result of this query 4 times.
At first try, I started using mysql_data_seek so I could repass the query, but I started losing performance. Due to this, I tried exchanging the mysql_data_seek for putting the data in an array and running a foreach loop.
The performance didn't improve in any way I could measure, so I started wondering which is, in fact, the best option. Building a rather big data array ou executing multiple times the mysql_fetch_array.
My application is currently running with PHP 5.2.17, MySQL, and everything is in a localhost. Unfortunatly, I have a busy database, but never have had any problems with the number of connections to it.
Is there some preferable way to execute this task? Is there any other option besides mysql_data_seek or the big array data? Has anyone some information regarding benchmarking testes of these options?
Thank you very much for your time.
The answer to your problem may lie in indexing appropriate fields in your database, most databases also cache frequently served queries but they do tend to discard them once the table they go over is altered. (which makes sense)
So you could trust in your database to do what it does well: query for and retrieve data and help it by making sure there's little contention on the table and/or placing appropriate indexes. This in turn can however alter the performance of writes which may not be unimportant in your case, only you really can judge that. (indexes have to be calculated and kept).
The PHP extension you use will play a part as well, if speed is of the essence: 'upgrade' to mysqli or pdo and do a ->fetch_all(), since it will cut down on communication between php process and the database server. The only reason against this would be if the amount of data you query is so enormous that it halts or bogs down your php/webserver processes or even your whole server by forcing it into swap.
The table type you use can be of importance, certain types of queries seem to run faster on MYISAM as opposed to INNODB. If you want to retool a bit then you could store this data (or a copy of it) in mysql's HEAP engine, so just in memory. You'd need to be careful to synchronize it with a disktable on writes though if you want to keep altered data for sure. (just in case of a server failure or shutdown)
Alternatively you could cache your data in something like memcache or by using apc_store, which should be very fast since it's in php process memory. The big caveat here is that APC generally has less memory available for storage though.(default being 32MB) Memcache's big adavantage is that while still fast, it's distributed, so if you have multiple servers running they could share this data.
You could try a nosql database, preferably one that's just a key-store, not even a document store, such as redis.
And finally you could hardcode your values in your php script, make sure to still use something like eaccelerator or APC and verify wether you really need to use them 4 times or wether you can't just cache the output of whatever it is you actually create with it.
So I'm sorry I can't give you a ready-made answer but performance questions, when applicable, usually require a multi-pronged approach. :-|
I'm the webmaster for a major US university. We have a great deal of requests on our website, which I've built and been in charge of for the last 7 years or so. I've been building ever-more-complex features into our website and it's always been my practice to put as much of the programming burden on our multi-processor Microsoft SQL server as possible - using stored procedures, views, etc, and fill-in what can't be done with PHP, ASP, or Perl from the IIS web server. Both servers are very powerful and capable machines. Since I've been doing this alone for so long without anyone else to brainstorm with, I'm curious if my approach is ideal for even higher load situations we'll have in the future.
My question is: Is it better practice to place more of the load burden on the SQL server using nested SELECT statements, views, stored procedures and aggregate functions, or should I be pulling multiple simpler queries and processing through them using server-side compile-time scripts like PHP? Keep on keepin' on or come up with a better way?
I've recently become more interested in performance after I did some load traces and learned just how much I've been putting on the shoulders of the SQL server. Both the web server and SQL servers are fast and responsive throughout the day, and almost without regard for how much I put on them, but I'd like to be ready and have trained myself and upgraded my existing code optimized best practices in mind by the time it becomes important.
Thanks for your advice and input.
You put each layer in your stack to use in the domain it fits best.
There is no use in having your database server send 1000 rows and using PHP to filter them if a WHERE-clause or GROUP-clause would suffice. It's not optimal to call the database to add two integers (SELECT 5+9 works fine, but php can do it itself, and you save the roundtrip).
You will probably want to look into scalability: what parts of your application can be divided unto multiple processes? If you're still just using 2 layers (script & db), there is a lot of room for scaling there. But always start with the bottleneck first.
Some examples: host static contents on CDN, use caching for your pages, read about nginx and memcached, use nosql (mongoDB), consider sharding, consider replication.
My opinion is that it's generally (mostly) best to favor letting the web servers do the processing. Two points:
First is scalability. Once your application gets enough usage, you'll need to start worrying about load balancing. And it's a lot easier to drop in a couple of extra web servers pointing to a common database than it is to set up a distributed database cluster. So best to take as much strain away from the Database as you can and keep it on a single machine for as long as possible.
The second point i'd like to make is about optimizing the queries. This will depend a lot on the queries you are using, and the database backend. When i first started working with databases, i fell into the trap of making elaborate SQL queries with multiple JOINs that fetched exactly the data i wanted, even if it was from four or five different tables. I reasoned that "That's what the database is there for - lets get it to do the hard work"
I quickly found that these queries took way too long to execute, and often ended up blocking the database from other requests. While it may seam inefficient to split your query into multiple requests (for example in a for loop), you'll often find that executing multiple small queries with fast indexes will make your application run far more smoothly than trying to pass all the hard work to the database
Firstly, you might want to check if there is any load which can be removed entirely by client side caching (.js, .css, static HTML and images), and use of technologies such as AJAX to do partial updates of screens - this will remove load on both web and sql servers.
Secondly, see if there is sql load which can be reduced by web server caching - e.g. static or low refresh data - if you have a lot of 'content' pages on your systems, have a look at common CMS caching techniques which will scale to allow many more users to view the same data without rebuilding the page or hitting the database.
I tend to do as much as possible outside the db, viewing db calls as expensive/time-intensive.
For example, when performing a select on a user table with fields name_given and name_family, I could fatten the query to return a column called full_name built by concatenation. But that kind of thing can be easily done in a model on your server-side scripting language (PHP, Ruby, etc).
Of course, there are cases when the db is the more "natural" place to perform an operation. But, in general, I incline more towards putting the load on the web server and optimize there with many of the techniques noted in other answers.
I've been doing a lot of calculating stuff nowadays. Usually I prefer to do these calculations in PHP rather than MySQL though I know PHP is not good at this. I thought MySQL may be worse. But I found some performance problem: some pages were loaded so slowly that 30 seconds' time limit is not enough for them! So I wonder where is the better place to do the calculations, and any principles for that? Suggestions would be appreciated.
Anything that can be done using a RDBMS (GROUPING, SUMMING, AVG) where the data can be filtered on the server side, should be done in the RDBMS.
If the calculation would be better suited in PHP then fine, go with that, but otherwise don't try to do in PHP what a RDBMS was made for. YOU WILL LOSE.
I would recommend doing any row level calculations using the RDBMS.
Not only are you going to benefit from better performance but it also makes your applications more portable if you need to switch to another scripting language, let's say PHP to Python, because you've already sorted, filtered and processed the data using your RBDMS.
It also helps separate your application logic, it has helped me keep my controllers cleaner and neater when working in an MVC environment.
i would say do calculations in languages that were created for that, like c++. But if you choose between mysql and php, php is better.
Just keep track of where your bottlenecks are. If your table gets locked up because you're trying to run some calculations, everyone else is in queue waiting to read/write the data in the selected tables and the queue will continue to grow.
MySQL is typically faster at processing your commands, but PHP should be able to handle simple problems without too much of a fuss. Of course, that does not mean you should be pinging your database multiple times for the same calculation over and over.
You might be better off caching your results if you can and have a cron job updating it once a day/hour (please don't do it every minute, your hosting provider will probably hate you).
Do as much filtering and merging as possible to bring the minimum amount of data into php. Once you have that minimum data set, then it depends on what you are doing, server load, and perhaps other factors.
If you can do something equally well in either, and the sql is not overly complex to write (and maintain) then do that. For simple math, sql is usually a good bet. For string manipulations where the strings will end up about the same length or grow, php is probably a good bet.
The most important thing is to request as little data as possible. The manipulation of the data, at least what sql can do, is secondary to retrieving and transferring the data.
Native MySQL functions are very quick. So do what makes sense in your queries.
If you have multiples servers (ie, a web server and a DB server), note DB servers are much more expensive then web servers, so if you have a lot of traffic or a very busy DB server do not do the 'extras' that can be handled just as easily/efficiently on a web server machine to help prevent slowdowns.
cmptrgeekken is right we would need some more information. BUT if you are needing to do calculations that pertain to database queries or doing operations on them, comparisons certian fields from the database, make the database do it. Doing special queries in SQL is cheape r(as far as time is concerned and it is optimized for that) But both PHP and MySQL are both server side it won't really matter where you are doing the calculations. But like I said before if they are operations on with database information, make a more complicated SQL query and use that.
Use PHP, don't lag up your MySQL doing endless calculations. If your talking about things like sorting its OK to use MySQL for stuff like that, SUM, AVG but don't overdo it.
Im on an optimization crusade for one of my sites, trying to cut down as many mysql queries as I can.
Im implementing partial caching, which writes .txt files for various modules of the site, and updates them on demand. I've came across one, that cannot remain static for all the users, so the .txt file thats written on the HD, will need to be altered on the fly via php.
Which is done via
flush();
ob_start();
include('file.txt');
$contents = ob_get_clean();
Then I modify the html in the $contents variable, and echo it out for different users.
Alternatively, I can leave it as it is, which runs a mysql query, which queries a small table that has category names (about 13 of them).
Which one is less expensive? Running a query every single time.... or doing it via the method I posted above, to inject html code on the fly, into a static .txt file?
Reading the file (save in very weird setups) will be minutely faster than querying the DB (no network interaction, &c), but the difference will hardly be measurable -- just try and see if you can measure it!
Optimize your queries first! Then use memcache or similar caching system, for data that is accessed frequently and then you can add file caching. We use all three combined and it runs very smooth. Small optimized queries aren't so bad. If your DB is in local server - network is not an issue. And don't forger to use MySQL query cache (i guess you do use MySQL).
Where is your the performance bottleneck?
If you don't know the bottleneck, you can't make any sensible assessment about optimisations.
Collect some metrics, and optimise accordingly.
Try both and choose the one that either is a clear winner or if not available, more maintainable. This depends on where the DB is, how much load it's getting, and whether you'll need to run more than one application instance (then they'd need to share this file on the network and it's not local anymore).
Here are the patterns that work for me when I'm refactoring PHP/MySQL site code.
The number of queries per page is absolutely critical - one complex query with joins is fastest as long as indexes are proper. A single page can almost always be generated with five or fewer queries in my experience, plus good use of classes and arrays of classes. Often one query for the session and one query for the app.
After indexes the biggest thing to work on is the caching configuration parameters.
Never have queries in loops.
Moving database queries to files has never been a useful strategy, especially since it often ends up screwing up your query integrity.
Alex and the others are right about testing. If your pages are noticeably slow, then they are slow for a reason (or reasons) - don't even start changing anything until you know what the reasons are and can measure the consequences of your changes. Refactoring by guessing is always a losing strategy espeically when (as in your case) you're adding complexity.
Okay, so I'm sure plenty of you have built crazy database intensive pages...
I am building a page that I'd like to pull all sorts of unrelated database information from. Here are some sample different queries for this one page:
article content and info
IF the author is a registered user, their info
UPDATE the article's view counter
retrieve comments on the article
retrieve information for the authors of the comments
if the reader of the article is signed in, query for info on them
etc...
I know these are basically going to be pretty lightning quick, and that I could combine some; but I wanted to make sure that this isn't abnormal?
How many fairly normal and un-heavy queries would you limit yourself to on a page?
As many as needed, but not more.
Really: don't worry about optimization (right now). Build it first, measure performance second, and IFF there is a performance problem somewhere, then start with optimization.
Otherwise, you risk spending a lot of time on optimizing something that doesn't need optimization.
I've had pages with 50 queries on them without a problem. A fast query to a non-large (ie, fits in main memory) table can happen in 1 millisecond or less, so you can do quite a few of those.
If a page loads in less than 200 ms, you will have a snappy site. A big chunk of that is being used by latency between your server and the browser, so I like to aim for < 100ms of time spent on the server. Do as many queries as you want in that time period.
The big bottleneck is probably going to be the amount of time you have to spend on the project, so optimize for that first :) Optimize the code later, if you have to. That being said, if you are going to write any code related to this problem, write something that makes it obvious how long your queries are taking. That way you can at least find out you have a problem.
I don't think there is any one correct answer to this. I'd say as long as the queries are fast, and the page follows a logical flow, there shouldn't be any arbitrary cap imposed on them. I've seen pages fly with a dozen queries, and I've seen them crawl with one.
Every query requires a round-trip to your database server, so the cost of many queries grows larger with the latency to it.
If it runs on the same host there will still be a slight speed penalty, not only because a socket is between your application but also because the server has to parse your query, build the response, check access and whatever else overhead you got with SQL servers.
So in general it's better to have less queries.
You should try to do as much as possible in SQL, though: don't get stuff as input for some algorithm in your client language when the same algorithm could be implemented without hassle in SQL itself. This will not only reduce the number of your queries but also help a great deal in selecting only the rows you need.
Piskvor's answer still applies in any case.
Wordpress, for instance, can pull up to 30 queries a page. There are several things you can use to stop MySQL pull down - one of them being memchache - but right now and, as you say, if it will be straightforward just make sure all data you pull is properly indexed in MySQL and don't worry much about the number of queries.
If you're using a Framework (CodeIgniter for example) you can generally pull data for the page creation times and check whats pulling your site down.
As other have said, there is no single number. Whenever possible please use SQL for what it was built for and retrieve sets of data together.
Generally an indication that you may be doing something wrong is when you have a SQL inside a loop.
When possible Use joins to retrieve data that belongs together versus sending several statements.
Always try to make sure your statements retrieve exactly what you need with no extra fields/rows.
If you need the queries, you should just use them.
What I always try to do, is to have them executed all at once at the same place, so that there is no need for different parts (if they're separated...) of the page to make database connections. I figure it´s more efficient to store everything in variables than have every part of a page connect to the database.
In my experience, it is better to make two queries and post-process the results than to make one that takes ten times longer to run that you don't have to post-process. That said, it is also better to not repeat queries if you already have the result, and there are many different ways this can be achieved.
But all of that is oriented around performance optimization. So unless you really know what you're doing (hint: most people in this situation don't), just make the queries you need for the data you need and refactor it later.
I think that you should be limiting yourself to as few queries as possible. Try and combine queries to mutlitask and save time.
Premature optimisation is a problem like people have mentioned before, but that's where you're crapping up your code to make it run 'fast'. But people take this 'maxim' too far.
If you want to design with scalability in mind, just make sure whatever you do to load data is sufficiently abstracted and calls are centralized, this will make it easier when you need to implement a shared memory cache, as you'll only have to change a few things in a few places.