I've been doing a lot of calculating stuff nowadays. Usually I prefer to do these calculations in PHP rather than MySQL though I know PHP is not good at this. I thought MySQL may be worse. But I found some performance problem: some pages were loaded so slowly that 30 seconds' time limit is not enough for them! So I wonder where is the better place to do the calculations, and any principles for that? Suggestions would be appreciated.
Anything that can be done using a RDBMS (GROUPING, SUMMING, AVG) where the data can be filtered on the server side, should be done in the RDBMS.
If the calculation would be better suited in PHP then fine, go with that, but otherwise don't try to do in PHP what a RDBMS was made for. YOU WILL LOSE.
I would recommend doing any row level calculations using the RDBMS.
Not only are you going to benefit from better performance but it also makes your applications more portable if you need to switch to another scripting language, let's say PHP to Python, because you've already sorted, filtered and processed the data using your RBDMS.
It also helps separate your application logic, it has helped me keep my controllers cleaner and neater when working in an MVC environment.
i would say do calculations in languages that were created for that, like c++. But if you choose between mysql and php, php is better.
Just keep track of where your bottlenecks are. If your table gets locked up because you're trying to run some calculations, everyone else is in queue waiting to read/write the data in the selected tables and the queue will continue to grow.
MySQL is typically faster at processing your commands, but PHP should be able to handle simple problems without too much of a fuss. Of course, that does not mean you should be pinging your database multiple times for the same calculation over and over.
You might be better off caching your results if you can and have a cron job updating it once a day/hour (please don't do it every minute, your hosting provider will probably hate you).
Do as much filtering and merging as possible to bring the minimum amount of data into php. Once you have that minimum data set, then it depends on what you are doing, server load, and perhaps other factors.
If you can do something equally well in either, and the sql is not overly complex to write (and maintain) then do that. For simple math, sql is usually a good bet. For string manipulations where the strings will end up about the same length or grow, php is probably a good bet.
The most important thing is to request as little data as possible. The manipulation of the data, at least what sql can do, is secondary to retrieving and transferring the data.
Native MySQL functions are very quick. So do what makes sense in your queries.
If you have multiples servers (ie, a web server and a DB server), note DB servers are much more expensive then web servers, so if you have a lot of traffic or a very busy DB server do not do the 'extras' that can be handled just as easily/efficiently on a web server machine to help prevent slowdowns.
cmptrgeekken is right we would need some more information. BUT if you are needing to do calculations that pertain to database queries or doing operations on them, comparisons certian fields from the database, make the database do it. Doing special queries in SQL is cheape r(as far as time is concerned and it is optimized for that) But both PHP and MySQL are both server side it won't really matter where you are doing the calculations. But like I said before if they are operations on with database information, make a more complicated SQL query and use that.
Use PHP, don't lag up your MySQL doing endless calculations. If your talking about things like sorting its OK to use MySQL for stuff like that, SUM, AVG but don't overdo it.
Related
I've been working with MySQL for a while, but I've never really used the supported mathematical functions, such as FLOOR(), SQRT(), CRC32(), etc.
Is it faster / better to use these functions in queries rather than just doing the same on the result set with PHP?
EDIT: I don't think this question is a duplicate of this, as my question is about mathematical functions, listed on the page I linked, not CONCAT() or NOW() or any other function as in that question. Please consider this before flagging.
It is more efficient to do this in PHP.
Faster depends on the machines involved, if you're talking about faster for one user. If you're talking about faster for a million users hitting a website, then it's more efficient to do these calculations in PHP.
The load of a webserver running PHP is very easily distributed over a large number of machines. These machines can run in parallel, handling requests from visitors and fetching necessary information from the database. The database, however, is not easy to run in parallel. Issues such as replication or sharding are complex and can require specialty software and properly organized data to function well. These are expensive solutions compared to adding another PHP installation to a server array.
Because of this, the value of a CPU cycle on the database machine is far more valuable than one on the webserver. So you should perform these math functions on the webserver where CPU cycles are cheaper and significantly more easy to parallelize.
There's no general answer to that. You certainly shouldn't go out of your way to do math in SQL instead of PHP; it really doesn't make that much of a difference, if there is any. However, if you're doing an SQL query anyway, and you have the choice of doing the operation in PHP before you send it to MySQL or in the query itself... it still won't make much of a difference. Oftentimes there will be a logical difference, in terms of when and how often exactly the operation is performed and where it needs to be performed and where the code to do this is best kept for good maintainability and reuse. That should be your first consideration, not performance.
Overall, you have to do some really complex math for any of it to make any difference whatsoever. Some simple math operations are trivial in virtually any language and environment. If in doubt, benchmark your specific case.
Like deceze said there's probably not going to be much difference in speed between using math functions in SQL and PHP. If you're really worried, you should always benchmark both of your use cases.
However, one example that comes to my mind, when is probably better to use SQL math functions than doing PHP math functions: when you don't need to perform any additional operations on the results from the DB. If you do your operations in MySQL, you avoid having to loop through results in PHP.
There is an additional consideration to think of and that's scaling. You usually have one MySQL server (if you have more than one, then you probably already know all of this). But you can have many web servers connecting to the same MySQL server (e.g. when having load balancer).
In that case, it's going to be better to move the computation to the PHP to take the load of MySQL. It's "easier" to add more web servers than to increase the performance of MySQL. In theory you can add infinite amount of web servers, but the amount of memory and the processor speed / num. of cores in a MySQL server is finite. You can scale MySQL in some other ways, like using MySQL cluster or doing master-slave replication and reading from slaves, but that will always be more complicated / harder to do.
MySQL is faster in scope of SQL query. PHP is faster in PHP code. If you make SQL query to find out SQRT() it should be definitely slower (unless PHP is broken) because MySQL parser and networking overhead.
I'm currently saving my language files in a MySQL database.
Is it generally better (I'm thinking about performance) to fetch all page specific language strings at once (a lot fewer queries, but they are bigger and contains some unnecessary strings) or fetch at request (that gives a lot more requests, but instead, they are much smaller and won't fetch unnecessary strings).
EDIT: I'm using APC, and there's about 200-250 page specific strings, but it becomes maybe 100-150 if I fetch one request. I'm hosting MySQL on the same machine.
It depends entirely on your situation and your available resources. Fetching everything at once will probably be better if you're making single-threaded requests to a remote server, for example, but more small requests might be faster and less memory-intensive running on a local MySQL server (Tuncay said it results in poor performance, tough). It would probably be even faster if the page were rigged up to make the requests asynchronously so that you're not waiting for the last one before making another.
However, the only way to really know is to run some benchmarks in your environment.
My experience is that the mysql server can easily handle a big request. Several small ones instead result in very poor performance. In comparable situations I find one query is most always better in terms of performance. Get the whole data from database and let php sort out the rest.
However, just fetching the data from db in one query is even better. Are you sure you cant use an appropiate "where" clause ?
I'm the webmaster for a major US university. We have a great deal of requests on our website, which I've built and been in charge of for the last 7 years or so. I've been building ever-more-complex features into our website and it's always been my practice to put as much of the programming burden on our multi-processor Microsoft SQL server as possible - using stored procedures, views, etc, and fill-in what can't be done with PHP, ASP, or Perl from the IIS web server. Both servers are very powerful and capable machines. Since I've been doing this alone for so long without anyone else to brainstorm with, I'm curious if my approach is ideal for even higher load situations we'll have in the future.
My question is: Is it better practice to place more of the load burden on the SQL server using nested SELECT statements, views, stored procedures and aggregate functions, or should I be pulling multiple simpler queries and processing through them using server-side compile-time scripts like PHP? Keep on keepin' on or come up with a better way?
I've recently become more interested in performance after I did some load traces and learned just how much I've been putting on the shoulders of the SQL server. Both the web server and SQL servers are fast and responsive throughout the day, and almost without regard for how much I put on them, but I'd like to be ready and have trained myself and upgraded my existing code optimized best practices in mind by the time it becomes important.
Thanks for your advice and input.
You put each layer in your stack to use in the domain it fits best.
There is no use in having your database server send 1000 rows and using PHP to filter them if a WHERE-clause or GROUP-clause would suffice. It's not optimal to call the database to add two integers (SELECT 5+9 works fine, but php can do it itself, and you save the roundtrip).
You will probably want to look into scalability: what parts of your application can be divided unto multiple processes? If you're still just using 2 layers (script & db), there is a lot of room for scaling there. But always start with the bottleneck first.
Some examples: host static contents on CDN, use caching for your pages, read about nginx and memcached, use nosql (mongoDB), consider sharding, consider replication.
My opinion is that it's generally (mostly) best to favor letting the web servers do the processing. Two points:
First is scalability. Once your application gets enough usage, you'll need to start worrying about load balancing. And it's a lot easier to drop in a couple of extra web servers pointing to a common database than it is to set up a distributed database cluster. So best to take as much strain away from the Database as you can and keep it on a single machine for as long as possible.
The second point i'd like to make is about optimizing the queries. This will depend a lot on the queries you are using, and the database backend. When i first started working with databases, i fell into the trap of making elaborate SQL queries with multiple JOINs that fetched exactly the data i wanted, even if it was from four or five different tables. I reasoned that "That's what the database is there for - lets get it to do the hard work"
I quickly found that these queries took way too long to execute, and often ended up blocking the database from other requests. While it may seam inefficient to split your query into multiple requests (for example in a for loop), you'll often find that executing multiple small queries with fast indexes will make your application run far more smoothly than trying to pass all the hard work to the database
Firstly, you might want to check if there is any load which can be removed entirely by client side caching (.js, .css, static HTML and images), and use of technologies such as AJAX to do partial updates of screens - this will remove load on both web and sql servers.
Secondly, see if there is sql load which can be reduced by web server caching - e.g. static or low refresh data - if you have a lot of 'content' pages on your systems, have a look at common CMS caching techniques which will scale to allow many more users to view the same data without rebuilding the page or hitting the database.
I tend to do as much as possible outside the db, viewing db calls as expensive/time-intensive.
For example, when performing a select on a user table with fields name_given and name_family, I could fatten the query to return a column called full_name built by concatenation. But that kind of thing can be easily done in a model on your server-side scripting language (PHP, Ruby, etc).
Of course, there are cases when the db is the more "natural" place to perform an operation. But, in general, I incline more towards putting the load on the web server and optimize there with many of the techniques noted in other answers.
On a site with a reasonable amount of traffic , would it matter if the application/business logic is written as stored procedures ,triggers and views , instead of inside the PHP code itself?
What would be the best way to go keeping scalability in mind.
I can't provide you statistics, but unless you plan to change PHP for another language in the future, i can say keeping the business logic in PHP is more "scalability friendly".
Its always easier and cheaper to solve web server load problems than having them in the database. Your database will always need to be lighting quick and just throwing mirrors at it won't solve the problem. The more database slaves you have, the more writes you have to do.
In my experience, you should put business logic in PHP code rather than move it onto the database. Assuming your database is on a separate server, you don't want your database to be busy calculating formulas when requests come in.
Keep your database lightning fast to handle selects, inserts and updates.
I think you will have far better scalibility keeping database code in the database where it can be performance tuned as the number of records gets larger. You will also have better data integrity which is critical to the data even being useful. You don't see a lot of terrabyte sized relational dbs with all their code in the application.
Read some books on database performance tuning and then decide if you want to risk your company's data on application code.
There are several things to consider when trying to decide whether to place the business logic in the database or in the application code.
Will the same database be accessed
from different websites / web
applications? Will the sites /
applications be written in the same
language or in a different language?
If the database will be used from a single site, and the site is written in a single language then this becomes a non-issue. Otherwise, you'll need to consider the added complexity of stored procedures, triggers, etc vs trying to maintain database access logic etc in multiple code bases.
What are relational databases in
general good for and what is MySQL
good for specifically? What is PHP
best at?
This consideration is fairly straight-forward. Relational databases across the board and specifically in any variant of SQL are going to do a great job at inserting, updating, and deleting data. Generally they also handle ATOMIC transactions well. However, most variants of SQL (including MySQL) are not good at complex calculations, on-the-fly date handling, file system access etc.
PHP on the other hand is very fast at handling calculations, dates, file system accesses. By taking a little time you can even design your PHP code to work in such a way that records are only retrieved once and then stored when necessary.
What are you most familiar /
comfortable with using?
Obviously it tends to make more sense to use the tool with which you are most familiar.
As a last point consider that just because a drill can be used to cut sheet rock or because a hammer can be used to drive a screw doesn't mean that they should be used for these things. Sometimes I think that programmers do more potential damage by trying to make more powerful tools that do everything rather than making simpler tools that do one thing really, really well.
A well done PHP application should be enought, but keep in mind that it also requires you to do the less calls to the database you can. Store values you'll need later in PHP, shorten queries, cache, etc.
MySQL optimization is always a must, as it will also decrease the amount of databse calls by PHP, and thus getting a better performance. Therefore, there's no way you can't think of stored procedures, etc, if your aim is to increase performance. But MySQL by itself would't be enought if your PHP code isn't well done (lots of unecessary database calls), that's why I think PHP must be well coded, keeping in mind the hole process while developing it, so that unecessary stuff doesn't get in the way. Cache for instance, in "duet" with proper MySQL, is a great boost on performance.
My POV, even not having much experience in developing large applications is to write business logic in the DB for some reasons:
1 - Maintainability, I think that languages deprecate functions and changes many other things in a short time period, so if PHP changes version, you'll need to adapt your code to the new version
2 - DBs tends to be more language stable, so when a new version of a RDBMS comes out, it usually doesn't change many things in the way you write your queries or SPs, or it even doesn't change. Writing your logic in DB will reduce code adaptation because of a new DB version
3 - A RDBMS is more likely to be alive for a long period rather than a programming language. Also, as your data is critical, there is a big worry from the RDBMS developers for automatic migration of your whole data to the new RDBMS version, including your SPs. When clipper died, there were no ways to migrate systems to a new programming language, they had to be completely rewritten.
4 - If you think someday to change completely the language you are writing the application for some reason(language death, for example), the only thing to be rewritten will be the presentation and the SP calls, not business logic.
I'd like to know from other people here if what I pointed out makes sense, and if not, why. I'm on the same situation as Sabeen Malik, I'm thinking to begin my first huge project and I'm tending towards SPs because of what I wrote. So it's time to correct my POV if it's not so correct.
MySQL sucks at using advanced DB techniques, it's simple and fast. PHP, being a dynamic language, makes processing data very easy. Therefore, it usually makes sense to use PHP.
Okay, so I'm sure plenty of you have built crazy database intensive pages...
I am building a page that I'd like to pull all sorts of unrelated database information from. Here are some sample different queries for this one page:
article content and info
IF the author is a registered user, their info
UPDATE the article's view counter
retrieve comments on the article
retrieve information for the authors of the comments
if the reader of the article is signed in, query for info on them
etc...
I know these are basically going to be pretty lightning quick, and that I could combine some; but I wanted to make sure that this isn't abnormal?
How many fairly normal and un-heavy queries would you limit yourself to on a page?
As many as needed, but not more.
Really: don't worry about optimization (right now). Build it first, measure performance second, and IFF there is a performance problem somewhere, then start with optimization.
Otherwise, you risk spending a lot of time on optimizing something that doesn't need optimization.
I've had pages with 50 queries on them without a problem. A fast query to a non-large (ie, fits in main memory) table can happen in 1 millisecond or less, so you can do quite a few of those.
If a page loads in less than 200 ms, you will have a snappy site. A big chunk of that is being used by latency between your server and the browser, so I like to aim for < 100ms of time spent on the server. Do as many queries as you want in that time period.
The big bottleneck is probably going to be the amount of time you have to spend on the project, so optimize for that first :) Optimize the code later, if you have to. That being said, if you are going to write any code related to this problem, write something that makes it obvious how long your queries are taking. That way you can at least find out you have a problem.
I don't think there is any one correct answer to this. I'd say as long as the queries are fast, and the page follows a logical flow, there shouldn't be any arbitrary cap imposed on them. I've seen pages fly with a dozen queries, and I've seen them crawl with one.
Every query requires a round-trip to your database server, so the cost of many queries grows larger with the latency to it.
If it runs on the same host there will still be a slight speed penalty, not only because a socket is between your application but also because the server has to parse your query, build the response, check access and whatever else overhead you got with SQL servers.
So in general it's better to have less queries.
You should try to do as much as possible in SQL, though: don't get stuff as input for some algorithm in your client language when the same algorithm could be implemented without hassle in SQL itself. This will not only reduce the number of your queries but also help a great deal in selecting only the rows you need.
Piskvor's answer still applies in any case.
Wordpress, for instance, can pull up to 30 queries a page. There are several things you can use to stop MySQL pull down - one of them being memchache - but right now and, as you say, if it will be straightforward just make sure all data you pull is properly indexed in MySQL and don't worry much about the number of queries.
If you're using a Framework (CodeIgniter for example) you can generally pull data for the page creation times and check whats pulling your site down.
As other have said, there is no single number. Whenever possible please use SQL for what it was built for and retrieve sets of data together.
Generally an indication that you may be doing something wrong is when you have a SQL inside a loop.
When possible Use joins to retrieve data that belongs together versus sending several statements.
Always try to make sure your statements retrieve exactly what you need with no extra fields/rows.
If you need the queries, you should just use them.
What I always try to do, is to have them executed all at once at the same place, so that there is no need for different parts (if they're separated...) of the page to make database connections. I figure it´s more efficient to store everything in variables than have every part of a page connect to the database.
In my experience, it is better to make two queries and post-process the results than to make one that takes ten times longer to run that you don't have to post-process. That said, it is also better to not repeat queries if you already have the result, and there are many different ways this can be achieved.
But all of that is oriented around performance optimization. So unless you really know what you're doing (hint: most people in this situation don't), just make the queries you need for the data you need and refactor it later.
I think that you should be limiting yourself to as few queries as possible. Try and combine queries to mutlitask and save time.
Premature optimisation is a problem like people have mentioned before, but that's where you're crapping up your code to make it run 'fast'. But people take this 'maxim' too far.
If you want to design with scalability in mind, just make sure whatever you do to load data is sufficiently abstracted and calls are centralized, this will make it easier when you need to implement a shared memory cache, as you'll only have to change a few things in a few places.