Recently I've been doing quite a big project with php + mysql. And now I'm concerned about my mysql. What should I do to make my mysql as optimal as possible? Tell everything you know, I'll be really very grateful.
Second question, I use one mysql query per page load which takes information from mysql. It's quite a big query, because I take information from a few tables with a join. Maybe I should do something else?
Thank you.
Some top tips from MySQL Performance tips forge
Specific Query Performance:
Use EXPLAIN to profile the query
execution plan
Use Slow Query Log (always have it
on!)
Don't use DISTINCT when you have or
could use GROUP BY Insert
performance
Batch INSERT and REPLACE
Use LOAD DATA instead of INSERT
LIMIT m,n may not be as fast as it
sounds
Don't use ORDER BY RAND() if you
have > ~2K records
Use SQL_NO_CACHE when you are
SELECTing frequently updated data or
large sets of data
Avoid wildcards at the start of LIKE
queries
Avoid correlated subqueries and in
select and where clause (try to
avoid in)
Scaling Performance Tips:
Use benchmarking
isolate workloads don't let administrative work interfere with customer performance. (ie backups)
Debugging sucks, testing rocks!
As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
Network Performance Tips:
Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
2. Don't use SELECT *
3. Be wary of lots of small quick queries if a longer query can be more efficient
Use multi_query if appropriate to reduce round-trips
Use stored procedures to avoid bandwidth wastage
OS Performance Tips:
Use proper data partitions
1. For Cluster. Start thinking about Cluster before you need them
Keep the database host as clean as possible. Do you really need a windowing system on that server?
Utilize the strengths of the OS
pare down cron scripts
create a test environment
Learn to use the explain tool.
Three things:
Joins are not necessarily suboptimal. Oftentimes schemata that use joins will be faster than those that achieve the same but avoid table joins. The important thing is to know that your joins are optimal. EXPLAIN is very helpful but you also need to know how indexes work.
If you're grabbing data from the DB on every page hit, consider if a cacheing system would work for you. If so, check out PHP memcache and memcached. It's easy to use in PHP and very fast. It's popular for a reason.
Back to mysql: make sure you're key buffer is sized correctly. You can also think about using dedicated key buffers for critical indices that should remain in cache. Read about CACHE INDEX and LOAD INDEX INTO CACHE. See also here.
"...because I take information from a few tables with a join"
Joins, even "big" joins aren't bad. Just be sure that you have good indexes.
Also note that performance with a couple of records is a lot different than performance with hundreds of thousands of records, so test accordingly.
For performance, this book is good: High Perofmanace MYSQL. The associated blog is good too.
my 2cents: set your log_slow_queries to <2sec and use mysqlsla (get it from hackmysql.com) to analyse the 'slow' queries... Thisway you can just drilldown into the slower queries as they come along...
(the mysqlsla can also benefit from the log-queries-not-using-indexes option)
on mysqlhack.com there's a script called 'mysqlreport' that gives estimates on how your installation is runnig... (once it's running a while) and also gives pointers as to where to tune your setup more precisely...
Being perfect is a bit of a challenge and not the first target to set yourself.
Enable mysql logging of all queries, and write some code which parses the log files and removes any literal values from the SQL statements.
e.g. changes
SELECT * FROM atable WHERE something=5 AND other='splodgy';
and
SELECT * FROM atable WHERE something=1 AND other='zippy';
to something like:
SELECT * FROM atable WHERE something=:1 AND other=:2;
(Sorry, I've not got my code which does this to hand - but it's not rocket science)
Then shove the re-written log into a table so you can prioritize your performance fixes based on length and frequency of execution.
Related
I have a PHP page on my website, that uses over 100 mysql queries. All the queries are different, and are all just SELECT queries from multiple tables. On average, the page takes about 5 seconds to load, and I wish to improve this time.
What method of optimization do I have? I did some research, and took a look into memcache (I don't know how it works, what it can do or if it applies to my situation, so help may be appreciated), but as I said, I don't know if that is applicable to my situation.
I was also thinking of a query caching program, but don't know of any I can use?
Any help?
There are a number of options for MySQL.
First is to setup a Query Cache in your MySQL config. If your program is SELECT heavy, try setting low-priority-updates to on. This gives higher priority on the server to SELECT statements, and less priority to INSERT/DELETE/UPDATE statements.
Changing MySQL's use of memory might be a good idea, especially if you use a lot of JOIN statements - I usually set the join_buffer_size to about 8M.
From a PHP point-of-view, try caching results.
Edit: the class down the forum page that Suresh Kamrushi posted is a nice way of caching in PHP.
Below are some points which might be useful to optimize your page load:
MySQL:
Enable Query Cache
Select with only specific columns, avoid select * from syntax
Avoid Co-related inner queries
Use Indexing
Avoid too many queries. If possible then try to use joins/unions
PHP:
Use singleton methodology to avoid multiple database instances
If possible, try calculation work in SQL as well.
HTML:
CDN to load images/js/css parallely
Sprite images
JS include in footer
We have an existing PHP/MySQL app which doesn't have indexes configured correctly (monitoring shows that we do 85% table scans, ouch!)
What is a good process to follow to identify where we should be putting our indexes?
We're using PHP (Kohana using ORM for the DB access), and MySQL.
The answer likely depends on many things. For example, your strategy might be different if you want to optimize SELECTs at all costs or whether INSERTs are important to you as well. You might do well to read a MySQL Performance Tuning book or web site. There are several decent-to-great ones.
If you have a Slow Query Log, check it to see if there are particular queries that are causing problems. http://dev.mysql.com/doc/refman/5.5/en/slow-query-log.html
If you know the types of queries you'll be running or have identified problematic queries via the Slow Query Log or other mechanisms, you can then use the EXPLAIN command to get some stats on those queries. http://dev.mysql.com/doc/refman/5.5/en/explain.html
Once you have the output from EXPLAIN, you can use it to optimize. See http://dev.mysql.com/doc/refman/5.5/en/using-explain.html
Indexes are not just for the primary keys or the unique keys. If there are any columns in your table that you will search by, you should almost always index them.
I think this will help you on your database problems.
http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
Okay, so I'm sure plenty of you have built crazy database intensive pages...
I am building a page that I'd like to pull all sorts of unrelated database information from. Here are some sample different queries for this one page:
article content and info
IF the author is a registered user, their info
UPDATE the article's view counter
retrieve comments on the article
retrieve information for the authors of the comments
if the reader of the article is signed in, query for info on them
etc...
I know these are basically going to be pretty lightning quick, and that I could combine some; but I wanted to make sure that this isn't abnormal?
How many fairly normal and un-heavy queries would you limit yourself to on a page?
As many as needed, but not more.
Really: don't worry about optimization (right now). Build it first, measure performance second, and IFF there is a performance problem somewhere, then start with optimization.
Otherwise, you risk spending a lot of time on optimizing something that doesn't need optimization.
I've had pages with 50 queries on them without a problem. A fast query to a non-large (ie, fits in main memory) table can happen in 1 millisecond or less, so you can do quite a few of those.
If a page loads in less than 200 ms, you will have a snappy site. A big chunk of that is being used by latency between your server and the browser, so I like to aim for < 100ms of time spent on the server. Do as many queries as you want in that time period.
The big bottleneck is probably going to be the amount of time you have to spend on the project, so optimize for that first :) Optimize the code later, if you have to. That being said, if you are going to write any code related to this problem, write something that makes it obvious how long your queries are taking. That way you can at least find out you have a problem.
I don't think there is any one correct answer to this. I'd say as long as the queries are fast, and the page follows a logical flow, there shouldn't be any arbitrary cap imposed on them. I've seen pages fly with a dozen queries, and I've seen them crawl with one.
Every query requires a round-trip to your database server, so the cost of many queries grows larger with the latency to it.
If it runs on the same host there will still be a slight speed penalty, not only because a socket is between your application but also because the server has to parse your query, build the response, check access and whatever else overhead you got with SQL servers.
So in general it's better to have less queries.
You should try to do as much as possible in SQL, though: don't get stuff as input for some algorithm in your client language when the same algorithm could be implemented without hassle in SQL itself. This will not only reduce the number of your queries but also help a great deal in selecting only the rows you need.
Piskvor's answer still applies in any case.
Wordpress, for instance, can pull up to 30 queries a page. There are several things you can use to stop MySQL pull down - one of them being memchache - but right now and, as you say, if it will be straightforward just make sure all data you pull is properly indexed in MySQL and don't worry much about the number of queries.
If you're using a Framework (CodeIgniter for example) you can generally pull data for the page creation times and check whats pulling your site down.
As other have said, there is no single number. Whenever possible please use SQL for what it was built for and retrieve sets of data together.
Generally an indication that you may be doing something wrong is when you have a SQL inside a loop.
When possible Use joins to retrieve data that belongs together versus sending several statements.
Always try to make sure your statements retrieve exactly what you need with no extra fields/rows.
If you need the queries, you should just use them.
What I always try to do, is to have them executed all at once at the same place, so that there is no need for different parts (if they're separated...) of the page to make database connections. I figure it´s more efficient to store everything in variables than have every part of a page connect to the database.
In my experience, it is better to make two queries and post-process the results than to make one that takes ten times longer to run that you don't have to post-process. That said, it is also better to not repeat queries if you already have the result, and there are many different ways this can be achieved.
But all of that is oriented around performance optimization. So unless you really know what you're doing (hint: most people in this situation don't), just make the queries you need for the data you need and refactor it later.
I think that you should be limiting yourself to as few queries as possible. Try and combine queries to mutlitask and save time.
Premature optimisation is a problem like people have mentioned before, but that's where you're crapping up your code to make it run 'fast'. But people take this 'maxim' too far.
If you want to design with scalability in mind, just make sure whatever you do to load data is sufficiently abstracted and calls are centralized, this will make it easier when you need to implement a shared memory cache, as you'll only have to change a few things in a few places.
I have a PHP script that calls an API method that can easily return 6k+ results.
I use PEAR DB_DataObject to write each row in a foreach loop to the DB.
The above script is batch processing 20 users at a time - and although some will only have a few results from the API others will have more. Worst case is that all have 1000's of results.
The loop to call the API seems to be ok, batches of 20 every 5 minutes works fine. My only concern is 1000's of mysql INSERTs for each user (with a long pause between each user for fresh API calls)
Is there a good way to do this? Or am I doing it a good way?!
Well, the fastest way to do it would be to do one insert statement with lots of values, like this:
INSERT INTO mytable (col1, col2) VALUES ( (?,?), (?,?), (?,?), ...)
But that would probably require ditching the DB_DataObject method you are using now. You'll just have to weigh the performance benefits of doing it that way vs. the "ease of use" benefits of using DB_DataObject.
Like Kalium said, check where the bottleneck is.
If it is really the database, you could try the bulk import feature some DBMS offer.
In DB2, for example, it is called LOAD.
It works without SQL, but reads directly from a named pipe.
It is especially designed to be fast when you need to bring a large number of new rows
into the database.
It can be configured to skip checks and index building, making it even faster.
Well, is your method producing more load than you can handle? If it's working, then I don't see any reason to change it offhand.
Database abstraction layers usually add a pretty decent amount of overhead. I've found that, in PHP atleast, it's much easier to use a plain mysql_query for the sake of speed than it is to optimize your library of choice.
Like Eric P and weinzierl.name have said, using a multi-row insert or LOAD will give you the best direct performance.
I have a few ideas, but you will have to verify them with testing.
If the table you are inserting to has indexes, try to make sure they are optimized for inserts.
Check out optimization options here:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Consider mysqli directly, or Pear::MDB2 or PDO. I understand that Pear::DB is fairly slow, though I don't use PEAR myself, so can't verify.
MySQL LOAD DATA INFILE feature is probably the fastest way to do what you want.
You can take a look at the chapter Speed of INSERT statements on MySQL Documentation.
It talks about a lot of way to improve INSERTING in MySQL.
I don't think a few thousand records should put any strain on your database; even my laptop should handle it nicely. Your biggest concern might be(come) gigantic tables if you don't do any cleanup or partitioning. Avoid premature optimization on that part.
As for your method, make sure you do each user (or batch) in a separate transaction. If mysql, make sure you're using innodb to avoid unnecessary locking. If you're already using innodb/postgres/other database that supports transactions you might see a significant performance increase.
Consider using COPY (at least on postgres - unsure about mysql).
Make sure your table is properly indexed (including removing unused ones). Indexes hurt insert speed.
Remember to optimize/vacuum regularly.
I'm building a PHP page with data sent from MySQL.
Is it better to have
1 SELECT query with 4 table joins, or
4 small SELECT queries with no table join; I do select from an ID
Which is faster and what is the pro/con of each method? I only need one row from each tables.
You should run a profiling tool if you're truly worried cause it depends on many things and it can vary but as a rule its better to have fewer queries being compiled and fewer round trips to the database.
Make sure you filter things as well as you can using your where and join on clauses.
But honestly, it usually doesn't matter since you're probably not going to be hit all that hard compared to what the database can do, so unless optimization is your spec you should not do it prematurely and do whats simplest.
Generally, it's better to have one SELECT statement. One of the main reasons to have databases is that they are fast at processing information, particularly if it is in the format of query.
If there is any drawback to this approach, it's that there are some kinds of analysis that you can't do with one big SELECT statement. RDBMS purists will insist that this is a database design problem, in which case you are back to my original suggestion.
When you use JOINs instead of multiple queries, you allow the database to apply its optimizations. You also are potentially retrieving rows that you don't need (if you were to replace an INNER join with multiple selects), which increases the network traffic between your app server and database server. Even if they're on the same box, this matters.
It might depend on what you do with the data after you fetch it from the DB. If you use each of the four results independently, then it would be more logical and clear to have four separate SELECT statements. On the other hand, if you use all the data together, like to create a unified row in a table or something, then I would go with the single SELECT and JOINs.
I've done a bit of PHP/MySQL work, and I find that even for queries on huge tables with tons of JOINs, the database is pretty good at optimizing - if you have smart indexes. So if you are serious about performance, start reading up on query optimization and indexing.
I would say 1 query with the join. This way you need to hit the server only once. And if your tables are joined with indexes, it should be fast.
Well under Oracle you'd want to take advantage of the query caching, and if you have a lot of small queries you are doing in your sequential processing, it would suck if the last query pushed the first one out of the cache...just in time for you to loop around and run that first query again (with different parameter values obviously) on the next pass.
We were building an XML output file using Java stored procedures and definitely found the round trip times for each individual query were eating us alive. We found it was much faster to get all the data in as few queries as possible, then plug those values into the XML DOM as needed.
The only downside is that the Java code was a bit less elegant, as the data fetch was now remote from its usage. But we had to generate a large complex XML file in as close to zero time as possible, so we had to optimize for speed.
Be careful when dealing with a merge table however. It has been my experience that although a single join can be good in most situations, when merge tables are involved you can run into strange situations.