I am running around 50,000 queries on a MyISAM DB however the longer the query runs the slower it becomes. For example, at the beginning 20 rows can be done in a second and by the end it is doing 5 rows a second. Is this happening because MyISAM locks the tables after each insert or is it some other reason?
I think I have to use MyISAM because I am using match()against() with Fulltext indexes, which as far as I know only MyISAM supports. Is it possible to use InnoDB or some other DB type with this functionality or is there a way to speed up the queries as the execution time for the current query is about 45mins.
I anticipate a lot of answers about my query structure, I am collating three tables with different columns into one table and trying to match up the 3 different rows together to form a 'supertable' as a result I need to do a lot of match()against(), preg_match, selects, and looping in order to match them up with accuracy. In order to optimise my loops I am using PDO with prepared statements but this has only brought down the query time to the time I mentioned above, using mysql_connect and the standard functions it was closer to an hour.
Remember the query isn't slow to begin with, only after it's done about 10,000 queries does it slow down noticeably and as it does more it becomes slower and slower until it reaches an unreasonable level for any server company to accept, can anyone give me a solution?
My recommendation is try to update your MySQL,
If you have a lot of writting MySQL -V 5.6+ which allow you to use full text index in InnoDB Tables, beside that Innodb allow you to make rows lock so that is a HUGE thing.
You can also try to repair index with ANALIZE that could help a little bit the performance
MySQL Full text search can't efficiently handle big data by design. Switching from MyISAM to InnoDB is good idea most of the time as it will help with non-FT queries performance and concurrency and will keep you data safe as InnoDB is supporting transaction, but will not boost your FT queries speed much (according to Percona's benchmarks http://bit.ly/M6DMsj ).
I would suggest to move Full-Text queries out of MySQL. Any external search engine like Solr or Sphinx will be very helpful in this case.
I'm not an expert with Solr, but in case you decided to use Sphinx you could use http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/ as a guide for initial configuration.
Hope this helps.
Related
I'm setting up a MySQL database and I'm not sure of the best method to structure it:
I am setting up a system (PHP/MySQL based) where a few hundred people will be executing SELECT/UPDATE/SET/DELETE queries to a database (probably about 50 simultaneously). I imagine there are going to be a few thousand rows if they're all using the same database and table. I could split the data across a number of tables but then I would have to make sure they're all uniform AND I, as the administrator, will be running some SELECT DISTINCT queries via cron to update an administrative interface.
What's the best way to approach this? Can I have everybody sharing one database? one table? Will there be a problem when there are a few thousand rows? I imagine there is going to be a huge performance issue over time.
Any tips or suggestions are welcome!
MySQL/php can easily handle this as long as your server is powerful enough. MySQL loves RAM and will use as much as it can (within the limits you provide).
If you're going to have a lot of concurrent users then I would suggest looking at using innodb tables instead of MyISAM (the default in MySQL versions <5.5). Innodb locks individual rows when doing INSERT/UPDATE/DELETE etc, rather than locking the whole table like MyISAM does.
We use php/MySQL and would have 1000+ users on our site at the same time (our master db server does about 4k queries per second).
I currently have 2000 records in a postgresql database being updated every minute that are filtered with a SQL statement. Upto 1000 different filter combinations can exist and approx 500 different filters can be called every minute. At the moment http responses are cached for 59 seconds to ease server load and database calls. However im considering caching the whole db table in memcached and doing the filtering in php. 2000 rows isnt alot but the response time for getting data from memory vs the db would be alot faster.
Would the php processing time outweigh the database response time for sql filtering for this number of rows? The table shouldnt grow anymore than 3000 rows in the foreseeable future.
As with any question relating to is x faster than y, the only real answer is to benchmark it for yourself. However, if the database is properly indexed for the queries you need to perform, it is likely to be quite a bit faster at filtering result sets than most any PHP code you could write.
The RDBMS is on the other hand, is already designed and optimized for locating, filtering, and ordering rows.
The way PostgreSQL operates, if you aren't extremely starving it for memory, 100% of such a small and frequently queried table will be held in RAM (Cache) already by the default caching algorithms. Having the database engine filter it is almost certainly faster than doing the same it in your application.
You may want to inspect your postgresql.conf, especially shared_buffers, the planner cost constants (set random_page_cost almost or exactly as low as seq_page_cost) and effective_cache_size (set it high enough).
You could probably benefit from optimizing indexes. There is a wide range of types available. Consider partial indexes, indexes on expression or multi-column indexes in addition to plain indexes. Test with EXPLAIN ANALYZE and only keep indexes that actually get used and speed up queries. As all of the table resides in RAM, the query planner should calculate that random access is almost or exactly as fast as sequential access. The difference only applies to disc reads.
As you updating every minute, be sure not to keep any indexes that aren't actually helping. Also, vacuuming and analyzing it frequently are keys to performance in such a case. Not VACUUM FULL ANALYZE, just VACUUM ANALYZE. Or use auto-vacuum with tuned settings.
Of course, all the standard advice on performance optimization applies.
I have recently switched my database tables from MYISAM to INNODB and experience bad timeouts with queries, mostly inserts. One function I use previously took <2 seconds to insert, delete and update a large collection of records across ~30 MYISAM tables, but now that they are INNODB, the function causes a PHP timeout.
The timeout was set to 60 seconds. I have optimised my script enough that now, even though there are still many queries, they are combined together (multiple inserts, multiple deletes, etc) and the script now takes ~25 seconds, which is a substantial increase from what appeared to be at least 60 seconds.
This duration is still over 10x quicker when previously using MYISAM, is there any mistakes I could be making in the way I process these queries? Or are there any settings that could assist in the performance? Currently the MySQL is using the default settings of installation.
The queries are nothing special, DELETE ... WHERE ... simple logic, same with the INSERT and UPDATE queries.
Hard to say without knowing too much about your environment, but this might be more of a database tuning problem. InnoDB can be VERY slow on budget hardware where every write forces a true flush. (This affects writes, not reads.)
For instance, you may want to read up on options like:
innodb_flush_log_at_trx_commit=2
sync_binlog=0
By avoiding the flushes you may be able to speed up your application considerably, but at the cost of potential data loss if the server crashes.
If data loss is something you absolutely cannot live with, then the other option is to use better hardware.
Run explain for each query. That is, if the slow query is select foo from bar;, run explain select foo from bar;.
Examine the plan, and add indices as necessary. Re-run the explain, and make sure the indices are being used.
Innodb builds hash indexes which helps to speed up lookup by indexes by passing BTREE index and using hash, which is faster
Recently I've been doing quite a big project with php + mysql. And now I'm concerned about my mysql. What should I do to make my mysql as optimal as possible? Tell everything you know, I'll be really very grateful.
Second question, I use one mysql query per page load which takes information from mysql. It's quite a big query, because I take information from a few tables with a join. Maybe I should do something else?
Thank you.
Some top tips from MySQL Performance tips forge
Specific Query Performance:
Use EXPLAIN to profile the query
execution plan
Use Slow Query Log (always have it
on!)
Don't use DISTINCT when you have or
could use GROUP BY Insert
performance
Batch INSERT and REPLACE
Use LOAD DATA instead of INSERT
LIMIT m,n may not be as fast as it
sounds
Don't use ORDER BY RAND() if you
have > ~2K records
Use SQL_NO_CACHE when you are
SELECTing frequently updated data or
large sets of data
Avoid wildcards at the start of LIKE
queries
Avoid correlated subqueries and in
select and where clause (try to
avoid in)
Scaling Performance Tips:
Use benchmarking
isolate workloads don't let administrative work interfere with customer performance. (ie backups)
Debugging sucks, testing rocks!
As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
Network Performance Tips:
Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
2. Don't use SELECT *
3. Be wary of lots of small quick queries if a longer query can be more efficient
Use multi_query if appropriate to reduce round-trips
Use stored procedures to avoid bandwidth wastage
OS Performance Tips:
Use proper data partitions
1. For Cluster. Start thinking about Cluster before you need them
Keep the database host as clean as possible. Do you really need a windowing system on that server?
Utilize the strengths of the OS
pare down cron scripts
create a test environment
Learn to use the explain tool.
Three things:
Joins are not necessarily suboptimal. Oftentimes schemata that use joins will be faster than those that achieve the same but avoid table joins. The important thing is to know that your joins are optimal. EXPLAIN is very helpful but you also need to know how indexes work.
If you're grabbing data from the DB on every page hit, consider if a cacheing system would work for you. If so, check out PHP memcache and memcached. It's easy to use in PHP and very fast. It's popular for a reason.
Back to mysql: make sure you're key buffer is sized correctly. You can also think about using dedicated key buffers for critical indices that should remain in cache. Read about CACHE INDEX and LOAD INDEX INTO CACHE. See also here.
"...because I take information from a few tables with a join"
Joins, even "big" joins aren't bad. Just be sure that you have good indexes.
Also note that performance with a couple of records is a lot different than performance with hundreds of thousands of records, so test accordingly.
For performance, this book is good: High Perofmanace MYSQL. The associated blog is good too.
my 2cents: set your log_slow_queries to <2sec and use mysqlsla (get it from hackmysql.com) to analyse the 'slow' queries... Thisway you can just drilldown into the slower queries as they come along...
(the mysqlsla can also benefit from the log-queries-not-using-indexes option)
on mysqlhack.com there's a script called 'mysqlreport' that gives estimates on how your installation is runnig... (once it's running a while) and also gives pointers as to where to tune your setup more precisely...
Being perfect is a bit of a challenge and not the first target to set yourself.
Enable mysql logging of all queries, and write some code which parses the log files and removes any literal values from the SQL statements.
e.g. changes
SELECT * FROM atable WHERE something=5 AND other='splodgy';
and
SELECT * FROM atable WHERE something=1 AND other='zippy';
to something like:
SELECT * FROM atable WHERE something=:1 AND other=:2;
(Sorry, I've not got my code which does this to hand - but it's not rocket science)
Then shove the re-written log into a table so you can prioritize your performance fixes based on length and frequency of execution.
I'm building a PHP page with data sent from MySQL.
Is it better to have
1 SELECT query with 4 table joins, or
4 small SELECT queries with no table join; I do select from an ID
Which is faster and what is the pro/con of each method? I only need one row from each tables.
You should run a profiling tool if you're truly worried cause it depends on many things and it can vary but as a rule its better to have fewer queries being compiled and fewer round trips to the database.
Make sure you filter things as well as you can using your where and join on clauses.
But honestly, it usually doesn't matter since you're probably not going to be hit all that hard compared to what the database can do, so unless optimization is your spec you should not do it prematurely and do whats simplest.
Generally, it's better to have one SELECT statement. One of the main reasons to have databases is that they are fast at processing information, particularly if it is in the format of query.
If there is any drawback to this approach, it's that there are some kinds of analysis that you can't do with one big SELECT statement. RDBMS purists will insist that this is a database design problem, in which case you are back to my original suggestion.
When you use JOINs instead of multiple queries, you allow the database to apply its optimizations. You also are potentially retrieving rows that you don't need (if you were to replace an INNER join with multiple selects), which increases the network traffic between your app server and database server. Even if they're on the same box, this matters.
It might depend on what you do with the data after you fetch it from the DB. If you use each of the four results independently, then it would be more logical and clear to have four separate SELECT statements. On the other hand, if you use all the data together, like to create a unified row in a table or something, then I would go with the single SELECT and JOINs.
I've done a bit of PHP/MySQL work, and I find that even for queries on huge tables with tons of JOINs, the database is pretty good at optimizing - if you have smart indexes. So if you are serious about performance, start reading up on query optimization and indexing.
I would say 1 query with the join. This way you need to hit the server only once. And if your tables are joined with indexes, it should be fast.
Well under Oracle you'd want to take advantage of the query caching, and if you have a lot of small queries you are doing in your sequential processing, it would suck if the last query pushed the first one out of the cache...just in time for you to loop around and run that first query again (with different parameter values obviously) on the next pass.
We were building an XML output file using Java stored procedures and definitely found the round trip times for each individual query were eating us alive. We found it was much faster to get all the data in as few queries as possible, then plug those values into the XML DOM as needed.
The only downside is that the Java code was a bit less elegant, as the data fetch was now remote from its usage. But we had to generate a large complex XML file in as close to zero time as possible, so we had to optimize for speed.
Be careful when dealing with a merge table however. It has been my experience that although a single join can be good in most situations, when merge tables are involved you can run into strange situations.