I am currently working on a very specialized PHP framework, which might have to handle large database transfers.
For example:
Take half of the whole user count; this should be every day's workspace for the framework.
So, if my framework is required by big projects, is it recommend to use single transactions with multiple queries (e.g. doing many things in 1 query with JOINs(?)), or is auto-commit preferred?
If possible, post some blog entries which have discussed this problem.
Thank you. ;)
MyISAM is transactionless, so autocommit does not affect it.
As for InnoDB, autocommit makes repeating queries hundreds of times as slow.
The best decision is of course doing everything set-based, but if you have to do queries in a loop, turn autocommit off.
Also note that "multiple queries" and "doing many things in one query with JOIN" are completely different things.
The latter is an atomic operation which succeeds or fails at once even on MyISAM.
Autocommit does not affect its performance as such: everything inside this query is always done in a single transaction.
Doing many things in one query is actually a preferred way to do many things.
Joins are not just useful they are necessary in any but the simplest of datastructures. To consider writing without them shows a fundamental lack of understanding of relational database design and access to me. Not to use joins would usually result in getting the wrong answer to your question in a select. You may get away without joins in inserts, updates and deletes, but they are often used and useful there as well.
Related
I am building a game site with a lot of queries. For optimisation what is best. handeling the data with a lot of tables and relations or fewer tables but with many fields?
I would think, especially regarding to inserts and updates that fewer fields with many fields would be better than many tables. That would give more queries or???
I'm trying to figure out what is best course I am experiencing high load on my server at the evenings when I have a lot of users...
Start off with the database normalized. Ensure that you have the appropriate indexes for the queries/updates/inserts that you are doing. Use optimize table periodically.
If you are still encountering problems do some profiling to find out where the performance is insufficient. Then consider either denormalizing or perhaps rewriting the queries.
In addition make sure that the system cannot have deadlocks. That really messes up performance.
i don't think the number of columns effects anything, really. it's all about how well you've indexed the columns. if you do more updates then selects on a particular field, you might want to drop the index if you have one.
not really an answer, just something i've noticed.
The creator of No`orm shows that is possible to decompose a join of 3 tables into 3 faster queries: http://www.notorm.com/#performance.
Do you think that is possible to avoid joins and use multiple queries by putting the IDs in the IN statement?
That library (NoORM) do not support join for the above mentioned reason, do you think I could give up using joins and just use that library? It seems strange to me that it is so easy to avoid joins.
For a relatively small scale application, this can be feasible. But in situations where you might be passing many arguments (equivalent to your join condition matching on many rows) this will fail you. The best example I can think of why, is a limitation with Informix, for example, (I don't know if this is true of the latest versions) where prepared statements did not allow for more than 20 (not exact) or so arguments to be passed.
One reason to use joins over individual queries is that it allows the database's optimizer to come up with the best plan for the tables involved, based on the existing data. If you break a joined query out into individual queries, you're effectively hard-coding the execution plan. Among other things, that assumes that you're going to spend some time thinking about the selectivity of your queries when you write your code and that the selectivity will remain the same for the lifetime of your application. Those are pretty big assumptions, in my opinion.
Also, if you're using your database as more than a dumb receptacle, you'll likely find that there are queries that don't break down into individual queries quite so easily (e.g. just about anytime you use aggregate functions).
Basically we have sales people that request leads to call. Right now it tried a "fresh lead" query to get those.
If there aren't any fresh leads it moves on to a "relatively new" query. We call these "sources" and essentially a closer will go through sources until they find a viable lead.
These queries all query the same table, just different groups of data. However, there is a lot of complex sorting on each query and between that and inserts/updates to the table (table being InnoDB) we're experience lots of waits (no deadlocks i'm pretty sure since they don't show in InnoDB status) so my guess is we have slow selects, coupled with lots of inserts/updates.
NOW, the ultimate question IS:
Should we query the DB for each source and grab about 100ish (obviously variable depending on the system) and cache them in memcached. Then, as closers request leads, send them from cache but update the cache to reflect an "is_acccepted" flag. This way we only call each source as we run out of cached leads so just once as we run out, instead of once per closer requesting a lead?
Then we can use simulated locking with memcached - http://code.google.com/p/memcached/wiki/FAQ#Emulating_locking_with_the_add_command
Does this seem like a viable solution? Any recommendations? We need to minimize the chances of lock waits desperately and quickly.
Sounds viable, but have you looked at your indexes and are you using proper isolation levels on your selects?
Previous SO question may help with the answer your seeking: Any way to select without causing locking in MySQL?
If you perform your select/update in a SP with full transaction's this could also speed things up quite a bit due to optimization. Of course, there are times when SP's in MySQL are much slower :(
I'd have put this as a comment, but haven't reached that level yet :)
And I did read the part about inno-db, but experience has shown me improvements even with inno when using isolation levels.
You should definitely look at making sure your DB queries are fully optimized before you employ another datastore.
If you do decide to cache this data then consider using Redis, which makes lists first class citizens.
I have heard about this problem and now I am looking for more specific information?
How does it happens, what are the reasons for that, detailed explanation of the mechanism of the deadlock to try to avoid it. How to detect the deadlock, solve it and protect the data from being corrupted because of it. The case is when using MySQL with PHP.
And can I mix the InnoDB and MyISAM? I intend to use innoDB for some majo rtables with many relationships and not that much data, as users, roles, privileges, companies, etc. and use MyISAM for tables that hold more data: customers data, actions data, etc. I would like to use only InnoDB, but the move from MyISAM scares me a bit in terms of speed and stability. And now this deadlocks :(
Deadlocks can occur if you've got two or more independent queries accessing the same resources (tables/rows) at the same time. A real world example:
Two mechanics are working on two cars. At some point during the repair, they both need a screwdriver and a hammer to loosen some badly stuck part. Mechanic A grabs the screwdriver, Mechanic B grabs the hammer, and now neither can continue, as the second tool they need is not available: they're deadlocked.
Now, humans are smart and one of the mechanics will be gracious and hand over their tool to the other: both can continue working. Databases are somewhat stupid, and neither query will be gracious and unlock whatever resource is causing the deadlock. At this point, the DBMS will turn Rambo and force a roll back (or just kill) one or more of the mutually locked queries. That will let one lucky query continue and proceed to get the locks/transactions it needs, and hopefully the aborted ones have smart enough applications handling them which will restart the transactions again later. On older/simpler DBMSs, the whole system would grind to a halt until the DBA went in and did some manual cleanup.
There's plenty of methods for coping with deadlocks, and avoiding them in the first place. One big one is to never lock resources in "random" orders. In our mechanics' case, both should reach for the screwdriver first, before reaching for the hammer. That way one can successfully working immediately, while the other one will know he has to wait.
As for mixing InnodB/MyISAM - MySQL fully supports mixing/matching table types in queries. You can select/join/update/insert/delete/alter in any order you want, just remember that doing anything to a MyISAM table within an InnoDB transaction will NOT make MyISAM magically transaction aware. The MyISAM portions will execute/commit immediately, and if you roll back the InnoDB side of things, MyISAM will not roll back as well.
The only major reason to stick with MyISAM these days is its support for fulltext indexing. Other than that, InnoDB will generally be the better choice, as it's got full transaction support and row-level locking.
I'm building a PHP page with data sent from MySQL.
Is it better to have
1 SELECT query with 4 table joins, or
4 small SELECT queries with no table join; I do select from an ID
Which is faster and what is the pro/con of each method? I only need one row from each tables.
You should run a profiling tool if you're truly worried cause it depends on many things and it can vary but as a rule its better to have fewer queries being compiled and fewer round trips to the database.
Make sure you filter things as well as you can using your where and join on clauses.
But honestly, it usually doesn't matter since you're probably not going to be hit all that hard compared to what the database can do, so unless optimization is your spec you should not do it prematurely and do whats simplest.
Generally, it's better to have one SELECT statement. One of the main reasons to have databases is that they are fast at processing information, particularly if it is in the format of query.
If there is any drawback to this approach, it's that there are some kinds of analysis that you can't do with one big SELECT statement. RDBMS purists will insist that this is a database design problem, in which case you are back to my original suggestion.
When you use JOINs instead of multiple queries, you allow the database to apply its optimizations. You also are potentially retrieving rows that you don't need (if you were to replace an INNER join with multiple selects), which increases the network traffic between your app server and database server. Even if they're on the same box, this matters.
It might depend on what you do with the data after you fetch it from the DB. If you use each of the four results independently, then it would be more logical and clear to have four separate SELECT statements. On the other hand, if you use all the data together, like to create a unified row in a table or something, then I would go with the single SELECT and JOINs.
I've done a bit of PHP/MySQL work, and I find that even for queries on huge tables with tons of JOINs, the database is pretty good at optimizing - if you have smart indexes. So if you are serious about performance, start reading up on query optimization and indexing.
I would say 1 query with the join. This way you need to hit the server only once. And if your tables are joined with indexes, it should be fast.
Well under Oracle you'd want to take advantage of the query caching, and if you have a lot of small queries you are doing in your sequential processing, it would suck if the last query pushed the first one out of the cache...just in time for you to loop around and run that first query again (with different parameter values obviously) on the next pass.
We were building an XML output file using Java stored procedures and definitely found the round trip times for each individual query were eating us alive. We found it was much faster to get all the data in as few queries as possible, then plug those values into the XML DOM as needed.
The only downside is that the Java code was a bit less elegant, as the data fetch was now remote from its usage. But we had to generate a large complex XML file in as close to zero time as possible, so we had to optimize for speed.
Be careful when dealing with a merge table however. It has been my experience that although a single join can be good in most situations, when merge tables are involved you can run into strange situations.