I am building an application that requires a MySQL table to be emptied and refilled with fresh data every minute. At the same time, it is expected that the table will receive anywhere from 10-15 SELECT statements per second constantly. The SELECT statements should in general be very fast (selecting 10-50 medium length strings every time). A few things I'm worried about:
Is there the potential for a SELECT query to run in between the TRUNCATE and UPDATE queries as to return 0 rows? Do I need to lock the table when executing the TRUNCATE-UPDATE query pair?
Are there any significant performance issues I should worry about regarding this setup?
There most propably is a better way to achieve your goal. But here's a possible answer to your question anyway: You can encapsulate queries that are meant to be executed together in a transaction. Off the top of my head something like
BEGIN TRANSACTION;
TRUNCATE foo;
INSERT INTO foo ...;
COMMIT;
EDIT: The above part is plain wrong, see Philip Devine's comment. Thanks.
Regarding the performance question: Repeatedly connecting to the server can be costly. If you have a persistent connection, you should be fine. You can save little bits here and there by executing multiple queries in a batch or using Prepared Statements.
Why do you need to truncate it every minute? Yes that will result in your users having no rows returned. Just update the rows instead of truncate and insert.
A second option is to insert the new values into a new table, rename the two tables as so:
RENAME TABLE tbl_name TO new_tbl_name
[, tbl_name2 TO new_tbl_name2]
Then truncate the old table.
Then your users see zero down time. The truncate in the other answer ignores transactions and happens immediately so dont do that!!
Related
I have to insert data into MySQL database (appox. 200,000). I am a little confused about the insert query. I have two options to insert data into MySQL:
INSERT INTO paper VALUES('a','b','c','d');
INSERT INTO paper VALUES('e','f','g','h');
INSERT INTO paper VALUES('k','l','m','n');
and
INSERT INTO paper VALUES('a','b','c','d'),('e','f','g','h'),('k','l','m','n');
Which insert query performs faster? What is the difference between the queries?
TL;TR
The second query will be faster. Why? Read below...
Basically, a query is executed in various steps:
Connecting: Both versions of your code have to do this
Sending query to server: Applies to both versions, only the second version sends only one query
Parsing query: Same as above, both versions need the queries to be parsed, the second version needs only 1 query to be parsed, though
Inserting row: Same in both cases
Inserting indexes: Again, same in both cases in theory. I'd expect MySQL to build update the index after the bulk insert in the second case, making it potentially faster.
Closing: Same in both cases
Of course, this doesn't tell the whole story: Table locks have an impact on performance, the MySQL config, use of prepared statements and transactions might result in better (or worse) performance, too. And of course, the way your DB server is set up makes a difference, too.
So we return to the age-old mantra:
When in doubt: test!
Depending on what your tests tell you, you might want to change some configuration, and test again until you find the best config.
In case of a big data-set, the ideal compromise will probably be a combination of both versions:
LOCK TABLE paper WRITE
/* chunked insert, with lock, probably add transaction here, too */
INSERT INTO paper VALUES ('a', 'z'), ('b','c');
INSERT INTO paper VALUES ('a', 'z'), ('b','c');
UNLOCK TABLES;
Just RTM - MySQL insert speed:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster. See Section 5.1.4, “Server System Variables”.
If you can't use multiple values, then locking is an easy way to speed up the inserts too, as explained on the same page:
To speed up INSERT operations that are performed with multiple statements for nontransactional tables, lock your tables:
LOCK TABLES a WRITE;
INSERT INTO a VALUES (1,23),(2,34),(4,33);
INSERT INTO a VALUES (8,26),(6,29);
/* ... */
UNLOCK TABLES;
This benefits performance because the index buffer is flushed to disk only once, after all INSERT statements have completed. Normally, there would be as many index buffer flushes as there are INSERT statements. Explicit locking statements are not needed if you can insert all rows with a single INSERT.
Read through the entire page for details
I'm not sure which is faster in purely database-side manner. But when you call database from your PHP scripts, then second way should be much faster as you save resources on multiple calls.
Anyway. There is just one way to know. TEST IT.
There is a large table that holds millions of records. phpMyAdmin reports 1.2G size for the table.
There is a calculation that needs to be done for every row. The calculation is not simple (cannot be put in set col= calc format), it uses a stored function to get the values, so currently we have for each row a single update.
This is extremely slow and we want to optimize it.
Stored function:
https://gist.github.com/a9c2f9275644409dd19d
And this is called by this method for every row:
https://gist.github.com/82adfd97b9e5797feea6
This is performed on a off live server, and usually it is updated once per week.
What options we have here.
Why not setup a separate table to hold the computed values to take the load off your current table. It can have two columns: primary key for each row in your main table and a column for the computed value.
Then your process can be:
a) Truncate computedValues table - This is faster than trying to identify new rows
b) Compute the values and insert into the computed values table
c) So when ever you need your computed values you join to the computedValues table using a primary key join which is fast, and in case you need more computations well you just add new columns.
d) You can also update the main table using the computed values if you have to
Well, the problem doesn't seem to be the UPDATE query because no calculations are performed in the query itself. As it seems the calculations are performed first and then the UPDATE query is run. So the UPDATE should be quick enough.
When you say "this is extremely slow", I assume you are not referring to the UPDATE query but the complete process. Here are some quick thoughts:
As you said there are millions of records, updating those many entries is always time consuming. And if there are many columns and indexes defined on the table, it will add to the overhead.
I see that there are many REPLACE INTO queries in the function getNumberOfPeople(). These might as well be a reason for the slow process. Have you checked how efficient are these REPLACE INTO queries? Can you try removing them and then see if it has any impact on the UPDATE process.
There are a couple of SELECT queries too in getNumberOfPeople(). Check if they might be impacting the process and if so, try optimizing them.
In procedure updateGPCD(), you may try replacing SELECT COUNT(*) INTO _has_breakdown with SELECT COUNT(1) INTO _has_breakdown. In the same query, the WHERE condition is reading _ACCOUNT but this will fail when _ACCOUNT = 0, no?
On another suggestion, if it is the UPDATE that you think is slow because of reason 1, it might make sense to move the updating column gpcd outside usage_bill to another table. The only other column in the table should be the unique ID from usage_bill.
Hope the above make sense.
I have mysql table fg_stock. Most of the time concurrent access is happening in this table. I used this code but it doesn't work:
<?php
mysql_query("LOCK TABLES fg_stock READ");
$select=mysql_query("SELECT stock FROM fg_stock WHERE Item='$item'");
while($res=mysql_fetch_array($select))
{
$stock=$res['stock'];
$close_stock=$stock+$qty_in;
$update=mysql_query("UPDATE fg_stock SET stock='$close_stock' WHERE Item='$item' LIMIT 1");
}
mysql_query("UNLOCK TABLES");
?>
Is this okay?
"Most of the time concurrent access is happening in this table"
So why would you want to lock the ENTIRE table when it's clear you are attempting to access a specific row from the table (WHERE Item='$item')? Chances are you are running a MyISAM storage engine for the table in question, you should look into using the InnoDB engine instead, as one of it's strong points is that it supports row level locking so you don't need to lock the entire table.
Why do you need to lock your table anyway?????
mysql_query("UPDATE fg_stock SET stock=stock+$qty_in WHERE Item='$item'");
That's it! No need in locking the table and no need in unnecessary loop with set of queries. Just try to avoid SQL Injection by using intval php function on $qty_in (if it is an integer, of course), for example.
And, probably, time concurrent access is only happens due to non-optimized work with database, with the excessive number of queries.
ps: moreover, your example does not make any sense as mysql could update the same record all the time in the loop. You did not tell MySQL which record exactly do you want to update. Only told to update one record with Item='$item'. At the next iteration the SAME record could be updated again as MySQL does not know about the difference between already updated records and those that it did not touched yet.
http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html
mysql> LOCK TABLES real_table WRITE, temp_table WRITE;
mysql> INSERT INTO real_table SELECT * FROM temp_table;
mysql> DELETE FROM temp_table;
mysql> UNLOCK TABLES;
So your syntax is correct.
Also from another question:
Troubleshooting: You can test for table lock success by trying to work
with another table that is not locked. If you obtained the lock,
trying to write to a table that was not included in the lock statement
should generate an error.
You may want to consider an alternative solution. Instead of locking,
perform an update that includes the changed elements as part of the
where clause. If the data that you are changing has changed since you
read it, the update will "fail" and return zero rows modified. This
eliminates the table lock, and all the messy horrors that may come
with it, including deadlocks.
PHP, mysqli, and table locks?
I have frequent updates to a user table that simply sets the last seen time of a user, and I was wondering whether there is a simple way to defer them and group them into a single query after a short timeout (5 minutes or so). This would reduce queries on my user database quite a lot.
If you do a UPDATE LOW_PRIORITY table ... you will make sure it will only execute your update when it's not doing anything else. Besides that I don't think there are much options inside MySQL.
Also, is it causing problems now or are you simply optimizing something that isn't a problem? Personally, if I would batch updates like these I would simply insert all the IDs in memcached and use a cronjob to update every 5 minutes.
Wolph's suggestion should do the trick. Also possible is to create a second table without any indices on it and insert all your data into that table. It can even be an in memory table. Then you an do a periodic INSERT INTO table1 SELECT * FROM TABLE2 ON DUPLICATE KEY UPDATE ... to transfer to the main table.
I am trying to find the fastest way to insert data into a table (data from a select)
I always clear the table:
TRUNCATE TABLE table;
Then I do this to insert the data:
INSERT INTO table(id,total) (SELECT id, COUNT(id) AS Total FROM table2 GROUP BY id);
Someone told me I shouldn't do this.
He said this would be much faster:
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey)) SELECT id, count(id) AS total FROM table2 GROUP BY id
Any ideas on this one?
I think my solution is cleaner, because I don't have to check for the table.
This will be ran in a cron job a few times a day
EDIT: I wasn't clear. The truncate is always ran. It's just the matter of the fastest why to insert all the data
I also think your solution is cleaner, plus the solution by "someone" looks to me to have some problems:
it does not actually delete old data that may be in the table
create table...select will create table columns with types based on what the select returns. That means changes in the table structure of table2 will propagate to table. That may or may not be what you want. It at least introduces an implicit coupling, which I find to be a bad idea.
As for performance, I see no reason why one should be faster than the other. So the usual advice applies: Choose the cleanest, most maintainable solution, test it, only optimize if performance is a problem :-).
Your solution would be my choice, the performance difference loss (if any, which I'm not sure because you don't drop/create the table and re-compute column type) is negligible and IMHO overweight cleanliness.
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey))
SELECT id, count(id) AS total
FROM table2
GROUP BY
id
This will not delete old values from the table.
If that's what you want, it will be faster indeed.
Perhaps something has been lost in the translation between your Someone and yourself. One possibility s/he might have been referring to is DROP/SELECT INTO vs TRUNCATE/INSERT.
I have heard that the latter is faster as it is minimally logged (but then again, what's the eventual cost of the DROP here?). I have no hard stats to back this up.
I agree with "sleske"s suggestion in asking you test it and optimize the solution yourself. DIY!
Every self respecting DB will give you the opportunity to rollback your transaction.
1. Rolling back your INSERT INTO... will require DB to keep track of every row inserted into the table
2. Rolling back the CREATE TABLE... is super easy for the DB - Simply get rid of the table.
Now, if you were designing & coding the DB, which would be faster? 1 or 2?
"someone"s suggestion DOES have merit especially if you are using Oracle.
Regards,
Shiva
I'm sure that any time difference is indistinguishable, but yours is IMHO preferable because it's one SQL statement rather than two; any change in your INSERT statement doesn't require more work on the other statement; and yours doesn't require the host to validate that your INSERT matches the fields in the table.
From the manual: Beginning with MySQL 5.1.32, TRUNCATE is treated for purposes of binary logging and replication as DROP TABLE followed by CREATE TABLE — that is, as DDL rather than DML. This is due to the fact that, when using InnoDB and other transactional storage engines where the transaction isolation level does not allow for statement-based logging (READ COMMITTED or READ UNCOMMITTED), the statement was not logged and replicated when using STATEMENT or MIXED logging mode.
You can simplify your insert to:
INSERT INTO table
( SELECT id, COUNT(id) FROM table2 GROUP BY id );