I have frequent updates to a user table that simply sets the last seen time of a user, and I was wondering whether there is a simple way to defer them and group them into a single query after a short timeout (5 minutes or so). This would reduce queries on my user database quite a lot.
If you do a UPDATE LOW_PRIORITY table ... you will make sure it will only execute your update when it's not doing anything else. Besides that I don't think there are much options inside MySQL.
Also, is it causing problems now or are you simply optimizing something that isn't a problem? Personally, if I would batch updates like these I would simply insert all the IDs in memcached and use a cronjob to update every 5 minutes.
Wolph's suggestion should do the trick. Also possible is to create a second table without any indices on it and insert all your data into that table. It can even be an in memory table. Then you an do a periodic INSERT INTO table1 SELECT * FROM TABLE2 ON DUPLICATE KEY UPDATE ... to transfer to the main table.
Related
I am building an application that requires a MySQL table to be emptied and refilled with fresh data every minute. At the same time, it is expected that the table will receive anywhere from 10-15 SELECT statements per second constantly. The SELECT statements should in general be very fast (selecting 10-50 medium length strings every time). A few things I'm worried about:
Is there the potential for a SELECT query to run in between the TRUNCATE and UPDATE queries as to return 0 rows? Do I need to lock the table when executing the TRUNCATE-UPDATE query pair?
Are there any significant performance issues I should worry about regarding this setup?
There most propably is a better way to achieve your goal. But here's a possible answer to your question anyway: You can encapsulate queries that are meant to be executed together in a transaction. Off the top of my head something like
BEGIN TRANSACTION;
TRUNCATE foo;
INSERT INTO foo ...;
COMMIT;
EDIT: The above part is plain wrong, see Philip Devine's comment. Thanks.
Regarding the performance question: Repeatedly connecting to the server can be costly. If you have a persistent connection, you should be fine. You can save little bits here and there by executing multiple queries in a batch or using Prepared Statements.
Why do you need to truncate it every minute? Yes that will result in your users having no rows returned. Just update the rows instead of truncate and insert.
A second option is to insert the new values into a new table, rename the two tables as so:
RENAME TABLE tbl_name TO new_tbl_name
[, tbl_name2 TO new_tbl_name2]
Then truncate the old table.
Then your users see zero down time. The truncate in the other answer ignores transactions and happens immediately so dont do that!!
I am building a PHP RESTful-API for remote "worker" machines to self-assign tasks. The MySQL InnoDB table on the API host holds pending records that the workers can pick up from the API whenever they are ready to work on a record. How do I prevent concurrently requesting worker system from ever getting the same record?
My initial plan to prevent this is to UPDATE a single record with a uniquely generated ID in a default NULL field, and then poll for the details of the record where the unique ID field matches.
For example:
UPDATE mytable SET status = 'Assigned', uniqueidfield = '3kj29slsad'
WHERE uniqueidfield IS NULL LIMIT 1
And in the same PHP instance, the next query:
SELECT id, status, etc FROM mytable WHERE uniqueidfield = '3kj29slsad'
The resulting record from the SELECT statement above is then given to the worker. Would this prevent simultaneously requesting workers from getting the same records shown to them? I am not exactly sure on how MySQL handles the lookups within an UPDATE query, and if two UPDATES could "find" the same record, and then update it sequentially. If this works, is there a more elegant or standardized way of doing this (not sure if FOR UPDATE would need to be applied to this)? Thanks!
Nevermind my previous answer. I believe I understand what you are asking. I'll reword it so maybe it is clearer to others.
"If I issue two of the above update statements at the same time, what would happen?"
According to http://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html, the second statement would not interfere with the first one.
Normally, you do not need to lock tables, because all single UPDATE
statements are atomic; no other session can interfere with any other
currently executing SQL statement.
A more elegant way is probably opinion based, but I don't see anything wrong with what you're doing.
Does anyone have any recommendations how to implement this?
table1 will constantly be INSERTed into. This necessitates that every row on table2 be UPDATEd upon each table1 INSERT. Also, an algorithm that I don't know if MySQL would be best responsible for (vs PHP calculation speed) also has to be applied to each row of table2.
I wanted to have PHP handle it whenever the user did the INSERT, but I found out that PHP pages are not persistent after servering the connection to the user (or so I understand, please tell me that's wrong so I can go that route).
So now my problem is that if I use a total table UPDATE in a TRIGGER, I'll have locks galore (or so I understand from InnoDB's locking when UPDATing an entire table with a composite primary key since part of that key will be UPDATEd).
Now, I'm thinking of using a cron job, but I'd rather they fire upon a user's INSERT on table1 instead of on a schedule.
So I was thinking maybe a CURSOR...
What way would be fastest and "ABSOLUTELY" NO LOCKING on table2?
Many thanks in advance!
Table structure
table2 is all INTs for speed. However, it has a 2 column primary key. 1 of those columns is what's being UPDATEd. That key is for equally important rapid SELECTs.
table1 averages about 2.5x the number of rows of table2.
table2 is actually very small, ~200mb.
First of all: What you try is close to impossible - I don't know of an RDBMS, that can escalate INSERTs into one table into UPDATEs of another with "ABSOLUTELY NO LOCKING".
That said:
my first point of research would be, whether the schema could be overhauled to optimize this hotspot away.
if this cannot be achieved, you might want to look into making table2 an in-memory type that can be recreated from existing data (such as keeping snapshots of it together with the max PK of table1 and rolling forward if a DB restart is required). Since you need to update all rows on every INSERT into table1 it cannot be very big.
Next point of research would be to put the INSERT and the UPDATE into a stored procedure, that is called by the insertion logic. This would make a runaway situation with the resulting locking hell on catchup much less likely.
There is a large table that holds millions of records. phpMyAdmin reports 1.2G size for the table.
There is a calculation that needs to be done for every row. The calculation is not simple (cannot be put in set col= calc format), it uses a stored function to get the values, so currently we have for each row a single update.
This is extremely slow and we want to optimize it.
Stored function:
https://gist.github.com/a9c2f9275644409dd19d
And this is called by this method for every row:
https://gist.github.com/82adfd97b9e5797feea6
This is performed on a off live server, and usually it is updated once per week.
What options we have here.
Why not setup a separate table to hold the computed values to take the load off your current table. It can have two columns: primary key for each row in your main table and a column for the computed value.
Then your process can be:
a) Truncate computedValues table - This is faster than trying to identify new rows
b) Compute the values and insert into the computed values table
c) So when ever you need your computed values you join to the computedValues table using a primary key join which is fast, and in case you need more computations well you just add new columns.
d) You can also update the main table using the computed values if you have to
Well, the problem doesn't seem to be the UPDATE query because no calculations are performed in the query itself. As it seems the calculations are performed first and then the UPDATE query is run. So the UPDATE should be quick enough.
When you say "this is extremely slow", I assume you are not referring to the UPDATE query but the complete process. Here are some quick thoughts:
As you said there are millions of records, updating those many entries is always time consuming. And if there are many columns and indexes defined on the table, it will add to the overhead.
I see that there are many REPLACE INTO queries in the function getNumberOfPeople(). These might as well be a reason for the slow process. Have you checked how efficient are these REPLACE INTO queries? Can you try removing them and then see if it has any impact on the UPDATE process.
There are a couple of SELECT queries too in getNumberOfPeople(). Check if they might be impacting the process and if so, try optimizing them.
In procedure updateGPCD(), you may try replacing SELECT COUNT(*) INTO _has_breakdown with SELECT COUNT(1) INTO _has_breakdown. In the same query, the WHERE condition is reading _ACCOUNT but this will fail when _ACCOUNT = 0, no?
On another suggestion, if it is the UPDATE that you think is slow because of reason 1, it might make sense to move the updating column gpcd outside usage_bill to another table. The only other column in the table should be the unique ID from usage_bill.
Hope the above make sense.
I have a scraper which visits many sites and finds upcoming events and another script which is actually supposed to put them in the database. Currently the inserting into the database is my bottleneck and I need a faster way to batch the queries than what I have now.
What makes this tricky is that a single event has data across three tables which have keys to each other. To insert a single event I insert the location or get the already existing id of that location, then insert the actual event text and other data or get the event id if it already exists (some are repeating weekly etc.), and finally insert the date with the location and event ids.
I can't use a REPLACE INTO because it will orphan older data with those same keys. I asked about this in Tricky MySQL Batch Query but if TLDR the outcome was I have to check which keys already exist, preallocate those that don't exist then make a single insert for each of the tables (i.e. do most of the work in php). That's great but the problem is that if more than one batch was processing at a time, they could both choose to preallocate the same keys then overwrite each other. Is there anyway around this because then I could go back to this solution? The batches have to be able to work in parallel.
What I have right now is that I simply turn off the indexing for the duration of the batch and insert each of the events separately but I need something faster. Any ideas would be helpful on this rather tricky problem. (The tables are InnoDB now... could transactions help solve any of this?)
I'd recommend starting with Mysql Lock Tables which you can use to prevent other sessions from writing to the tables whilst you insert your data.
For example you might do something similar to this
mysql_connect("localhost","root","password");
mysql_select_db("EventsDB");
mysql_query("LOCK TABLE events WRITE");
$firstEntryIndex = mysql_insert_id() + 1;
/*Do stuff*/
...
mysql_query("UNLOCK TABLES);
The above does two things. Firstly it locks the table preventing other sessions from writing to it until you the point where you're finished and the unlock statement is run. The second thing is the $firstEntryIndex; which is the first key value which will be used in any subsequent insert queries.