MySQL/InnoDB transactions with table locks in InnoDB - php

I did a lot of research and I found a lot of information about all the relevant topics. However, I am not confident that I do now understand, how to put all this information together properly.
This application is written in PHP.
For queries I use PDO.
The MySQL database is configured as InnoDB.
What I need
SELECT ... FROM tableA;
// PHP looks at what comes back and does some logic.
INSERT INTO tableA ...;
INSERT INTO tableB ...;
Conditions:
The INSERTs need to be atomic. If one of them fails I want to roll back.
No reads and writes from/to tableA are allowed to happen between the SELECT and the INSERT from/into tableA.
This to me looks like a very simple problem. Yet I am not able to figure out, how to do this properly. So my question is:
What is the best approach?
This is an outline for my current plan, heavily simplified:
try {
SET autocommit = 0;
BLOCK TABLES tableA WRITE, tableB WRITE;
SELECT ... FROM tableA;
INSERT INTO tableA ...;
INSERT INTO tableB ...;
COMMIT;
UNLOCK TABLES;
SET autocommit = 1;
}
catch {
ROLLBACK;
UNLOCK TABLES;
SET autocommit = 1;
}
I feel like there is a lot that could be done better, but I don't know how :/
Why do it like this?
I need some kind of transaction to be able to do a rollback if INSERTs fail.
I need to lock tableA to make sure that no other INSERTs or UPDATEs take place.
Transactions and table locks don't work well together
(https://dev.mysql.com/doc/refman/8.0/en/lock-tables-and-transactions.html)
I want to use autocommit as a standard throughout the rest of my application, which is why I set it back to "1" at the end.
I am really not sure about this: But I somewhere picked up, that after locking a table, I can (from within the current connection) only query to this table until I unlock it (this does not make sense to me). This is why I locked tableB too, altough otherwise I wouldn't need to.
I am open for completely different approaches
I am open for any suggestion within the framework conditions PHP, MySQL, PDO and InnoDB.
Thank You!
Edit 1 (2018-06-01)
I feel like my problem/question needs some more clarification.
Starting point:
If have two tables, t1 and t2.
t1 has multiple columns of non-unique values.
The specifics of t2 are irrelevant for this problem.
What I want to do:
Step by step:
Select multiple columns and rows from t1.
In PHP analyse the retrieved data. Based on the results of this analysis put together a dataset.
INSERT parts of the dataset into t1 and parts of it into t2.
Additional information:
The INSERTs into the 2 tables must be atomic. This can be achieved using transactions.
No INSERTs from a different connection are allowed to occur between steps 1 and 3. This is very important, because every single INSERT into t1 has to occur with full awareness of the current state of the table. I'll best describe this in more detail. I will leave t2 out of this for now, to make things easier to understand.
Imagine this sequence of events (connections con1 and con2):
con1: SELECT ... FROM t1 WHERE xyz;
con1: PHP processes the information.
con2: SELECT ... FROM t1 WHERE uvw;
con2: PHP processes the information.
con1: INSERT INTO t1 ...;
con2: INSERT INTO t1 ...;
So both connections see t1 in the same state. However, they select different information. Con1 takes the information gathered, does some logic with it and then INSERTs data into a new row in t1. Con2 does the same, but using different information.
The problem is this: Both connections INSERTed data based on calculations that did not take into account whatever the other connection inserted into t1, because this information wasn't there when they read from t1.
Con2 might have inserted a row into t1 that would have met the WHERE-conditions of con1's SELECT-statement. In other words: Had con2 inserted its row earlier, con1 might have created completely different data to insert into t1. This is to say: The two INSERTs might have completely invalidated each others inserts.
This is why I want to make sure, that only one connection can work with the data in t1 at a time. No other connection is allowed to write, but also no other connection is allowed to read until the current connection is done.
I hope this clarifies things a bit... :/
Thoughts:
My thoughts were:
I need to make the INSERTs into the 2 tables atomic. --> I will use a transaction for this. Something like this:
try {
$pdo->beginTransaction();
// INSERT INTO t1 ...
// INSERT INTO t2 ...
$pdo->commit();
}
catch (Exception $e) {
$pdo->rollBack();
throw $e;
}
I need to make sure, no other connection writes to or reads from t1. This is where I decided that I need LOCK TABLES.
Assuming I had to use LOCK TABLES, I was confronted with the problem that LOCK TABLES is not transaction aware. Which is why I decided to go with the solution proposed here (https://dev.mysql.com/doc/refman/8.0/en/lock-tables-and-transactions.html) and also in multiple answers on stackoverflow.
But I wasn't happy with how the code looked like, which is why I came here to ask this (meanwhile rather lengthy) question.
Edit 2 (2018-06-01)
This process will not run often. So there is no significant need for high performance and effiency. This, of course, also means that the chances of two of those processes infering with eachother are rather minute. Stil, I'd like to make sure nothing can happen.

Case 1:
BEGIN;
INSERT ..
INSERT ..
COMMIT;
Other connections will not see the inserted rows until after the commit. That is, BEGIN...COMMIT made the two inserts "atomic".
If anything fails, you still need the try/catch to deal with it.
Do not use LOCK TABLES on InnoDB tables.
Don't bother with autocommit; BEGIN..COMMIT overrides it.
My statements apply to (probably) all frameworks. (Except that some do not have "try" and "catch".)
Case 2: Lock a row in anticipation of possibly modifying it:
BEGIN;
SELECT ... FROM t1 FOR UPDATE;
... work with the values SELECTed
UPDATE t1 ...;
COMMIT;
This keeps others away from the rows SELECTed until after the COMMIT.
Case 3: Sometimes IODKU is useful to do two things in a single atomic statement:
INSERT ...
ON DUPLICATE KEY UPDATE ...
instead of
BEGIN;
SELECT ... FOR UPDATE;
if no row found
INSERT ...;
else
UPDATE ...;
COMMIT;
Class 4: Classic banking example:
BEGIN;
UPDATE accounts SET balance = balance - 1000.00 WHERE id='me';
... What if crash occurs here? ...
UPDATE accounts SET balance = balance + 1000.00 WHERE id='you';
COMMIT;
If the system crashes between the two UPDATEs, the first update will be undone. This keeps the system from losing track of the funds transfer.
Case 5: Perhaps close to what the OP wants. It is mostly a combination of Cases 2 and 1.
BEGIN;
SELECT ... FROM t1 FOR UPDATE; -- see note below
... work with the values SELECTed
INSERT INTO t1 ...;
COMMIT;
Notes on Case 5: The SELECT..FOR UPDATE must include any rows that you don't want the other connection to see. This has the effect of delaying the other connection until this connection COMMITs. (Yes, this feels a lot like LOCK TABLES t1 WRITE.)
Case 6: The "processing" that needs to be inside the BEGIN..COMMIT will take too long. (Example: the typical online shopping cart.)
This needs a locking mechanism outside of InnoDB's transactions. One way (useful for shopping cart) is to use a row in some extra table, and have everyone check it. Another way (more practical within a single connection) is to use GET_LOCK('foo') and it's friends.
General Discussion
All of the above examples lock only the row(s) involved, not the entire table(s). This makes the action much less invasive, and allows for the system to handle much more activity.
Also, read about MVCC. This is a general technique used under the cover to let one connection see the values of the table(s) at some instant in time, even while other connections are modifying the table(s).
"Prevent inserts" -- With MVCC, if you start a SELECT, it is like getting a snapshot in time of everything you are looking at. You won't see the INSERTs until after you complete the transaction that the SELECT is in. You can have your cake and eat it, too. That is, it appears as if the inserts were blocked, but you get the performance benefit of them happening in parallel. Magic.

Related

cron or TRIGGER on continually INSERTing table1 or CURSOR (or alternative) to UPDATE 1m row InnoDB table2 without locking?

Does anyone have any recommendations how to implement this?
table1 will constantly be INSERTed into. This necessitates that every row on table2 be UPDATEd upon each table1 INSERT. Also, an algorithm that I don't know if MySQL would be best responsible for (vs PHP calculation speed) also has to be applied to each row of table2.
I wanted to have PHP handle it whenever the user did the INSERT, but I found out that PHP pages are not persistent after servering the connection to the user (or so I understand, please tell me that's wrong so I can go that route).
So now my problem is that if I use a total table UPDATE in a TRIGGER, I'll have locks galore (or so I understand from InnoDB's locking when UPDATing an entire table with a composite primary key since part of that key will be UPDATEd).
Now, I'm thinking of using a cron job, but I'd rather they fire upon a user's INSERT on table1 instead of on a schedule.
So I was thinking maybe a CURSOR...
What way would be fastest and "ABSOLUTELY" NO LOCKING on table2?
Many thanks in advance!
Table structure
table2 is all INTs for speed. However, it has a 2 column primary key. 1 of those columns is what's being UPDATEd. That key is for equally important rapid SELECTs.
table1 averages about 2.5x the number of rows of table2.
table2 is actually very small, ~200mb.
First of all: What you try is close to impossible - I don't know of an RDBMS, that can escalate INSERTs into one table into UPDATEs of another with "ABSOLUTELY NO LOCKING".
That said:
my first point of research would be, whether the schema could be overhauled to optimize this hotspot away.
if this cannot be achieved, you might want to look into making table2 an in-memory type that can be recreated from existing data (such as keeping snapshots of it together with the max PK of table1 and rolling forward if a DB restart is required). Since you need to update all rows on every INSERT into table1 it cannot be very big.
Next point of research would be to put the INSERT and the UPDATE into a stored procedure, that is called by the insertion logic. This would make a runaway situation with the resulting locking hell on catchup much less likely.

Locking row for two updates

I need to do two updates to rows but I need to make sure they are done together and that no other query from another user could interfere with them. I know about SELECT...FOR UPDATE but I imagine after the first update it will of course be unlocked which means someone could interfere with the second update. If someone else updates that row first, the update will work but will mess up the data. Is there anyway to ensure that the two updates happen how they are supposed to? I have been told about transactions but as far as I know they are only good for making sure the two updates actually happen and not whether they happen "together," unless I am mistaken and the rows will be locked until the transaction is committed?
Here are the queries:
SELECT z FROM table WHERE id='$id'
UPDATE table SET x=x+2 WHERE x>z
UPDATE table SET y=y+2 WHERE y>z
I made a mistake and didn't give full information. That was my fault. I have updated the queries. The issue I have is that z can be updated as well. If z is updated after the SELECT but before the other two updates, the data can get messed up. Does doing the transaction BEGIN/COMMIT work for that?
Learn about TRANSACTION
http://dev.mysql.com/doc/refman/5.0/en/commit.html
[... connect ...]
mysql_query("BEGIN");
$query1 = mysql_query('UPDATE table SET x=x+2 WHERE x>y');
$query2 = mysql_query('UPDATE table SET y=y+2 WHERE y>y');
if($query1 && $query2) {
mysql_query("COMMIT");
echo 'Save Done. All UPDATES done.';
} else {
mysql_query("ROLLBACK");
echo 'Error Save. All UPDATES reverted, and not done.';
}
There are various levels of transaction, but basically as per ACID properties you should expect that within a given transaction, all reads and updates are performed consistently meaning it will be kept in a valid state, but more importantly a transaction is isolated in that work being done in another transaction (thread) will not interfere with your transaction (your grouping of select & update SQL statements): this allows you to take a broad assumption that you are the only thread of execution within the system allowing you to commit that group of work (atomically) or roll it all back.
Each database may handle the semantics differently (some may lock rows or columns, some may re-order, some may serialize) but that's the beauty of a declarative database interface: you worry about the work you want to get done.
As stated, on MySQL InnoDB is transactional and will support what is mentioned above so ensure your tables are organized with InnoDB, other non-transactional engines (e.g. MyISAM) are not transactional and thus will force you to manage those transactional semantics (locking) manually.
One approach would be to lock the entire table:
LOCK TABLE `table` WRITE;
SELECT z FROM `table` WHERE id='$id';
UPDATE `table` SET x=x+2 WHERE x>z;
UPDATE `table` SET y=y+2 WHERE y>z;
UNLOCK TABLES;
This will prevent other sessions from writing, and reading, from the table table during the SELECTs and UPDATEs.
Whether this is an appropriate solution does depend on how appropriate it is for sessions to wait to read or write from the table.

MYSQL table locking with PHP

I have mysql table fg_stock. Most of the time concurrent access is happening in this table. I used this code but it doesn't work:
<?php
mysql_query("LOCK TABLES fg_stock READ");
$select=mysql_query("SELECT stock FROM fg_stock WHERE Item='$item'");
while($res=mysql_fetch_array($select))
{
$stock=$res['stock'];
$close_stock=$stock+$qty_in;
$update=mysql_query("UPDATE fg_stock SET stock='$close_stock' WHERE Item='$item' LIMIT 1");
}
mysql_query("UNLOCK TABLES");
?>
Is this okay?
"Most of the time concurrent access is happening in this table"
So why would you want to lock the ENTIRE table when it's clear you are attempting to access a specific row from the table (WHERE Item='$item')? Chances are you are running a MyISAM storage engine for the table in question, you should look into using the InnoDB engine instead, as one of it's strong points is that it supports row level locking so you don't need to lock the entire table.
Why do you need to lock your table anyway?????
mysql_query("UPDATE fg_stock SET stock=stock+$qty_in WHERE Item='$item'");
That's it! No need in locking the table and no need in unnecessary loop with set of queries. Just try to avoid SQL Injection by using intval php function on $qty_in (if it is an integer, of course), for example.
And, probably, time concurrent access is only happens due to non-optimized work with database, with the excessive number of queries.
ps: moreover, your example does not make any sense as mysql could update the same record all the time in the loop. You did not tell MySQL which record exactly do you want to update. Only told to update one record with Item='$item'. At the next iteration the SAME record could be updated again as MySQL does not know about the difference between already updated records and those that it did not touched yet.
http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html
mysql> LOCK TABLES real_table WRITE, temp_table WRITE;
mysql> INSERT INTO real_table SELECT * FROM temp_table;
mysql> DELETE FROM temp_table;
mysql> UNLOCK TABLES;
So your syntax is correct.
Also from another question:
Troubleshooting: You can test for table lock success by trying to work
with another table that is not locked. If you obtained the lock,
trying to write to a table that was not included in the lock statement
should generate an error.
You may want to consider an alternative solution. Instead of locking,
perform an update that includes the changed elements as part of the
where clause. If the data that you are changing has changed since you
read it, the update will "fail" and return zero rows modified. This
eliminates the table lock, and all the messy horrors that may come
with it, including deadlocks.
PHP, mysqli, and table locks?

Fastest way to fill a table

I am trying to find the fastest way to insert data into a table (data from a select)
I always clear the table:
TRUNCATE TABLE table;
Then I do this to insert the data:
INSERT INTO table(id,total) (SELECT id, COUNT(id) AS Total FROM table2 GROUP BY id);
Someone told me I shouldn't do this.
He said this would be much faster:
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey)) SELECT id, count(id) AS total FROM table2 GROUP BY id
Any ideas on this one?
I think my solution is cleaner, because I don't have to check for the table.
This will be ran in a cron job a few times a day
EDIT: I wasn't clear. The truncate is always ran. It's just the matter of the fastest why to insert all the data
I also think your solution is cleaner, plus the solution by "someone" looks to me to have some problems:
it does not actually delete old data that may be in the table
create table...select will create table columns with types based on what the select returns. That means changes in the table structure of table2 will propagate to table. That may or may not be what you want. It at least introduces an implicit coupling, which I find to be a bad idea.
As for performance, I see no reason why one should be faster than the other. So the usual advice applies: Choose the cleanest, most maintainable solution, test it, only optimize if performance is a problem :-).
Your solution would be my choice, the performance difference loss (if any, which I'm not sure because you don't drop/create the table and re-compute column type) is negligible and IMHO overweight cleanliness.
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey))
SELECT id, count(id) AS total
FROM table2
GROUP BY
id
This will not delete old values from the table.
If that's what you want, it will be faster indeed.
Perhaps something has been lost in the translation between your Someone and yourself. One possibility s/he might have been referring to is DROP/SELECT INTO vs TRUNCATE/INSERT.
I have heard that the latter is faster as it is minimally logged (but then again, what's the eventual cost of the DROP here?). I have no hard stats to back this up.
I agree with "sleske"s suggestion in asking you test it and optimize the solution yourself. DIY!
Every self respecting DB will give you the opportunity to rollback your transaction.
1. Rolling back your INSERT INTO... will require DB to keep track of every row inserted into the table
2. Rolling back the CREATE TABLE... is super easy for the DB - Simply get rid of the table.
Now, if you were designing & coding the DB, which would be faster? 1 or 2?
"someone"s suggestion DOES have merit especially if you are using Oracle.
Regards,
Shiva
I'm sure that any time difference is indistinguishable, but yours is IMHO preferable because it's one SQL statement rather than two; any change in your INSERT statement doesn't require more work on the other statement; and yours doesn't require the host to validate that your INSERT matches the fields in the table.
From the manual: Beginning with MySQL 5.1.32, TRUNCATE is treated for purposes of binary logging and replication as DROP TABLE followed by CREATE TABLE — that is, as DDL rather than DML. This is due to the fact that, when using InnoDB and other transactional storage engines where the transaction isolation level does not allow for statement-based logging (READ COMMITTED or READ UNCOMMITTED), the statement was not logged and replicated when using STATEMENT or MIXED logging mode.
You can simplify your insert to:
INSERT INTO table
( SELECT id, COUNT(id) FROM table2 GROUP BY id );

How to do monthly refresh of large DB tables without interrupting user access to them

I have four DB tables in an Oracle database that need to be rewritten/refreshed every week or every month. I am writing this script in PHP using the standard OCI functions, that will read new data in from XML and refresh these four tables. The four tables have the following properties
TABLE A - up to 2mil rows, one primary key (One row might take max 2K data)
TABLE B - up to 10mil rows, one foreign key pointing to TABLE A (One row might take max 1100 bytes of data)
TABLE C - up to 10mil rows, one foreign key pointing to TABLE A (One row might take max 1100 bytes of data)
TABLE D - up to 10mil rows, one foreign key pointing to TABLE A (One row might take max 120 bytes of data)
So I need to repopulate these tables without damaging the user experience. I obviously can't delete the tables and just repopulate them as it is a somewhat lengthy process.
I've considered just a big transaction where I DELETE FROM all of the tables and just regenerate them. I get a little concerned about the length of the transaction (don't know yet but it could take an hour or so).
I wanted to create temp table replicas of all of the tables and populate those instead. Then I could DROP the main tables and rename the temp tables. However you can't do the DROP and ALTER table statements within a transaction as they always do an auto commit. This should be able to be done quickly (four DROP and and four ALTER TABLE statements), but it can't guarantee that a user won't get an error within that short period of time.
Now, a combination of the two ideas, I'm considering doing the temp tables, then doing a DELETE FROM on all four original tables and then and INSERT INTO from the temp tables to repopulate the main tables. Since there are no DDL statements here, this would all work within a transaction. Then, however, I wondering if the memory it takes to process some 60 million records within a transaction is going to get me in trouble (this would be a concern for the first idea as well).
I would think this would be a common scenario. Is there a standard or recommended way of doing this? Any tips would be appreciated. Thanks.
You could have a synonym for each of your big tables. Create new incarnations of your tables, populate them, drop and recreate the synonyms, and finally drop your old tables. This has the advantage of (1) only one actual set of DML (the inserts) avoiding redo generation for your deletes and (2) the synonym drop/recreate is very fast, minimizing the potential for a "bad user experience".
Reminds me of a minor peeve of mine about Oracle's synonyms: why isn't there an ALTER SYNONYM command?
I'm assuming your users don't actually modify the data in these tables since it is deleted from another source every week, so it doesn't really matter if you lock the tables for a full hour. The users can still query the data, you just have to size you rollback segment appropriately. A simple DELETE+INSERT therefore should work fine.
Now if you want to speed things up AND if the new data has little difference with the previous data you could load the new data into temporary tables and updating the tables with the delta with a combination of MERGE+DELETE like this:
Setup:
CREATE TABLE a (ID NUMBER PRIMARY KEY, a_data CHAR(200));
CREATE GLOBAL TEMPORARY TABLE temp_a (
ID NUMBER PRIMARY KEY, a_data CHAR(200)
) ON COMMIT PRESERVE ROWS;
-- Load A
INSERT INTO a
(SELECT ROWNUM, to_char(ROWNUM) FROM dual CONNECT BY LEVEL <= 10000);
-- Load TEMP_A with extra rows
INSERT INTO temp_a
(SELECT ROWNUM + 100, to_char(ROWNUM + 100)
FROM dual
CONNECT BY LEVEL <= 10000);
UPDATE temp_a SET a_data = 'x' WHERE mod(ID, 1000) = 0;
This MERGE statement will insert the new rows and update the old rows only if they are different:
SQL> MERGE INTO a
2 USING (SELECT temp_a.id, temp_a.a_data
3 FROM temp_a
4 LEFT JOIN a ON (temp_a.id = a.id)
5 WHERE decode(a.a_data, temp_a.a_data, 1) IS NULL) temp_a
6 ON (a.id = temp_a.id)
7 WHEN MATCHED THEN
8 UPDATE SET a.a_data = temp_a.a_data
9 WHEN NOT MATCHED THEN
10 INSERT (id, a_data) VALUES (temp_a.id, temp_a.a_data);
Done
You will then need to delete the rows that aren't in the new set of data:
SQL> DELETE FROM a WHERE a.id NOT IN (SELECT temp_a.id FROM temp_a);
100 rows deleted
You would insert into A then into the child tables and deleting in reverse order.
Am I the only one (except Vincent) who would first test the simplest possible solution, i.e. DELETE/INSERT, before trying to build something more advanced?
Then, however, I wondering if the memory it takes to process some 60 million records within a transaction is going to get me in trouble (this would be a concern for the first idea as well).
Oracle manages memory quite well, it hasn't been written by a bunch of Java novices (oops it just came out of my mouth!). So the real question is, do you have to worry about the performance penalties of thrashing REDO and UNDO log files... In other words, build a performance test case and run it on your server and see how long it takes. During the DELETE / INSERT the system will be not as responsive as usual but other sessions can still perform SELECTs without any fears of deadlocks, memory leaks or system crashes. Hint: DB servers are usually disk-bound, so getting a proper RAID array is usually a very good investment.
On the other hand, if the performance is critical, you can select one of the alternative approaches described in this thread:
partitioning if you have the license
table renaming if you don't, but be mindful that DDLs on the fly can cause some side effects such as object invalidation, ORA-06508...
In Oracle your can partition your tables and indexes based on a Date or time column that way to remove a lot of data you can simply drop the partition instead of performing a delete command.
We used to use this to manage monthly archives of 100 Million+ records and not have downtime.
http://www.oracle.com/technology/oramag/oracle/06-sep/o56partition.html is a super handy page for learning about partitioning.
I assume that this refreshing activity is the only way of data changing in these tables, so that you don't need to worry about inconsistencies due to other writing processes during the load.
All that deleting and inserting will be costly in terms of undo usage; you also would exclude the option of using faster data loading techniques. For example, your inserts will go much, much faster if you insert into the tables with no indexes, then apply the indexes after the load is done. There are other strategies as well, but both of them preclude the "do it all in one transaction" technique.
Your second choice would be my choice - build the new tables, then rename the old ones to a dummy name, rename the temps to the new name, then drop the old tables. Since the renames are fast, you'd have a less than one second window when the tables were unavailable, and you'd then be free to drop the old tables at your leisure.
If that one second window is unacceptable, one method I've used in situations like this is to use an additional locking object - specifically, a table with a single row that users would be required to select from before they access the real tables, and that your load process could lock in exclusive mode before it it does the rename operation.
Your PHP script would use two connections to the db - one where you do the lock, the other where you do the loading, renaming and dropping. This way the implicit commits in the work connection won't terminate the lock in the other table.
So, in the script, you'd do something like:
Connection 1:
Create temp tables, load them, create new indexes
Connection 2:
LOCK TABLE Load_Locker IN SHARE ROW EXCLUSIVE MODE;
Connection 1:
Perform renaming swap of old & new tables
Connection 2:
Rollback;
Connection 1:
Drop old tables.
Meanwhile, your clients would issue the following command immediately after starting a transaction (or a series of selects):
LOCK TABLE Load_Locker IN SHARE MODE;
You can have as many clients locking the table this way - your process above will block behind them until they have all released the lock, at which point subsequent clients will block until you perform your operations. Since the only thing you're doing inside the context of the SHARE ROW EXCLUSIVE lock is renaming tables, your clients would only ever block for an instant. Additionally, putting this level of granularity allows you to control how long the clients would have a read consistent view of the old table; without it, if you had a client that did a series of reads that took some time, you might end up changing the tables mid-stream and wind up with weird results if the early queries pulled old data & the later queries pulled new data. Using SET TRANSACTION SET ISOLATION LEVEL READ ONLY would be another way of addressing this issue if you weren't using my approach.
The only real downside to this approach is that if your client read transactions take some time, you run the risk of other clients being blocked for longer than an instant, since any locks in SHARE MODE that occur after your load process issues its SHARE ROW EXCLUSIVE lock will block until the load process finishes its task. For example:
10:00 user 1 issues SHARE lock
10:01 user 2 issues SHARE lock
10:03 load process issues SHARE ROW EXCLUSIVE lock (and is blocked)
10:04 user 3 issues SHARE lock (and is blocked by load's lock)
10:10 user 1 releases SHARE
10:11 user 2 releases SHARE (and unblocks loader)
10:11 loader renames tables & releases SHARE ROW EXCLUSIVE (and releases user 3)
10:11 user 3 commences queries, after being blocked for 7 minutes
However, this is really pretty kludgy. Kinlan's solution of partitioning is most likely the way to go. Add an extra column to your source tables that contains a version number, partition your data based on that version, then create views that look like your current tables that only show data that shows the current version (determined by the value of a row in a "CurrentVersion" table). Then just do your load into the table, update your CurrentVersion table, and drop the partition for the old data.
Why not add a version column? That way you can add the new rows with a different version number. Create a view against the table that specifies the current version. After the new rows are added recompile the view with the new version number. When that's done, go back and delete the old rows.
What we do in some cases is have two versions of the tables, say SalesTargets1 and SalesTargets2 (an active and inactive one.) Truncate the records from the inactive one and populate it. Since no one but you uses the inactive one, there should be no locking issues or impact on the users while it is populating. Then have view that selcts all the information from the active table (it should be named what the current table is now, say SalesTargets in my example). Then to switch to the refreshed data, all you have to do is run an alter view statement.
Have you evaluated the size of the delta (of changes).
If the number of rows that get updated (as opposed to inserted) every time you put up a new rowset it not too high, then I think you should consider importing the new set of data into a set of staging tables and do an update-where-exists and insert-where-not-exists (UPSERT) solution and just refresh your indexes (ok ok indices).
Treat it like ETL.
I'm going with an upsert method here.
I added an additional "delete" column to each of the tables.
When I begin processing the feed, I set the delete field for every record to '1'.
Then I go through a serious of updates if the record exists, or inserts if it does not. For each of those inserts/updates, the delete field is then set to zero.
At the end of the process I delete all records that still have a delete value of '1'.
Thanks everybody for your answers. I found it very interesting/educational.

Categories