I am pretty new to database design and PostgreSQL in general, but I am aware of the general concept behind row versioning, transactions and exclusive locks in Postgres (e.g. this article gives a pretty good overview).
My current problem is that a) I am not sure why I get so many exclusive locks show up in my PG database log files and b) why these locks are happening at all.
I run PostgreSQL 10 (+ PostGIS extension) with about 300 Million rows over 5 tables (200GB). I have about 5 scripts (4x PHP and 1x Python Psycopg2) running 24/7 that make a lot of inserts (and DO UPDATE with COALESCE, in case the entry already exist). However, as far as I understand, PHP Postgres extension commits automatically after each SQL Query and in my Python Script, increasing commits does not significantly reduce locks. I have a couple of triggers that dynamically update rows, but as far as I can tell from the log files they are not the cause of locks. It should generally be very rare that two or more of my scripts insert/update the same row at the same time.
This is an example log entry:
2018-01-31 01:04:02 CET [808]: [258-1] user=user1,db=maindb,app=[unknown],client=::1 LOG: process 808 still waiting for ExclusiveLock on page 0 of relation 26889 of database 16387 after 1015.576 ms
2018-01-31 01:04:02 CET [808]: [259-1] user=user1,db=maindb,app=[unknown],client=::1 DETAIL: Process holding the lock: 680. Wait queue: 1728, 152, 808.
2018-01-31 01:04:02 CET [808]: [260-1] user=user1,db=maindb,app=[unknown],client=::1 STATEMENT:
INSERT INTO "table1" (...)
VALUES (...)
ON CONFLICT (...)
DO UPDATE SET
...;
I have similar log entries about every 2-3 Minutes. Are they problematic? What do they mean exactly, are the locks finally resolved or is the data of the transaction lost? There is no log entry that states that locks are resolved or updates are finally committed to the database.
The second type of frequent log entry is similar to this:
2018-01-31 07:22:16 CET [2504]: [16384-1] user=,db=,app=,client= LOG: checkpoint complete: wrote 9999 buffers (3.8%); 0 WAL file(s) added, 0 removed, 7 recycled; write=269.842 s, sync=0.218 s, total=270.123 s; sync files=85, longest=0.054 s, average=0.002 s; distance=66521 kB, estimate=203482 kB
2018-01-31 07:22:46 CET [2504]: [16385-1] user=,db=,app=,client= LOG: checkpoint starting: time
Does this mean Auto-Vaccum or Auto-Commit that resolves all locks?
My general question: should I be concerned and do something or simply leave things as they are?
After a while, I found out what is causing these locks and also how to solve them. All Exclusive locks happened on one relation in the database:
...ExclusiveLock on page 0 of relation 26889 of database...
What is 26889?
SELECT relname FROM pg_class WHERE OID=26889
Result: idx_post_hashtags
It was all caused by a GIN index over a particular column with Array (Text). Furthermore, this GIN index was useless, as the array had a variable length and looking up any specific array value did not benefit from the index. I dropped it: all Exclusive Locks gone!
Reading the logs carefully really helps.
Related
I'm running a PHP script that searches through a relatively large MySQL instance with a table with millions of rows to find terms like "diabetes mellitus" in a column description that has a full text index on it. However, after one day I'm only through a couple hundred queries so it seems like my approach is never going to work. The entries in the description column are on average 1000 characters long.
I'm trying to figure out my next move and I have a few questions:
My MySQL table has unnecessary columns in it that aren't being queried. Will remove those affect performance?
I assume running this locally rather than on RDS will dramatically increase performance? I have a decent macbook, but I chose RDS since cost isn't an issue, and I tried to run on an instance that was better than the my Macbook.
Would using a compiled language like Go rather than PHP do more than the 5-10x boost people report in test examples? That is, given my task is there any reason to think a static language would produce 100X or more speed improvements?
Should I put the data in a text or CSV file rather than MySQL? Is using MySQL just causing unnecessary overhead?
This is the query:
SELECT id
FROM text_table
WHERE match(description) against("+diabetes +mellitus" IN BOOLEAN MODE);
Here's the line of output of EXPLAIN for the query, showing the optimizer is utilizing the FULLTEXT index:
1 SIMPLE text_table fulltext idx idx 0 NULL 1 Using where
The RDS instance is db.m4.10xlarge which has 160GB of RAM. The InnoDB buffer pool is typically about 75% of RAM on an RDS instance, which make it 120GB.
The text_table status is:
Name: text_table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 26000630
Avg_row_length: 2118
Data_length: 55079485440
Max_data_length: 0
Index_length: 247808
Data_free: 6291456
Auto_increment: 29328568
Create_time: 2018-01-12 00:49:44
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
This indicates the table has about 26 million rows, and the size of data and indexes is 51.3GB, but this doesn't include the FT index.
For the size of the FT index, query:
SELECT stat_value * ##innodb_page_size
FROM mysql.innodb_index_stats
WHERE table_name='text_table'
AND index_name = 'FTS_DOC_ID_INDEX'
AND stat_name='size'
The size of the FT index is 480247808.
Following up on comments above about concurrent queries.
If the query is taking 30 seconds to execute, then the programming language you use for the client app won't make any difference.
I'm a bit skeptical that the query is really taking 1 to 30 seconds to execute. I've tested MySQL fulltext search, and I found a search runs in under 1 second even on my laptop. See my presentation https://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
It's possible that it's not the query that's taking so long, but it's the code you have written that submits the queries. What else is your code doing?
How are you measuring the query performance? Are you using MySQL's query profiler? See https://dev.mysql.com/doc/refman/5.7/en/show-profile.html This will help isolate how long it takes MySQL to execute the query, so you can compare to how long it takes for the rest of your PHP code to run.
Using PHP is going to be single-threaded, so you are running one query at a time, serially. The RDS instance you are using has 40 CPU cores, so you should be able to many concurrent queries at a time. But each query would need to be run by its own client.
So one idea would be to split your input search terms into at least 40 subsets, and run your PHP search code against each respective subset. MySQL should be able to run the concurrent queries fine. Perhaps there will be a slight overhead, but this will be more than compensated for by the parallel execution.
You can split your search terms manually into separate files, and then run your PHP script with each respective file as the input. That would be a straightforward way of solving this.
But to get really professional, learn to use a tool like GNU parallel to run the 40 concurrent processes and split your input over these processes automatically.
The PHP Documentation says:
If you've never encountered transactions before, they offer 4 major
features: Atomicity, Consistency, Isolation and Durability (ACID). In
layman's terms, any work carried out in a transaction, even if it is
carried out in stages, is guaranteed to be applied to the database
safely, and without interference from other connections, when it is
committed.
QUESTION:
Does this mean that I can have two separate php scripts running transactions simultaneously without them interfering with one another?
ELABORATING ON WHAT I MEAN BY "INTERFERING":
Imagine we have the following employees table:
__________________________
| id | name | salary |
|------+--------+----------|
| 1 | ana | 10000 |
|------+--------+----------|
If I have two scripts with similar/same code and they run at the exact same time:
script1.php and script2.php (both have the same code):
$conn->beginTransaction();
$stmt = $conn->prepare("SELECT * FROM employees WHERE name = ?");
$stmt->execute(['ana']);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
$salary = $row['salary'];
$salary = $salary + 1000;//increasing salary
$stmt = $conn->prepare("UPDATE employees SET salary = {$salary} WHERE name = ?");
$stmt->execute(['ana']);
$conn->commit();
and assuming the sequence of events is as follows:
script1.php selects data
script2.php selects data
script1.php updates data
script2.php updates data
script1.php commit() happens
script2.php commit() happens
What would the resulting salary of ana be in this case?
Would it be 11000? And would this then mean that 1 transaction will overlap the other because the information was obtained before either commit happened?
Would it be 12000? And would this then mean that regardless of the order in which data was updated and selected, the commit() function forced these to happen individually?
Please feel free to elaborate as much as you want on how transactions and separate scripts can interfere (or don't interfere) with one another.
You are not going to find the answer in php documentation because this has nothing to do with php or pdo.
Innodb table engine in mysql offers 4 so-called isolation levels in line with the sql standard. The isolation levels in conjunction with blocking / non-blocking reads will determine the result of the above example. You need to understand the implications of the various isolation levels and choose the appropriate one for your needs.
To sum up: if you use serialisable isolation level with autocommit turned off, then the result will be 12000. In all other isolation levels and serialisable with autocommit enabled the result will be 11000. If you start using locking reads, then the result could be 12000 under all isolation levels.
Judging by the given conditions (a solitary DML statement), you don't need a transaction here, but a table lock. It's a very common confusion.
You need a transaction if you need to make sure that ALL your DML statements were performed correctly or weren't performed at all.
Means
you don't need a transaction for any number of SELECT queries
you don't need a transaction if only one DML statement is performed
Although, as it was noted in the excellent answer from Shadow, you may use a transaction here with appropriate isolation level, it would be rather confusing. What you need here is table locking. InnoDB engine lets you lock particular rows instead of locking the entire table and thus should be preferred.
In case you want the salary to be 1200 - then use table locks.
Or - a simpler way - just run an atomic update query:
UPDATE employees SET salary = salary + 1000 WHERE name = ?
In this case all salaries will be recorded.
If your goal is different, better express it explicitly.
But again: you have to understand that transactions in general has nothing to do with separate scripts execution. Regarding your topic of race condition you are interested not in transactions but in table/row locking. This is a very common confusion, and you better learn it straight:
a transaction is to ensure that a set of DML queries within one script were executed successfully.
table/row locking is to ensure that other script executions won't interfere.
The only topic where transactions and locking interfere is a deadlock, but again - it's only in case when a transaction is using locking.
Alas, the "without interference" needs some help from the programmer. It needs BEGIN and COMMIT to define the extent of the 'transaction'. And...
Your example is inadequate. The first statement needs SELECT ... FOR UPDATE. This tells the transaction processing that there is likely to be an UPDATE coming for the row(s) that the SELECT fetches. That warning is critical to "preventing interference". Now the timeline reads:
script1.php BEGINs
script2.php BEGINs
script1.php selects data (FOR UPDATE)
script2.php selects data is blocked, so it waits
script1.php updates data
script1.php commit() happens
script2.php selects data (and will get the newly-committed value)
script2.php updates data
script2.php commit() happens
(Note: This is not a 'deadlock', just a 'wait'.)
I am writing an application which shall track the financial transactions (as in a bank), to maintain the balance amount. I am using Denormalizing technique to keep the performance in check(and not have to calculate the balance at runtime) as discussed Here and Here.
Now, I am facing a Race Condition if two people simultaneously did a transaction related to same entity, the balance calculation as discussed above, shall return/set inconsistent data, as discussed Here and Here, And also as suggested in the answers..
I am going for Mysql Transactions.
Now My question is,
What Happens to the other similar queries when a mysql Transaction is underway?
I wish to know if other transactions fail as in Error 500 or are they queued and executed, once the first transaction finishes.
I also need to know how to deal with the either result from the php point of view.
And since these transactions are going to be an element of a larger set of operation in php with many prior insert queries, should I also device a mechanism to roll-back those successfully executed queries too, since I want Atomicity not only as in individual queries but also as in whole operation logic(php).
Edit 1 :-
Also, if former is the case, should I check for the error, and wait a few second and try that specific transaction query again after some time?
Edit 2 :-
Also Mysql Triggers is not an option for me.
With code like this, there is no race condition. Instead, one transaction could be aborted (ROLLBACK'd).
BEGIN;
SELECT balance FROM Accounts WHERE acct_id = 123 FOR UPDATE;
if balance < 100, then ROLLBACK and exit with "insufficient funds"
UPDATE Accounts SET balance = balance - 100 WHERE acct_id = 123;
UPDATE Accounts SET balance = balance + 100 WHERE acct_id = 456;
COMMIT;
And check for errors at each step. If error, ROLLBACK and rerun the transaction. On the second time, it will probably succeed. If not, then abort -- it is probably a logic bug. Only then should you give http error 500.
When two users "simultaneously" try to do similar transactions, then one of these things will happen:
The 'second' user will be stalled until the first finishes.
If that stall is more than innodb_lock_wait_timeout, your queries are too slow or something else. You need to fix the system.
If you get a "Deadlock", there may be ways to repair the code. Meanwhile, simply restarting the transaction is likely to succeed.
But it will not mess up the data (assuming the logic is correct).
There is no need to "wait a second" -- unless you have transactions that take "a second". Such would be terribly slow for this type of code.
What I am saying works for "real money", non-real money, non-money, etc.; whatever you need to tally carefully.
I have a website running PHP+MySQL. It is a multiuser system and most of the MySQL tables are MyISAM-based.
The following situation got me puzzled for the last few hours:
I have two (concurrent) users A,B. Both of them will do this:
Perform a Read Operation on Table 1
Perform a Write Operation on another Table 2 (only if the previous Read operation will return a distinct result, e.g. STATUS="OK")
B is a little delayed towards A.
So it will occur like this:
User A performs a read on Table 1 and sees STATUS="OK".
(User A Schedules Write on Table 2)
User B performs a read on Table 1 and still sees STATUS="OK".
User A performs Write on Table 2 (resulting in STATUS="NOT OK" anymore)
User B performs Write on Table 2 (assuming STATUS="OK")
I think I could prevent this if Reading Table 1 and Writing to Table 2 were defined as a critical section and would be executed atomically. I know this works perfectly fine in Java with threads etc., however in PHP there is no thread communication, as far as I know.
So the solution to my problem must be database-related, right?
Any ideas?
Thanks a lot!
The Right Way: Use InnoDB and transactions.
The Wrong-But-Works Way: Use the GET_LOCK() MySQL function to obtain an exclusive named lock before performing the database operations. When you're don, release the lock with RELEASE_LOCK(). Since only one client can own a particular lock, this will ensure that there's never more than one instance of the script in the "critical section" at the same time.
Pseudo-code:
SELECT GET_LOCK('mylock', 10);
If the query returned "1":
//Read from Table 1
//Update Table 2
SELECT RELEASE_LOCK('mylock');
Else:
//Another instance has been holding the lock for > 10 seconds...
I sometimes gets mysql deadlock errors saying:
'Deadlock found when trying to get lock; try restarting transaction'
I have a queues table where multiple php processes are running simultaneously selecting rows from the table. However, for each process i want it grab a unique batch of rows each fetch so i don't have any overlapping rows being selected.
so i run this query: (which is the query i get the deadlock error on)
$this->db->query("START TRANSACTION;");
$sql = " SELECT mailer_queue_id
FROM mailer_queues
WHERE process_id IS NULL
LIMIT 250
FOR UPDATE;";
...
$sql = "UPDATE mailer_queues
SET process_id = 33044,
status = 'COMPLETED'
WHERE mailer_queue_id
IN (1,2,3...);";
...
if($this->db->affected_rows() > 0) {
$this->db->query("COMMIT;");
} else{
$this->db->query("ROLLBACK;");
}
I'm also:
inserting rows to the table (with no transactions/locks) at the same time
updating rows in the table (with no transactions/locks) at the same time
deleting the rows from the table (with no transactions/locks) at the same time
As well, my updates and deletes only update and delete rows where they have a process_id assigned to them ...and where i perform my transactions that "SELECT rows ... FOR UPDATE" are where the process_id = null. In theory they should never be overlapping.
I'm wondering if there is a proper way to avoid these deadlocks?
Can a deadlock occur because one transaction is locking the table for too long while its selecting/update and the another process is trying to perform the same transaction and just timesout?
any help is much appreciated
Deadlocks occur when two or more processes requests locks in such a way that the resources being locked overlap, but occur in different orders, so that each process is waiting for a resource that's locked by another process, and that other process is waiting for a lock that the original process has open.
In real world terms, consider a construction site: You've got one screwdriver, and one screw. Two workers need to drive in a screw. Worker #1 grabs the screwdriver, and worker #2 grabs the screw. Worker #1 goes to grab the screw as well, but can't, because it's being held by worker #2. Worker #2 needs the screwdriver, but can't get it because worker #1 is holding it. So now they're deadlocked, unable to proceed, because they've got 1 of the 2 resources they need, and neither of them will be polite and "step back".
Given that you've got out-of-transaction changes occurring, it's possible that one (or more) of your updates/deletes are overlapping the locked areas you're reserving inside the transactions.
You might want to try LOCK TABLES before starting the transaction, thereby assuring you have explicit control over the tables. The lock will wait until all activity on the particular tables has completed.
I think everyone on net has explained very well about the deadlock.
Mysql provide very good log to check all the last dead lock happened and which
queries were stuck at that time.
Check this mysql documentation page and search for LATEST DETECTED DEADLOCK
its a great logs, helped finding many subtle deadlocks.