Combine many MySQL queries with logic into data file - php

Background:
I am parsing a 330 meg xml file into a DB (netflix catalog) using PHP script from the console.
I can successfully add about 1,500 titles every 3 seconds until i addd the logic to add actors, genre and formats. These are separate tables linked by an associative table.
right now I have to run many, many queries for each title, in this order ( i truncate all tables first, to eliminate old titles, genres, etc)
add new title to 'titles' and capture insert id
check actor table for exising actor
if present, get id, if not insert
actor and get insert id
insert title id and actor id into
associative table
(steps 2-4 are repeated for genres too)
This drops my speed don to about 10 per 3 seconds. which would take eternitty to add the ~250,00 titles.
so how would I combine the 4 queries into a single query, without adding duplicate actors or genres
My goal is to just write all queries into a data file, and do a bulk insert.
I started by writing all associative queries into a data file, but it didn't do much for performance.
I start by inserting th etitle, and saving ID
function insertTitle($nfid, $title, $year){
$query="INSERT INTO ".$this->titles_table." (nf_id, title, year ) VALUES ('$nfid','$title','$year')";
mysql_query($query);
$this->updatedTitleCount++;
return mysql_insert_id();
}
that is then used in conjunction with each actor's name to create the association
function linkActor($value, $title_id){
//check if we already know value
$query="SELECT * FROM ".$this->persons_table." WHERE person = '$value' LIMIT 0,1";
//echo "<br>".$query."<br>";
$result=mysql_query($query);
if($result && mysql_num_rows($result) != 0){
while ($row = mysql_fetch_assoc($result)) {
$value_id=$row['id'];
}
}else{
//no value known, add to persons table
$query="INSERT INTO ".$this->persons_table." (person) VALUES ('$value')";
mysql_query($query);
$value_id=mysql_insert_id();
}
//echo "linking title:".$title_id." with rel:".$value_id;
$query = " INSERT INTO ".$this->title_persons_table." (title_id,person_id) VALUE ('$title_id','$value_id');";
//mysql_query($query);
//write query to data file to be read in bulk style
fwrite($this->fh, $query);
}

This is a perfect opportunity for using prepared statements.
Also take a look at the tips at http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html, e.g.
To speed up INSERT operations that are performed with multiple statements for nontransactional tables, lock your tables
You can also decrease the number of queries. E.g. you can eliminate the SELECT...FROM persons_table to obtain the id by using INSERT...ON DUPLICATE KEY UPDATE and LAST_INSERT_ID(expr).
( sorry, running out of time for a lengthy description, but I wrote an example before noticing the time ;-) If this answer isn't downvoted too much I can hand it in later. )
class Foo {
protected $persons_table='personsTemp';
protected $pdo;
protected $stmts = array();
public function __construct($pdo) {
$this->pdo = $pdo;
$this->stmts['InsertPersons'] = $pdo->prepare('
INSERT INTO
'.$this->persons_table.'
(person)
VALUES
(:person)
ON DUPLICATE KEY UPDATE
id=LAST_INSERT_ID(id)
');
}
public function getActorId($name) {
$this->stmts['InsertPersons']->execute(array(':person'=>$name));
return $this->pdo->lastInsertId('id');
}
}
$pdo = new PDO("mysql:host=localhost;dbname=test", 'localonly', 'localonly');
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// create a temporary/test table
$pdo->exec('CREATE TEMPORARY TABLE personsTemp (id int auto_increment, person varchar(32), primary key(id), unique key idxPerson(person))');
// and fill in some data
foreach(range('A', 'D') as $p) {
$pdo->exec("INSERT INTO personsTemp (person) VALUES ('Person $p')");
}
$foo = new Foo($pdo);
foreach( array('Person A', 'Person C', 'Person Z', 'Person B', 'Person Y', 'Person A', 'Person Z', 'Person A') as $name) {
echo $name, ' -> ', $foo->getActorId($name), "\n";
}
prints
Person A -> 1
Person C -> 3
Person Z -> 5
Person B -> 2
Person Y -> 6
Person A -> 1
Person Z -> 5
Person A -> 1
(someone might want to start a discussion whether a getXYZ() function should perform an INSERT or not ...but not me, not now....)

Your performance is glacially slow; something is very Wrong. I assume the following
You run your dedicated, otherwise-idle database server on respectable hardware
You have tuned it to some extent (i.e. at least configure it to use a few gigs of ram properly) - engine-specific optimisations will be required
You may be being stung by doing lots of tiny operations with autocommit on; this is a mistake as it generates an unreasonable number of disc IO operations. You should do a large amount of work (100, 1000 records etc) in a single transaction then commit it.
The lookups may be slowing things down because of the simple overhead of doing the queries (the queries themselves will be really easy as you'll have an index on actor name).
I also question your method of assuming that no two actors have the same name - surely your original database contains a unique actor ID, so you don't get them mixed up?

Can you use a language other than PHP? If not, are you running this as a PHP stand-alone script or through a webserver? The webserver is probably adding a lot of overhead you don't need.
I do something very similar at work, using Python, and can insert a couple thousand rows (with associative table lookups) per second on your standard 3.4 GHz, 3GB RAM, machine. MySQL database isn't hosted locally but within the LAN.

Related

Race condition when inserting in table

I have a rounds mysql table (innodb) that is used to track all games that given user plays. It has the following columns: id (int), userId (int), gameId (int), status (int).
Id is table's primary key, userId and gameId represent foreign keys to other tables and in status the int values represent 3 different states if the game round - in progress, finished, error.
One user is not allowed to proceed with new game round until his previous round is still in progress.
My code does the following:
SELECT id FROM rounds WHERE userId = <user> AND gameId = <game> AND status = <in progress> LIMIT 1
If the select returns result, an error is thrown. Otherwise I'm creating new round:
INSERT INTO rounds (userId, gameId, status) VALUES (<user>, <game>, <in progress>);
There is no other code between the select and insert and most of the time everything seems to be working correctly and creating new game round, while the previous is still in progress, results in error.
But when I test the code for concurrency and performance under big load, there are some rounds that manage to get inserted simultaneously. (I'm using simple node script that send 200 requests asynchronously.)
Do you have any ideas how should I alter the code so that in all cases I have maximum 1 active round?
I seem to be stuck on this problem and I know there must be simple solution buy I just can't see it. :(
Any help is greatly appreciated!
PS: I've tried using INSERT ... INTO rounds SELECT ... FROM ROUNDS WHERE NOT EXISTS (SELECT id FROM rounds WHERE userId=<user> AND gameId=<game> AND status=<in progress>) but this results in deadlocks..
EDIT:
The code looks something like this:
In game:
public function play() {
$gamesInProgress = $this->repo->getGamesInProgress($this->user->getId(), $this->id, self::STATUS_IN_PROGRESS);
if($gamesInProgress) throw \Exception('Only one active game is allowed.');
$this->createGame();
// some other code
}
In Repo:
public function checkForGamesInProgress($userId, $gameId, $status) {
$stmt = $this->dbh->prepare('SELECT `id` FROM `rounds` WHERE `userId`=:userId AND `gameId`=:gameId AND `status`=:status LIMIT 1');
$stmt->prepare([
'userId' => $userId,
'gameId' => $gameId,
'status' => $status
]);
return $stmt->fetchColumn();
}
It would be a good practice to use "composite key". This way the database ensured that no duplicated rows are inserted. But then you will have to handle the database error when (accidentally) trying to insert the duplicated entry.
I know that you have tried this, but what happens when you run this?
INSERT INTO rounds (userId, gameId, status)
SELECT <user>, <game>, '<in progress>' FROM rounds WHERE NOT EXISTS(
SELECT id FROM rounds WHERE userId = <user> AND gameId = <game> AND status = '<in progress>' LIMIT 1)

pdo update multiple rows in one query [duplicate]

I know that you can insert multiple rows at once, is there a way to update multiple rows at once (as in, in one query) in MySQL?
Edit:
For example I have the following
Name id Col1 Col2
Row1 1 6 1
Row2 2 2 3
Row3 3 9 5
Row4 4 16 8
I want to combine all the following Updates into one query
UPDATE table SET Col1 = 1 WHERE id = 1;
UPDATE table SET Col1 = 2 WHERE id = 2;
UPDATE table SET Col2 = 3 WHERE id = 3;
UPDATE table SET Col1 = 10 WHERE id = 4;
UPDATE table SET Col2 = 12 WHERE id = 4;
Yes, that's possible - you can use INSERT ... ON DUPLICATE KEY UPDATE.
Using your example:
INSERT INTO table (id,Col1,Col2) VALUES (1,1,1),(2,2,3),(3,9,3),(4,10,12)
ON DUPLICATE KEY UPDATE Col1=VALUES(Col1),Col2=VALUES(Col2);
Since you have dynamic values, you need to use an IF or CASE for the columns to be updated. It gets kinda ugly, but it should work.
Using your example, you could do it like:
UPDATE table SET Col1 = CASE id
WHEN 1 THEN 1
WHEN 2 THEN 2
WHEN 4 THEN 10
ELSE Col1
END,
Col2 = CASE id
WHEN 3 THEN 3
WHEN 4 THEN 12
ELSE Col2
END
WHERE id IN (1, 2, 3, 4);
The question is old, yet I'd like to extend the topic with another answer.
My point is, the easiest way to achieve it is just to wrap multiple queries with a transaction. The accepted answer INSERT ... ON DUPLICATE KEY UPDATE is a nice hack, but one should be aware of its drawbacks and limitations:
As being said, if you happen to launch the query with rows whose primary keys don't exist in the table, the query inserts new "half-baked" records. Probably it's not what you want
If you have a table with a not null field without default value and don't want to touch this field in the query, you'll get "Field 'fieldname' doesn't have a default value" MySQL warning even if you don't insert a single row at all. It will get you into trouble, if you decide to be strict and turn mysql warnings into runtime exceptions in your app.
I made some performance tests for three of suggested variants, including the INSERT ... ON DUPLICATE KEY UPDATE variant, a variant with "case / when / then" clause and a naive approach with transaction. You may get the python code and results here. The overall conclusion is that the variant with case statement turns out to be twice as fast as two other variants, but it's quite hard to write correct and injection-safe code for it, so I personally stick to the simplest approach: using transactions.
Edit: Findings of Dakusan prove that my performance estimations are not quite valid. Please see this answer for another, more elaborate research.
Not sure why another useful option is not yet mentioned:
UPDATE my_table m
JOIN (
SELECT 1 as id, 10 as _col1, 20 as _col2
UNION ALL
SELECT 2, 5, 10
UNION ALL
SELECT 3, 15, 30
) vals ON m.id = vals.id
SET col1 = _col1, col2 = _col2;
All of the following applies to InnoDB.
I feel knowing the speeds of the 3 different methods is important.
There are 3 methods:
INSERT: INSERT with ON DUPLICATE KEY UPDATE
TRANSACTION: Where you do an update for each record within a transaction
CASE: In which you a case/when for each different record within an UPDATE
I just tested this, and the INSERT method was 6.7x faster for me than the TRANSACTION method. I tried on a set of both 3,000 and 30,000 rows.
The TRANSACTION method still has to run each individually query, which takes time, though it batches the results in memory, or something, while executing. The TRANSACTION method is also pretty expensive in both replication and query logs.
Even worse, the CASE method was 41.1x slower than the INSERT method w/ 30,000 records (6.1x slower than TRANSACTION). And 75x slower in MyISAM. INSERT and CASE methods broke even at ~1,000 records. Even at 100 records, the CASE method is BARELY faster.
So in general, I feel the INSERT method is both best and easiest to use. The queries are smaller and easier to read and only take up 1 query of action. This applies to both InnoDB and MyISAM.
Bonus stuff:
The solution for the INSERT non-default-field problem is to temporarily turn off the relevant SQL modes: SET SESSION sql_mode=REPLACE(REPLACE(##SESSION.sql_mode,"STRICT_TRANS_TABLES",""),"STRICT_ALL_TABLES",""). Make sure to save the sql_mode first if you plan on reverting it.
As for other comments I've seen that say the auto_increment goes up using the INSERT method, this does seem to be the case in InnoDB, but not MyISAM.
Code to run the tests is as follows. It also outputs .SQL files to remove php interpreter overhead
<?php
//Variables
$NumRows=30000;
//These 2 functions need to be filled in
function InitSQL()
{
}
function RunSQLQuery($Q)
{
}
//Run the 3 tests
InitSQL();
for($i=0;$i<3;$i++)
RunTest($i, $NumRows);
function RunTest($TestNum, $NumRows)
{
$TheQueries=Array();
$DoQuery=function($Query) use (&$TheQueries)
{
RunSQLQuery($Query);
$TheQueries[]=$Query;
};
$TableName='Test';
$DoQuery('DROP TABLE IF EXISTS '.$TableName);
$DoQuery('CREATE TABLE '.$TableName.' (i1 int NOT NULL AUTO_INCREMENT, i2 int NOT NULL, primary key (i1)) ENGINE=InnoDB');
$DoQuery('INSERT INTO '.$TableName.' (i2) VALUES ('.implode('), (', range(2, $NumRows+1)).')');
if($TestNum==0)
{
$TestName='Transaction';
$Start=microtime(true);
$DoQuery('START TRANSACTION');
for($i=1;$i<=$NumRows;$i++)
$DoQuery('UPDATE '.$TableName.' SET i2='.(($i+5)*1000).' WHERE i1='.$i);
$DoQuery('COMMIT');
}
if($TestNum==1)
{
$TestName='Insert';
$Query=Array();
for($i=1;$i<=$NumRows;$i++)
$Query[]=sprintf("(%d,%d)", $i, (($i+5)*1000));
$Start=microtime(true);
$DoQuery('INSERT INTO '.$TableName.' VALUES '.implode(', ', $Query).' ON DUPLICATE KEY UPDATE i2=VALUES(i2)');
}
if($TestNum==2)
{
$TestName='Case';
$Query=Array();
for($i=1;$i<=$NumRows;$i++)
$Query[]=sprintf('WHEN %d THEN %d', $i, (($i+5)*1000));
$Start=microtime(true);
$DoQuery("UPDATE $TableName SET i2=CASE i1\n".implode("\n", $Query)."\nEND\nWHERE i1 IN (".implode(',', range(1, $NumRows)).')');
}
print "$TestName: ".(microtime(true)-$Start)."<br>\n";
file_put_contents("./$TestName.sql", implode(";\n", $TheQueries).';');
}
UPDATE table1, table2 SET table1.col1='value', table2.col1='value' WHERE table1.col3='567' AND table2.col6='567'
This should work for ya.
There is a reference in the MySQL manual for multiple tables.
Use a temporary table
// Reorder items
function update_items_tempdb(&$items)
{
shuffle($items);
$table_name = uniqid('tmp_test_');
$sql = "CREATE TEMPORARY TABLE `$table_name` ("
." `id` int(10) unsigned NOT NULL AUTO_INCREMENT"
.", `position` int(10) unsigned NOT NULL"
.", PRIMARY KEY (`id`)"
.") ENGINE = MEMORY";
query($sql);
$i = 0;
$sql = '';
foreach ($items as &$item)
{
$item->position = $i++;
$sql .= ($sql ? ', ' : '')."({$item->id}, {$item->position})";
}
if ($sql)
{
query("INSERT INTO `$table_name` (id, position) VALUES $sql");
$sql = "UPDATE `test`, `$table_name` SET `test`.position = `$table_name`.position"
." WHERE `$table_name`.id = `test`.id";
query($sql);
}
query("DROP TABLE `$table_name`");
}
Why does no one mention multiple statements in one query?
In php, you use multi_query method of mysqli instance.
From the php manual
MySQL optionally allows having multiple statements in one statement string. Sending multiple statements at once reduces client-server round trips but requires special handling.
Here is the result comparing to other 3 methods in update 30,000 raw. Code can be found here which is based on answer from #Dakusan
Transaction: 5.5194580554962
Insert: 0.20669293403625
Case: 16.474853992462
Multi: 0.0412278175354
As you can see, multiple statements query is more efficient than the highest answer.
If you get error message like this:
PHP Warning: Error while sending SET_OPTION packet
You may need to increase the max_allowed_packet in mysql config file which in my machine is /etc/mysql/my.cnf and then restart mysqld.
There is a setting you can alter called 'multi statement' that disables MySQL's 'safety mechanism' implemented to prevent (more than one) injection command. Typical to MySQL's 'brilliant' implementation, it also prevents user from doing efficient queries.
Here (http://dev.mysql.com/doc/refman/5.1/en/mysql-set-server-option.html) is some info on the C implementation of the setting.
If you're using PHP, you can use mysqli to do multi statements (I think php has shipped with mysqli for a while now)
$con = new mysqli('localhost','user1','password','my_database');
$query = "Update MyTable SET col1='some value' WHERE id=1 LIMIT 1;";
$query .= "UPDATE MyTable SET col1='other value' WHERE id=2 LIMIT 1;";
//etc
$con->multi_query($query);
$con->close();
Hope that helps.
You can alias the same table to give you the id's you want to insert by (if you are doing a row-by-row update:
UPDATE table1 tab1, table1 tab2 -- alias references the same table
SET
col1 = 1
,col2 = 2
. . .
WHERE
tab1.id = tab2.id;
Additionally, It should seem obvious that you can also update from other tables as well. In this case, the update doubles as a "SELECT" statement, giving you the data from the table you are specifying. You are explicitly stating in your query the update values so, the second table is unaffected.
You may also be interested in using joins on updates, which is possible as well.
Update someTable Set someValue = 4 From someTable s Inner Join anotherTable a on s.id = a.id Where a.id = 4
-- Only updates someValue in someTable who has a foreign key on anotherTable with a value of 4.
Edit: If the values you are updating aren't coming from somewhere else in the database, you'll need to issue multiple update queries.
No-one has yet mentioned what for me would be a much easier way to do this - Use a SQL editor that allows you to execute multiple individual queries. This screenshot is from Sequel Ace, I'd assume that Sequel Pro and probably other editors have similar functionality. (This of course assumes you only need to run this as a one-off thing rather than as an integrated part of your app/site).
And now the easy way
update my_table m, -- let create a temp table with populated values
(select 1 as id, 20 as value union -- this part will be generated
select 2 as id, 30 as value union -- using a backend code
-- for loop
select N as id, X as value
) t
set m.value = t.value where t.id=m.id -- now update by join - quick
Yes ..it is possible using INSERT ON DUPLICATE KEY UPDATE sql statement..
syntax:
INSERT INTO table_name (a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE a=VALUES(a),b=VALUES(b),c=VALUES(c)
use
REPLACE INTO`table` VALUES (`id`,`col1`,`col2`) VALUES
(1,6,1),(2,2,3),(3,9,5),(4,16,8);
Please note:
id has to be a primary unique key
if you use foreign keys to
reference the table, REPLACE deletes then inserts, so this might
cause an error
I took the answer from #newtover and extended it using the new json_table function in MySql 8. This allows you to create a stored procedure to handle the workload rather than building your own SQL text in code:
drop table if exists `test`;
create table `test` (
`Id` int,
`Number` int,
PRIMARY KEY (`Id`)
);
insert into test (Id, Number) values (1, 1), (2, 2);
DROP procedure IF EXISTS `Test`;
DELIMITER $$
CREATE PROCEDURE `Test`(
p_json json
)
BEGIN
update test s
join json_table(p_json, '$[*]' columns(`id` int path '$.id', `number` int path '$.number')) v
on s.Id=v.id set s.Number=v.number;
END$$
DELIMITER ;
call `Test`('[{"id": 1, "number": 10}, {"id": 2, "number": 20}]');
select * from test;
drop table if exists `test`;
It's a few ms slower than pure SQL but I'm happy to take the hit rather than generate the sql text in code. Not sure how performant it is with huge recordsets (the JSON object has a max size of 1Gb) but I use it all the time when updating 10k rows at a time.
The following will update all rows in one table
Update Table Set
Column1 = 'New Value'
The next one will update all rows where the value of Column2 is more than 5
Update Table Set
Column1 = 'New Value'
Where
Column2 > 5
There is all Unkwntech's example of updating more than one table
UPDATE table1, table2 SET
table1.col1 = 'value',
table2.col1 = 'value'
WHERE
table1.col3 = '567'
AND table2.col6='567'
UPDATE tableName SET col1='000' WHERE id='3' OR id='5'
This should achieve what you'r looking for. Just add more id's. I have tested it.
UPDATE `your_table` SET
`something` = IF(`id`="1","new_value1",`something`), `smth2` = IF(`id`="1", "nv1",`smth2`),
`something` = IF(`id`="2","new_value2",`something`), `smth2` = IF(`id`="2", "nv2",`smth2`),
`something` = IF(`id`="4","new_value3",`something`), `smth2` = IF(`id`="4", "nv3",`smth2`),
`something` = IF(`id`="6","new_value4",`something`), `smth2` = IF(`id`="6", "nv4",`smth2`),
`something` = IF(`id`="3","new_value5",`something`), `smth2` = IF(`id`="3", "nv5",`smth2`),
`something` = IF(`id`="5","new_value6",`something`), `smth2` = IF(`id`="5", "nv6",`smth2`)
// You just building it in php like
$q = 'UPDATE `your_table` SET ';
foreach($data as $dat){
$q .= '
`something` = IF(`id`="'.$dat->id.'","'.$dat->value.'",`something`),
`smth2` = IF(`id`="'.$dat->id.'", "'.$dat->value2.'",`smth2`),';
}
$q = substr($q,0,-1);
So you can update hole table with one query

PDO last insert id Alternative?

I am moving data from one table to another via a INSERT INTO table1 SELECT * FROM table2 query. The data being moved contains information about employees (first name, last name...etc) as well as the path to that employee's resume. I'm now trying to split that information up into two different tables, one table for the employee info, and one table for the document (resume) info, linking the two by putting the employee ID in the document table. Both the employee ID and the document ID will be auto incremented PK values.
I understand that I can put these queries into a for loop and move one row at a time, grabbing the last insert id of the employee table before adding the document info to the document table in order to link the two. I am curious if there is a way to do this in one query, being able to take multiple rows from the original table, split up the info to be inserted into two new/different tables and and use the auto-generated id in the employee table as a value in the document table....hope this makes sense!
Sorry if I get this wrong but do you want to execute this query once with your current DB Tables?
And I guess both tables have the same amount of rows(and in order of each other)?
If you split those up you will get:
Employee table for example:
- employee_id(auto_increment)
- employee_firstname
- employee_lastname
- employee_document_id
- +whatever you want etc
Document table for example:
- document_id(auto_increment)
- document_name
- document_path
- document_employee_id
- +whatever you want etc.
If this is what you mean, than I think the following would work:
1: Setup PDO(The editor didn't work for me that's why ">")
<?php
$config['db'] = array(
'host' => 'host',
'username' => 'username',
'password' => 'password',
'dbname' => 'dbname'
);
$db = new PDO('mysql:host=' . $config['db']['host'] . ';dbname=' . $config['db']['dbname'], $config['db']['username'], $config['db']['password']);
?>
2: Setup insert queries
<?php
$select_query = "SELECT * FROM table1";
//$db is PDO example name
$select_all = $db->prepare($select_query);
$select_all->execute();
$count = $select_all->rowCount();
for(var i = 0; i =< $count; ++i) {
$insert_query1 = "INSERT INTO table1 (employee_firstname,
employee_lastname, employee_document_id)
VALUES(employee_firstnameValue, employee_lastnameValue,'"i"'";
$insert_query2 = "INSERT INTO table2 (document_name, document_path,
employee_id) VALUES(document_nameValue, document_pathValue, '"i"')"
$insert_table1 = $db->prepare($insert_query1);
$insert_table1->execute()
$insert_table2 = $db->prepare($insert_query2);
$insert_table2->execute()
}
?>
I think the above will work because you get an auto_increment starting at 1, en de ++i will occur every time. So the employee_document_id and the document_employee_id will both get ++i(which is 1) just like the auto_increment is(also 1)
But maybe this is to much thought.. Or not going to work in your model
Side notes:
Working with parameters is recommended in the query.
This is just a loose describing method which came up in my mind(maybe you can pick something up from here..)
EDIT: Another solution is to use a query like "SELECT MAX", but this is unsafe.

InnoDB only insert record if referenced id exists (without FOREIGN KEYS)

Foreign keys may be the best approach for this problem. However, I'm trying to learn about table locking/transactions, and so I'm hoping that we can ignore them for the moment.
Let's pretend that I have two tables in an InnoDB database: categories and jokes; and that I'm using PHP/MySQLi to do the work. The tables look like so:
CATEGORIES
id (int, primary, auto_inc) | category_name (varchar[64])
============================================================
1 knock, knock
JOKES
id (int, primary, auto_inc) | category_id (int) | joke_text (varchar[255])
=============================================================================
empty
Here are two functions, each of which is being called by a different connection, at the same time. The calls are: delete_category(1) and add_joke(1,"Interrupting cow. Interrup-MOOOOOOOO!")
function delete_category($category_id) {
// only delete the category if there are no jokes in it
$query = "SELECT id FROM jokes WHERE category_id = '$category_id'";
$result = $conn->query($query);
if ( !$result->num_rows ) {
$query = "DELETE FROM categories WHERE id = '$category_id'";
$result = $conn->query($query);
if ( $conn->affected_rows ) {
return true;
}
}
return false;
}
function add_joke($category_id,$joke_text) {
$new_id = -1;
// only add the joke if the category exists
$query = "SELECT id FROM categories WHERE id = '$category_id'";
$result = $conn->query($query);
if ( $result->num_rows ) {
$query = "INSERT INTO jokes (joke_text) VALUES ('$joke_text')";
$result = $conn->query($query);
if ( $conn->affected_rows ) {
$new_id = $conn->insert_id;
return $new_id;
}
}
return $new_id;
}
Now, if the SELECT statements from both functions execute at the same time, and proceed from there, delete_category will think it's okay to delete the category, and add_joke will think it's okay to add the joke to the existing category, so I'll get an empty categories table and an entry in the joke table that references a non-existent category_id.
Without using foreign keys, how would you solve this problem?
My best thought so far would be to do the following:
1) "LOCK TABLES categories WRITE, jokes WRITE" at the start of delete_category. However, since I'm using InnoDB, I'm quite keen to avoid locking entire tables (especially main ones that will be used often).
2) Making add_joke a transaction and then doing "SELECT id FROM categories WHERE id = '$category_id'" after inserting the record as well. If it doesn't exist at that point, rollback the transaction. However, since the two SELECT statements in add_joke might return different results, I believe I need to look into transaction isolation levels, which I'm not familiar with.
It seems to me that if I did both of those things, it should work as expected. Nevertheless, I'm keen to hear more informed opinions. Thanks.
You can DELETE a category only if is no matching joke:
DELETE c FROM categories AS c
LEFT OUTER JOIN jokes AS j ON c.id=j.category_id
WHERE c.id = $category_id AND j.category_id IS NULL;
If there are any jokes for the category, the join will find them, and therefore the outer join will return a non-null result. The condition in the WHERE clause eliminates non-null results, so the overall delete will match zero rows.
Likewise, you can INSERT a joke to a category only if the category exists:
INSERT INTO jokes (category_id, joke_text)
SELECT c.id, '$joke_text'
FROM categories AS c WHERE c.id = $category_id;
If there is no such category, the SELECT returns zero rows, and the INSERT is a no-op.
Both of these cases create a shared lock (S-lock) on the categories table.
Demonstration of an S-lock:
In one session I run:
mysql> INSERT INTO bar (i) SELECT SLEEP(600) FROM foo;
In second session I run:
mysql> SHOW ENGINE INNODB STATUS\G
. . .
---TRANSACTION 3849, ACTIVE 1 sec
mysql tables in use 2, locked 2
2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 18, OS thread handle 0x7faefe7d1700, query id 203 192.168.56.1 root User sleep
insert into bar (i) select sleep(600) from foo
TABLE LOCK table `test`.`foo` trx id 3849 lock mode IS
RECORD LOCKS space id 22 page no 3 n bits 72 index `GEN_CLUST_INDEX` of table `test`.`foo` trx id 3849 lock mode S
You can see that this creates an IS-lock on the table foo, and an S-lock on one row of foo, the table I'm reading from.
The same thing happens for any hybrid read/write operations such as SELECT...FOR UPDATE, INSERT...SELECT, CREATE TABLE...SELECT, to block the rows being read from being modified while they are needed as a source for the write operation.
The IS-lock is a table-level lock that prevents DDL operations on the table, so no one issues DROP TABLE or ALTER TABLE while this transaction is depending on some content in the table.

INSERT new rows, UPDATE old rows: how?

I am making a PHP utility that imports and analyzes a CSV file into $data, and whether or not to INSERT "new" rows into the database ($saveNew).
Right now, I have a bit of an ugly mess: (in generalized pseudo-PHP)
function synchronize($data,$saveNew) {
$existing_ids = $table->find_all('ID'); //array of ID's in the table
$incoming_ids = get_key('ID',$data); //grabs the ID field from each record
$new_ids = array_diff($incoming_ids,$existing_ids);
foreach ($data as $record) {
if (in_array($record['ID'],$new_ids)) { //it's new
if ($saveNew) $table->insert($record);
else continue;
} else {
$table->update($record);
}
}
}
To me, this just has a smell, and I think I could accomplish this in just a single query, except that I'm not that familiar with SQL.
I'm using a simple ORM in my app, but I can easily just use straight SQL.
Oh, and I'm using MySQL.
Can this be done with just one query? It seems like this would be a common problem with a simple solution, but I just can't figure it out.
Have a look at MySQL's INSERT ... ON DUPLICATE KEY ... syntax, which allows you to do just that: http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html
Whether you want to implement that in your ORM or in that specific subroutine of yours is up to you...
If you have a unique row identifier, you can use the INSERT ... ON DUPLICATE KEY UPDATE syntax:
INSERT INTO foo (id, col1, col2) VALUES (?, ?, ?)
ON DUPLICATE KEY UPDATE col1 = VALUES(col1), col2 = VALUES(col2);
Then, when you bind NULL, 4, 5 it'll insert a new row (assuming id is an autoincrement column)
When you bind 1, 4, 5 it'll insert a new row if there is no row 1, otherwise it'll update row 1's col1 and col2 fields to 4 and 5 respectively...

Categories