This is my db structure:
ID NAME SOMEVAL API_ID
1 TEST 123456 A123
2 TEST2 223232 A123
3 TEST3 918922 A999
4 TEST4 118922 A999
I'm filling it using a function that calls an API and gets some data from an external service.
The first run, I want to insert all the data I get back from the API. After that, each time I run the function, I just want to update the current rows and add rows in case I got them from the API call and are not in the db.
So my initial thought regarding the update process is to go through each row I get from the API and SELECT to see if it already exists.
I'm just wondering if this is the most efficient way to do it, or maybe it's better to DELETE the relevant rows from the db and just re-inserting them all.
NOTE: each batch of rows I get from the API has an API_ID, so when I say delete the rows i mean something like DELETE FROM table WHERE API_ID = 'A999' for example.
If you retrieving all the rows from the service i recommend you the drop all indexes, truncate the table, then insert all the data and recreate indexes.
If you retrieving some data from the service i would drop all indexes, remove all relevant rows, insert all rows then recreate all indexes.
In such scenarios I'm usually going with:
start transaction
get row from external source
select local store to check if it's there
if it's there: update its values, remember local row id in list
if it's not there: insert it, remember local row id in list
at the end delete all rows that are not in remembered list of local row ids (NOT IN clause if the count of ids allows for this, or other ways if it's possible that there will be many deleted rows)
commit transaction
Why? Because usually I have local rows referenced by other tables, and deleting them all would break the references (not to mention deletete cascade).
I don't see any problem in performing SELECT, then deciding between an INSERT or UPDATE. However, MySQL has the ability to perform so-called "upserts", where it will insert a row if it does not exist, or update an existing row otherwise.
This SO answer shows how to do that.
I would recommend using INSERT...ON DUPLICATE KEY UPDATE.
If you use INSERT IGNORE, then the row won't actually be inserted if it results in a duplicate key on API_ID.
Add unique key index on API_ID column.
If you have all of the data returned from the API that you need to completely reconstruct the rows after you delete them, then go ahead and delete them, and insert afterwards.
Be sure, though, that you do this in a transaction, and that you are using an engine that supports transactions properly, such as InnoDB, so that other clients of the database don't see rows missing from the table just because they are going to be updated.
For efficiency, you should insert as many rows as you can in a single query. Much faster that way.
BEGIN;
DELETE FROM table WHERE API_ID = 'A987';
INSERT INTO table (NAME, SOMEVAL, API_ID) VALUES
('TEST5', 12345, 'A987'),
('TEST6', 23456, 'A987'),
('TEST7', 34567, 'A987'),
...
('TEST123', 123321, 'A987');
COMMIT;
Related
Just looking for some tips and pointers for a small project I am doing. I have some ideas but I am not sure if they are the best practice. I am using mysql and php.
I have a table called nomsing in the database.
It has a primary key called row id which is an integer.
Then I have about 8 other tables referencing this table.
That are called nomplu, accsing,accplu, datsing, datplu for instance.
Each has a column that references the primary key of nomsing.
Withing my php code I have all the information to insert into the tables except one thing , the row id primary key of the nomsing table. So that php generates a series of inserts like the following.
INSERT INTO nomsing(word,postress,gender) VALUES (''велосипед","8","mask").
INSERT INTO nomplu(word,postress,NOMSING?REFERENCE) VALUES (''велосипеды","2",#the reference to the id of the first insert#).
There are more inserts but this one gets the point across. The second insert should reference the auto generated id for the first insert. I was this to work as a transaction so all inserts should complete or none.
One idea I have is to not auto generate the id and generate it myself in php. That way would know the id given before the transaction but then I would have to check if the id was already in the db.
Another idea I have is to do the first insert and then query for the row id of that insert in php and then make the second insert. I mean both should work but they don't seem like an optimal solution. I am not too familiar with the database transactional features but what would be the best approach to do in this case. I don't like the idea of inserting then querying for the id and then running the rest of the queries. Just seems very inefficient or perhaps I am wrong.
Just insert a row in the master table. Then you can fetch the insert id ( lastInserId when on PDO) and use that to populate your other queries.
You could use the php version as given by JvdBerg , or Mysql's LAST_INSERT_ID. I usually use the former option.
See a similar SO question here.
You could add a new column to the nomsing table, called 'insert_order' (or similar) with a default value of 0, then instead of generating one SQL statement per insert create a bulk insert statement e.g.
INSERT INTO nomsing(word,postress,gender, insert_order)
VALUES (''велосипед","8","mask",1), (''abcd'',"9","hat",2).....
you generate the insert_order number with a counter in your loop starting at one. Then you can perform one SELECT on the table to get the ids e.g.
SELECT row_id
FROM nomsing
WHERE insert_order > 0;
now you have all the IDs you can now do a bulk insert for your following queries. At the end of your script just do an update to reset the insert_order column back to 0
UPDATE nomsing SET insert_order = 0 WHERE insert_order > 0;
It may seem messy to add an extra column to do this but it will add a significant speed increase over performing one query at a time.
I need to update two tables in MySQL with PHP. The second table needs the ID of the row being inserted in the to first table.
At the moment I have some PHP code that loops through this process for each of the items in an array:
Check if record exists by attempting to get it's ID.
If the record doesn't exist insert it and get the last insert ID.
Update the second table using the ID we found as a foreign key.
This is very inefficient as multiple database calls are made. I would rather store the data in two arrays, one for each table, then batch insert them when the loop is done. The problem is I need to get the ID of the row in the first table before I can do this.
This is a problem I come across a lot. What is the most efficient / 'best practice' way of doing this?
Thank you
Create stored procedure for inserting whole hierarchy in one server call. Supply all parent-child records as XML and parse it/insert records inside procedure (afaik MySql should have XML-functions similar to MS SQL). This will result in the same number of INSERT statements however they will execute on server side which should improve performance. E.g.
exec MySp #myHierarchy = '<Recs><Parent Name="P1"><Child Name="C1" /><Child Name="C2"/></Parent></Recs>'
I've got a PHP script pulling a file from a server and plugging the values in it into a Database every 4 hours.
This file can and most likely change within the 4 hours (or whatever timeframe I finally choose). It's a list of properties and their owners.
Would it be better to check the file and compare it to each DB entry and update any if they need it, or create a temp table and then compare the two using an SQL query?
None.
What I'd personally do is run the INSERT command using ON DUPLICATE KEY UPDATE (assuming your table is properly designed and that you are using at least one piece of information from your file as UNIQUE key which you should based on your comment).
Reasons
Creating temp table is a hassle.
Comparing is a hassle too. You need to select a record, compare a record, if not equal update the record and so on - it's just a giant waste of time to compare a piece of info and there's a better way to do it.
It would be so much easier if you just insert everything you find and if a clash occurs - that means the record exists and most likely needs updating.
That way you took care of everything with 1 query and your data integrity is preserved also so you can just keep filling your table or updating with new records.
I think it would be best to download the file and update the existing table, maybe using REPLACE or REPLACE INTO. "REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted." http://dev.mysql.com/doc/refman/5.0/en/replace.html
Presumably you have a list of columns that will have to match in order for you to decide that the two things match.
If you create a UNIQUE index over those columns then you can use either INSERT ... ON DUPLICATE KEY UPDATE(manual) or REPLACE INTO ...(manual)
I am setting up an uploader (using php) for my client where they can select a CSV (in a pre-determined format) on their machine to upload. The CSV will likely have 4000-5000 rows. Php will process the file by reading each line of the CSV and inserting it directly into the DB table. That part is easy.
However, ideally before appending this data to the database table, I'd like to review 3 of the columns (A, B, and C) and check to see if I already have a matching combo of those 3 fields in the table AND IF SO I would rather UPDATE that row rather than appending. If I DO NOT have a matching combo of those 3 columns I want to go ahead and INSERT the row, appending the data to the table.
My first thought is that I could make columns A, B, and C a unique index in my table and then just INSERT every row, detect a 'failed' INSERT (due to the restriction of my unique index) somehow and then make the update. Seems that this method could be more efficient than having to make a separate SELECT query for each row just to see if I have a matching combo already in my table.
A third approach may be to simply append EVERYTHING, using no MySQL unique index and then only grab the latest unique combo when the client later queries that table. However I am trying to avoid having a ton of useless data in that table.
Thoughts on best practices or clever approaches?
If you make the 3 columns the unique id, you can do an INSERT with ON DUPLICATE KEY.
INSERT INTO table (a,b,c,d,e,f) VALUES (1,2,3,5,6,7)
ON DUPLICATE KEY UPDATE d=5,e=6,f=7;
You can read more about this handy technique here in the MySQL manual.
If you add a unique index on the ( A, B, C ) columns, then you can use REPLACE to do this in one statement:
REPLACE works exactly like INSERT,
except that if an old row in the table
has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old
row is deleted before the new row is
inserted...
I'm trying to keep the database tables for a project I'm working on nice and normalized, but I've run into a problem. I'm trying to figure out how I can insert a row into a table and then find out what the value of the auto_incremented id column was set to so that I can insert additional data into another table. I know there are functions such as mysql_insert_id which "get the ID generated from the previous INSERT operation". However, if I'm not mistaken mysql_insert_id just returns the ID of the very last operation. So, if the site has enough traffic this wouldn't necessarily return the ID of the query you want since another query could have been run between when you inserted the row and look for the ID. Is this understanding of mysql_insert_id correct? Any suggestions on how to do this are greatly appreciated. Thanks.
LAST_INSERT_ID() has session scope.
It will return the identity value inserted in the current session.
If you don't insert any rows between INSERT and LAST_INSERT_ID, then it will work all right.
Note though that for multiple value inserts, it will return the identity of the first row inserted, not the last one:
INSERT
INTO mytable (identity_column)
VALUES (NULL)
SELECT LAST_INSERT_ID()
--
1
INSERT
INTO mytable (identity_column)
VALUES (NULL), (NULL)
/* This inserts rows 2 and 3 */
SELECT LAST_INSERT_ID()
--
2
/* But this returns 2, not 3 */
You could:
A. Assume that won't be a problem and use mysql_insert_id
or
B. Include a timestamp in the row and retrieve the last inserted ID before inserting into another table.
The general solution to this is to do one of two things:
Create a procedural query that does the insert and then retrieves the last inserted id (using, ie. LAST_INSERT_ID()) and returns it as output from the query.
Do the insert, do another insert where the id value is something like (select myid from table where somecolumnval='val')
2b. Or make the select explicit and standalone, and then do the other inserts using that value.
The disadvantage to the first is that you have to write a proc for each of these cases. The disadvantage to the second is that some db engines don't accept that, and it clutters your code, and can be slow if you have to do a where on multiple columns.
This assumes that there may be inserts between your calls that you have no control over. If you have explicit control, one of the other solutions above is probably better.