I have to insert data into MySQL database (appox. 200,000). I am a little confused about the insert query. I have two options to insert data into MySQL:
INSERT INTO paper VALUES('a','b','c','d');
INSERT INTO paper VALUES('e','f','g','h');
INSERT INTO paper VALUES('k','l','m','n');
and
INSERT INTO paper VALUES('a','b','c','d'),('e','f','g','h'),('k','l','m','n');
Which insert query performs faster? What is the difference between the queries?
TL;TR
The second query will be faster. Why? Read below...
Basically, a query is executed in various steps:
Connecting: Both versions of your code have to do this
Sending query to server: Applies to both versions, only the second version sends only one query
Parsing query: Same as above, both versions need the queries to be parsed, the second version needs only 1 query to be parsed, though
Inserting row: Same in both cases
Inserting indexes: Again, same in both cases in theory. I'd expect MySQL to build update the index after the bulk insert in the second case, making it potentially faster.
Closing: Same in both cases
Of course, this doesn't tell the whole story: Table locks have an impact on performance, the MySQL config, use of prepared statements and transactions might result in better (or worse) performance, too. And of course, the way your DB server is set up makes a difference, too.
So we return to the age-old mantra:
When in doubt: test!
Depending on what your tests tell you, you might want to change some configuration, and test again until you find the best config.
In case of a big data-set, the ideal compromise will probably be a combination of both versions:
LOCK TABLE paper WRITE
/* chunked insert, with lock, probably add transaction here, too */
INSERT INTO paper VALUES ('a', 'z'), ('b','c');
INSERT INTO paper VALUES ('a', 'z'), ('b','c');
UNLOCK TABLES;
Just RTM - MySQL insert speed:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster. See Section 5.1.4, “Server System Variables”.
If you can't use multiple values, then locking is an easy way to speed up the inserts too, as explained on the same page:
To speed up INSERT operations that are performed with multiple statements for nontransactional tables, lock your tables:
LOCK TABLES a WRITE;
INSERT INTO a VALUES (1,23),(2,34),(4,33);
INSERT INTO a VALUES (8,26),(6,29);
/* ... */
UNLOCK TABLES;
This benefits performance because the index buffer is flushed to disk only once, after all INSERT statements have completed. Normally, there would be as many index buffer flushes as there are INSERT statements. Explicit locking statements are not needed if you can insert all rows with a single INSERT.
Read through the entire page for details
I'm not sure which is faster in purely database-side manner. But when you call database from your PHP scripts, then second way should be much faster as you save resources on multiple calls.
Anyway. There is just one way to know. TEST IT.
Related
I'm doing some web crawling and inserting the result into a database. It takes about 2 seconds to scrape but a lot longer to insert. There are two tables, table one is a list of urls and an Ids, table two is a set of tagIds and siteIds.
When I add indexes to the siteIds (which are md5 hashes of the URL, I did this because it speeds up the insertion as it doesn't have to query the database for each urls id to add the site-tag pairings) the insert speed falls off a cliff after 300,000 or so pages.
Example
Table 1
hash |url |title |description
sjkjsajwoi20doi2jdo2xq2klm www.somesite.com somesite a site with info
Table2
site |tag
sjkjsajwoi20doi2jdo2xq2klm xn\zmcbmmndkd2
When I took off the indexes it went much faster and I was able to add about 25 million records in 12 hours, but searching unindexed tags is just impossible.
I'm using php and mysqli for this, I'm open to suggestions for a better way to organise this data.
Hmm, this is a bit tricky as the slow-down is due to the overhead of the database needing to update the index data structure when each record is inserted.
How are you accessing this? Using PDO for php? Using raw sql? Prepared statements?
I would also ensure if you need transactions or not, as the db could be implicitly using a transaction, and that could slow down the inserts. For atomic records (records not deleted but collected, or ones WITHOUT normalized foreign key dependent records) you don't need this.
You could also consider testing if a STORED PROCEDURE has better efficiency (the db could possibly optimize if it has a stored procedure). Then just call this stored procedure via the PDO. It is also possible that the server / install of the db has a hardware limitation, either storage (not on an ssd) or the db operations / install cannot access the full power of the cpu (low priority in the os, other large processing making the db wait for cpu cycles, etc).
can you explain to me, why MongoDB insert faster than MySQL insert?
i use PHP 5.6 and MongoDB 3.2.12 and MySQL 5.6.25.
please give reference! book or journal
(I reopened because the "dup" MySQL vs MongoDB 1000 reads was about reads, not inserts.)
MySQL can be very slow or very fast on INSERTing. To be fair to MySQL, you should specify the specifics that lead to slow or fast.
MyISAM versus InnoDB -- MyISAM is dying, don't consider it in any comparison. https://www.percona.com/blog/2016/10/11/mysql-8-0-end-myisam/
Batch the rows being inserted. Inserting 100 rows in a single INSERT statement easily runs 10 times as fast as one row at a time.
LOAD DATA INFILE (from a .csv) is even faster. https://dev.mysql.com/doc/refman/5.6/en/load-data.html
Consider transactional flags in InnoDB to get some more speed.
Secondary indexes have some impact.
Inserting at the "end" of a table is faster than inserting at random locations -- think UUIDs or GUIDs. http://mysql.rjweb.org/doc.php/uuid
If the table is too big to be cached, UUID inserts eventually slow down to 1 row per IOP.
Keep in mind that the only reason for inserting a table is to later fetch it. So, the fetching patterns have some impact on the insertion optimizations.
Etc, etc.
I am building an application that requires a MySQL table to be emptied and refilled with fresh data every minute. At the same time, it is expected that the table will receive anywhere from 10-15 SELECT statements per second constantly. The SELECT statements should in general be very fast (selecting 10-50 medium length strings every time). A few things I'm worried about:
Is there the potential for a SELECT query to run in between the TRUNCATE and UPDATE queries as to return 0 rows? Do I need to lock the table when executing the TRUNCATE-UPDATE query pair?
Are there any significant performance issues I should worry about regarding this setup?
There most propably is a better way to achieve your goal. But here's a possible answer to your question anyway: You can encapsulate queries that are meant to be executed together in a transaction. Off the top of my head something like
BEGIN TRANSACTION;
TRUNCATE foo;
INSERT INTO foo ...;
COMMIT;
EDIT: The above part is plain wrong, see Philip Devine's comment. Thanks.
Regarding the performance question: Repeatedly connecting to the server can be costly. If you have a persistent connection, you should be fine. You can save little bits here and there by executing multiple queries in a batch or using Prepared Statements.
Why do you need to truncate it every minute? Yes that will result in your users having no rows returned. Just update the rows instead of truncate and insert.
A second option is to insert the new values into a new table, rename the two tables as so:
RENAME TABLE tbl_name TO new_tbl_name
[, tbl_name2 TO new_tbl_name2]
Then truncate the old table.
Then your users see zero down time. The truncate in the other answer ignores transactions and happens immediately so dont do that!!
I have multiple tables. like table1, table2, table3, etc.
What is required:
1. fetch specific row from table1. (for ex: id = 203)
2. fetch all values related to id 203 from table2 (ex: 1,2,3,4,5,6,7....500)
3. again fetch all values of ids from step 2 from table3,table4, etc which have foreign key relation on table2.(millions of rows)
4. Build insert statements for all above 3 steps from result.
5. Insert queries of step.4 in respected tables in archived db with same table names. ie, in short, archiving some part of the data to archive DB.
How I am doing:
For each table, whenever got the rows, created insert statement and storing in specific arrays for each table. Once fetched all values till step 3, creating insert statement and storing in array. Then running loops for each separate arrays and executing these queries archived DB. Once queries executed successfully, deleting all fetched rows from main db, then committing the transaction.
Result:
So far the above approach worked very well with small DB of size around 10-20mb data.
Issue:
For larger number of rows(say more than 5gb), the php is throwing memory exhaust error while fetching rows and hence not working in Production. Even I have increased memory limit till 3gb. I dont want to increase it more.
Alternate solution what I am thinking is, instead of using arrays to store queries, store these queries in files, and then internally use infile command to execute queries to archive DB.
Please suggest how to achieve above issue? once moved to archive DB, there are requirements to move back to main DB with similar functionality.
There are two keys to handling large result sets.
The first is to stream the result set row by row. Unless you specify this explicitly, the php APIs for MySQL immediately attempt to read the entire result set from the MySQL server into client memory, then navigate through that row by row. If your result set has tens or hundreds of thousands of rows, this can make php run out of memory.
If you're using the mysql_ interface, use mysql_unbuffered_query(). You should not be using that interface, though. It's deprecated because, well, it sucks.
If you're using the mysqli_ interface, call mysqli_real_query() instead of mysqli_query(). Then call mysqli_use_result() to initiate retrieval of the result set. You can then fetch each row with with one of the fetch() variants. Don't forget to use mysqli_free_result() to close the result set when you have fetched all its rows. mysqli_ has object-oriented methods; you can use those as well.
PDO has a similar way of streaming result sets from server to client.
The second key to handling large result sets is to use a second connection to your MySQL server to perform the INSERT and UPDATE operations so you don't have to accumulate them in memory. The same goes if you choose to write information to a file in the file system: write it out a row at a time so you don't have to hold it in RAM.
The trick is to handle one or a few rows at a time, not tens of thousands.
It has to be said: Many people prefer to use command line programs written in a number-crunching language like Java, C#, or PERL for this kind of database maintenance.
First of all, let me just say that I'm using the PHP framework Yii, so I'd like to stay within its defined set of SQL statement if possible. I know I could probably create one huge long SQL statement that would do everything, but I'd rather not go there.
OK, imagine I have a table Users and a table FavColors. Then I have a form where users can select their color preferences by checking one or more checkboxes from a large list of possible colors.
Those results are stored as multiple rows in the FavColors table like this (id, user_id, color_id).
Now imagine the user goes in and changes their color preference. In this scenario, what would be the most efficient way to get the new color preferences into the database?
Option 1:
Do a mass delete of all rows where user_id matches
Then do a mass insert of all new rows
Option 2:
Go through each current row to see what's changed, and update accordingly
If more rows need to be inserted, do that.
If rows need to be deleted, do that.
I like option one because it only requires two statements, but something just feels wrong about deleting a row just to potentially put back almost the exact same data in. There's also the issue of making the ids auto-increment to higher values more quickly, and I don't know if that should be avoided whenever possible.
Option 2 will require a lot more programming work, but would prevent situations where I'd delete a row just to create it again. However, adding more load in PHP may not be worth the decrease in load for MySQL.
Any thoughts? What would you all do?
UPDATE is by far much faster. When you UPDATE, the table records are just being rewritten with new data. And all this must be done again on INSERT.
When you DELETE, the indexes should be updated (remember, you delete the whole row, not only the columns you need to modify) and data blocks may be moved (if you hit the PCTFREE limit). Also deleting and adding new changes records IDs on auto_increment, so if those records have relationships that would be broken, or would need updates too. I'd go for UPDATE.
That's why you should prefer INSERT ... ON DUPLICATE KEY UPDATE instead of REPLACE.
The former one is an UPDATE operation in case of a key violation, while the latter one is DELETE / INSERT
UPDATE: Here's an example INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
For more details read update documentation
Philip,
Have you tried doing prepared statements? With prepared statements you can batch one query with different parameters and call it multiple times. At the end of your loop, you can execute all of them with minimal amount of network latency. I have used prepared statements with php and it works great. Little more confusing than java prepared statements.