My use case:
I have multiple scripts inserting into a table in the order of several inserts per second. I am seeing performance degradation, so I think there would be performance benefits in "batching queries" and inserting several hundred rows every minute or so.
Question:
How would I go about doing this using mysqli? My current code uses a wrapper (pastebin), and looks like:
$array = array();\\BIG ARRAY OF VALUES (more than 100k rows worth)
foreach($array AS $key => $value){
$db -> q('INSERT INTO `player_items_attributes` (`column1`, `column2`, `column3`) VALUES (?, ?, ?)', 'iii', $value['test1'], $value['test2'], $value['test3']);
}
Notes:
I looked at using transactions, but it sounds like those would still hit the server, instead of queuing. I would prefer to use a wrapper (feel free to suggest one with similar functionality to what my current one offers), but if not possible I will try to build suggestions into the wrapper I use.
Sources:
Wrapper came from here
Edit:
I am trying optimize table speed, rather than script speed. This table has more than 35million rows, and has a few indexes.
The MySQL INSERT syntax allows for one INSERT query to insert multiple rows, like this:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
where each set of parenthesised values represents another row in your table. So, by working down the array you could create multiple rows in one query.
There is one major limitation: the total size of the query must not exceed the configured limit. For 100k rows you'd probably have to break this down into blocks of, say, 250 rows, reducing your 100k queries to 400. You might be able to go further.
I'm not going to attempt to code this - you'd have to code something and try it in your environment.
Here's a pseudo-code version:
escape entire array // array_walk(), real_escape_string()
block_size = 250; // number of rows to insert per query
current_block = 0;
rows_array = [];
while (next-element <= number of rows) {
create parenthesised set and push to rows_array // implode()
if (current_block == block_size) {
implode rows_array and append to query
execute query
set current_block = 0
reset rows_array
reset query
}
current_block++
next_element++
}
if (there are any records left over) {
implode rows_array and append to query
execute the query for the last block
}
I can already think of a potentially faster implementation with array_map() - try it.
Related
I'm trying to create a script to import about 10m records to mysql database.
When I did a loop with single queries, import with 2000 records takes 20 minutes.
So I'm trying to do this with transactions. The problem is, in my loop there are some select queries that need to be trigger at once to get some values to create inserts. Last two queries (insert and update) could be in in transaction.
Something like this:
foreach($record as $rec) {
//select sth
//do sth with result
//second select sth
//do sth with second result
//prepare values from above results and $rec
// below part I'd like to do with transaction
//insert with new record
//update table
}
I know this is little messy and not exact, but this function is more complicated, so I decided to put just a "draft" and I need just advice, not complete code.
Regards
Transactions are for multiple statements that need to be treated as a single group that either entirely succeeds or entirely fails. It sounds like your issue has a lot more to do with performance than transactions. Unless there is a bit of information that you haven't included that involves groups of statements "which all must succeed at the same time", transactions are just a distraction.
There are a few ways to approach your problem depending on some things that aren't immediately obvious from your post.
-If your data source for the 10M records is a table in the same database that you are going to populate with the new records (via the inserts and updates at the end of your loop) then you might be able to do everything through a single database query. SQL is very expressive and through joins and some of the built in functions (SUBSTR(), UPPER(), REVERSE(), CASE...END, et c.) you might be able to do everything you want. This would require reading up on SQL and trying to reframe your goals in terms of set operations that you could do.
-If you are inserting records that are sourced from outside the database (like from a file) then I would organize your code like this
//select sth
//do sth with result
//second select sth
//do sth with second result
//prepare values from above results so that $rec info can be added in later
foreach($record as $rec) {
//construct a big insert statement
}
//insert the new records by running the big insert statement
//update table
The advantage here is that you are only hitting the db with a few queries, instead of a few queries per $rec so your performance will be better (since db calls have overhead). For 10M rows you may need to break the above up into a few chunks since there is a limit to how big a single insert can be (see max_allowed_packet). I would suggest breaking the 10M into 5K or 10K chunks by adding another loop around the above that partitions off the chunks from the 10M.
A clearer answer could have been given if you added details about your data source, what transformations you want to do on the data, what the purpose of the
//select sth
//do sth with result
//second select sth
//do sth with second result
section is (within the context of how it adds information to your insert statements later), and what the prepare values section of your code does.
I have array like this:
$array = array("AAA,http://aaa.com,bbb,http://bbb.com,ccc,http://ccc.com");
How can I take this array and insert into database like this:
No name url
1. AAA. Http://aaa.com
2. BBB. Http://bbb.com
3. CCC. Http://ccc.com
Using PHP and MySQL.
Thank you.
Assembling one INSERT statement with multiple rows is much faster in MySQL than one INSERT statement per row.
That said, it sounds like you might be running into string-handling problems in PHP, which is really an algorithm problem, not a language one. Basically, when working with large strings, you want to minimize unnecessary copying. Primarily, this means you want to avoid concatenation. The fastest and most memory efficient way to build a large string, such as for inserting hundreds of rows at one, is to take advantage of the implode() function and array assignment.
$sql = array();
foreach( $data as $row ) {
$sql[] = '("'.mysql_real_escape_string($row['text']).'", '.$row['category_id'].')';
}
mysql_query('INSERT INTO table (text, category) VALUES '.implode(',', $sql));
The advantage of this approach is that you don't copy and re-copy the SQL statement you've so far assembled with each concatenation; instead, PHP does this once in the implode() statement. This is a big win.
If you have lots of columns to put together, and one or more are very long, you could also build an inner loop to do the same thing and use implode() to assign the values clause to the outer array.
please try and good luck
i need to insert 1000-30000 lines at a time (made of 19 elements each) into a mysql table from php using pdo prepared statements. I was asking myself if it would be better to do many different inserts or one big multi insert, like:
INSERT INTO table (a,b,c,...) VALUES (value0a, value0b, value0c,...), (value1a, value1b, value1c,...), ..., (value10000a, value10000b, value10000c,...)
VS exec each insert inside a transaction
INSERT INTO table (a,b,c,...) VALUES (value0a, value0b, value0c,...);
INSERT INTO table (a,b,c,...) VALUES (value1a, value1b, value1c,...);
INSERT INTO table (a,b,c,...) VALUES (value2a, value2b, value2c,...);
...
INSERT INTO table (a,b,c,...) VALUES (value10000a, value10000b, value10000c,...);
looks like a multi-insert is better, so do i have to know how many lines i need to insert and create a (?,?,?,...) placeholders for them and then bind them in a loop?
considering that PDOStatement::debugDumpParams() is not showing params values, how do i echo the whole query as it will be inserted?
When you prepare a statement, it is lexed once and execution plan is ready. What's left is to fill in the data. This is much, much better for several reasons
Lexing done once
Execution plan is ready
You won't have issues with max_packet_size because if you send bulk inserts and if the query is large, MySQL can refuse it
It's easier to use such a statement in a loop, provide data and execute
The issue of speed is related to your hard disk. Basically, if you start a transaction, issue 100 (or 200) inserts and then commmit the transaction - you will see a huge increase in speed. That's how we achieve fast insert rates, by spending 1 I/O and using a lot of disks' bandwith.
How often do you do this?
In case, you will do this often (once a day, more times over a week), try to get a mix of "many-inserts-per-line" and "many-line-inserts", so you have 5 to 10 inserts in a row.
Faster way two insert data in table
INSERT INTO your_tbl
(a,b,c)
VALUES
(value0a,value0b,value0c),
(value1a,value1a,value1a)
just from here: Which is faster: multiple single INSERTs or one multiple-row INSERT?
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
The time required for inserting a row is determined by the following
factors, where the numbers indicate approximate proportions:
Connecting: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting row: (1 × size of row)
Inserting indexes: (1 × number of indexes)
Closing: (1)
From this it should be obvious, that sending one large statement will
save you an overhead of 7 per insert statement, which in further
reading the text also says:
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements.
It is always good to have as much less SQL queries as possible. So doing single big insert is better as it will reduce database interaction and save lots of your processing time.
i was playing with PDO on PostgreSQL 9.2.4 and was trying to fetch data from a table having millions on rows. My query returns about 100.000 rows.
I do not use any of PDOStatements's fetch function, i simply use the result from the PDO Objecte itels and loop through it.
But its getting slower and slower by time. At the beginning it was fetching like 200 rows per second. But the close it comes to its end, it gets slower. Now being at row 30.000 it fetches only 1 row per second. Why is it getting slower.
I do this, its pretty simple:
$dbh = new PDO("pgsql...");
$sql = "SELECT x, y FROM point WHERE name is NOT NULL and place IN ('area1', 'area2')";
$res = $dbh->query($sql);
$ins_sql = "INSERT INTO mypoints (x, y) VALUES ";
$ins_vals = [];
$ins_placeholders = [];
foreach($res as $row) {
$ins_placeholders[] = "(?,?)";
$ins_vals = array_merge($ins_vals, [$row['x'], $row['y']]);
printCounter();
}
// now build up one insert query using placeholders and values,
// to insert all of them in one shot into table mypoints
Function printCounter simply increases an int var and prints it. So i can see how many rows it has put already in that array before i create my insert statement out of it. I use one shot inserts to speed things up, better than doing 100.000 inserts.
But that foreach loop is getting slower by time. How can i increase the speed.
Is there a difference between fetch() and the simple loop method using the pdostatement in foreach?
when i start this php script, it takes like 5-10 seconds for the query. So this has nothing to do with how the table is setup and if i need indexes.
I have other tables returning 1 million rows, im not sure what is the best way to fetch them. I can raise PHP's memory_limit if needed, so the most important thing for me is SPEED.
Appreciate any help.
It's not likely that the slowness is related to the database, because after the $dbh->query() call, the query is finished and the resulting rows are all in memory (they are not in PHP variables yet, but they're in memory accessible at the pgsql module level).
The more likely culprit is the array_merge operation. The array becomes larger at every loop iteration, and the operation recreates the entire array each time.
You may want to do instead:
$ins_vals[] = [$row['x'], $row['y']];
Although personally, when concerned with speed, I'd use an even simpler flat structure:
$ins_vals[] = $x;
$ins_vals[] = $y;
Another unrelated point is that it seems to build a query with a huge number of placeholders, which is not how placeholders are normally used. To send large numbers of values to the server, the efficient way is to use COPY, possibly into a temporary table followed by server-side merge operations if it's not a plain insertion.
I dont know why, but using fetch() method instead and doing the $ins_val filling like this:
$ins_vals[] = $x;
$ins_vals[] = $y;
and using beginTransaction and commit makes now my script unbelievable fast.
Now it takes only about 1 minute to add my 100.000 points.
i think both array_merge and that "ugly" looping through the PDOStatement slowed down my script.
And why the heck someone downvoted my question? Are you punishing me because of my missing knowledge? Thanks.
Ok i generated a class where i set the sql and then put the values for each row with a method call. Whenever it reaches a specific limit, it starts a transaction, prepares the statement with as many placeholders as i have put values, then executes it with the array having all the values, then commit.
This seems to be fast enough, at least it doesnt get slower anymore.
For some reason its faster to add values in a flat structure as Daniel suggested. Thats enough for me.
Sometimes its good to have a function doing one step of insertion, because when the function returns, all the memory used in the function will be freed, so your memory usage stays low.
I'm importing a csv file to a mysql db. Haven't looked into bulk insert yet, but was wondering is it more efficient to construct a massive INSERT statement (using PHP) by looping through the values OR is it more efficient to do individual insert of the CSV rows?
Inserting in bulk is much faster. I'll typically do something like this which imports data 100 records at a time (The 100 record batch size is arbitrary).
$a_query_inserts = array();
$i_progress = 0;
foreach( $results as $a_row ) {
$i_progress++;
$a_query_inserts[] = "({$a_row['Column1']}, {$a_row['Column2']}, {$a_row['Column3']})";
if( count($a_query_inserts) > 100 || $i_progress >= $results->rowCount() ) {
$s_query = sprintf("INSERT INTO Table
(Column1,
Column2,
Column3)
VALUES
%s",
implode(', ', $a_query_inserts)
);
db::getInstance()->query($s_query);
// Reset batch
$a_query_inserts = array();
}
}
There is also a way to load the file directly into the database.
I don't know the specifics of how PHP makes connections to mySQL, but every insert request is going to have some amount of overhead beyond the data for the insert itself. Therefore I would imagine a bulk insert would be much more efficient than repeated database calls.
It is difficult to give an answer without knowing at least two more elements:
1) Is your DB running on the same server where the PHP code runs?
2) How "big" is the file? I.e. average 20 csv records? 200? 20000?
In general looping through the csv file and firing a insert statement for each row (please use prepared statements, though, or your DB will spend time parsing the same string every single time) would be the more "traditional" approach and would be efficient enough unless you have a really slow connectiong between PHP and the DB.
Even in that case, if the csv file is more than 20 records long you would probably start having problems with max statement length from the SQL parser.