I'm a beginner in php/sql (6 months), and I noticed - transactions are faster than pure "insert into".
When I operate on huge amounts of data (with range: 10-500k inserts), I noticed my script is slow.
The goal: I wanna do the fastest way to save data into sqlite *.db file.
My script looks like that:
$array = array(
'ronaldo' => 'gay' ,
'mario basler' => 'cool guy'
);
$db = new Sqlite3('file.db')
$db->query('BEGIN;');
foreach($array as $kee => $val){
$db->query("insert into table('name' , 'personality') values('$k' , '$v')");
}
$db->query("COMMIT");
Is that way is wrong?
What you do is absolutely correct. It will speed up your interaction with the database. Any command that changes the database will automatically start a transaction if one is not already in effect.
So, if you do many inserts without starting a transaction explicitly, for each operation a transaction will be created. You create 1 transaction and do all the operations in bulk.
How to insert 40000 records fast into an sqlite database in an iPad
https://www.sqlite.org/lang_transaction.html
Related
I'm using Laravel 5.7 to fetch large amounts of data (around 500k rows) from an API server and insert it into a table (call it Table A) quite frequently (at least every six hours, 24/7) - however, it's enough to insert only the changes the next time we insert (but at least 60-70% of the items will change). So this table will quickly have tens of millions of rows.
I came up with the idea to make a helper table (call it Table B) to store all the new data into it. Before inserting everything into Table A, I want to compare it to the previous data (with Laravel, PHP) from Table B - so I will only insert the records that need to be updated. Again it will usually be around 60-70% of the records.
My first question is if this above-mentioned way is the preferred way of doing it, in this situation (obviously I want to make it happen as fast as possible.) I assume that searching for an updating the records in the table would take a lot more time and it would keep the table busy / lock it. Is there a better way to achieve the same (meaning to update the records in the DB).
The second issue I'm facing is the slow insert times. Right now I'm using a local environment (16GB RAM, I7-6920HQ CPU) and MySQL is inserting the rows very slowly (about 30-40 records at a time). The size of one row is around 50 bytes.
I know it can be made a lot faster by fiddling around with InnoDB's settings. However, I'd also like to think that I can do something on Laravel's side to improve performance.
Right now my Laravel code looks like this (only inserting 1 record at a time):
foreach ($response as $key => $value)
{
DB::table('table_a')
->insert(
[
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
$response is a type of array.
So my second question: is there any way to increase the inserting time of the records to something like 50k/second - both on the Laravel application layer (by doing batch inserts) and MySQL InnoDB level (changing the config).
Current InnoDB settings:
innodb_buffer_pool_size = 256M
innodb_log_file_size = 256M
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = normal
innodb_use_native_aio = true
MySQL version is 5.7.21.
If I forgot to tell/add anything, please let me know in a comment and I will do it quickly.
Edit 1:
The server that I'm planning to use will have SSD on it - if that makes any difference. I assume MySQL inserts will still count as I/O.
Disable autocommit and manually commit at end of insertion
According to MySQL 8.0 docs. (8.5.5 Bulk Data Loading for InnoDB Tables)
You can increase the INSERT speed by turning off auto commit:
When importing data into InnoDB, turn off autocommit mode, because it performs a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:
SET autocommit=0;
... SQL import statements ...
COMMIT;
Other way to do it in Laravel is using Database Transactions:
DB::beginTransaction()
// Your inserts here
DB::commit()
Use INSERT with multiple VALUES
Also according to MySQL 8.0 docs (8.2.5.1 Optimizing INSERT Statements) you can optimize INSERT speed by using multiple VALUES on a single insert statement.
To do it with Laravel, you can just pass an array of values to the insert() method:
DB::table('your_table')->insert([
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
]);
According to the docs, it can be many times faster.
Read the docs
Both MySQL docs links that I put on this post have tons of tips on increasing INSERT speed.
Avoid using Laravel/PHP for inserting it
If your data source is (or can be) a CSV file, you can run it a lot faster using mysqlimport to import the data.
Using PHP and Laravel to import data from a CSV file is an overhead, unless you need to do some data processing before inserting.
Thanks #Namoshek, I had also the same problem. solution is like this.
$users= array_chunk($data, 500, true);
foreach ($users as $key => $user) {
Model::insert($user);
}
Depends on data, you can also make use of array_push() and then insert.
Don't call insert() inside a foreach() because it will execute n number of queries to the database when you have n number of data.
First create an array of data objects matching with the database column names. and then pass the created array to insert() function.
This will only execute one query to the database regardless of how many number of data you have.
This is way faster, way too faster.
$data_to_insert = [];
foreach ($response as $key => $value)
{
array_push($data_to_insert, [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
DB::table('table_a')->insert($data_to_insert);
You need to do multiple row insert but also chunk your insert to not exceed your DB limits
You can do this by chunking your array
foreach (array_chunk($response, 1000) as $responseChunk)
{
$insertableArray = [];
foreach($responseChunk as $value) {
$insertableArray[] = [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
];
}
DB::table('table_a')->insert($insertableArray);
}
You can increase the size of the chunk 1000 till you approach you DB configuration limit. Make sure to leave some security margin (0.6 times your DB limit).
You can't go any faster than this using laravel.
We've just built a system that rolls up its data at midnight. It must iterate through several combinations of tables in order to rollup the data it needs. Unfortunately the UPDATE queries are taking forever. We have 1/1000th of our forecasted userbase and it already takes 28 minutes to rollup our data daily with just our beta users.
Since the main lag is UPDATE queries, it may be hard to delegate servers to handle the data processing. What are some other options for optimizing millions of UPDATE queries? Is my scaling issue in the code below?:
$sql = "SELECT ab_id, persistence, count(*) as no_x FROM $query_table ftbl
WHERE ftbl.$query_col > '$date_before' AND ftbl.$query_col <= '$date_end'
GROUP BY ab_id, persistence";
$data_list = DatabaseManager::getResults($sql);
if (isset($data_list)){
foreach($data_list as $data){
$ab_id = $data['ab_id'];
$no_x = $data['no_x'];
$measure = $data['persistence'];
$sql = "SELECT ab_id FROM $rollup_table WHERE ab_id = $ab_id AND rollup_key = '$measure' AND rollup_date = '$day_date'";
if (DatabaseManager::getVar($sql)){
$sql = "UPDATE $rollup_table SET $rollup_col = $no_x WHERE ab_id = $ab_id AND rollup_key = '$measure' AND rollup_date = '$day_date'";
DatabaseManager::update($sql);
} else {
$sql = "INSERT INTO $rollup_table (ab_id, rollup_key, $rollup_col, rollup_date) VALUES ($ab_id, '$measure', $no_x, '$day_date')";
DatabaseManager::insert($sql);
}
}
}
When addressing SQL scaling issues, it is always best to benchmark your problematic SQL. Even at the PHP level is fine in this case, as you're running your queries within PHP.
If your first query could potentially return millions of records, you may be better served running that query as a MySQL stored procedure. That will minimize the amount of data that has to be transferred between database server and PHP application server. Even if both are the same machine, you can still realize a significant performance improvement.
Some questions to consider that may help to resolve your issue follow:
How long do your SELECT queries take to process without the UPDATE or INSERT statements?
What is the percentage breakdown of your queries - by both SQL selects, and the INSERT and UPDATE? It will be easier to help identify solutions with that info.
Is it possible that there may be larger bottlenecks with those that may resolve your performance issues?
Is it necessary to iterate through your data at the PHP source-code level rather than the MySQL stored procedure level?
Is there a necessity to iterate procedurally through your records, or is it possible to accomplish the same thing through set-based operations?
Does your rollup_table have an index that covers the columns from the UPDATE query?
Also, the SELECT query ran right before your UPDATE query appears to have an identical WHERE clause. It seems to be a redundancy. If you can get away with only running the WHERE clause once, you will shave a lot of time off your largest bottleneck.
If you're unfamiliar with writing MySQL stored procedures, the process is quite simple. See http://www.mysqltutorial.org/getting-started-with-mysql-stored-procedures.aspx for an example. MySQL has good documentation on this as well. A stored procedure is a program that runs within the MySQL database process, which may help to improve performance when dealing with queries that potentially return millions of rows.
Set-based database operations are often faster than procedural operations. SQL is a set-based language. You can update all rows in a database table with a single UPDATE statement, i.e. UPDATE customers SET total_owing_to_us = 1000000 updates all rows in the customers table, without the need to create a programmatic loop like you've created in your sample code. If you have 100,000,000 customer entries, the set-based update will be significantly faster than the procedural update. There are lots of useful resources online that you can read up about this. Here's a SO link to get started: Why are relational set-based queries better than cursors?.
Seems like you are doing one insert or update at a time. Have you tried how much faster it would be to have one big insert or update or batching the queries as much as possible? Here is an example http://www.stackoverflow.com/questions/3432/multiple-updates-in-mysql
I'm importing a csv file to a mysql db. Haven't looked into bulk insert yet, but was wondering is it more efficient to construct a massive INSERT statement (using PHP) by looping through the values OR is it more efficient to do individual insert of the CSV rows?
Inserting in bulk is much faster. I'll typically do something like this which imports data 100 records at a time (The 100 record batch size is arbitrary).
$a_query_inserts = array();
$i_progress = 0;
foreach( $results as $a_row ) {
$i_progress++;
$a_query_inserts[] = "({$a_row['Column1']}, {$a_row['Column2']}, {$a_row['Column3']})";
if( count($a_query_inserts) > 100 || $i_progress >= $results->rowCount() ) {
$s_query = sprintf("INSERT INTO Table
(Column1,
Column2,
Column3)
VALUES
%s",
implode(', ', $a_query_inserts)
);
db::getInstance()->query($s_query);
// Reset batch
$a_query_inserts = array();
}
}
There is also a way to load the file directly into the database.
I don't know the specifics of how PHP makes connections to mySQL, but every insert request is going to have some amount of overhead beyond the data for the insert itself. Therefore I would imagine a bulk insert would be much more efficient than repeated database calls.
It is difficult to give an answer without knowing at least two more elements:
1) Is your DB running on the same server where the PHP code runs?
2) How "big" is the file? I.e. average 20 csv records? 200? 20000?
In general looping through the csv file and firing a insert statement for each row (please use prepared statements, though, or your DB will spend time parsing the same string every single time) would be the more "traditional" approach and would be efficient enough unless you have a really slow connectiong between PHP and the DB.
Even in that case, if the csv file is more than 20 records long you would probably start having problems with max statement length from the SQL parser.
I currently have some code which needs to perform multiple updates per user for thousands of users, incrementing a counter depending on an action they've taken in order to track what actions are being performed. Each action consists of subactions which need to have the count updated too. These need to be tracked by day.
So I am storing "action":"actionName", "day":day, "count": count, for actions per day (e.g. incoming from outside web page, start game, stop game by exiting, concatenated with the game name for a lot of games).
Each day I get a few thousand rows (one per unique action) added which are updated a few hundred thousand times each day to increase the count.
The relevant code is as follows (creating array of actions not included).
$m = new Mongo();
$db = $m->actionsDB;
$collection = $db->action_count;
foreach ($arr as $action) {
$collection->update(array("action" => $action, "day" => $day),array('$inc' => array("count" => 1)),array("upsert" => true));)
}
$collection->ensureIndex(array("action" => 1, "day" => -1));
An example of the series of updates made on an action and subactions would be:
startGame, 20110417;
startGameZork, 20110417;
startGameZorkWindows, 20110417
The problem seems to be that with this code running on the server, mongo commands in the shell get queued up.
Currently I'm unsure as to why, I guess there may be a performance issue with so many updates per second.
What I am wondering is how can I increase performance? I'm pretty new to mongo, so not entirely sure what options are available. I looked at PHP's batchInsert but I can't see any mention of doing batchUpdate (so instead of updating, creating an array holding all the data I currently update then doing a batchUpdate in a single trip to the DB).
Mongo driver version is 1.2.0, so persistent connections are by default.
Edit: db.serverStatus() before, during and after on ~1600 updates per second (30 seconds). Test Data
There is no built-in batching for updates/upserts. You can only limit the docs to be updated by adjusting your query expression and adding some further filter for "emulating" a batch somehow. MongoDB won't help you here. Updates/Upserts are one or all.
If you have a chance to store your data in a file (json or csv), you could try to insert the data using the command-line mongoimport utility .
In this way you can use the --upsert flag to update/insert documents if they are already present/new
For example from PHP:
exec("mongoimport --db <bdname> --collection <collection_name> --jsonArray --upsert --file $data_file");
I am building php web application that let's a user upload a MS Access Database (csv export) that is then translated and migrated into a MySQL database.
The MS Access database consists of one table called t_product of 100k rows. This table is not designed well. As an example, the following query:
SELECT part_number, model_number FROM t_product
will return:
part_number model_number
100 AX1000, AX1001, AX1002
101 CZ10, CZ220, MB100
As you can see, the model numbers are listed as comma separated values instead of individual records in another table. There are many more issues of this nature. I'm writing a script to clean this data before importing into the mysql database. The script will also map existing Access columns to a proper relationally design database.
My issue is that my script takes too long to complete. Here's simplified code to explain what I'm doing:
$handle = fopen("MSAccess.csv, "r");
// get each row from the csv
while ($data=fgetcsv($handle, 1000, ","))
{
mysql_query("INSERT INTO t_product (col1, col2 etc...) values ($data[0], $data[1], etc...");
$prodId = mysql_last_insert_id();
// using model as an example, there are other columns
// with csv values that need to be broken up
$arrModel = explode(',', $data[2]);
foreach($arrModel as $modelNumber)
mysql_query("INSERT INTO t_model (product_id, col1, col2 etc...) values ($prodId, $modelNumber[0], $modelNumber[1] etc...");
}
The problem here is that each while-loop iteration makes a tremendous number of calls to the database. For every product record, I have to insert N model numbers, Y part numbers, X serial numbers etc...
I started another approach where I stored the whole CSV in an array. I then write one batch query like
$sql = "INSERT INTO t_product (col1, col2, etc...) values ";
foreach($arrParam as $val)
$sql .= " ($val[0], $val[1], $val[2]), "
But I ran into excessive memory errors with this approach. I increased the max memory limit to 64M and I'm still running out of memory.
What is the best way to tackle this problem?
Maybe I should write all my queries to a *.sql file first, then import the *.sql file into the mysql database?
This may be entirely not the direction you want to go, but you can generate the MySQL creation script directly from MS Access with the free MySQL Migration Toolkit
Perhaps you could allow the user to upload the Access db, and then have your PHP script call the Migration toolkit?
If you're going to try optimizing the code you have there already, I would try aggregating the INSERTS and see if that helps. This should be easy to add to your code. Something like this (C# pseudocode):
int flushCount = 0;
while (!done)
{
// Build next query, concatenate to last set of queries
if (++flushCount == 5)
{
// Flush queries to database
// Reset query string to empty
flushCount = 0;
}
}
// Flush remaining queries to the database
I decided to write all my queries into a .SQL file. This gave me the opportunity to normalize the CSV file into a proper relational database. Afterwards, my php script called an exec("mysql -h dbserver.com -u myuser -pmypass dbname < db.sql");
This solved my memory problems and it was much faster than multiple queries from php.