I'm using Laravel 5.7 to fetch large amounts of data (around 500k rows) from an API server and insert it into a table (call it Table A) quite frequently (at least every six hours, 24/7) - however, it's enough to insert only the changes the next time we insert (but at least 60-70% of the items will change). So this table will quickly have tens of millions of rows.
I came up with the idea to make a helper table (call it Table B) to store all the new data into it. Before inserting everything into Table A, I want to compare it to the previous data (with Laravel, PHP) from Table B - so I will only insert the records that need to be updated. Again it will usually be around 60-70% of the records.
My first question is if this above-mentioned way is the preferred way of doing it, in this situation (obviously I want to make it happen as fast as possible.) I assume that searching for an updating the records in the table would take a lot more time and it would keep the table busy / lock it. Is there a better way to achieve the same (meaning to update the records in the DB).
The second issue I'm facing is the slow insert times. Right now I'm using a local environment (16GB RAM, I7-6920HQ CPU) and MySQL is inserting the rows very slowly (about 30-40 records at a time). The size of one row is around 50 bytes.
I know it can be made a lot faster by fiddling around with InnoDB's settings. However, I'd also like to think that I can do something on Laravel's side to improve performance.
Right now my Laravel code looks like this (only inserting 1 record at a time):
foreach ($response as $key => $value)
{
DB::table('table_a')
->insert(
[
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
$response is a type of array.
So my second question: is there any way to increase the inserting time of the records to something like 50k/second - both on the Laravel application layer (by doing batch inserts) and MySQL InnoDB level (changing the config).
Current InnoDB settings:
innodb_buffer_pool_size = 256M
innodb_log_file_size = 256M
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = normal
innodb_use_native_aio = true
MySQL version is 5.7.21.
If I forgot to tell/add anything, please let me know in a comment and I will do it quickly.
Edit 1:
The server that I'm planning to use will have SSD on it - if that makes any difference. I assume MySQL inserts will still count as I/O.
Disable autocommit and manually commit at end of insertion
According to MySQL 8.0 docs. (8.5.5 Bulk Data Loading for InnoDB Tables)
You can increase the INSERT speed by turning off auto commit:
When importing data into InnoDB, turn off autocommit mode, because it performs a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:
SET autocommit=0;
... SQL import statements ...
COMMIT;
Other way to do it in Laravel is using Database Transactions:
DB::beginTransaction()
// Your inserts here
DB::commit()
Use INSERT with multiple VALUES
Also according to MySQL 8.0 docs (8.2.5.1 Optimizing INSERT Statements) you can optimize INSERT speed by using multiple VALUES on a single insert statement.
To do it with Laravel, you can just pass an array of values to the insert() method:
DB::table('your_table')->insert([
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
]);
According to the docs, it can be many times faster.
Read the docs
Both MySQL docs links that I put on this post have tons of tips on increasing INSERT speed.
Avoid using Laravel/PHP for inserting it
If your data source is (or can be) a CSV file, you can run it a lot faster using mysqlimport to import the data.
Using PHP and Laravel to import data from a CSV file is an overhead, unless you need to do some data processing before inserting.
Thanks #Namoshek, I had also the same problem. solution is like this.
$users= array_chunk($data, 500, true);
foreach ($users as $key => $user) {
Model::insert($user);
}
Depends on data, you can also make use of array_push() and then insert.
Don't call insert() inside a foreach() because it will execute n number of queries to the database when you have n number of data.
First create an array of data objects matching with the database column names. and then pass the created array to insert() function.
This will only execute one query to the database regardless of how many number of data you have.
This is way faster, way too faster.
$data_to_insert = [];
foreach ($response as $key => $value)
{
array_push($data_to_insert, [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
DB::table('table_a')->insert($data_to_insert);
You need to do multiple row insert but also chunk your insert to not exceed your DB limits
You can do this by chunking your array
foreach (array_chunk($response, 1000) as $responseChunk)
{
$insertableArray = [];
foreach($responseChunk as $value) {
$insertableArray[] = [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
];
}
DB::table('table_a')->insert($insertableArray);
}
You can increase the size of the chunk 1000 till you approach you DB configuration limit. Make sure to leave some security margin (0.6 times your DB limit).
You can't go any faster than this using laravel.
Related
I have a Laravel application which has a common operation where a user fetches around 300,000 rows from a database table called Sign with 3 columns. The table columns are described as follows: id(int-10), sign(varchar-16), status (int-10)
The table has around 300 million entries. When a user takes some entries, the status columns of these rows are changed to the id of the user. Note that the user always takes around 300,000 entries at a time.
I have increased the innodb_buffer_pool_size and innodb_log_file_size to 2GB and 1GB respectively. The system has 3.75GB of RAM.
Here is the code-
$collection = Sign::select('sign')
->where('status', 0)
->where(DB::raw('CHAR_LENGTH(sign)'), '=', 7)
->take(300000);
//write the the signs in $collection in a file here
$collection->update(['status' => $user->id]);
In my case, the table data is fetched quite easily within less than 1s. The update statement used to take about 100-200s previously but recently I have upgraded my OS from Ubuntu 14 to 16 and after this update statement is taking about 500-600s.
Is there any way to make this process faster? Should I increase the RAM?
Instead of select and update, you have to just hit update query directly like in laravel eloquent,
$collection = Sign::where('status', 0)
->whereRaw('CHAR_LENGTH(sign) = 7')
->update(['status' => $user->id]);
In any database select operation will be always fast cause it's just involve scanning of table, you may get more details e.g. EXPLAIN SELECT * FROM sign;
To get over the slowness of CHAR_LENGTH(sign) not using an index, generated columns provide a solution.
Here we create a sign_length calculated as the length of sign as a column:
ALTER TABLE sign ADD sign_length INT UNSIGNED AS (CHAR_LENGTH(sign))
, ADD INDEX status_sign_length(status,sign_length)
Then use:
$collection = Sign::where('status', 0)
->where('sign_length', 7)
->update(['status' => $user->id])
->take(300000);
Note: larvel isn't my strong skill, corrections welcome.
2G for the buffer_pool is dangerously high for a mere 3.75GB of RAM. Is the system swapping? If so, either lower the buffer_pool or increase the RAM. Swapping is terrible for MySQL.
Since the new OS may be taking more RAM for itself, the above statement may explain the slowdown.
Please provide the SQL generated by $collection->update(['status' => $user->id]);; it is not obvious what it will be. For all I know, $collection is keeping a list of 300K ids of all the rows and creating a IN clause for the UPDATE.
UPDATEing rows is a lot more costly than SELECTing them. The former must keep a copy of the rows in case there is a crash necessitating a ROLLBACK.
What version of MySQL? There have been recent changes in the Optimizer for UPDATE.
If there is stuff between the SELECT and the UPDATE, you may need SELECT ... FOR UPDATE -- else another connection could grab the same rows, and make a mess of the data!
I download XML from external URL and parse it into mysql.
Rate::updateOrCreate([
'exchanger_id' => $exchangerId,
'signature_from_id' => $signatureFromId,
'signature_to_id' => $signatureToId
], [
'in' => $item->in,
'out' => $item->out,
'amount' => $item->amount
]);
The thing is XML contains many items, and I parse many sites, so it results into 20K queries for 20-25 URLS. Later on I'll parse about 300 URLS and the number of queries will rise.
How could I optimize this process? I mean the updateOrCreate part. If a row with exchanger_id, signature_from_id and signature_to_id exists I need to update it, otherwise create a new row. And repeat it for every xml item.
As I realize Laravel makes at least 2 queries: first is a select which checks out if the row exists, second is create/update.
Couldn't think about any batch examples :(
Update
I made a unique composite key for first three columns (exchanger_id, signature_from_id, signature_to_id) and downloaded this trait https://github.com/yadakhov/insert-on-duplicate-key
Number of queries become 26 (was about 20000). But the amount of time required to handle all this didn't change. What am I missing...
Why not do this instead if your business case allows it.
(1) Store all the xml in bulk in some folder in your app
(2) Create Cron job that will do the processing for you and fire an event that you can capture when the processing is complete so you can take the next step? Take a look at scheduling jobs here. Also take a look at queues and eventing in laravel here for some more advance ideas.
I'm a beginner in php/sql (6 months), and I noticed - transactions are faster than pure "insert into".
When I operate on huge amounts of data (with range: 10-500k inserts), I noticed my script is slow.
The goal: I wanna do the fastest way to save data into sqlite *.db file.
My script looks like that:
$array = array(
'ronaldo' => 'gay' ,
'mario basler' => 'cool guy'
);
$db = new Sqlite3('file.db')
$db->query('BEGIN;');
foreach($array as $kee => $val){
$db->query("insert into table('name' , 'personality') values('$k' , '$v')");
}
$db->query("COMMIT");
Is that way is wrong?
What you do is absolutely correct. It will speed up your interaction with the database. Any command that changes the database will automatically start a transaction if one is not already in effect.
So, if you do many inserts without starting a transaction explicitly, for each operation a transaction will be created. You create 1 transaction and do all the operations in bulk.
How to insert 40000 records fast into an sqlite database in an iPad
https://www.sqlite.org/lang_transaction.html
I have to do a update query of 800k rows and looking for the best way to do this. All rows are updated with the same values excepted one field (D in my exemple). This field can be 1 or 0. I use update() methode of Zend_Db.
I think about 3 methods to do this :
Methode 1 : Update each row, one after one (with a foreach).
Methode 2 : Do an IF in the update to set the value of the field
Methode 3 : Divide rows in two groups (one with field = 1 and another
with field
= 0) and make two updates (UPDATE ... WHERE id IN (...)), one for each group.
Query looks like this :
$a_data = array(
'A' => foo,
'B' => 99,
'C' => 0,
'D' => (0 OR 1 ?)
);
$where['id IN (?)'] = $a_id;
$update = $this->_db->update($this->_name, $a_data, $where);
Witch method can be the best way to do this ? Thanks
For the record, 800k rows updated on a live production server isn't a good plan. Except being done at an actual mysql level, the chances of this update stopping your server are high.
Now, that being said, and assuming you're running MySql,
Method 1. isn't feasible if for nothing else than that you have 800k rows => 800k queries. max_timeout in php.ini will not allow for the script to run that long. If you still want to try it, try splicing the results into batches of 50-100-200 (depending on your server configuration) and run each batch with a time difference between them. Do a batch, wait a second, do a batch, wait a second, and so on...
Method 2. i guess it pertains to your certain problem, but it will be quicker.
Method 3. see answer for Method 1, except it's not 800k at once, but depends on the ratio between your 0 and 1's. It's going to be 2 queries each pretty large.
Usually, when there's a large batch update like this, I'd say, use mysql from a command line.
If this is an update php script that you're running, the best results are splicing the results and updating 50-100-whatever number at a time. Although it's time consuming (800.000rows / 100rows at a time = 800 runs of the script + a pause of a second after every updated batch).
I currently have some code which needs to perform multiple updates per user for thousands of users, incrementing a counter depending on an action they've taken in order to track what actions are being performed. Each action consists of subactions which need to have the count updated too. These need to be tracked by day.
So I am storing "action":"actionName", "day":day, "count": count, for actions per day (e.g. incoming from outside web page, start game, stop game by exiting, concatenated with the game name for a lot of games).
Each day I get a few thousand rows (one per unique action) added which are updated a few hundred thousand times each day to increase the count.
The relevant code is as follows (creating array of actions not included).
$m = new Mongo();
$db = $m->actionsDB;
$collection = $db->action_count;
foreach ($arr as $action) {
$collection->update(array("action" => $action, "day" => $day),array('$inc' => array("count" => 1)),array("upsert" => true));)
}
$collection->ensureIndex(array("action" => 1, "day" => -1));
An example of the series of updates made on an action and subactions would be:
startGame, 20110417;
startGameZork, 20110417;
startGameZorkWindows, 20110417
The problem seems to be that with this code running on the server, mongo commands in the shell get queued up.
Currently I'm unsure as to why, I guess there may be a performance issue with so many updates per second.
What I am wondering is how can I increase performance? I'm pretty new to mongo, so not entirely sure what options are available. I looked at PHP's batchInsert but I can't see any mention of doing batchUpdate (so instead of updating, creating an array holding all the data I currently update then doing a batchUpdate in a single trip to the DB).
Mongo driver version is 1.2.0, so persistent connections are by default.
Edit: db.serverStatus() before, during and after on ~1600 updates per second (30 seconds). Test Data
There is no built-in batching for updates/upserts. You can only limit the docs to be updated by adjusting your query expression and adding some further filter for "emulating" a batch somehow. MongoDB won't help you here. Updates/Upserts are one or all.
If you have a chance to store your data in a file (json or csv), you could try to insert the data using the command-line mongoimport utility .
In this way you can use the --upsert flag to update/insert documents if they are already present/new
For example from PHP:
exec("mongoimport --db <bdname> --collection <collection_name> --jsonArray --upsert --file $data_file");