I have a large set of results as an array from a cakePHP model for a csv export. I have been formatting using a loop as shown below. As the number of records grow, this is becoming too slow and giving time out errors. Is there a better way to do this using either cakephp hash or php array functions?
foreach($people as $person){
array_push($results, array(
'SchoolName'=> $person['School']['name'],
'SchoolRef' => $person['School']['ref'],
'firstName' => $person['Person']['firstname'],
'LastName' => $person['Person']['lastname'],
'Year1' => $person['Person']['year_1'],
'StudentID' => $person['Person']['studentid'],
'Email' => $person['Person']['email']
));
}
If you're just outputting to CSV, why not try outputting directly from MySQL (or whichever database you're using).
Eg. http://ariejan.net/2008/11/27/export-csv-directly-from-mysql/
Alternatively, if the data doesn't change, you might be able to presummarize the existing output. So, if you had 10,000 students the last time you outputted the CSV, you could save that CSV and just append the new records. If they do change, you could add a hash of all fields to each record.
Also, if the data doesn't have to be up to the minute accurate, you could presummarize on a daily basis (or whatever interval works for you).
However, without clear indication of where you're at (in terms of record sizes and timeouts), and without a clear idea of where you'd like to be, its difficult to make a specific recommendation.
Related
I'm using Laravel 5.7 to fetch large amounts of data (around 500k rows) from an API server and insert it into a table (call it Table A) quite frequently (at least every six hours, 24/7) - however, it's enough to insert only the changes the next time we insert (but at least 60-70% of the items will change). So this table will quickly have tens of millions of rows.
I came up with the idea to make a helper table (call it Table B) to store all the new data into it. Before inserting everything into Table A, I want to compare it to the previous data (with Laravel, PHP) from Table B - so I will only insert the records that need to be updated. Again it will usually be around 60-70% of the records.
My first question is if this above-mentioned way is the preferred way of doing it, in this situation (obviously I want to make it happen as fast as possible.) I assume that searching for an updating the records in the table would take a lot more time and it would keep the table busy / lock it. Is there a better way to achieve the same (meaning to update the records in the DB).
The second issue I'm facing is the slow insert times. Right now I'm using a local environment (16GB RAM, I7-6920HQ CPU) and MySQL is inserting the rows very slowly (about 30-40 records at a time). The size of one row is around 50 bytes.
I know it can be made a lot faster by fiddling around with InnoDB's settings. However, I'd also like to think that I can do something on Laravel's side to improve performance.
Right now my Laravel code looks like this (only inserting 1 record at a time):
foreach ($response as $key => $value)
{
DB::table('table_a')
->insert(
[
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
$response is a type of array.
So my second question: is there any way to increase the inserting time of the records to something like 50k/second - both on the Laravel application layer (by doing batch inserts) and MySQL InnoDB level (changing the config).
Current InnoDB settings:
innodb_buffer_pool_size = 256M
innodb_log_file_size = 256M
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = normal
innodb_use_native_aio = true
MySQL version is 5.7.21.
If I forgot to tell/add anything, please let me know in a comment and I will do it quickly.
Edit 1:
The server that I'm planning to use will have SSD on it - if that makes any difference. I assume MySQL inserts will still count as I/O.
Disable autocommit and manually commit at end of insertion
According to MySQL 8.0 docs. (8.5.5 Bulk Data Loading for InnoDB Tables)
You can increase the INSERT speed by turning off auto commit:
When importing data into InnoDB, turn off autocommit mode, because it performs a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:
SET autocommit=0;
... SQL import statements ...
COMMIT;
Other way to do it in Laravel is using Database Transactions:
DB::beginTransaction()
// Your inserts here
DB::commit()
Use INSERT with multiple VALUES
Also according to MySQL 8.0 docs (8.2.5.1 Optimizing INSERT Statements) you can optimize INSERT speed by using multiple VALUES on a single insert statement.
To do it with Laravel, you can just pass an array of values to the insert() method:
DB::table('your_table')->insert([
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
]);
According to the docs, it can be many times faster.
Read the docs
Both MySQL docs links that I put on this post have tons of tips on increasing INSERT speed.
Avoid using Laravel/PHP for inserting it
If your data source is (or can be) a CSV file, you can run it a lot faster using mysqlimport to import the data.
Using PHP and Laravel to import data from a CSV file is an overhead, unless you need to do some data processing before inserting.
Thanks #Namoshek, I had also the same problem. solution is like this.
$users= array_chunk($data, 500, true);
foreach ($users as $key => $user) {
Model::insert($user);
}
Depends on data, you can also make use of array_push() and then insert.
Don't call insert() inside a foreach() because it will execute n number of queries to the database when you have n number of data.
First create an array of data objects matching with the database column names. and then pass the created array to insert() function.
This will only execute one query to the database regardless of how many number of data you have.
This is way faster, way too faster.
$data_to_insert = [];
foreach ($response as $key => $value)
{
array_push($data_to_insert, [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
DB::table('table_a')->insert($data_to_insert);
You need to do multiple row insert but also chunk your insert to not exceed your DB limits
You can do this by chunking your array
foreach (array_chunk($response, 1000) as $responseChunk)
{
$insertableArray = [];
foreach($responseChunk as $value) {
$insertableArray[] = [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
];
}
DB::table('table_a')->insert($insertableArray);
}
You can increase the size of the chunk 1000 till you approach you DB configuration limit. Make sure to leave some security margin (0.6 times your DB limit).
You can't go any faster than this using laravel.
I need to make an import method that takes the CSV file and imports everything in the database.
I've done the parsing with one of Laravel's CSV addons and it works perfectly giving me a big array of values set as:
[
'col1_name' => 'col1 value',
'col2_name' => 'col2 value',
'col3_name' => 'col3 value,
'...' => '...'
]
This is also perfect since all the column names fit my model which makes the database inserts a breeze.
However - a lot of column values are strings that i'd like to set as separate tables/relations. For example, one column contains the name of the item manufacturer, and i have the manufacturer table set in my database.
My question is - what's the easy way to go through the imported CSV and swap the strings with the corresponding ID from the relationship table, making it compatible with my database design?
Something that would make the imported line:
[
'manufacturer' => 'Dell',
]
into:
[
'manufacturer' => '32',
]
I know i could just do a foreach loop comparing the needed values with values from the relationship models but I'm sure there's an easier and more clean way of doing it.
I don't think theres any "nice" way to do this - you'll need to look up each value for "manufacturer" - the question is, how many queries will you run to do so?
A consideration you need to make here is how many rows you will be importing from your CSV file.
You have a couple of options.
1) Querying 1 by 1
I'm assuming you're going to be looping through every line of the CSV file anyway, and then making a new model? In which case, you can add an extra database call in here;
$model->manufacturer_id = Manufacturer::whereName($colXValue)->first()->id;
(You'd obviously need to put in your own checks etc. here to make sure manufacturers exist)
This method is fine relatively small datsets, however, if you're importing lots and lots of rows, it might end up sluggish with alot of arguably unnecessary database calls.
2) Mapping ALL your Manufacturers
Another option would be to create a local map of all your Manufacturers before you loop through your CSV lines;
$mappedManufacturers = Manufacturer::all()->pluck('id', 'name');
This will make $mappedManufacturers an array of manufacturers that has name as a key, id as a value. This way, when you're building your model, you can do;
$model->manufacturer_id = $mappedManufacturers[$colXValue];
This method is also fine, unless you have tens of thousands of Manufacturers!
3) Where in - then re-looping
Another option would be to build up a list of manufacturer names when looping through your CSV lines, going to the database with 1 whereIn query and then re-looping through your models to populate the manufacturer ID.
So in your initial loop through your CSV, you can temporarily set a property to store the name of the manufacturer, whilst adding it to another array;
$models = collect();
$model->..... = ....;
$model->manufacturer = $colXValue;
$models->push($colXValue);
Then you'll end up with a collection of models. You then query the database for ONLY manufacturers which have appeared:
$manufacturers = Manufacturer::whereIn('name', $models->lists('manufacturer'))->get()->keyBy('name')->toArray();
This will give you array of manufacturers, keyed by their name.
You then loop through your $models collection again, assigning the correct manufacturer id using the map;
$model->manufacturer_id = $manufacturers[$model->manufacturer];
Hopefully this will give you some ideas of how you can achieve this. I'd say the solution mostly depends on your use case - if this was going to be a heavy duty ask - I'd definitely Queue it and be tempted to use Option 1! :P
Let's say that I have array like the one I posted below and that I need to store it in my MySQL database:
Array(
"Weight" => "10",
"Height" => "17",
"Usage" => "35"
);
Preamble:
I will never update these values
I will never perform a query based on these values
Long story short I only need to store and display this array as it is. Actually I need to use these values to generate graphs. Now I see 2 possible options.
Option 1: even if I will never use a WHERE, ORDER BY, HAVING (...) condition on these values, I store each value separately in a dedicated column (weight, height, usage).
Option 2: I create a single column (stats) where I store a serialized version of the array then, in order generate my graphs, I unserialize each row before using it.
The question is: what's the best approach to store this array in terms of effectiveness and performaces?
In my opinion the second approach is the best but let's say that there are many rows and elements involved in the process. I don't understand if it's faster and ligher to unserialize an array made by 20 elements for 100 rows with PHP or to read plain values stored in 20 columns considering that I need to save lot of them very frequently and simultaneously.
I will never update these values
I will never perform a query based on these values
The second you finalise your code having stored them as serialised values, you'll be asked to perform a query to update anything with a weight above ten.
Just store them in their own columns - not only will this future-proof the code, but it is easier to work with and will take up less drive space in the long run.
So, there's a field in the db in which I store serialized arrays.
$array = array('count1' => 10, 'count2' => 20, 'count3' => 4);
serialized:
a:3:{s:6:"count1";i:10;s:6:"count2";i:20;s:6:"count3";i:4;}
Would it be possible to pull count1+count2+count3 using a mysql query? I guess I'm looking for something like php's explode. Pretty sure this can't be done, but I thought I'd ask.
I need to pull the highest count1+count2+count3 rows and return the total count. Looping through each row and unserializing wouldn't work since there are TONS of rows.
If you need to access parts of your serialized data via SQL, you need to store them in separate columns.
While it might be possible to use techniques such as regular expressions to access those three values in this string, it would be extremely slow when used in a WHERE criterion as indexes would be useless - not to mention that it would be a huge mess, way worse than using goto in a programming language.
So the solution is to create a new columns and then iterate over all rows, unserialize them, and store the sum into the new column. That might take a while but you'll only need to it once.
Depending on your application it might be better to create three columns and store each value separately.
I'm trying to figure out a way to make a small "table" style system in php for about 10 data rows. Because it requires constant editing, I want to replace my mysql system for something in php directly.
The data is 10 rows of:
id
first name
last name
I give the php file the id and want to pull out the first name and last name.
I tried using a associative array, but that turned into a coding mess as my syntax was all over the place.
How can I set this one up properly so i can edit the data easily in a single place and get first and last name of a row by its $id?
edit - example:
id fname lname
1 john ter
2 mark laken
3 peter lars
4 vlad morch
Basically, how do I set that info above up in php such that I can add new rows without too much trouble and the code will still work, and such that it is possible to output the fname and lname from a $_GET of an id value...
Hope that makes sense!
I'm not understanding why you wouldn't want to store constantly changing data in the database, but here is how I would hardcode it:
$data = array(
'id01' => array(
'firstName' => 'Eric',
'lastName' => 'Smith',
),
'id02' => array(
'firstName' => 'John',
'lastName' => 'Turner',
),
...
);
If you were returning this data in an ajax call I'd do it along these line
echo json_encode($data[$id]);
Of course you should also test if the value in $id is in your data array.
I'm not totally sure I understand what you're looking for, but if you want to be able to edit something inline and save it on form input blur, you will have to look beyond PHP and into an AJAX solution. You will likely still want to back this with a database as PHP scripts don't have a continuous runtime, so you can't read all the data into memory and change it directly in memory through user interaction. So what you'll do is read all the data from the DB into a form, then using a little ajax, you will be able to save the form data back to the database everytime a value is changed.