I am inserting data to the database using insert_batch() function.
I want to split the process.
I mean if I want to create 10,000 serial numbers. but 1,000 rows at a time, it should run the create process 10 times in a loop.
How can I do that?
$serial_numbers = $this->serial_numbers_model->generate_serial_numbers($product_id, $partner_id, $serial_number_quantity, $serial_number_start);
$issued_date = date("Y-m-d H:i:s");
$inserted_rows = 0;
foreach ($serial_numbers as $sn) {
$check_number = $this->serial_numbers_model->generate_check_number();
$first_serial_number = reset($serial_numbers);
$last_serial_number = end($serial_numbers);
$inserted_rows++;
$insert_data[] = array(
'partner_id' => $partner_id,
'product_id' => $product_id,
'check_number' => $check_number,
'serial_number' => $sn,
'issued_time' => $issued_date,
'serial_number_status' => CREATE_STATUS
);
}
$this->serial_numbers_model->insert_batch($insert_data);
}
Probably your serial_numbers_model->insert_batch() is just a wrapper around Codeigniter's native insert_batch()? The code below uses the native one for clarity, replace it with yours as required.
// Track how many in your batch, and prepare empty batch array
$count = 0;
$insert_data = [];
foreach ($serial_numbers as $sn) {
// ... your code, prepare your data, etc ...
$count++;
$insert_data[] = array(
// ... your data ...
);
// Do you have a batch of 1000 ready?
if ($count === 1000) {
// Yes - insert it
$this->db->insert_batch('table', $insert_data);
// $this->serial_numbers_model->insert_batch($insert_data);
// Reset the count, and empty the batch, ready to start again
$count = 0;
$insert_data = [];
}
}
// Watch out! If there were 1001 serial numbers, the first 1000 were handled,
// but the last one hasn't been inserted!
if (sizeof($insert_data)) {
$this->db->insert_batch('table', $insert_data);
}
Related
Hello i'm trying to use eloquent in my code:
$nr_riga = 0;
foreach($data_detail as $key => $row_detail) {
$nr_riga = $key + 1;
$new_orders_details->nr_riga = $nr_riga;
$new_orders_details->codice_articolo = $row_detail['codice_articolo'];
$new_orders_details->quantita = $row_detail['quantita'];
$new_orders_details->prezzo = $row_detail['prezzo'];
$new_orders_details->order_id = $new_orders_grid->id;
$new_orders_details->save();
// DB::table('orders_detail')->insert([
// 'order_id' => $new_orders_details->order_id,
// 'nr_riga' => $nr_riga,
// 'codice_articolo' => $new_orders_details->codice_articolo,
// 'quantita' => $new_orders_details->quantita,
// 'prezzo' => $new_orders_details->prezzo,
// ]);
}
This loop works both ways but not equally.. when i use $new_orders_details->save(); it inserts to the db a single row,seems to not looping.
DB::table('orders_detail')->insert does the job as i want.
How to convert it to eloquent for have same result?
This is the db screen:
You have to create new model instance in loop.
$new_orders_details = new OrderSDetail();
Since I cannot comment, can you try this solution?
$ordersDetail = new OrderSDetail();
$ordersDetail->insert([ //your data here]);
I think you can prepare date in loop and make one batch insert using Model::insert($your_data)
Saving in loop it is not best way to save data to db
Like:
$data = [];
foreach($data_detail as $key => $row_detail) {
$nr_riga = $key + 1;
$data[]['nr_riga'] = $nr_riga;
$data[]['codice_articolo'] = $row_detail['codice_articolo'];
$data[]['quantita'] = $row_detail['quantita'];
$data[]['prezzo'] = $row_detail['prezzo'];
$data[]['order_id'] = $new_orders_grid->id;
}
NewOrderDetails::insert($data);
It is save all you data with using model
I use CodeIgniter, and when an insert_batch does not fully work (number of items inserted different from the number of items given), I have to do the inserts again, using insert ignore to maximize the number that goes through the process without having errors for existing ones.
When I use this method, the kind of data I'm inserting does not need strict compliance between the number of items given, and the number put in the database. Maximize is the way.
What would be the correct way of a) using insert_batch as much as possible b) when it fails, using a workaround, while minimizing the number of unnecessary requests?
Thanks
The Correct way of inserting data using insert_batch is :
CI_Controller :
public function add_monthly_record()
{
$date = $this->input->post('date');
$due_date = $this->input->post('due_date');
$billing_date = $this->input->post('billing_date');
$total_area = $this->input->post('total_area');
$comp_id = $this->input->post('comp_id');
$unit_id = $this->input->post('unit_id');
$percent = $this->input->post('percent');
$unit_consumed = $this->input->post('unit_consumed');
$per_unit = $this->input->post('per_unit');
$actual_amount = $this->input->post('actual_amount');
$subsidies_from_itb = $this->input->post('subsidies_from_itb');
$subsidies = $this->input->post('subsidies');
$data = array();
foreach ($unit_id as $id => $name) {
$data[] = array(
'date' => $date,
'comp_id' => $comp_id,
'due_date' => $due_date,
'billing_date' => $billing_date,
'total_area' => $total_area,
'unit_id' => $unit_id[$id],
'percent' =>$percent[$id],
'unit_consumed' => $unit_consumed[$id],
'per_unit' => $per_unit[$id],
'actual_amount' => $actual_amount[$id],
'subsidies_from_itb' => $subsidies_from_itb[$id],
'subsidies' => $subsidies[$id],
);
};
$result = $this->Companies_records->add_monthly_record($data);
//return from model
$total_affected_rows = $result[1];
$first_insert_id = $result[0];
//using last id
if ($total_affected_rows) {
$count = $total_affected_rows - 1;
for ($x = 0; $x <= $count; $x++) {
$id = $first_insert_id + $x;
$invoice = 'EBR' . date('m') . '/' . date('y') . '/' . str_pad($id, 6, '0', STR_PAD_LEFT);
$field = array(
'invoice_no' => $invoice,
);
$this->Companies_records->add_monthly_record_update($field,$id);
}
}
echo json_encode($result);
}
CI_Model :
public function add_monthly_record($data)
{
$this->db->insert_batch('monthly_record', $data);
$first_insert_id = $this->db->insert_id();
$total_affected_rows = $this->db->affected_rows();
return [$first_insert_id, $total_affected_rows];
}
AS #q81 mentioned, you would split the batches (as you see fit or depending on system resources) like this:
$insert_batch = array();
$maximum_items = 100;
$i = 1;
while ($condition == true) {
// code to add data into $insert_batch
// ...
// insert the batch every n items
if ($i == $maximum_items) {
$this->db->insert_batch('table', $insert_batch); // insert the batch
$insert_batch = array(); // empty batch array
$i = 0;
}
$i++;
}
// the last $insert_batch
if ($insert_batch) {
$this->db->insert_batch('table', $insert_batch);
}
Edit:
while insert batch already splits the batches, the reason why you have "number of items inserted different from the number of items given" might be because the allowed memory size is reached. this happened to me too many times.
I have a script that parses a csv to array with a million rows in it.
I want to batch this with a cronjob. For example every 100.000 rows i want to pause the script and then continue it again to prevent memory leaks etc.
My script for now is looking like this :
It's not relevant what is does but how can i loop through this in batches in an cronjob?
Can i just make an cronjob what calls this script every 5 minutes and remembers where the foreach loop is paused?
$csv = file_get_contents(CSV);
$array = array_map("str_getcsv", explode("\n", $csv));
$headers = $array[0];
$number_of_records = count($array);
for ($i = 1; $i < $number_of_records; $i++) {
$params['body'][] = [
'index' => [
'_index' => INDEX,
'_type' => TYPE,
'_id' => $i
]
];
// Set the right keys
foreach ($array[$i] as $key => $value) {
$array[$i][$headers[$key]] = $value;
unset($array[$i][$key]);
}
// Loop fields
$params['body'][] = [
'Inrijdtijd' => $array[$i]['Inrijdtijd'],
'Uitrijdtijd' => $array[$i]['Uitrijdtijd'],
'Parkeerduur' => $array[$i]['Parkeerduur'],
'Betaald' => $array[$i]['Betaald'],
'bedrag' => $array[$i]['bedrag']
];
// Every 1000 documents stop and send the bulk request
if ($i % 100000 == 0) {
$responses = $client->bulk($params);
// erase the old bulk request
$params = ['body' => []];
// unset the bulk response when you are done to save memory
unset($responses);
}
// Send the last batch if it exists
if (!empty($params['body'])) {
$responses = $client->bulk($params);
}
}
In the given code the script will always process from the beginning, since no pointer of some sort is kept.
My suggestion would be to split the CSV file into pieces and let another script parse the pieces one by one (i.e. every 5 minutes). (and delete the file afterwards).
$fp = fopen(CSV, 'r');
$head = fgets($fp);
$output = [$head];
while (!feof($fp)) {
$output[] = fgets($fp);
if (count($output) == 10000) {
file_put_contents('batches/batch-' . $count . '.csv', implode("\n", $output));
$count++;
$output = [$head];
}
}
if (count($output) > 1) {
file_put_contents('batches/batch-' . $count . '.csv', implode("\n", $output));
}
Now the original script can process a file every time:
$files = array_diff(scandir('batches/'), ['.', '..']);
if (count($files) > 0) {
$file = 'batches/' . $files[0];
// PROCESS FILE
unlink($file);
}
I have a text file (which essentially is a csv without the extension) that has 150,000 lines in it. I need to remove duplicates by key then insert them into the database. I'm attempting fgetcvs to read it line by line, but I don't want to do 150,000 queries. So this is what I came up with so far: (keep in mind i'm using laravel)
$count = 0;
$insert = [];
if (($handle = fopen("myHUGEfile.txt", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$count++;
//See if this is the top row, which in this case are column headers
if ($count == 1) continue;
//Get the parts needed for the new part
$quantity = $data[0];
$part_number = $data[1];
$manufacturer = $data[2];
$new_part = [
'manufacturer' => $manufacturer,
'part_number' => $part_number,
'stock' => $quantity,
'price' => '[]',
'approved' => 0,
];
$insert[] = $new_part;
}
fclose($handle);
} else {
throw new Exception('Could not open file for reading.');
}
//Remove duplicates
$newRows = [];
$parsedCount = 0;
foreach ($insert as $row) {
$x = 0;
foreach ($newRows as $n) {
if (strtoupper($row['part_number']) === strtoupper($n['part_number'])) {
$x++;
}
}
if ($x == 0) {
$parsedCount++;
$newRows[] = $row;
}
}
$parsed_rows = array_chunk($newRows, 1000, true);
$x = 0;
foreach ($parsed_rows as $chunk) {
//Insert
if (count($chunk) > 0)
if (DB::table('search_parts')->insert($chunk))
$x++;
}
echo $x . " chunks inserted.<br/>" . $count . " parts started with<br/>" . $parsedCount . " rows after duplicates removed.";
But it's very clunky, I have only tested it with a little over 1000 rows and it works using localhost. But i'm afraid if I push it up to production it won't be able to handle all 150,000 rows. The file is about 4mb.
Can someone show me a better more efficient way to do this?
Right now, you're keeping the first duplicate record. If you're ok keeping the last dupe, you can just change
$insert[] = $new_part;
to
$insert[strtoupper($part_number)] = $new_part
That way, your $insert array will only have one value for each $part_number. Your inserts will be a little slower, but you can drop all of the code which checks for duplicates which looks very, very slow.
4Mb is not remotely a "huge" file. I'd just read the whole thing into an assoc array keyed by part number, which will inherently de-dupe, giving you the last row whenever a duplicate is encountered. Something like this maybe:
$parts = [];
foreach (explode("\n", file_get_contents('file')) as $line) {
$part = str_getcsv($line);
$parts[$part[1]] = [
'manufacturer' => $part[2],
'part_number' => $part[1],
'stock' => $part[0],
'price' => '[]',
'approved' => 0,
];
}
// $parts now contains unique part list
foreach ($parts as $part) {
$db->insert($part);
}
If you don't want duplicates on a certain or multiple keys, you can make it easy on yourself and just add a UNIQUE INDEX on the key you don't want duplicates for on the table.
This way, all you have to worry about is processing the file. When it reaches a duplicate key, it will not be able to insert it and will continue.
It would also make it easier in the future because you wouldn't have to modify your code if you need to do checks on additional columns. Just modify the index.
I'm using simple_html_dom_helper so do some screen scraping and am encountering some errors.
The second foreach runs 4 times (since sizeof($pages) == 4), while it should only run once. I got this code from an example script where table.result-liste occurs several times on the page. In my case it only occurs once, so imho there is no need for a foreach. The print_r($data) prints out the same thing 4 times and there's no need for that.
Further down I'm trying to do the same without the foreach but it just prints out no, so there seems to a different response and am not sure why.
foreach( $pages as $page )
{
$p = $this->create_url($codes[0], $price, $page); //pass page number along
$p_html = file_get_html($p);
$row = $p_html->find("table[class=result-liste] tr");
//RUNS OK BUT NO NEED TO DO IT FOUR TIMES.
//CLASS RESULT-LISTE ONLY OCCURS ONCE ANYWAY
foreach( $p_html->find("table[class=result-liste] tr") as $row)
{
//grab only rows where there is a link
if( $row->find('td a') )
{
$d_price = $this->get_price($row->first_child());
$d_propid = $this->get_prop_id($row->outertext);
$data = array(
"price" => $d_price,
"prop_id" => $d_propid
);
print_r($data);
}
}
//MY ATTEMPT TO AVOID THE SECOND FOREACH DOES NOT WORK ...
$row = $p_html->find("table[class=result-liste] tr");
if( is_object($row) && $row->find('td a')) print "yes ";
else print "no ";
}
Even though the table[class=result-liste] only occurs once on your page, this find statement is looking for the <tr> elements that are the table's rows. So unless your table has only one row, you will need this foreach.
$p_html->find("table[class=result-liste] tr")
Your code
foreach( $p_html->find("table[class=result-liste] tr") as $row)
{
//grab only rows where there is a link
if( $row->find('td a') )
{
$d_price = $this->get_price($row->first_child());
$d_propid = $this->get_prop_id($row->outertext);
$data = array(
"price" => $d_price,
"prop_id" => $d_propid
);
print_r($data);
}
}
Replace above code by MY code
$asRow = $p_html->find("table[class=result-liste] tr");
$row = $asRow[0];
//grab only rows where there is a link
if( $row->find('td a') )
{
$d_price = $this->get_price($row->first_child());
$d_propid = $this->get_prop_id($row->outertext);
$data = array(
"price" => $d_price,
"prop_id" => $d_propid
);
print_r($data);
}
Try with this.