Large number of SQLite inserts using PHP - php

I have about 14000 rows of comma separated values that I am trying to insert into a sqlite table using PHP PDO, like so:
<?php
// create a PDO object
$dbh = new PDO('sqlite:mydb.sdb');
$lines = file('/csv/file.txt'); // import lines as array
foreach ($lines as $line) {
$line_array = (','$line); // create an array of comma-separated values in each line
$values = '';
foreach ($line_array as $l) {
$values .= "'$l', ";
}
substr($values,-2,0); // get rid of the last comma and whitespace
$query = "insert into sqlite_table values ($values)"; // plug the value into a query statement
$dbh->query($query); // run the query
}
?>
This query takes a long time, and to run it without interuption, I would have to use PHP-CLI.
Is there a better (faster) way to do this?

You will see a good performance gain by wrapping your inserts in a single transaction. If you don't do this SQLite treats each insert as its own transaction.
<?php
// create a PDO object
$dbh = new PDO('sqlite:mydb.sdb');
// Start transaction
$dbh->beginTransaction();
$lines = file('/csv/file.txt'); // import lines as array
foreach ($lines as $line) {
$line_array = (','$line); // create an array of comma-separated values in each line
$values = '';
foreach ($line_array as $l) {
$values .= "'$l', ";
}
substr($values,-2,0); // get rid of the last comma and whitespace
$query = "insert into sqlite_table values ($values)"; // plug the value into a query statement
$dbh->query($query); // run the query
}
// commit transaction
$dbh->commit();
?>

Start a transaction before the loop and commit it after the loop
the way your code is working now, it starts a transaction on every insert

If you're looking for a bit more speed, use prepare/fetch, so the SQL engine doesn't have to parse out the text string each time.
$name = $age = '';
$insert_stmt = $db->prepare("insert into table (name, age) values (:name, :age)");
$insert_stmt->bindValue(':name', $name);
$insert_stmt->bindValue(':age', $age);
// do your loop here, like fgetcsv
while (get the data) {
list($name, $age) = split(',', $string);
$insert_stmt->execute();
}
It's counter-intuitive that you do the binding outside the loop, but this is one reason why this method is so fast, you're basically saying "Execute this pre-compiled query using data from these variables". So it doesn't even need to move the data around internally. And you want to avoid re-parsing the query, which is the problem if you use something like "insert into table (name) values ('$name')", every query sends the entire text string to the database to be re-parsed.
One more thing to speed it up -- wrap the whole loop in a transaction, then commit the transaction when the loop is finished.

From SQLlite FAQ :
Transaction speed is limited by disk drive speed because (by default)
SQLite actually waits until the data
really is safely stored on the disk
surface before the transaction is
complete. That way, if you suddenly
lose power or if your OS crashes, your
data is still safe. For details, read
about atomic commit in SQLite..
[...]
Another option is to run PRAGMA synchronous=OFF. This command will
cause SQLite to not wait on data to
reach the disk surface, which will
make write operations appear to be
much faster. But if you lose power in
the middle of a transaction, your
database file might go corrupt.
I'd say this last paragraph is what you need.
EDIT: No sure about this, but I believe using sqlite_unbuffered_query() should do the trick.

Related

Table only populating with one value from array

I have an array from which I would like to populate table records from, unfortunately it will only populate the 1st record of the array. I anticipate I have my increments declared incorrectly, but cannot find a combination that will work. In addition I would like '$mtcelogID = $siteNAME.'.'.$Maindate.'.'.$i;' to have the last part of the ID to increment
$mtcelogARRAY = $objPHPExcel->setActiveSheetIndex(2)->rangeToArray('A8:A18');
$num_mtcelog = count($mtcelogARRAY); // Here get total count of row in that Excel sheet
for( $i=0; $i<=$num_mtcelog; $i++ ){
$sql_mtcelog = "INSERT INTO `maintenance_log`(`mtcelogID`,`mtcelogTYPE`,`MaintenanceID`) VALUES (?,?,?)";
$query_mtcelogARRAY = mysqli_prepare($link, $sql_mtcelog);
$mtcelogID = $siteNAME.'.'.$Maindate.'.'.$i;
mysqli_stmt_bind_param($query_mtcelogARRAY,"sss", $mtcelogID, $mtcelogARRAY[$i][0], $MaintenanceID);
mysqli_stmt_execute($query_mtcelogARRAY);
mysqli_stmt_close($query_mtcelogARRAY);
}
The above code returns this in my PHP table:
And my array looks like this:
Thanks in advance
I know you're using mysqli, but I'm going to leave this PDO answer here. If this code is a small maintenance script, there shouldn't be any trouble dumping mysqli. Notice no binding is necessary, you just pass the values as an array to PDOStatement::execute(). No worrying about how many s and i you have. Also, foreach is a much more flexible and less verbose construct than for.
$pdo = new PDO("mysql:host=localhost;dbname=mydatabase", $username, $password);
$mtcelogARRAY = $objPHPExcel->setActiveSheetIndex(2)->rangeToArray('A8:A18');
$sql_mtcelog = "INSERT INTO `maintenance_log`(`mtcelogID`,`mtcelogTYPE`,`MaintenanceID`) VALUES (?,?,?)";
$stmt = $pdo->prepare($sql_mtcelog);
foreach ($mtcelogARRAY as $i=>$arr) {
$params = ["$siteNAME.$Maindate.$i", $arr[0], $MaintenanceID];
$stmt->execute($params);
}
The important thing is to prepare your statement outside the loop. One of the main goals of prepared statements is to reduce overhead; by preparing the statement repeatedly you are increasing overhead.

How to optimize the times of execute when to insert data with pdo?

There is a huge two dimensional array which contain 500k sub one dimension arrays, every sub array contain 5 elements.
Now it is my job to insert all the data into a sqlite database.
function insert_data($array){
Global $db;
$dbh=new PDO("sqlite:{$db}");
$sql = "INSERT INTO quote (f1,f2,f3,f4,f5) VALUES (?,?,?,?,?)";
$query = $dbh->prepare($sql);
foreach($array as $item){
$query->execute(array_values($item));
}
$dbh=null;
}
I want to optimize the data insert process that the execute action will be executed for 500k times,how to make it executed just one time?
The idea is to prevent running transactions for each insert, because it will be very slow indeed. So just start and commit the transaction, say for every 10k records.
$dbh->beginTransaction();
$counter = 0;
foreach($array as $item) {
$query->execute(array_values($item));
if ($counter++ % 10000 == 0) {
$dbh->commit();
$dbh->beginTransaction();
}
}
$dbh->commit();
Another solution, you can move an array in a csv file and then just import it.
If you are using a newer version of Sqlite (3.7.11+) then it supports batch inserts:
INSERT INTO quote (f1,f2,f3,f4,f5) VALUES
(?,?,?,?,?),
(?,?,?,?,?),
(?,?,?,?,?);
You can use this to chunk your array into groups, and do batch inserts this way.
As pointed out by Axalix you should also wrap the whole operation in a transaction.

Seemingly identical sql queries in php, but one inserts an extra row

I generate the below query in two ways, but use the same function to insert into the database:
INSERT INTO person VALUES('','john', 'smith','new york', 'NY', '123456');
The below method results in CORRECT inserts, with no extra blank row in the sql database
foreach($_POST as $item)
$statement .= "'$item', ";
$size = count($statement);
$statement = substr($statement, 0, $size-3);
$statement .= ");";
The code below should be generating an identical query to the one above (they echo identically), but when I use it, an extra blank row (with an id) is inserted into the database, after the correct row with data. so two rows are inserted each time.
$mytest = "INSERT INTO person VALUES('','$_POST[name]', '$_POST[address]','$_POST[city]', '$_POST[state]', '$_POST[zip]');";
Because I need to run validations on posted items from the form, and need to do some manipulations before storing it into the database, I need to be able to use the second query method.
I can't understand how the two could be different. I'm using the exact same functions to connect and insert into the database, so the problem can't be there.
below is my insert function for reference:
function do_insertion($query) {
$db = get_db_connection();
if(!($result = mysqli_query($db, $query))) {
#die('SQL ERROR: '. mysqli_error($db));
write_error_page(mysqli_error($db));
} #end if
}
Thank you for any insite/help on this.
Using your $_POST directly in your query is opening you up to a lot of bad things, it's just bad practice. You should at least do something to clean your data before going to your database.
The $_POST variable often times can contain additional values depending on the browser, form submit. Have you tried doing a null/empty check in your foreach?
!~ Pseudo Code DO NOT USE IN PRODUCTION ~!
foreach($_POST as $item)
{
if(isset($item) && $item != "")
{
$statement .= "'$item', ";
$size = count($statement);
$statement = substr($statement, 0, $size-3);
$statement .= ");";
}
}
Please read #tadman's comment about using bind_param and protecting yourself against SQL injection. For the sake of answering your question it's likely your $_POST contains empty data that is being put into your query and resulting in the added row.
as #yycdev stated, you are in risk of SQL injection. Start by reading this and rewrite your code by proper use of protecting your database. SQL injection is not fun and will produce many bugs.

Mysql insert using array

I have an array stored in a variable $contactid. I need to run this query to insert a row for each contact_id in the array. What is the best way to do this? Here is the query I need to run...
$contactid=$_POST['contact_id'];
$eventid=$_POST['event_id'];
$groupid=$_POST['group_id'];
mysql_query($query);
$query="INSERT INTO attendance (event_id,contact_id,group_id) VALUES ('$eventid','$contactid','$groupid')";
Use a foreach loop.
$query = "INSERT INTO attendance (event_id,contact_id,group_id) VALUES ";
foreach($contactid as $value)
{
$query .= "('{$eventid}','{$value}','{$groupid}'),";
}
mysql_query(substr($query, 0, -1));
The idea here is to concatenate your query string and only make 1 query to the database, each value-set is separated by a comma
Since no one hasn't stated that yet, you actually cannot do this:
$query = '
INSERT INTO [Table] ([Column List])
VALUES ([Value List 1]);
INSERT INTO [Table] ([Column List])
VALUES ([Value List 2]);
';
mysql_query($query);
as this has been prevented to prevent sql injections in the mysql_query code. You cannot have semicolon within the given query param with mysql_query. With the following exception, taken from the manual comments:
The documentation claims that "multiple queries are not supported".
However, multiple queries seem to be supported. You just have to pass
flag 65536 as mysql_connect's 5 parameter (client_flags). This value
is defined in /usr/include/mysql/mysql_com.h:
#define CLIENT_MULTI_STATEMENTS (1UL << 16) /* Enable/disable multi-stmt support */
Executed with multiple queries at once, the mysql_query function will
return a result only for the first query. The other queries will be
executed as well, but you won't have a result for them.
That is undocumented and unsupported behaviour, however, and easily opens your code to SQL injections. What you can do with mysql_query, instead, is
$query = '
INSERT INTO [Table] ([Column List])
VALUES ([Value List 1])
, ([Value List 2])
[...]
, ([Value List N])
';
mysql_query($query);
so you can actually insert multiple rows with a one query, and with one insert statement. In this answer there's a code example for it which doesn't concatenate to a string in a loop, which is better than what's suggested in this thread.
However, disregarding all the above, you're probably better of still to use a prepared statement, like
$stmt->prepare("INSERT INTO mytbl (fld1, fld2, fld3, fld4) VALUES(?, ?, ?, ?)");
foreach($myarray as $row)
{
$stmt->bind_param('idsb', $row['fld1'], $row['fld2'], $row['fld3'], $row['fld4']);
$stmt->execute();
}
$stmt->close();
Use something like the following. Please note that you shouldn't be using mysql_* functions anymore, and that your code is suseptible to injection.
for ($i = 0; $i < count($contactid); $i++) {
$query="INSERT INTO attendance (event_id,contact_id,group_id) VALUES ('$eventid','$contactid[$i]','$groupid')";
mysql_query($query);
}
I'm not sure running multiple queries is the best thing to do, so won't recommend making a for loop for example, that runs for each element of the array. I would rather say, make a recursive loop, that adds the new elements to a string, that then gets passed to the query. In case you can give us a short example of your DB structure and how you'd like it to look like (i.e. how the array should go into the table), I could give you an example loop syntax.
Cheers!
What about:
$contactIds = $_POST['contact_id'];
$eventIds = $_POST['event_id'];
$groupIds = $_POST['group_id'];
foreach($contactIds as $key => $value)
{
$currentContactId = $value;
$currentEventId = $eventIds[$key];
$currentGroupId = $groupIds[$key];
$query="INSERT INTO attendance (event_id,contact_id,group_id) VALUES ('$currentEventId','$currentContactId','$currentGroupId')";
mysql_query($query);
}
Well, you could refactor that to insert everything in a single query, but you got the idea.

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

Categories