Insert loop from CSV bug using PHP PDO - php

I'm tryin to insert datas (160,000+ rows) using INSERT INTO and PHP PDO but i have a bug.
When I launch the PHP script, i see more than the exact number of lines in my CSV inserted in my database.
Can someone say me if my loop is not correct or something ?
Here the code I have :
$bdd = new PDO('mysql:host=<myhost>;dbname=<mydb>', '<user>', '<pswd>');
// I clean the table
$req = $bdd->prepare("TRUNCATE TABLE lbppan_ticket_reglements;");
$req->execute();
// I read and import line by line the CSV file
$handle = fopen('<pathToMyCsvFile>', "r");
while (($data = fgetcsv($handle, 0, ',')) !== FALSE) {
$reqImport =
"INSERT INTO lbppan_ticket_reglements
(<my31Columns>)
VALUES
('$data[0]','$data[1]','$data[2]','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]','$data[8]',
'$data[9]','$data[10]','$data[11]','$data[12]','$data[13]','$data[14]','$data[15]','$data[16]',
'$data[17]','$data[18]','$data[19]','$data[20]','$data[21]','$data[22]','$data[23]','$data[24]',
'$data[25]','$data[26]','$data[27]','$data[28]','$data[29]','$data[30]')";
$req = $bdd->prepare($reqImport);
$req->execute();
}
fclose($handle);
The script works a little because datas are in the table but i dunno why it bugs and inserts more datas. I think maybe, due to the file size (18 Mo) maybe the script crash and attempts to relaunch inserting same rows again.
I can't use LOAD DATA on the server I'm using.
Thanks for your help.

This is not an answer but adding this much into comments is quite tricky.
Start by upping the maximum execution time
If that does not solve your issue, start working your way through the code line by line and handle every exception you can think of. For example, you are truncating the table BUT you say you have loads more data after execution, could the truncate be failing?
try {
$req = $bdd->prepare("TRUNCATE TABLE lbppan_ticket_reglements;");
$req->execute();
} catch (\Exception $e) {
exit($e->getMessage()); // Die immediately for ease of reading
}
Not the most graceful of try/catches but it will allow you to easily spot a problem. You can also apply this to the proceeding query...
try {
$req = $bdd->prepare($reqImport);
$req->execute();
} catch (\Exception $e) {
exit($e->getMessage());
}
and also stick in some diagnostics, are you inserting 160k rows? You could optionally echo out $i on each loop and see if you can spot any breaks or abnormalities.
$i = 0;
while (($data = fgetcsv($handle, 0, ',')) !== FALSE) {
// ... your stuff
$i++;
}
echo "Rows inserted " . $i . "\n\n";
Going beyond that you can the loop print out the SQL content for you to look at manually, perhaps its doing something weird and fruity.
Hope that helps.

Assuming $data[0] is the unique identifier then you can try this to spot the offending row(s):
$i = 0;
while (($data = fgetcsv($handle, 0, ',')) !== FALSE) {
echo 'Row #'.++$i.' - '.$data[0];
}
Since you are not using prepared statements, it is very possible that one of the $data array items are causing a double-insert or some other unknown issue.

Related

Use of Complex (curly) Syntax in external file

I am having a problem with getting an sql query to interpolate as I would want, and would be grateful for some help please.
Within the manual page for pg_query_params,there is a code example for pg_query() passing a variable using curly braces. This appeared to be exactly what I need for my task. So, my code is as follows:
$fh = fopen('/home/www/KPI-Summary.sql',"r")
or die("Problem opening SQL file.\n");
$dbh = pg_connect("$connect")
or die('Could not connect: ' . pg_last_error());
$j = 0;
while (($line = fgets($fh)) !== false) {
$tmp[$j] = array(); // Initialise temporary storage.
$result = pg_query($dbh, $line); // process the line read.
if (!$result) { echo "Error: query did not execute"; }
...
while ($row = pg_fetch_row($result)) { // Read sql result.
$tmp[$j][2][] = $row;
}
$j++;
}
fclose($fh);
The sql file contains several queries, one per line, like this:
SELECT count(*) from table WHERE value=0 AND msgid='{$arg[1]}';
However, currently, my variable is not being replaced by the contents -- and therefore although the query runs OK, it is returning zero rows. What do I need to do in order to get the expected result? (Note: each sql line varies, and the query parameters are not constant -- hence using variable(s) within the sql file.)
OK. I have a solution (although it might not be the correct approach).
This works -- but it needs polish I think. Suggestions regarding a better regexp would be very much appreciated.
$bar = 'VALUE-A'; // Can we replace simple variable names?
$arg[1] = 'VALUE-B'; // What about an array, such as $arg[1]?
function interpolate($matches){
global $bar;
global $arg;
if ($matches[2]) {
$i = isset(${$matches[1]}[$matches[2]]) ? ${$matches[1]}[$matches[2]] : 'UNDEF';
} else {
$i = isset(${$matches[1]}) ? ${$matches[1]} : 'UNDEF';
}
return $i;
}
$fh = fopen('/home/www/file.sql',"r") or die("Failed.\n");
while (($line = fgets($fh)) !== false) {
...
$line = preg_replace_callback('|\{\$([a-z]+)\[*(\d*)\]*}|i', "interpolate", $line);
echo $line; // and continue with rest of code as above.
}
fclose($fh);
(Of course, the solution suggests that the question title is completely wrong. Is there any way to edit this?)
did you use pg_escape_string?
$arg[1] = pg_escape_string($arg[1]);
$line="SELECT count(*) from table WHERE value=0 AND msgid='{$arg[1]}';";

How can I import csv file to MySQL database more efficiently with PHP?

I explain, I have a Symfony2 project and I need to import users via csv file in my database. I have to do some work on the datas before importing it in MySQL. I created a service for this and everything is working fine but it takes too much time to execute and slow my server if I give it my entire file. My files have usually between 500 and 1500 rows and I have to split my file in ~200 rows files and import one by one.
I need to handle related users that can be both in the file and/or in database already. Related users are usually a parent of a child.
Here is my simplified code :
$validator = $this->validator;
$members = array();
$children = array();
$mails = array();
$handle = fopen($filePath, 'r');
$datas = fgetcsv($handle, 0, ";");
while (($datas = fgetcsv($handle, 0, ";")) !== false) {
$user = new User();
//If there is a related user
if($datas[18] != ""){
$user->setRelatedMemberEmail($datas[18]);
$relation = array_search(ucfirst(strtolower($datas[19])), UserComiti::$RELATIONSHIPS);
if($relation !== false)
$user->setParentRelationship($relation);
}
else {
$user->setRelatedMemberEmail($datas[0]);
$user->addRole ( "ROLE_MEMBER" );
}
$user->setEmail($mail);
$user->setLastName($lastName);
$user->setFirstName($firstName);
$user->setGender($gender);
$user->setBirthdate($birthdate);
$user->setCity($city);
$user->setPostalCode($zipCode);
$user->setAddressLine1($adressLine1);
$user->setAddressLine2($adressLine2);
$user->setCountry($country);
$user->setNationality($nationality);
$user->setPhoneNumber($phone);
//Entity Validation
$listErrors = $validator->validate($user);
//In case of errors
if(count($listErrors) > 0) {
foreach($listErrors as $error){
$nbError++;
$errors .= "Line " . $line . " : " . $error->getMessage() . "\n";
}
}
else {
if($mailParent != null)
$children[] = $user;
else{
$members[] = $user;
$nbAdded++;
}
}
foreach($members as $user){
$this->em->persist($user);
$this->em->flush();
}
foreach($children as $child){
//If the related user is already in DB
$parent = $this->userRepo->findOneBy(array('username' => $child->getRelatedMemberEmail(), 'club' => $this->club));
if ($parent !== false){
//Check if someone related to related user already has the same last name and first name. If it is the case we can guess that this user is already created
$testName = $this->userRepo->findByParentAndName($child->getFirstName(), $child->getLastName(), $parent, $this->club);
if(!$testName){
$child->setParent($parent);
$this->em->persist($child);
$nbAdded++;
}
else
$nbSkipped++;
}
//Else in case the related user is neither file nor in database we create a fake one that will be able to update his profile later.
else{
$newParent = clone $child;
$newParent->setUsername($child->getRelatedMemberEmail());
$newParent->setEmail($child->getRelatedMemberEmail());
$newParent->setFirstName('Unknown');
$this->em->persist($newParent);
$child->setParent($newParent);
$this->em->persist($child);
$nbAdded += 2;
$this->em->flush();
}
}
}
It's not my whole service because I don't think the remaining would help here but if you need more information ask me.
While I do not heave the means to quantitatively determine the bottlenecks in your program, I can suggest a couple of guidelines that will likely significantly increase its performance.
Minimize the number of database commits you are making. A lot happens when you write to the database. Is it possible to commit only once at the end?
Minimize the number of database reads you are making. Similar to the previous point, a lot happens when you read from the database.
If after considering the above points you still have issues, determine what SQL the ORM is actually generating and executing. ORMs work great until efficiency becomes a problem and more care needs to go into ensuring optimal queries are being generated. At this point, becoming more familiar with the ORM and SQL would be beneficial.
You don't seem to be working with too much data, but if you were, MySQL alone supports reading CSV files.
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
You may be able to access this MySQL specific feature through your ORM, but if not, you would need to write some plain SQL to utilize it. Since you need to modify the data you are reading from the CSV, you would likely be able to do this very, very quickly by following these steps:
Use LOAD DATA INFILE to read the CSV into a temporary table.
Manipulate the data in the temporary table and other tables as required.
SELECT the data from the temporary table into your destination table.
I know that it is very old topic, but some time ago I created a bundle, which can help import entities from csv to database. So maybe if someone will see this topic, it will be helpful for him.
https://github.com/jgrygierek/BatchEntityImportBundle
https://github.com/jgrygierek/SonataBatchEntityImportBundle

phpactiverecord ORM transactions 30 sec error executing time

Well I'm trying to insert several rows from an csv file to a myslDB, my first attempt (wrong approach) was to trying to insert creating a new object with $o = new Model();
After read/research on the web I saw that what I need is to use transaction, Right now im using phpactiverectord ORM and this is my code:
But still having the 30 sec fatal error
try{
if (($handle = fopen("somefile.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$someid = $data[4];
Usuario::transaction(function() use ($someid){
Usuario::create(array("matricula" => $someid));
});
}
fclose($handle);
}
}
I think im coding in a wrong way the transaction but I don't realize how to do it. Need some help. Actually the insert is working wha I need is to insert all before the 30 sec error happend, my database is on godaddy btw.
thanks
EDIT -
This was solve with the set_time_limit function was not a transaction problem. Maybe this question can work for other person I will leave it.

bulk data insert sql issue

i am importing a huge data from a csv file into my database, but the issue is , if there is a error in the sql , my insertion stops , thus making my bulk insertion useless.i have to go back , delete the uploaded data and remove that entry which is causing issue in my insertion and start again . i want programme to skip it and continue insertion, i know i have to apply try and catch, i have applied it in my algo but i cnt understand how to use so that it continues its insertion .
here is my code
$num=35; //number of columns
$dum=true; // a check
$sum=0;// count total entries
if (($handle = fopen($_FILES['file']['name'], 'r')) !== FALSE) {
while (($data = fgetcsv($handle, 10000, ',')) !== FALSE)
{
if($dum)
{
for ($qwe=0; $qwe < $num; $qwe++) { //searching the columns exact position in case they have been changed
if($data[$qwe]=='ID')
{$a=$qwe;}
.
.
.
.
.else if($data[$qwe]=='PowerMeterSerial')
{$aa=$qwe;}
else if($data[$qwe]=='Region')
{$ab=$qwe;}
else if($data[$qwe]=='Questions')
{$ac=$qwe;}
}
}
if(!$dum)
{
for($qwe=0;$qwe<$num;$qwe++)
{
if($qwe==7||$qwe==8)
{
if($qwe==7){$asd=$data[$qwe];}
$data[$qwe]=date('Y-m-d h-i-s',strtotime($data[$qwe]));
}
}
$data[57]=date('Y-m-d ',strtotime($asd));try {
$sql="INSERT INTO pm (ID, .......... Questions, dateofdata ,Unsuccessful) VALUES ('$data[$a]','$data[$b]','$data[$c]','$data[$d]','$data[$e]','$data[$f]','$data[$g]','$data[$h]' ,'$data[$i]','$data[$j]','$data[$k]', '$data[$l]', '$data[$m]','$data[$n]','$data[$o]','$data[$p]','$data[$q]','$data[$r]','$data[$s]','$data[$t]','$data[$u]','$data[$v]','$data[$w]', '$data[$x]','$data[$y]','$data[$z]','$data[$aa]','$data[$ab]','$data[$ac]','$data[57]','$data[30]')";
if (!mysql_query($sql,$con))
{
die('Error: ' . mysql_error(). $sql);
}
}
catch (Exception $e)
{
echo'<br>'; echo $sql;
}
$sum++;
}$dum=false;
}
}
?>
kindly note there is no issue in uploading algorithme or sql , its when input data does not match the data type than sql generates a error , for that i am trying try and catch .. please help
Change your die() command to a print(). You'll see what the error was, and the script will move on to the next line.
Given the structure of your code, I'm guessing it'll blow up anytime you're inserting a string (particularly with quotes inside it), causing SQL syntax errors. You MUST pass each text field from your csv through mysql_real_escape_string() BEFORE you insert those values into the query string.

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

Categories