I have a large textfile with rows of data that need to be imported in the database. But the file contains like 300000 rows and I can't get it to work, because the query seems too big.
$all_inserts = array();
$count = 0;
foreach($file as $line) {
if($count > 0) {
$modelnummer = trim(substr($line, 0, 4));
$datum = trim(substr($line, 4, 8));
$acc_nummer = trim(substr($line, 12, 4));
$acc_volgnr = trim(substr($line, 16, 1));
$prijs = trim(substr($line, 17,5));
$mutatiecode = trim(substr($line, 22,1));
$all_inserts[] = array($modelnummer, $datum, $acc_nummer, $acc_volgnr, $prijs, $this->quote($mutatiecode));
}
$count++;
}
$query = 'INSERT INTO accessoire_model_brommer (modelnummer, datum, acc_nummer, acc_volgnr, prijs, mutatiecode) VALUES ';
$rows = array();
foreach($all_inserts as $one_insert) {
$rows[] = '(' . implode(',', $one_insert) . ')';
}
$query .= ' ' . implode(',', $rows);
$db->query($query);
I used above code for smaller files and it works fine and fast. But it doesn't work for the bigger files. Does someone know a better way to read and insert this file?
Also tried to use an insert statement per row within a transaction, but it doesn't work either.
Note: I don't know the exact limitations of PDO or SQL for query length
If the query seems too big, perhaps you could splice up the textfile, so instead of running 1 query with 300k values, run 100 queries with 3000 values i.e.?
Perhaps you could create a buffer, fill it with 3000 values and run the query. Empty the buffer, fill it with the next 3000 values and run the query again.
Related
Okay, so I get around 100k-1M lines of text that I always import to a database. The code that i use is as follows:
$lines = new SplFileObject('/home/file.txt');
while(!$lines->eof()) {
$lines->next(); //Skipping first line
$row = explode(',',$lines);
for($i = 0; $i<4; $i++){
if(!isset($row[$i])){
$row[$i] = null;
}
}
$y = (float) $row[1];
$z = (float) $row[2];
$load_query = "INSERT IGNORE INTO new (datetime_gmt,field2,field3)
VALUES ('".$row[0]."','".$y."','".$z."');";
if(!$mysqli->query($load_query)){
die("CANNOT EXECUTE".$mysqli->error."\n");
}
}
$lines = null;
However, it takes waaayyy too long. Is there any faster way to do it, or am I stuck with this method?
PS. I don't want to use MySQL's "INSERT DATA INFILE".
As written, you're running an insert statement for every line. It'll be much faster if you compile a single multi-insert statement in the format of INSERT INTO table (foo, bar) VALUES (1, 2), (3, 4), (5, 6); that is executed once at the end. Something along the lines of this, though it could be cleaned up more.
$lines = new SplFileObject('/home/file.txt');
$load_query = "INSERT IGNORE INTO new (datetime_gmt,field2,field3)
VALUES ";
while(!$lines->eof()) {
$lines->next(); //Skipping first line
$row = explode(',',$lines);
for($i = 0; $i<4; $i++){
if(!isset($row[$i])){
$row[$i] = null;
}
}
$y = (float) $row[1];
$z = (float) $row[2];
$load_query .= "('".$row[0]."','".$y."','".$z."'),";
}
if(!$mysqli->query(rtrim($load_query, ','))) {
die("CANNOT EXECUTE".$mysqli->error."\n");
}
$lines = null;
Also keep make sure the data is trusted. If the file can come from an outside user, appending directly to the query string creates an SQL injection vector.
I've got a script that I needed to change since the data which is going to be inserted into the db got too big to do it at once. So I created a loop, that splits up the array in blocks of 6000 rows and then inserts it.
I don't know exactly if the data is to big for the server to process at once or if it's too big to upload, but atm I got both steps split up in these 6000s blocks.
Code:
for ($j = 0; $j <= ceil($alength / 6000); $j++){
$array = array_slice($arraysource, $j * 6000, 5999);
$sql = "INSERT INTO Ranking (rank, name, score, kd, wins, kills, deaths, shots, time, spree) VALUES ";
foreach($array as $a=>$value){
//transforming code for array
$ra = $array[$a][0];
$na = str_replace(",", ",", $array[$a][1]);
$na = str_replace("\", "\\\\", $na);
$na = str_replace("'", "\'", $na);
$sc = $array[$a][2];
$kd = $array[$a][3];
$wi = $array[$a][4];
$ki = $array[$a][5];
$de = $array[$a][6];
$sh = $array[$a][7];
$ti = $array[$a][8];
$sp = $array[$a][9];
$sql .= "('$ra',' $na ','$sc','$kd','$wi','$ki','$de','$sh','$ti','$sp'),";
}
$sql = substr($sql, 0, -1);
$conn->query($sql);
}
$conn->close();
Right now it only inserts the first 5999 rows, but not more as if it only executed the loop once. No error messages..
Don't know if this'll necessarily help, but what about using array_chunk, array_walk, and checking error codes (if any)? Something like this:
function create_query(&$value, $key) {
//returns query statements; destructive though.
$value[1] = str_replace(",", ",", $value[1]);
$value[1] = str_replace("\", "\\\\", $value[1]);
$value[1] = str_replace("'", "\'", $value[1]);
$queryvalues = implode("','",$value);
$value = "INSERT INTO Ranking (rank, name, score, kd, wins, kills, deaths, shots, time, spree) VALUES ('".$queryvalues."');";
}
$array = array_chunk($arraysource, 6000);
foreach($array as $key=>$value){
array_walk($value,'create_query');
if (!$conn->query($value)) {
printf("Errorcode: %d\n", $conn->errno);
}
}
$conn->close();
Secondly, have you considered using mysqli::multi_query? It'll do more queries at once, but you'll have to check the max allowed packet size (max_allowed_packet).
Another tip would be to check out the response from the query, which your code doesn't include.
Thanks for the tips but I figured it out. Didn't think about this ^^
it was the first line after the for loop that i didnt include in my question:
array_unshift($array[$a], $a + 1);
this adds an additional value infront of each user, the "rank". But the numbers would repeat after one loop finishes and it can't import users with the same rank.
now it works:
array_unshift($array[$a], $a + 1 + $j * 5999);
I have a file like this:
FG 09097612 DN 6575 HL 879797
BHC 09097613 DN 6576 HL 879798
FG 09097614 DN 6577 IOPPP 879799
FG 09097614 DN 6577 IOPPP 879800
with its logic that never changes line by line, it is always the same logic.
I would create an array taking the first 2 characters as a variable "nation", then the first 8 characters as a var "prize", then 2 other characters as "player" and so on and create a record in the database for each line.
I am using this code (THE CODE IN TEH EDIT ABOVE IS CHANGED), but not being a csv with delimitation with comma or tab I don't know how to do.
ini_set("auto_detect_line_endings", 1);
$current_row = 1;
$handle = fopen("upload/import.txt", "r");
while ( ($csv_data = fgetcsv($handle, 10000, "\t") ) !== FALSE )
{
$number_of_fields = count($csv_data);
if ($current_row == 1) {
}
else {
}
fclose($handle);
}
I want to put theese var in a record of database, each var in each column.
What do you recommend?
Obviously I can not change the original txt file.
HOW TO SAVE IN DATABASE?
If I use this code (from one answer above):
$lines = file("upload/import.txt");
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$player = trim(substr($line, 12, 2)); // get 13th-14th characters as player and remove spaces
I use this code:
$dbhandle = odbc_connect("Driver={SQL Server Native Client 11.0};
Server=$myServer;Database=$myDB;", $myUser, $myPass)
or die("Couldn't connect to SQL Server on $myServer");
$query = "INSERT INTO TEST (nation) VALUES ('".$nation."')";
echo "<br>Inserted: ".$nation."<br>";
$result = odbc_exec($dbhandle, $query);
But it seems to me that this code is too heavy to be done in a foreach? Is not it?
1.In this situation, delimiter changes because of the formatting so I suggest you treat it as a string but not a csv.
FG 09097612 DN 6575 HL 879797
BHC 09097613 DN 6576 HL 879798
FG 09097614 DN 6577 IOPPP 879799
FG 09097614 DN 6577 IOPPP 879800
$lines = file("upload/import.txt");
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$player = trim(substr($line, 12, 2)); // get 13th-14th characters as player and remove spaces
}
2.Or if you insist on using csv parser, you should make consecutive spaces into one before you actually use fgetscsv:
$tempfile = "tmp/temp".microtime()."csv"; // a temp folder where you have write authority, `microtime()` here is used to generate a unique filename.
$content = file_get_contents("upload/import.txt");
while(strpos($content," ") !== false) {
// while consecutive spaces exist
$content = str_replace(" ", " ", $content);
}
file_put_contents($tempfile, $content);
Then you can treat it as a normal csv file with space delimiter like this:
$handle = fopen($tempfile, "r");
$current_row = 1;
while ( ($csv_data = fgetcsv($handle, 10000, " ") ) !== FALSE )
{
$number_of_fields = count($csv_data);
// ...
}
fclose($handle);
After you finish this , delete the temp file like this:
unlink($tempfile);
And it is better to do the insert only once than create and run the insert query in a foreach loop, so instead of adding
$query = "INSERT INTO TEST (nation) VALUES ('".$nation."')";
$result = odbc_exec($dbhandle, $query);
in each loop, which produces
INSERT INTO TEST (nation) VALUES ('FG');
INSERT INTO TEST (nation) VALUES ('BHC');
INSERT INTO TEST (nation) VALUES ('FG');
INSERT INTO TEST (nation) VALUES ('FG');
...
, it is more reasonable to create the query like this
$lines = file("upload/import.txt");
$dbhandle = odbc_connect("Driver={SQL Server Native Client 11.0};
Server=$myServer;Database=$myDB;", $myUser, $myPass)
or die("Couldn't connect to SQL Server on $myServer");
$query = "INSERT INTO TEST (nation) VALUES \n";
$row = array(); // query for each line
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$player = trim(substr($line, 12, 2)); // get 13th-14th characters as player and remove spaces
$row []= "(".$nation.")";
}
$query .= implode(",\n",$row).";";
$result = odbc_exec($dbhandle, $query);
If you echo $query, you should get something like this:
INSERT INTO TEST (nation) VALUES
('FG'),
('BHC'),
('FG'),
('FG');
which is much more light to run insert query every line.
PS: Please be careful that mysql has a limited query length. If the query is longer than max length, you will get an error , something like 'Mysql has gone away...'. I haven't much experience in using ms-sql server but there should be a same limit.
In this situation, you should split the query in a proper way. For example, run and clear the query every 10000 lines like this:
$lines = file("upload/import.txt");
$dbhandle = odbc_connect("Driver={SQL Server Native Client 11.0};
Server=$myServer;Database=$myDB;", $myUser, $myPass) or die("Couldn't connect to SQL Server on $myServer");
$query = "INSERT INTO TEST (nation) VALUES \n";
$row = array(); // query for each line
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$player = trim(substr($line, 12, 2)); // get 13th-14th characters as player and remove spaces
$row []= "('".$nation."')";
if($lineNum % 10000 == 9999){
// run and reproduce the query every 10000 lines
$query .= implode(",\n",$row).";";
// put 'echo $query;' here would help you understand the design
$result = odbc_exec($dbhandle, $query);
// It is better to check the result here if it success
// The query has been run so it should be initialized again, so is $row.
$query = "INSERT INTO TEST (nation) VALUES \n";
$row = array(); // query for each line
}
}
$query .= implode(",\n",$row).";";
$result = odbc_exec($dbhandle, $query);
Answer to 'I need now to check if in the file there are lines with the same nation, in this example "FG" and sum the $prize of each "FG" and save only the total in prize and not all the lines of FG? '
Of course you can do that, since I don't know your table or how you would like to save your other data so I will just provide a sample with only nation and price:
$lines = file("upload/import.txt");
$dbhandle = odbc_connect("Driver={SQL Server Native Client 11.0};
Server=$myServer;Database=$myDB;", $myUser, $myPass)
or die("Couldn't connect to SQL Server on $myServer");
$query = "INSERT INTO TEST (nation, prize) VALUES \n";
$row = array(); // data for each line
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$row[$nation] = !empty($row[$nation])? $row[$nation] + (int)$prize : 0 ; // make sure prize is an interger in your file
}
// Since there are not so much nations in the world, I don't think it is necessary to worry about the max query length
$query_row = array(); // query for each line
foreach($row as $nation => $sum_prize){
$query_row []= "('".$nation."','".$sum_prize."')";
}
$query .= implode(",\n",$query_row).";";
$result = odbc_exec($dbhandle, $query);
Try this:
$handle = fopen("upload/import.txt", "r");
$current_row = 1;
while(!feof($handle)){
$str=fgets($handle);//read one line of the file
$str=str_replace(" "," ",$str);//reduce all consecutive spaces to single space
$arr=explode(" ",$str);
if ($current_row == 1) {
}
else {
}
}
fclose($handle);
Now the array $arr will contain the strings {"FG","09097612,"DN","6575","HL","879797"} (after reading the first line). You can use this array to access the values and insert them in DB.
EDIT:
I understand that you want to have an array called "nation" which will contain values {"FG","BHC","FG","FG"}, and the same for prize and other variables. Try this code:
$nation=array();
$prize=array();
$player=array();
$handle = fopen("upload/import.txt", "r");
$current_row = 1;
while(!feof($handle)){
$str=fgets($handle);//read one line of the file
$str=str_replace(" "," ",$str);//reduce all consecutive spaces to single space
$arr=explode(" ",$str);
//now insert values in respective arrays
array_push($nation,$arr[0]);
array_push($prize,$arr[1]);
array_push($player,$arr[2]);
//and so on
if ($current_row == 1) {
}
else {
}
}
fclose($handle);
Now you can use the arrays $nation, $prize and $player. You can create arrays for the other values in the same manner.
Hope this helps.
I have a file like this
FG 09097612 DN 6575 HL 879797
BHC 09097613 DN 6576 HL 879798
FG 09097614 DN 6577 IOPPP 879799
FG 09097614 DN 6577 IOPPP 879800
and I import it in mysql with
$lines = file("upload/import.txt");
$dbhandle = odbc_connect("Driver={SQL Server Native Client 11.0};
Server=$myServer;Database=$myDB;", $myUser, $myPass)
or die("Couldn't connect to SQL Server on $myServer");
$query = "INSERT INTO TEST (nation) VALUES \n";
$row = array(); // query for each line
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$player = trim(substr($line, 12, 2)); // get 13th-14th characters as player and remove spaces
$row []= "(".$nation.")";
}
$query .= implode(",\n",$row).";";
$result = odbc_exec($dbhandle, $query);
What I need now is to check if in the file there are lines with the same nation, in this example "FG" and sum the $prize of each "FG" and save only the total in prize and not all the lines of FG?
If you need to do this in php I would save the rows in an array keyed by the nation, and each time check if that key already exists and if so just increment the prize.
Something like this:-
<?php
$lines = file("upload/import.txt");
$dbhandle = odbc_connect("Driver={SQL Server Native Client 11.0};
Server=$myServer;Database=$myDB;", $myUser, $myPass)
or die("Couldn't connect to SQL Server on $myServer");
$query = "INSERT INTO TEST (nation, prize) VALUES \n";
$row = array(); // query for each line
foreach($lines as $lineNum => $line )
{
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
if (array_key_exists($nation, $row))
{
$row[$nation]['prize'] += $prize;
}
else
{
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
$player = trim(substr($line, 12, 2)); // get 13th-14th characters as player and remove spaces
$row[$nation]= array('nation'=>$nation, 'prize'=>$prize);
}
}
array_walk($row, function($v, $k){return "(".$v['nation'].", ".$v['prize'].")";});
$query .= implode(",\n",$row).";";
$result = odbc_exec($dbhandle, $query);
?>
However if this was being done in MySQL I would be tempted to just have the nation as a unique key on the database and add ON DUPLICATE KEY SET prize=prize + VALUES(prize) to the end of the insert query.
You could write really complicated php code to do this. Or, you could change your strategy. First, let me point out that you can just use load data infile to read the data from a file.
The new strategy is to read the data into a staging table and then copy it to the final table. Basically:
read data into test_staging table (using php or `load data infile`)
insert into test(nation, prize)
select nation, sum(prize)
from test_staging
group by nation;
drop table test_staging;
So if you need it in PHP, I would suggest using a named array like this:
$insertLines = array();
foreach($lines as $lineNum => $line ) {
$nation = trim(substr($line, 0, 4)); // get first four characters as nation and remove spaces
$prize = trim(substr($line, 4, 8)); // get 5th-12th characters as prize and remove spaces
insertLines[$nation] += $prize;
}
foreach($insertLines as $lineNation => $linePrice ) {
// The creation of the insert into the database goes here.
}
And please check the lenghts.
I think the plazer is in 14 to 15 ?
$player = trim(substr($line, 13, 2));
I want to generate a pair of random numbers 1234567890-9876543210 (10 digits each)
I made this code. It works fine it generates a pair of random numbers BUT if I try to insert it into database I get same results multiple times. Let's say I get 1234567890 more than once. If I echo the insert statement I get different results but when I want to query it into database I get same results.
$numbers = array(0,1,2,3,4,5,6,7,8,9);
srand(time());
$f = fopen('sql0.txt', 'w');
for($i=0;$i<100000;$i++)
{
$r = NULL;
$r2 = NULL;
for($x=0;$x<10;$x++)
{
$n = rand(0,9);
$r .= $numbers[$n];
}
for($x=0;$x<10;$x++)
{
$n1 = rand(0,9);
$r2 .= $numbers[$n1];
}
echo("INSERT INTO ci_codes VALUES (NULL, '$r', '$r2', '0')<br>");
}
Do you need the INSERT expression inside your loop? That might cause the trouble.
As others have mentioned, it is better to just generate the numbers once with the php function, then run your MySql Query.
$r = mt_rand(0, 10000000);
$r2 = mt_rand(0, 10000000);
echo("INSERT INTO ci_codes VALUES (NULL, '$r', '$r2', '0'");