I'm creating a PHP script that imports some data from text files into a MySQL database. These text files are pretty large, an average file will have 10,000 lines in it each of which corresponds to a new item I want in my database. (I won't be importing files very often)
I'm worried that reading a line from the file, and then doing a INSERT query, 10,000 times in a row might cause some issues. Is there a better way for me to do this? Should I perform one INSERT query with all 10,000 values? Or would that be just as bad?
Maybe I can reach a medium, and perform something like 10 or 100 entries at once. Really my problem is that I don't know what is good practice. Maybe 10,000 queries in a row is fine and I'm just worrying for nothing.
Any suggestions?
yes it is
<?php
$lines = file('file.txt');
$count = count($lines);
$i = 0;
$query = "INSERT INTO table VALUES ";
foreach($lines as $line){
$i++;
if ($count == $i) {
$query .= "('".$line."')";
}
else{
$query .= "('".$line."'),";
}
}
echo $query;
http://sandbox.phpcode.eu/g/5ade4.php
this will make one single query, which is multiple faster than one-line-one-query style!
Use prepared statements, suggested by the authors of High Performance MySQL. It saves a lot of time (saves from wasteful protocol and SQL ASCII code).
I would do it in one large query with all the values at once. Just to be sure, though, make sure you run START TRANSACTION; before and COMMIT; afterwards, so that if something goes wrong during the execution of the query (which is possible, since it will most likely run for a fairly long time), the database will not be affected.
Related
I have a DB table which has approximately 40 columns and the main motive is to insert the records in the database as quickly as possible. I am using PHP for this.
The problems is, to create the insert statement, I have to loop through a for each. I am not sure if I doning this correctly. Please suggest me the best atlernative.. here is the example..
/// to loop through the available data ///
$sqc = "";
for ($i=1; $i<=100; $i++){
if ($sqc == ""){
$sqc = "('".$array_value["col1"]."'.. till .. '".$array_value["col40"]."',)";
} else {
$sqc .= ",('".$array_value["col1"]."'.. till .. '".$array_value["col40"]."',)";
}
}
/// finally the sql query ///
$sql_quyery = "INSERT INTO table_name (`col1`,.. till.. ,`col40`) values ".$sqc;
This concatenation of $sqc is taking a lot of time. and also the insertion in the DB, is there an alternate way of doing this.. i need to find a way to speed this up like 100X.. :(
Thank you
As suggested on MySQL Optimizing INSERT Statements page, there are some ways for this-
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.
Find the link below-
[MySQL Guide]
[1] https://dev.mysql.com/doc/refman/5.7/en/insert-optimization.html
Is just a small contribute but you can avoid the concat using a binding
$stmt = mysqli_prepare($yuor_conn,
"INSERT INTO table_name (`col1`,.. till.. ,`col40`) VALUES (?, ... till.. ?)");
mysqli_stmt_bind_param($stmt, 'ss.......s',
$array_value["col1"], $array_value["col2"], .. till..,
$array_value["col40"]);
I am using the following code:
for ($x = 0; $x < count($response[0]); $x++) { //approx count 50-60
$t = $response[0][$x];
$query = "INSERT INTO tableX (time,title) VALUES ('$date','$t')";
if ($query_run = mysqli_query($mysql_connect, $query)) {
//Call some functions if the insertion happens
}
}
mysqli_close($mysql_connect);
Title in the table is a pimary key. I am going to call some functions only when the insertion is successful i.e., no existing title is provided. The title and date are derived from a csv file.
How can I improve the performance of this code? When should I use unset to save CPU memory cycles?
unset will not improve performance by any meaningful metric here.
What will improve performance is updating more than one row per query.
Let's take your code as an example. This is just an example of how it could be. If you need to run other functions after the insert, then you might want to update for example 10 or 100 rows at a time instead of ALL rows.
$query = "INSERT INTO tableX (time,title) VALUES ";
$valueQuery = array();
for ($x = 0; $x < count($response[0]); $x++) { //approx count 50-60
$t = $response[0][$x];
$valueQuery[] = "('$date','$t')";
}
$query .= implode(", ",$valueQuery);
if ($query_run = mysqli_query($mysql_connect, $query)) {
//Call some functions if the insertion happens
}
This question is entirely based on the wrong premises.
First of all, nowhere using unset will save you a CPU cycle,but will rather cousume them.
Besides, there is not much place to put unset anyway.
Finally, there are no real issues with performance with this code. If you are experiencing any, you should meausere them, detect a real bottleneck, and then fix it, instead of poking around barking at random trees.
What should be your actual concern, instead of fictional performance issues, is that your code wide open to sql injections and syntax errors of all sorts.
You could improve the shown code by using prepared statements for your insert (1). As it is, in every iteration you first have to define the $query and then execute. In each execution the database would also have to first parse your statement.
By preparing the statement before entering the loop you only define it once and can just execute it with the different values. Also the statement would only be parsed when it gets prepared, it doesn't get parsed on each execute.
Here is a small artice on how the garbage-collector in php works: http://www.hackingwithphp.com/18/1/10, found in this question: Is there garbage collection in PHP?
PHP performs garbage collection at three primary junctures:
When you tell it to
When you leave a function
When the script ends
So maybe you should check how much of your programm you can encapsulate into functions, though there is not much to see in your snippet.
(1)
I suggest the prepared statements to improve efficiency. It is imo best practice to reduce workload - the unneccessary parsing of the query.
So I have no estimation on the actual performance improvement, though as this case is talking about 50 to 60 iterations it would be minimal if even noticable.
I'm dealing with big volumes of data. It's a huge table where I'm performing unions through a SQL statement, from my php, and sending over to my own localhost db. I've got the thing sorted out but I want to optimize this. It had to stay overnight merging around 83.000 rows.
$con = new PDO("pgsql:host, port,dbname, user, password");
$con->beginTransaction(); // cursors require a transaction.
$stmt = $con->prepare($query);
$stmt->execute();
$innerStatement = $con->prepare("FETCH 1 FROM cursor1");
while($innerStatement->execute() && $row = $innerStatement->fetch(PDO::FETCH_ASSOC)) {
insertDataToDB($row);
}
Question: will changing the line to "FETCH 1000 FROM cursor1" make it so that I'm fetching 1000 rows each time instead of one? Will that help performance?
I'm hoping this larger operation was a one time thing. But in future I will have to move smaller amounts of data... still the query is rather heavy since it relies in comparisons with timestamps, otherwise how would I know if my DB is updated or not?
Thank you.
This is an optimisation question RE: 1st principles.. Imagine I am doing a big heavy lifting comparison.. 30k files vs 30k database entries.. is it most process efficient to do one big MySQL into an array then loop through physical files checking vs the array or is it better to loop through the files and then one at a time do one line MySQL calls..
Here is some pseudo code to help explain:
//is this faster?
foreach($recursiveFileList as $fullpath){
$Record = $db->queryrow("SELECT * FROM files WHERE fullpath='".$fullpath."'");
//do some $Record logic
}
//or is this faster
$BigList = array();
$db->query("SELECT * FROM files");
while($Record = $db->rows()){
$BigList[$Record['fullpath']] = $Record;
}
foreach($recursiveFileList as $fullpath){
if (isset($BigList[$fullpath])){
$Record = $BigList[$fullpath];
//do some $Record logic
}
}
Update: if you always know that your $recursiveFileList is 100% of the table, then doing one query per row would be needless overhead. In that case, just use SELECT * FROM files.
I wouldn't use either of the two styles you show.
The first style runs one separate SQL query for each individual fullpath. This causes some overhead of SQL parsing, optimization, etc. Keep in mind that MySQL does not have the capability of remembering the query optimization from one invocation of a similar query to the next; it analysis and performs query optimization every time. The overhead is relatively small, but it adds up.
The second style shows fetching all rows from the table, and sorting it out in the application layer. This has a lot of overhead, because typically your $recursiveFileList might match only 1% or 0.1% or an even smaller portion of the rows in the table. I have seen cases where transferring excessive amounts of data over the network literally exhausted a 1Gbps network switch, and this put a ceiling on the requests per second for the application.
Use query conditions and indexes wisely to let the RDBMS examine and return only the matching rows.
The two styles you show are not the only options. What I would suggest is to use a range query to match multiple file fullpath values in a single query.
$sql = "SELECT * FROM files WHERE fullpath IN ("
. array_fill(0, count($recursiveFileList), "?") . ")";
$stmt = $pdo->prepare($sql);
$stmt->execute($recursiveFileList);
while ($row = $stmt->fetch()) {
//do some $Record logic
}
Note I also use a prepared query with ? parameter placeholders, and then pass the array of fullpath values separately when I call execute(). PDO is nice for this, because you can just pass an array, and the array elements get matched up to the parameter placeholders.
This also solves the risk of SQL injection in this case.
I used to use
<?php
$sql = "insert into test (owner) values ('owen')";
$db->autocommit(false);
if (!$db->query($sql))
$db->rollback();
else
$db->commit();
$db->close();
?>
However, today I run two insert php files in a same tables, without any action. It is simple like:
<?php
$sql = "insert into test (owner) values ('owen')"; //the other php is the same but replacing 'owen' to 'huhu'
for ($i = 0; $i < 100 * 1000; $i++) {
$db->query($sql);
}
$db->close();
?>
I run two php files in two different consoles. Then I got 200,000 records without any error. Does that mean using transaction manually is really not needed. As there are no conflicts.
You do not need transactions for this.
Transactions exist to be able to roll back half-finished changes to the database. These only occur when you have a set of multiple statements changing the database which might be interrupted inbetween. Then often only some of the statements have been execute which might leave the database in a state which is not 'clean' from the applications point of view.
A simple and good example is a money transfer between two tables:
first a removal from one table
then it is added to a second table
If this process is interrupted inbetween the money has vanished. That is not intended, you probably want to be able to rollback.
In your case however all statements are 'atomic', meaning they succeed or fail, but the databases state is always 'clean'. It does not matter in this if it is a single or multiple clients running the statements.