Related
I am having issues running a PHP script which inserts data to MySQL. The error I get is "504 Gateway Time - out nginx" When the PHP page gets stuck with this timeout 10,102 lines of data have been entered to the database. I'm planning to insert 160,000 lines in one load of the script.
I have made my code more efficient by using a prepared statement for the SQL.
The SQL is also set up in this structure:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
I have read SO PHP script times out and How to keep a php script from timing out because of a long mysql query
I have tried adding to the start of my code but doesn't seem to make a difference:
set_time_limit(0);
ignore_user_abort(1);
Can anyone show me data to split dataset into chunnks and for each chunk data is inserted?
I will show the section of code that inserts to MySQL below
// prepare and bind
$stmt = $link->prepare("INSERT INTO MyGuests (`eventID`,`location`,`date`,`barcode`,`runner`,`time`,`Run Points`,`Volunteer Points`,`Gender`, `Gender pos`) VALUES (?,?,?,?,?,?,?,?,?,?)");
$stmt->bind_param("isssssiisi", $eventID,$location,$date,$barcode,$runner,$time,$runpoints,$volpoints,$gender,$genderpos);
// set parameters and execute
for( $x=0; $x < count($array_runner); $x++ ){
$eventID=null;
$barcode=$array_barcode[$x];
$runner=$array_runner[$x];
$time=$array_time[$x];
$runpoints=$array_score[$x];
$volpoints=' ';
$gender=$array_gender[$x];
$genderpos=$array_gender_pos[$x];
$stmt->execute();
}
$stmt->close();
$link->close();
I am new to working with MySQL and am looking for some guidance with this problem.
set_time_limit(0); resets the count when it is executed. It does not change the max_execution_time in php.ini so to make it have any useful effect you would have to run it in the loop.
// prepare and bind
$stmt = $link->prepare("INSERT INTO MyGuests (`eventID`,`location`,`date`,`barcode`,`runner`,`time`,`Run Points`,`Volunteer Points`,`Gender`, `Gender pos`) VALUES (?,?,?,?,?,?,?,?,?,?)");
$stmt->bind_param("isssssiisi", $eventID,$location,$date,$barcode,$runner,$time,$runpoints,$volpoints,$gender,$genderpos);
// set parameters and execute
for( $x=0; $x < count($array_runner); $x++ ){
$eventID=null;
$barcode=$array_barcode[$x];
$runner=$array_runner[$x];
$time=$array_time[$x];
$runpoints=$array_score[$x];
$volpoints=' ';
$gender=$array_gender[$x];
$genderpos=$array_gender_pos[$x];
$stmt->execute();
// every 5000 times through the loop reset the timeout
if ( $x % 5000 == 0 ) {
set_time_limit(30);
}
}
$stmt->close();
$link->close();
Of course you can play with the value 5000 so it does the reset less often.
From the Manual:
When called, set_time_limit() restarts the timeout counter from zero. In other words, if the timeout is the default 30 seconds, and 25 seconds into script execution a call such as set_time_limit(20) is made, the script will run for a total of 45 seconds before timing out.
If you are using query inside a loop with so large number of rows it would definitely stuck.
The best way I can suggest is simply handle all the data to be inserted in a PHP string and then fire a single query to insert data.
Let me elaborate
$data_to_insert = '' // will contain all data to inserted
$count = 1;
$eventID = null; // if it is null for all rows
for( $x=0; $x < count($array_runner); $x++ )
{
if($count == 1) // checking if it is the first value to be inserted
{
$data_to_insert = "(";
$count = 2;
}
else // with second value onwards
{
$data_to_insert = ",(" ;
}
$data_to_insert = $data_to_insert . $eventID . ",";
$data_to_insert = $data_to_insert . "'". $barcode . "'";
$data_to_insert = $data_to_insert . "'". $array_runner[$x] . "'";
$data_to_insert = ")";
}
// so in the last $data_to_insert should look like this
// $data_to_insert = (eventid1 , 'barcode1', 'runner1'), (eventid2 , 'barcode2', 'runner2') and so on...
Then fire the query
mysqli_query("INSERT INTO MyGuests (`eventID`,`barcode`,`runner`) values" . $data_to_insert);
// which would look like
// INSERT INTO MyGuests (`eventID`,`barcode`,`runner`) values (eventid1 , 'barcode1', 'runner1'), (eventid2 , 'barcode2', 'runner2')
Note :
There might be some syntax error in my code, but you get the logic here.
I've seen multiple threads discussing this but there always has been totally different conclusion in the answers. Especially I wonder whether it is really necessary to create a own prepared statement (with the right amount of placeholders) in order to insert it as single query. I expected that when I use beginTransaction and endTransaction before and after my for loop, that pdo/php waits with the transaction until all data is collected and it will send these data's as a single query once the server hits the line endTransaction.
How would I need to rewrite such a for loop insert with multiple inserts in order to reach the best performance (it has between 1 and 300 rows usually but it also could reach 2000 rows).
for($i=0; $i<$baseCount; $i++)
{
$thLevel = $bases[$i]["ThLevel"];
$gold = $bases[$i]["Gold"];
$elixir = $bases[$i]["Elixir"];
$darkElixir = $bases[$i]["DarkElixir"];
$dateFound = $elixir = $bases[$i]["TimeFound"];
$query = $db->prepare("INSERT INTO bot_attacks_searchresults (attack_id, available_gold, available_elixir, available_dark_elixir, date_found, opponent_townhall_level)
VALUES (:attack_id, :available_gold, :available_elixir, :available_dark_elixir, :date_found, :opponent_townhall_level)");
$query->bindValue(':attack_id', $attackId);
$query->bindValue(':available_gold', $gold);
$query->bindValue(':available_elixir', $elixir);
$query->bindValue(':available_dark_elixir', $darkElixir);
$query->bindValue(':date_found', $dateFound);
$query->bindValue(':opponent_townhall_level', $thLevel);
$query->execute();
}
Prepare the statement once. MySQL lexes it once, so any subsequent call to the query will be quick since it's already lexed and juts needs parameters.
Start the transaction before the loop. This is done so your hard drive can write down all the rows in one input output operation. The default mode is that 1 insert query = 1 I/O of the hdd.
Create the loop, bind your parameters there and call the $query->execute();
Exit the loop and commit() the transaction.
Full code:
$db->beginTransaction();
$query = $db->prepare("INSERT INTO bot_attacks_searchresults (attack_id, available_gold, available_elixir, available_dark_elixir, date_found, opponent_townhall_level)
VALUES (:attack_id, :available_gold, :available_elixir, :available_dark_elixir, :date_found, :opponent_townhall_level)");
for($i = 0; $i < $baseCount; $i++)
{
$thLevel = $bases[$i]["ThLevel"];
$gold = $bases[$i]["Gold"];
$elixir = $bases[$i]["Elixir"];
$darkElixir = $bases[$i]["DarkElixir"];
$dateFound = $elixir = $bases[$i]["TimeFound"];
$query->bindValue(':attack_id', $attackId);
$query->bindValue(':available_gold', $gold);
$query->bindValue(':available_elixir', $elixir);
$query->bindValue(':available_dark_elixir', $darkElixir);
$query->bindValue(':date_found', $dateFound);
$query->bindValue(':opponent_townhall_level', $thLevel);
$query->execute();
}
$db->commit();
Here's a very crude proof of concept:
<?php
$values = array();
for($i=0;$i<10;$i++)
{
$values[] = "($i)";
}
$values = implode($values,',');
$query = "INSERT INTO my_table VALUES $values";
echo $query;
?>
outputs INSERT INTO my_table VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
You would need to restructure this slightly to work with prepare (PHP is not my forte), but the principle is the same; i.e. you build the query inside the loop, but execute it only once.
I got this code
$i = -1;
$random_string = array();
while (sizeof($random_string) < 1600000) {
$i++;
$zmienna = generatePassword();
if (!in_array($zmienna, $random_string))
$random_string[$i] = $zmienna;
else
continue;
}
//print_r($random_string);
foreach ($random_string as $value) {
$sql = "INSERT INTO `kody`(`kod`) VALUES ('$value')";
mysql_query($sql, $con);
}
But it will take a lot of hours to insert it to databse, or even to array. Do someone know how to improve this code?
Well, in_array() is rather expensive. Use a hash instead of a simple array, and then you can use isset() instead of in_array().
Also, don't use things like sizeof() and count() as loop conditions. Instead, just use a simple for ($i = 0; $i < 1600000; ++$i) { ... } array.
Depending on your web host permissions, another significant optimization would be to use fputcsv() to write your array to disk and then make use of MySQL's LOAD DATA INFILE to load the contents into your database, instead of generating 1.6 million queries.
Yes, use one query to insert all of them at once with an SQL multi-insert:
$values = "('" . implode( "'), ('", $random_string) . "')";
$sql="INSERT INTO `kody`(`kod`) VALUES " . $values;
mysql_query($sql,$con);
As drrcknlsn very correctly points out, in_array() is inefficient, as it performs a linear O(n) search on the array. Here is how you can fix that (which is a hash implementation):
while( sizeof($random_string) < 1600000) {
$i++;
$zmienna = generatePassword();
if( !isset( $random_string[$zmienna]))
$random_string[$zmienna] = $zmienna;
else
continue;
}
Now, you can use the above code to generate a single SQL query, and this should run much, much faster.
The problem is probably that it's trying to update the INDEX after each insert. Try using transactions. This will only update the INDEX once (after COMMIT) is called. This will also let you ROLLBACK if something goes wrong.
mysql_query("SET AUTOCOMMIT=0");
mysql_query("START TRANSACTION");
foreach($random_string as $value)
{
$sql="INSERT INTO `kody`(`kod`) VALUES ('$value')";
mysql_query($sql,$con);
}
mysql_query("COMMIT");
Please bear with me on this question.
I'm looking to create a relatively large MySQL database that I want to use to do some performance testing. I'm using Ubuntu 11.04 by the way.
I want to create about 6 tables, each with about 50 million records. Each table will have about 10 columns. The data would just be random data.
However, I'm not sure how I can go about doing this. Do I use PHP and loop INSERT queries (bound to timeout)? Or if that is inefficient, is there a way I can do this via some command line utility or shell script?
I'd really appreciate some guidance.
Thanks in advance.
mysql_import is what you want. Check this for full information. It's command line and very fast.
Command-line mode usually has the timeouts disabled, as that's a protection against taking down a webserver, which doesn't apply at the command line.
You can do it from PHP, though generating "random" data will be costly. How random does this information have to be? You can easily read from /dev/random and get "garbage", but it's not a source of "good" randomness (You'd want /dev/urandom, then, but that will block if there isn't enough entropy available to make good garbage).
Just make sure that you have keys disabled on the tables, as keeping those up-to-date will be a major drag on your insert operations. You can add/enable the keys AFTER you've got your data set populated.
If you do want to go the php way, you could do something like this:
<?php
//Edit Following
$millionsOfRows = 2;
$InsertBatchSize = 1000;
$table = 'ATable';
$RandStrLength = 10;
$timeOut = 0; //set 0 for no timeout
$columns = array('col1','col2','etc');
//Mysql Settings
$username = "root";
$password = "";
$database = "ADatabase";
$server = "localhost";
//Don't edit below
$letters = range('a','z');
$rows = $millionsOfRows * 1000000;
$colCount = count($columns);
$valueArray = array();
$con = #mysql_connect($server, $username, $password) or die('Error accessing database: '.mysql_error());
#mysql_select_db($database) or die ('Couldn\'t connect to database: '.mysql_error());
set_time_limit($timeOut);
for ($i = 0;$i<$rows;$i++)
{
$values = array();
for ($k = 0; $k<$colCount;$k++)
$values[] = RandomString();
$valueArray[] = "('".implode("', '", $values)."')";
if ($i > 0 && ($i % $InsertBatchSize) == 0)
{
echo "--".$i/$InsertBatchSize."--";
$sql = "INSERT INTO `$table` (`".implode('`,`',$columns)."`) VALUES ".implode(',',$valueArray);
mysql_query($sql);
echo $sql."<BR/><BR/>";
$valueArray = array();
}
}
mysql_close($con);
function RandomString ()
{
global $RandStrLength, $letters;
$str = "";
for ($i = 0;$i<$RandStrLength;$i++)
$str .= $letters[rand(0,25)];
return $str;
}
Of course you could just use a created dataset, like the NorthWind Database.
all you need to do is launch your script from command line like this:
php -q generator.php
it can then be a simple php file like this:
<?php
$fid = fopen("query.sql", "w");
fputs($fid, "create table a (id int not null auto_increment primary key, b int, c, int);\n");
for ($i = 0; $i < 50000000; $i++){
fputs($fid, "insert into table a (b,c) values (" . rand(0,1000) . ", " . rand(0,1000) . ")\n");
}
fclose($fid);
exec("mysql -u$user -p$password $db < query.sql");
Probably it is fastest to run multiple inserts in one query as:
INSERT INTO `test` VALUES
(1,2,3,4,5,6,7,8,9,0),
(1,2,3,4,5,6,7,8,9,0),
.....
(1,2,3,4,5,6,7,8,9,0)
I created a PHP script to do this. First I tried to construct a query that will hold 1 million inserts but it failed. Then I tried with 100 thousend and it failed again. 50 thousends don't do it also. My nest try was with 10 000 and it works fine. I guess I am hitting the transfer limit from PHP to MySQL. Here is the code:
<?php
set_time_limit(0);
ini_set('memory_limit', -1);
define('NUM_INSERTS_IN_QUERY', 10000);
define('NUM_QUERIES', 100);
// build query
$time = microtime(true);
$queries = array();
for($i = 0; $i < NUM_QUERIES; $i++){
$queries[$i] = 'INSERT INTO `test` VALUES ';
for($j = 0; $j < NUM_INSERTS_IN_QUERY; $j++){
$queries[$i] .= '(1,2,3,4,5,6,7,8,9,0),';
}
$queries[$i] = rtrim($queries[$i], ',');
}
echo "Building query took " . (microtime(true) - $time) . " seconds\n";
mysql_connect('localhost', 'root', '') or die(mysql_error());
mysql_select_db('store') or die(mysql_error());
mysql_query('DELETE FROM `test`') or die(mysql_error());
// execute the query
$time = microtime(true);
for($i = 0; $i < NUM_QUERIES; $i++){
mysql_query($queries[$i]) or die(mysql_error());
// verify all rows inserted
if(mysql_affected_rows() != NUM_INSERTS_IN_QUERY){
echo "ERROR: on run $i not all rows inserted (" . mysql_affected_rows() . ")\n";
exit;
}
}
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
$result = mysql_query('SELECT count(*) FROM `test`') or die(mysql_error());
$row = mysql_fetch_row($result);
echo "Total number of rows in table: {$row[0]}\n";
echo "Total memory used in bytes: " . memory_get_usage() . "\n";
?>
The result on my Win 7 dev machine are:
Building query took 0.30241012573242 seconds
Executing query took 5.6592788696289 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 22396560
So for 1 mil inserts it took 5 and a half seconds. Then I ran it with this settings:
define('NUM_INSERTS_IN_QUERY', 1);
define('NUM_QUERIES', 1000000);
which is basically doing one insert per query. The results are:
Building query took 1.6551470756531 seconds
Executing query took 77.895285844803 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 140579784
Then I tried to create a file with one insert per query in it, as suggested by #jancha. My code is slightly modified:
$fid = fopen("query.sql", "w");
fputs($fid, "use store;");
for($i = 0; $i < 1000000; $i++){
fputs($fid, "insert into `test` values (1,2,3,4,5,6,7,8,9,0);\n");
}
fclose($fid);
$time = microtime(true);
exec("mysql -uroot < query.sql");
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
The result is:
Executing query took 79.207592964172 seconds
Same as executing the queries through PHP. So, probably the fastest way is to do multiple inserts in one query and shouldn't be a problem to use PHP to do the work.
Do I use PHP and loop INSERT queries (bound to timeout)
Certainly running long duration scripts via a webserver mediated requset is not a good idea. But PHP can be compiled to run from the command line - in fact most distributions of PHP come bundled with this.
There are lots of things you do to make this run more efficiently, exactly which ones will vary depedning on how you are populating the data set (e.g. once only, lots of batch additions). However for a single load, you might want to have a look at the output of mysqldump (note disabling, enabling indexes, multiple insert lines) and recreate this in PHP rather than connecting directly to the database from PHP.
I see no point in this question, and, especially, in raising a bounty for it.
as they say, "the best is the enemy of good"
You have asked this question ten days ago.
If you'd just go with whatever code you've got, you'd have your tables already and even done with your tests. But you lose so much time just in vain. It's above my understanding.
As for the method you've been asking for (just to keep away all these self-appointed moderators), there are some statements as a food for thought:
mysql's own methods considered more effective in general.
mysql can insert all data from the table into another using INSERT ... SELECT syntax. so, you will need to run only about 30 queries to get your 50 mil records.
and sure mysql can copy whole tables as well.
keep in mind that there should be no indexes at the time of table creation.
I just want to point you to http://www.mysqldumper.net/ which is a tool that allows you to backup and restore big databases with PHP.
The script has some mechanisms to circumvent the maximum execution time of PHP -> imo worth a look.
This is not a solution for generating data, but a great one for importing / exporting.
I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.