Why str_shuffle always generate similar patterns? - php

I am trying to generate 1500 authentication code using the following code:
<?php
include "../include/top.php";
set_time_limit(0);
ini_set("memory_limit", "-1");
$end=0;
while ($end<1500)
{
//generate an authentication code
$string="ABCDEFGHJKLMNPQRSTUVWXYZ123456789";
$string= substr(str_shuffle($string),5,8) ;
//check whether generated code already exist
$query = "select count(*) from auth where code = '$string' ";
$stmt = prepare ($query);
execute($stmt);
$bind = mysqli_stmt_bind_result($stmt, $count);
check_bind_result($bind);
mysqli_stmt_fetch($stmt);
mysqli_stmt_free_result($stmt);
//If generated code does not already exist, insert it to Database table
if ($count == 0)
{
echo $string."<br>";
$query = "insert into auth (Code) values ('$string')";
$stmt = prepare ($query);
execute($stmt);
$end++;
}
}
?>
It generated and inserted 1024 codes in database and printed 667 codes in browser within 15 seconds and the browser continue loading without inserting/printing further codes, until I close the browser window after half an hour.
After that when opening any web page in browser from the WAMP, It shows like the browser is loading and does not show the content. That is, I need to restart the WAMP after running this script before opening any web pages.
I have tried this many times.
Why the script does not generate 1500 codes and always stop when it reach the count 667/1024?
UPDATE
As an experiment, I have added an ELSE clause to the IF condition and wrote the code to print "Code Already Exist" in ELSE clause. And ran the script with an empty(truncated) copy of the same table , then it print and inserted 1024 codes and after that, it print "Code Already Exist" continuously (Around 700 000+ entries within 5 minutes and continuing). And again when running the script with table having only 1024 rows, It doesn't print or insert even a single code. Instead it infinitely continuing print "Code Already Exist".
Another thing I observed that the very first 1024 iteration of the WHILE loop passes the IF condition(if the table is empty). And all the subsequent iteration failed the IF condition.

I dont think that randomiser in str_shuffle is up to this.
If I run it once I get 1024 unique codes and then it just generates duplicates. If I then restart it, it will generate another 976 unique codes giving a total of 2000 codes on the database.
I therefore assume that the randomiser used by str_shuffle() needs a reset to accomplish the generation of the required 1500 unique codes.
Try this minor modification, it will at least stop the execution after 15000 failed attempts at generating a unique code.
Basically I think you have to come up with a much better randomisation mehanism.
<?php
include "../include/top.php";
set_time_limit(0);
ini_set("memory_limit", "-1");
$end=0;
$dups=0;
while ($end<1500 && $dups < 15000)
{
//generate an authentication code
$string="ABCDEFGHJKLMNPQRSTUVWXYZ123456789";
$string= substr(str_shuffle($string),5,8) ;
//check whether generated code already exist
$query = "select count(*) from auth where code = '$string' ";
$stmt = prepare ($query);
execute($stmt);
$bind = mysqli_stmt_bind_result($stmt, $count);
check_bind_result($bind);
mysqli_stmt_fetch($stmt);
mysqli_stmt_free_result($stmt);
//If generated code does not already exist, insert it to Database table
if ($count == 0) {
echo $string."<br>";
$query = "insert into auth (Code) values ('$string')";
$stmt = prepare ($query);
execute($stmt);
$end++;
} else {
$dups++
echo "DUPLICATE for $string, Dup Count = $dups<br>";
}
}
?>
Why you need to restart = you set php timeout so it never times out.

I don't see any specific coding errors. The use of the str_shuffle for creating a authentication code is peculiar because it will prevent duplicated letters, making a much smaller possible range of values. So it may just be repeating the patterns.
Try something like this instead, so that you are shuffling the shuffled string:
$origstring= str_shuffle("ABCDEFGHJKLMNPQRSTUVWXYZ123456789");
while ($end<1500 && $dups < 15000)
{
$origstring = str_shuffle( $origstring);
$string = substr( $newstring, 5, 8);
Or, use something like this to generate the string so that you can have duplicates, creating a much larger range of possible values:
$characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
for ($i = 0; $i < 8; $i++)
{
$code .= $characters[mt_rand(0, 35)];
}

You have to fine-tune some variables in your php.ini configuration file, find those:
; Maximum execution time of each script, in seconds
; http://php.net/max-execution-time
; Note: This directive is hardcoded to 0 for the CLI SAPI
max_execution_time = 600
And, not mandatory, you can change those also:
suhosin.post.max_vars = 5000
suhosin.request.max_vars = 5000
After modification, restart your web server.

Related

PHP While Loop with MySQL Has Inconsistent Execution Time, How Can I Fix This?

My problem is simple. On my website I'm loading several results from MySQL tables inside a while loop in PHP and for some reason the execution time varies from reasonably short (0.13s) or to confusingly long (11s) and I have no idea why. Here is a short version of the code:
<?php
$sql =
"SELECT * FROM test_users, image_uploads
WHERE test_users.APPROVAL = 'granted'
AND test_users.NAME = image_uploads.OWNER
".$checkmember."
".$checkselected."
ORDER BY " . $sortingstring . " LIMIT 0, 27
";
$result = mysqli_query($mysqli, $sql);
$data = "";
$c = 0;
$start = microtime(true);
while($value = mysqli_fetch_array($result)) {
$files_key = $value["KEY"];
$file_hidden = "no";
$inner_query = "SELECT * FROM my_table WHERE KEY = '".$files_key."' AND HIDDEN = '".$file_hidden."'";
$inner_result = mysqli_query($mysqli, $inner_query);
while ($row = mysqli_fetch_array($inner_result)) {
// getting all variables with row[n]
}
$sql = "SELECT * FROM some_other_table WHERE THM=? AND MEMBER=?";
$fstmt = $mysqli->prepare($sql);
$fstmt->bind_param("ss", $value['THM'], 'username');
$fstmt->execute();
$fstmt->store_result();
if($fstmt->num_rows > 0) {
$part0 = 'some elaborate string';
} else {
$part0 = 'some different string';
}
$fstmt->close();
// generate a document using the gathered data
include "../data.php"; // produces $partsMerged
// save to data string
$data .= $partsMerged;
$c++;
}
$time_elapsed_secs = substr(microtime(true) - $start, 0, 5);
// takes sometimes only 0.13 seconds
// and other times up to 11 seconds and more
?>
I was wondering where the problem could be.
Does it have to do with my db connection or is my code flawed? I haven't had this problem at the beginning when I first implemented it but since a few months it's behaving strangely. Sometimes it loads very fast other times as I said it takes 11 seconds or even more.
How can I fix this?
There's a few ways to debug this.
Firstly, any dynamic variables that form part of your query (e.g. $checkmember) - we have no way of knowing here whether these are the same or different each time you're executing the query. If they're different then each time you are executing a different query! So it goes without saying it may take longer depending on what query is being run.
Regardless of the answer, try running the SQL through the MySQL command line and see how long that query takes.
If it's similar (i.e. not an 11 second range) then the answer is it's nothing to do with the actual query itself.
You need to say whether the environment you're running this in is a web server, e.g. accessing the PHP script via a browser, or executing the script via a command line.
There isn't enough information to answer your question. But you need to at least establish some of these things first.
The rule of thumb is that if your raw SQL executes on a MySQL command line in a similar amount of time on subsequent attempts, the problem area is elsewhere (e.g. connection to a web server via a browser). This can be monitored in the Network tab of your browser.

Fast inserts results in data loss

I have a strange bug here related to mysql and php. I'm wondering if it could be a performance problem on our server's behalf too.
I got a class used to manage rebate promotional codes. The code is great, works fine and doesn exactly what it is supposed to do. The saveChanges() operation sends an INSERT or UPDATE depending on the state of the object and in the current context will only insert cause i'm trying to generate coupon codes.
The classe's saveChanges goes like this: (I know, i shouldn't be using old mysql, but i've got no choice due to architectural limitations of the software, so don't complain about that part please)
public function saveChanges($asNew = false){
//Get the connection
global $conn_panier;
//Check if the rebate still exists
if($this->isNew() || $asNew){
//Check unicity if new
if(reset(mysql_fetch_assoc(mysql_query('SELECT COUNT(*) FROM panier_rabais_codes WHERE code_coupon = "'.mysql_real_escape_string($this->getCouponCode(), $conn_panier).'"', $conn_panier))) > 0){
throw new Activis_Catalog_Models_Exceptions_ValidationException('Coupon code "'.$this->getCouponCode().'" already exists in the system', $this, __METHOD__, $this->getCouponCode());
}
//Update the existing rebate
mysql_query($q = 'INSERT INTO panier_rabais_codes
(
`no_rabais`,
`code_coupon`,
`utilisation`,
`date_verrou`
)VALUES(
'.$this->getRebate()->getId().',
"'.mysql_real_escape_string(stripslashes($this->getCouponCode()), $conn_panier).'",
'.$this->getCodeUsage().',
"'.($this->getInvalidityDate() === NULL ? '0000-00-00 00:00:00' : date('Y-m-d G:i:s', strtotime($this->getInvalidityDate()))).'"
)', $conn_panier);
return (mysql_affected_rows($conn_panier) >= 1);
}else{
//Update the existing rebate
mysql_query('UPDATE panier_rabais_codes
SET
`utilisation` = '.$this->getCodeUsage().',
`date_verrou` = "'.($this->getInvalidityDate() === NULL ? '0000-00-00 00:00:00' : date('Y-m-d G:i:s', strtotime($this->getInvalidityDate()))).'"
WHERE
no_rabais = '.$this->getRebate()->getId().' AND code_coupon = "'.mysql_real_escape_string($this->getCouponCode(), $conn_panier).'"', $conn_panier);
return (mysql_affected_rows($conn_panier) >= 0);
}
}
So as you can see, the code itself is pretty simple and clean and returns true if the insert succeeded, false if not.
The other portion of the code generates the codes using a random algorithm at goes like this:
while($codes_to_generate > 0){
//Sleep to delay mysql choking on the input
usleep(100);
//Generate a random code
$code = strtoupper('RC'.$rebate->getId().rand(254852, 975124));
$code .= strtoupper(substr(md5($code), 0, 1));
$rebateCode = new Activis_Catalog_Models_RebateCode($rebate);
$rebateCode->setCouponCode($code);
$rebateCode->setCodeUsage($_REQUEST['utilisation_generer']);
try{
if($rebateCode->saveChanges()){
$codes_to_generate--;
$generated_codes[] = $code;
}
}catch(Exception $ex){
}
}
As you can see here, two things to note. The number of codes to generate and the array of generated codes only get filled if i get a return true from the saveChanges, so mysql HAS to report that the information was inserted for this part to happen.
Another tidbit is the first line of the while:
//Sleep to delay mysql choking on the input
usleep(100);
Wtf? Well this post is all about this. My code works flawlessly with small amounts of codes to generate. But if i ask mysql to save more than a few codes at once, i have to throttle myself using usleep or mysql drops some of these lines. It will report that there are affected rows but is not saving them.
Under 100 lines, i don't need throttling and then i need to usleep depending on the amount of lines to insert. It must be something simple but i don't know what. Here is a sum of the lines i tried to insert and the minimum usleep throttle i had to implement:
< 100 lines: none
< 300 lines: 2 ms
< 1000 lines: 5 ms
< 2000 lines: 10 ms
< 5000 lines: 20 ms
< 10000 lines: 100 ms
Thank you for your time
Are you sure that your codes are all inserted and not updated, because, update a non existing line does nothing.

Creating a very large MySQL Database from PHP Script

Please bear with me on this question.
I'm looking to create a relatively large MySQL database that I want to use to do some performance testing. I'm using Ubuntu 11.04 by the way.
I want to create about 6 tables, each with about 50 million records. Each table will have about 10 columns. The data would just be random data.
However, I'm not sure how I can go about doing this. Do I use PHP and loop INSERT queries (bound to timeout)? Or if that is inefficient, is there a way I can do this via some command line utility or shell script?
I'd really appreciate some guidance.
Thanks in advance.
mysql_import is what you want. Check this for full information. It's command line and very fast.
Command-line mode usually has the timeouts disabled, as that's a protection against taking down a webserver, which doesn't apply at the command line.
You can do it from PHP, though generating "random" data will be costly. How random does this information have to be? You can easily read from /dev/random and get "garbage", but it's not a source of "good" randomness (You'd want /dev/urandom, then, but that will block if there isn't enough entropy available to make good garbage).
Just make sure that you have keys disabled on the tables, as keeping those up-to-date will be a major drag on your insert operations. You can add/enable the keys AFTER you've got your data set populated.
If you do want to go the php way, you could do something like this:
<?php
//Edit Following
$millionsOfRows = 2;
$InsertBatchSize = 1000;
$table = 'ATable';
$RandStrLength = 10;
$timeOut = 0; //set 0 for no timeout
$columns = array('col1','col2','etc');
//Mysql Settings
$username = "root";
$password = "";
$database = "ADatabase";
$server = "localhost";
//Don't edit below
$letters = range('a','z');
$rows = $millionsOfRows * 1000000;
$colCount = count($columns);
$valueArray = array();
$con = #mysql_connect($server, $username, $password) or die('Error accessing database: '.mysql_error());
#mysql_select_db($database) or die ('Couldn\'t connect to database: '.mysql_error());
set_time_limit($timeOut);
for ($i = 0;$i<$rows;$i++)
{
$values = array();
for ($k = 0; $k<$colCount;$k++)
$values[] = RandomString();
$valueArray[] = "('".implode("', '", $values)."')";
if ($i > 0 && ($i % $InsertBatchSize) == 0)
{
echo "--".$i/$InsertBatchSize."--";
$sql = "INSERT INTO `$table` (`".implode('`,`',$columns)."`) VALUES ".implode(',',$valueArray);
mysql_query($sql);
echo $sql."<BR/><BR/>";
$valueArray = array();
}
}
mysql_close($con);
function RandomString ()
{
global $RandStrLength, $letters;
$str = "";
for ($i = 0;$i<$RandStrLength;$i++)
$str .= $letters[rand(0,25)];
return $str;
}
Of course you could just use a created dataset, like the NorthWind Database.
all you need to do is launch your script from command line like this:
php -q generator.php
it can then be a simple php file like this:
<?php
$fid = fopen("query.sql", "w");
fputs($fid, "create table a (id int not null auto_increment primary key, b int, c, int);\n");
for ($i = 0; $i < 50000000; $i++){
fputs($fid, "insert into table a (b,c) values (" . rand(0,1000) . ", " . rand(0,1000) . ")\n");
}
fclose($fid);
exec("mysql -u$user -p$password $db < query.sql");
Probably it is fastest to run multiple inserts in one query as:
INSERT INTO `test` VALUES
(1,2,3,4,5,6,7,8,9,0),
(1,2,3,4,5,6,7,8,9,0),
.....
(1,2,3,4,5,6,7,8,9,0)
I created a PHP script to do this. First I tried to construct a query that will hold 1 million inserts but it failed. Then I tried with 100 thousend and it failed again. 50 thousends don't do it also. My nest try was with 10 000 and it works fine. I guess I am hitting the transfer limit from PHP to MySQL. Here is the code:
<?php
set_time_limit(0);
ini_set('memory_limit', -1);
define('NUM_INSERTS_IN_QUERY', 10000);
define('NUM_QUERIES', 100);
// build query
$time = microtime(true);
$queries = array();
for($i = 0; $i < NUM_QUERIES; $i++){
$queries[$i] = 'INSERT INTO `test` VALUES ';
for($j = 0; $j < NUM_INSERTS_IN_QUERY; $j++){
$queries[$i] .= '(1,2,3,4,5,6,7,8,9,0),';
}
$queries[$i] = rtrim($queries[$i], ',');
}
echo "Building query took " . (microtime(true) - $time) . " seconds\n";
mysql_connect('localhost', 'root', '') or die(mysql_error());
mysql_select_db('store') or die(mysql_error());
mysql_query('DELETE FROM `test`') or die(mysql_error());
// execute the query
$time = microtime(true);
for($i = 0; $i < NUM_QUERIES; $i++){
mysql_query($queries[$i]) or die(mysql_error());
// verify all rows inserted
if(mysql_affected_rows() != NUM_INSERTS_IN_QUERY){
echo "ERROR: on run $i not all rows inserted (" . mysql_affected_rows() . ")\n";
exit;
}
}
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
$result = mysql_query('SELECT count(*) FROM `test`') or die(mysql_error());
$row = mysql_fetch_row($result);
echo "Total number of rows in table: {$row[0]}\n";
echo "Total memory used in bytes: " . memory_get_usage() . "\n";
?>
The result on my Win 7 dev machine are:
Building query took 0.30241012573242 seconds
Executing query took 5.6592788696289 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 22396560
So for 1 mil inserts it took 5 and a half seconds. Then I ran it with this settings:
define('NUM_INSERTS_IN_QUERY', 1);
define('NUM_QUERIES', 1000000);
which is basically doing one insert per query. The results are:
Building query took 1.6551470756531 seconds
Executing query took 77.895285844803 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 140579784
Then I tried to create a file with one insert per query in it, as suggested by #jancha. My code is slightly modified:
$fid = fopen("query.sql", "w");
fputs($fid, "use store;");
for($i = 0; $i < 1000000; $i++){
fputs($fid, "insert into `test` values (1,2,3,4,5,6,7,8,9,0);\n");
}
fclose($fid);
$time = microtime(true);
exec("mysql -uroot < query.sql");
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
The result is:
Executing query took 79.207592964172 seconds
Same as executing the queries through PHP. So, probably the fastest way is to do multiple inserts in one query and shouldn't be a problem to use PHP to do the work.
Do I use PHP and loop INSERT queries (bound to timeout)
Certainly running long duration scripts via a webserver mediated requset is not a good idea. But PHP can be compiled to run from the command line - in fact most distributions of PHP come bundled with this.
There are lots of things you do to make this run more efficiently, exactly which ones will vary depedning on how you are populating the data set (e.g. once only, lots of batch additions). However for a single load, you might want to have a look at the output of mysqldump (note disabling, enabling indexes, multiple insert lines) and recreate this in PHP rather than connecting directly to the database from PHP.
I see no point in this question, and, especially, in raising a bounty for it.
as they say, "the best is the enemy of good"
You have asked this question ten days ago.
If you'd just go with whatever code you've got, you'd have your tables already and even done with your tests. But you lose so much time just in vain. It's above my understanding.
As for the method you've been asking for (just to keep away all these self-appointed moderators), there are some statements as a food for thought:
mysql's own methods considered more effective in general.
mysql can insert all data from the table into another using INSERT ... SELECT syntax. so, you will need to run only about 30 queries to get your 50 mil records.
and sure mysql can copy whole tables as well.
keep in mind that there should be no indexes at the time of table creation.
I just want to point you to http://www.mysqldumper.net/ which is a tool that allows you to backup and restore big databases with PHP.
The script has some mechanisms to circumvent the maximum execution time of PHP -> imo worth a look.
This is not a solution for generating data, but a great one for importing / exporting.

maximum execution time of 30 seconds exceeded php

When I run my script I receive the following error before processing all rows of data.
maximum execution time of 30 seconds
exceeded
After researching the problem, I should be able to extend the max_execution_time time which should resolve the problem.
But being in my PHP programming infancy I would like to know if there is a more optimal way of doing my script below, so I do not have to rely on "get out of jail cards".
The script is:
1 Taking a CSV file
2 Cherry picking some columns
3 Trying to insert 10k rows of CSV data into a my SQL table
In my head I think I should be able to insert in chunks, but that is so far beyond my skillset I do not even know how to write one line :\
Many thanks in advance
<?php
function processCSV()
{
global $uploadFile;
include 'dbConnection.inc.php';
dbConnection("xx","xx","xx");
$rowCounter = 0;
$loadLocationCsvUrl = fopen($uploadFile,"r");
if ($loadLocationCsvUrl <> false)
{
while ($locationFile = fgetcsv($loadLocationCsvUrl, ','))
{
$officeId = $locationFile[2];
$country = $locationFile[9];
$country = trim($country);
$country = htmlspecialchars($country);
$open = $locationFile[4];
$open = trim($open);
$open = htmlspecialchars($open);
$insString = "insert into countrytable set officeId='$officeId', countryname='$country', status='$open'";
switch($country)
{
case $country <> 'Country':
if (!mysql_query($insString))
{
echo "<p>error " . mysql_error() . "</p>";
}
break;
}
$rowCounter++;
}
echo "$rowCounter inserted.";
}
fclose($loadLocationCsvUrl);
}
processCSV();
?>
First, in 2011 you do not use mysql_query. You use mysqli or PDO and prepared statements. Then you do not need to figure out how to escape strings for SQL. You used htmlspecialchars which is totally wrong for this purpose. Next, you could use a transaction to speed up many inserts. MySQL also supports multiple interests.
But the best bet would be to use the CSV storage engine. http://dev.mysql.com/doc/refman/5.0/en/csv-storage-engine.html read here. You can instantly load everything into SQL and then manipulate there as you wish. The article also shows the load data infile command.
Well, you could create a single query like this.
$query = "INSERT INTO countrytable (officeId, countryname, status) VALUES ";
$entries = array();
while ($locationFile = fgetcsv($loadLocationCsvUrl, ',')) {
// your code
$entries[] = "('$officeId', '$country', '$open')";
}
$query .= implode(', ', $enties);
mysql_query($query);
But this depends on how long your query will be and what the server limit is set to.
But as you can read in other posts, there are better way for your requirements. But I thougt I should share a way you did thought about.
You can try calling the following function before inserting. This will set the time limit to unlimited instead of the 30 sec default time.
set_time_limit( 0 );

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

Categories