Import large file on MySQL DB

Import large file on MySQL DB - php

I want to insert about 50,000 mysql query for 'insert' in mysql db,
for this i have 2 options,
1- Directly import the (.sql) file:
Following error is occur
" You probably tried to upload too large file. Please refer to documentation for ways to workaround this limit. "
2- Use php code to insert these queries in form of different chunks from the (.sql) file.
here is my code:
<?php
// Configure DB
include "config.php";
// Get file data
$file = file('country.txt');
// Set pointers & position variables
$position = 0;
$eof = 0;
while ($eof < sizeof($file))
{
for ($i = $position; $i < ($position + 2); $i++)
{
if ($i < sizeof($file))
{
$flag = mysql_query($file[$i]);
if (isset($flag))
{
echo "Insert Successfully<br />";
$position++;
}
else
{
echo mysql_error() . "<br>\n";
}
}
else
{
echo "<br />End of File";
break;
}
}
$eof++;
}
?>
But memory size error is occur however i have extend memory limit from 128M to 256M or even 512M.
Then i think that if i could be able to load a limited rows from (.sql) file like 1000 at a time and execute mysql query then it may be import all records from file to db.
But here i dont have any idea for how to handle file start location to end and how can i update the start and end location, so that it will not fetch the previously fetched rows from .sql file.

Here is the code you need, now prettified! =D
<?php
include('config.php');
$file = #fopen('country.txt', 'r');
if ($file)
{
while (!feof($file))
{
$line = trim(fgets($file));
$flag = mysql_query($line);
if (isset($flag))
{
echo 'Insert Successfully<br />';
}
else
{
echo mysql_error() . '<br/>';
}
flush();
}
fclose($file);
}
echo '<br />End of File';
?>
Basically it's a less greedy version of your code, instead of opening the whole file in memory it reads and executes small chunks (one liners) of SQL statements.

Instead of loading the entire file into memory, which is what's done when using the file function, a possible solution would be to read it line by line, using a combinaison of fopen, fgets, and fclose -- the idea being to read only what you need, deal with the lines you have, and only then, read the next couple of ones.
Additionnaly, you might want to take a look at this answer : Best practice: Import mySQL file in PHP; split queries
There is no accepted answer yet, but some of the given answers might already help you...

Use the command line client, it is far more efficient, and should easily handle 50K inserts:
mysql -uUser -p <db_name> < dump.sql

I read recently about inserting lots of queries into a database to quickly. The article suggested using the sleep() (or usleep) function to delay a few seconds between queries so as not to overload the MySQL server.

Related

php+odbc writing to files limitation

i got a function in PHP to read table from ODBC (to IBM AS400) and write it to a text file on daily basis. it works fine until it reach more than 1GB++. Then it just stop to some rows and didn't write completely.
function write_data_to_txt($table_new, $query)
{
global $path_data;
global $odbc_db, $date2;
if(!($odbc_rs = odbc_exec($odbc_db,$query))) die("Error executing query $query");
$num_cols = odbc_num_fields($odbc_rs);
$path_folder = $path_data.$table_new."/";
if (!file_exists($path_folder)) mkdir ($path_folder,0777);
$filename1 = $path_folder. $table_new. "_" . $date2 . ".txt";
$comma = "|";
$newline = chr(13).chr(10);
$handle = fopen($filename1, "w+");
if (is_writable($filename1)) {
$ctr=0;
while(odbc_fetch_row($odbc_rs))
{
//function for writing all field
// for($i=1; $i<=$num_cols; $i++)
// {
// $data = odbc_result($odbc_rs, $i);
// if (!fwrite($handle, $data) || !fwrite($handle, $comma)) {
// print "Cannot write to file ($filename1)";
// exit;
// }
//}
//end of function writing all field
$data = odbc_result($odbc_rs, 1);
fwrite($handle,$ctr.$comma.$data.$newline);
$ctr++;
}
echo "Write Success. Row = $ctr <br><br>";
}
else
{
echo "Write Failed<br><br>";
}
fclose($handle);
}
no errors, just success message but it should be 3,690,498 rows (and still increase) but i just got roughly 3,670,009 rows
My query is ordinary select like :
select field1 , field2, field3 , field4, fieldetc from table1
What i try and what i assume :
I think it was fwrite limitation so i try not to write all field (just write $ctr and 1st record) but it still stuck in same row.. so i assume its not about fwrite exceed limit..
I try to reduce field i select and it can works completely!! so i assume it have some limitation on odbc.
I try to use same odbc datasource with SQL Server and try to select all field and it give me complete rows. So i assume its not odbc limitation.
Even i try on 64 bits machine but it even worse, it just return roughly 3,145,812 rows.. So i assume it's not about 32/64 bit infrastructure.
I try to increase memory_limit in php ini to 1024mb but it didnt work also..
Is there anyone know if i need to set something in my PHP to odbc connection??

Fwrite Fails w/ PHP Cron

I have this issue where cron runs a php script every 5 minutes to update a list.
However, the list fails to update 5% of the time, and the list ends up blank. I don't believe it's related to cron, because I think I failed to manually generate the list twice out of like 100 tries.
What I believe it's related to is when the site has like 50+ people on it, it will fail to generate, perhaps being related to the server being busy. I added a check to make sure it's not MySQL not returning rows (which seems impossible) but it still does it leads me to believe fwrite is failing.
<?
$fileHandle = fopen("latest.html", 'w');
$links = array();
$query1 = $db_conn -> query("SELECT * FROM `views` ORDER BY `date` DESC LIMIT 0,20");
while ($result1 = $db_conn -> fetch_row($query1))
{
$result2 = $db_conn -> fetch_query("SELECT * FROM `title` WHERE `id` = '" . $result1['id'] . "'");
array_push($links, "<a href='/title/" . $result2['title'] . "'>" . $result2['title'] . "</a>");
}
if (count($links) > 0)
fwrite($fileHandle, implode(" • ", $links));
else
echo "Didn't work!";
fclose($fileHandle);
?>
Could there be a slight chance the file is in use so it ends up not working and writing a blank list?

$fileHandle = "latest.html", 'w');
I'm going to assume you mean
$fileHandle = fopen("latest.html", 'w');
the 'w' here opens the file, places the cursor at the start and truncates the file to zero length.
If you check count($links) before doing this you wont truncate the file when there is nothing to be written to it.
<?php
$links = "QUERY HERE AND HANDLE THE RESULTS (REMOVED)";
if (count($links) > 0)
{
$fileHandle = fopen("latest.html", 'w');
fwrite($fileHandle, implode(" • ", $links));
fclose($fileHandle);
}
else
{
echo "Didn't work!";
}
?>

Could there be a slight chance the file is in use so it ends up not
working and writing a blank list?
Well, yes. We don't know what other code you run that manipulates latest.html, so we can't really profile it.
Here are some suggestions:
Fix the syntax error in your file handler creation
You can acquire a fopen('w') handler to a file that has an existing fopen('r') process going on, so be sure to use PHP's flock while writing to the file to ensure other processes don't corrupt your list
Check to see what your logs have to say
Write to a string, then fwrite the entire string, so you spend less time in your inner loop with your file handler open (especially in this case where it doesn't eem that the string would be that long -- list of links)
Try outputting your links (datetamped) to a separate file besides latest.html; in the 5% chance when it fails, look back at the timetamped links and see how they compare. You can also include your query in that file so you can isolate if the issue is somthing to do with the DB or to do with writing to latest.html -- this will be especially useful in the case where your query (which isn't shown) possibly returns no results.

I think you are leaving yourself open to the possibility that the query is returning no data. The "removed" logic from your example may help shed light on what's going on. A good way of figuring this out is to write something to a log file, and check that log file after a few dozen iterations of your script. In the interest of having something in your latest.html file, I'd use file_put_contents over your current code.
<?php
$links = array();
$query = "SELECT links FROM tableA";
$result = mysql_query($links);
while ($row = mysql_fetch_row($result)) {
$links[] = $row[0];
}
if (count($links) > 0) {
file_put_contents('latest.html', implode(" * ", $links));
file_put_contents('linkupdate.log', "got links: " . count($links) . "\n", FILE_APPEND);
} else {
file_put_contents('linkupdate.log', "No links? [(" . mysql_errno() . ") " . mysql_error() . "]\n", FILE_APPEND);
}
?>
If we find no links, we won't overwrite the previous data file. If we encounter a MySQL error that might be causing the problem, it'll show up in the log output.
A read on the file shouldn't block a write, but switching to file_put_contents will help reduce the time the file is open and empty (there is some latency while you're performing the query and fetching the results).
Feel free to anonymize your query and post that as well - you definitely could have a problem with the result set since your code otherwise seems like it ought to work.

mysql multi_query intermittently fails

function cpanel_populate_database($dbname)
{
// populate database
$sql = file_get_contents(dirname(__FILE__) . '/PHP-Point-Of-Sale/database/database.sql');
$mysqli->multi_query($sql);
$mysqli->close();
}
The sql file is a direct export from phpMyAdmin and about 95% of the time runs without issue and all the tables are created and data is inserted. (I am creating a database from scratch)
The other 5% only the first table or sometimes the first 4 tables are created, but none of the other tables are created (there are 30 tables).
I have decided to NOT use multi_query because it seems buggy and see if the the bug occurs by using just mysql_query on each line after semi-colon. Has anyone ran into issue's like this?

Fast and effective
system('mysql -h #username# -u #username# -p #database# < #dump_file#');

I've seen similar issues when using multi_query with queries that can create or alter tables. In particular, I tend to get InnoDB 1005 errors that seem to be related to foreign keys; it's like MySQL doesn't completely finish one statement before moving on to the next, so the foreign keys lack a proper referent.
In one system, I split the problematic statements into their own files. In another, I have indeed run each command separately, splitting on semicolons:
function load_sql_file($basename, $db) {
// Todo: Trim comments from the end of a line
log_upgrade("Attempting to run the `$basename` upgrade.");
$filename = dirname(__FILE__)."/sql/$basename.sql";
if (!file_exists($filename)) {
log_upgrade("Upgrade file `$filename` does not exist.");
return false;
}
$file_content = file($filename);
$query = '';
foreach ($file_content as $sql_line) {
$tsl = trim($sql_line);
if ($sql_line and (substr($tsl, 0, 2) != '--') and (substr($tsl, 0, 1) != '#')) {
$query .= $sql_line;
if (substr($tsl, -1) == ';') {
set_time_limit(300);
$sql = trim($query, "\0.. ;");
$result = $db->execute($sql);
if (!$result) {
log_upgrade("Failure in `$basename` upgrade:\n$sql");
if ($error = $db->lastError()) {
log_upgrade("$error");
}
return false;
}
$query = '';
}
}
}
$remainder = trim($query);
if ($remainder) {
log_upgrade("Trailing text in `$basename` upgrade:\n$remainder");
if (DEBUG) trigger_error('Trailing text in upgrade script: '.$remainder, E_USER_WARNING);
return false;
}
log_upgrade("`$basename` upgrade successful.");
return true;
}

I have never resorted to multi-query. When I needed something like that, I moved over to mysqli. Also, if you do not need any results from the query, passing the script to mysql_query will also work. You'll also get those errors if there are exports in an incorrect order that clash with require tables for foreign keys and others.

I think the approach of breaking the SQL file to single-queries would be a good idea. Even if its just for comparison purposes (to see if it solves the issue).
Also, I'm not sure how big is your file - but I've had a couple of cases where the file was incredibly big and splitting it into batches did the job.

maximum execution time of 30 seconds exceeded php

When I run my script I receive the following error before processing all rows of data.
maximum execution time of 30 seconds
exceeded
After researching the problem, I should be able to extend the max_execution_time time which should resolve the problem.
But being in my PHP programming infancy I would like to know if there is a more optimal way of doing my script below, so I do not have to rely on "get out of jail cards".
The script is:
1 Taking a CSV file
2 Cherry picking some columns
3 Trying to insert 10k rows of CSV data into a my SQL table
In my head I think I should be able to insert in chunks, but that is so far beyond my skillset I do not even know how to write one line :\
Many thanks in advance
<?php
function processCSV()
{
global $uploadFile;
include 'dbConnection.inc.php';
dbConnection("xx","xx","xx");
$rowCounter = 0;
$loadLocationCsvUrl = fopen($uploadFile,"r");
if ($loadLocationCsvUrl <> false)
{
while ($locationFile = fgetcsv($loadLocationCsvUrl, ','))
{
$officeId = $locationFile[2];
$country = $locationFile[9];
$country = trim($country);
$country = htmlspecialchars($country);
$open = $locationFile[4];
$open = trim($open);
$open = htmlspecialchars($open);
$insString = "insert into countrytable set officeId='$officeId', countryname='$country', status='$open'";
switch($country)
{
case $country <> 'Country':
if (!mysql_query($insString))
{
echo "<p>error " . mysql_error() . "</p>";
}
break;
}
$rowCounter++;
}
echo "$rowCounter inserted.";
}
fclose($loadLocationCsvUrl);
}
processCSV();
?>

First, in 2011 you do not use mysql_query. You use mysqli or PDO and prepared statements. Then you do not need to figure out how to escape strings for SQL. You used htmlspecialchars which is totally wrong for this purpose. Next, you could use a transaction to speed up many inserts. MySQL also supports multiple interests.
But the best bet would be to use the CSV storage engine. http://dev.mysql.com/doc/refman/5.0/en/csv-storage-engine.html read here. You can instantly load everything into SQL and then manipulate there as you wish. The article also shows the load data infile command.

Well, you could create a single query like this.
$query = "INSERT INTO countrytable (officeId, countryname, status) VALUES ";
$entries = array();
while ($locationFile = fgetcsv($loadLocationCsvUrl, ',')) {
// your code
$entries[] = "('$officeId', '$country', '$open')";
}
$query .= implode(', ', $enties);
mysql_query($query);
But this depends on how long your query will be and what the server limit is set to.
But as you can read in other posts, there are better way for your requirements. But I thougt I should share a way you did thought about.

You can try calling the following function before inserting. This will set the time limit to unlimited instead of the 30 sec default time.
set_time_limit( 0 );

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.

I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.

is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html

make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.

You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.

I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.

2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Import large file on MySQL DB - php

Use the command line client, it is far more efficient, and should easily handle 50K inserts: mysql -uUser -p <db_name> < dump.sql

I read recently about inserting lots of queries into a database to quickly. The article suggested using the sleep() (or usleep) function to delay a few seconds between queries so as not to overload the MySQL server.

Related

php+odbc writing to files limitation

Fwrite Fails w/ PHP Cron

mysql multi_query intermittently fails

maximum execution time of 30 seconds exceeded php

This code needs to loop over 3.5 million rows, how can I make it more efficient?

Categories

Resources