Importing CSV with odd rows into MySQL - php

I'm faced with a problematic CSV file that I have to import to MySQL.
Either through the use of PHP and then insert commands, or straight through MySQL's load data infile.
I have attached a partial screenshot of how the data within the file looks:
The values I need to insert are below "ACC1000" so I have to start at line 5 and make my way through the file of about 5500 lines.
It's not possible to skip to each next line because for some Accounts there are multiple payments as shown below.
I have been trying to get to the next row by scanning the rows for the occurrence of "ACC"
if (strpos($data[$c], 'ACC') !== FALSE){
echo "Yep ";
} else {
echo "Nope ";
}
I know it's crude, but I really don't know where to start.

If you have a (foreign key) constraint defined in your target table such that records with a blank value in the type column will be rejected, you could use MySQL's LOAD DATA INFILE to read the first column into a user variable (which is carried forward into subsequent records) and apply its IGNORE keyword to skip those "records" that fail the FK constraint:
LOAD DATA INFILE '/path/to/file.csv'
IGNORE
INTO TABLE my_table
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 4 LINES
(#a, type, date, terms, due_date, class, aging, balance)
SET account_no = #account_no := IF(#a='', #account_no, #a)

There are several approaches you could take.
1) You could go with #Jorge Campos suggestion and read the file line by line, using PHP code to skip the lines you don't need and insert the ones you want into MySQL. A potential disadvantage to this approach if you have a very large file is that you will either have to run a bunch of little queries or build up a larger one and it could take some time to run.
2) You could process the file and remove any rows/columns that you don't need, leaving the file in a format that can be inserted directly into mysql via command line or whatever.
Based on which approach you decide to take, either myself or the community can provide code samples if you need them.

This snippet should get you going in the right direction:
$file = '/path/to/something.csv';
if( ! fopen($file, 'r') ) { die('bad file'); }
if( ! $headers = fgetcsv($fh) ) { die('bad data'); }
while($line = fgetcsv($fh)) {
echo var_export($line, true) . "\n";
if( preg_match('/^ACC/', $line[0] ) { echo "record begin\n"; }
}
fclose($fh);
http://php.net/manual/en/function.fgetcsv.php

Related

how to have mysql ignore special character in names

I am trying to do a CSV bulk import of names into MYSQL, but during the process there is a special characters that halts the operation.
The character over the name - Pérez
Is there a way to have MYSQL to ignore this on upload?
My next step is to automate the upload via a web page where a customer can just upload the CSV file and hit submit, therefore hoping to work out these glitches.
I took the suggestion of the panel and recreated my table as UTF8-Default.
ERROR 1366: Incorrect string value: '\xE9rez' for column 'acct_owner' at row 1
SQL Statement:
I tried this and I still get the same error on that special character, plus now for some reason my auto-increment column does not increment, it just captures the data from the last_update column, therefore everything shifts left.
Well in case you want to insert/load csv data with special character you can try like this
LOAD DATA INFILE 'file.csv'
IGNORE INTO TABLE table
CHARACTER SET UTF8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
To resolve this I ended up rebuilding my DB tables, i then went ahead and prior to submitting my data I cycled through it with some code to remove the characters that I know are causing me problems. While might not be the best way it was certainly the easiest way i found. I initially had a cycle that removed any characters that were not standard A-Z or 0-9 but I discovered that caused me other problems as I needed to see ; and \ and even / in some of my details. Therefore I only concentrated on what I know was breaking the import to MYSQL.
Here is a snip of the code i used:
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
//Replace all the special characters to avoid any insert errors
$data = preg_replace('/\'/', '', $data);
$data = preg_replace('/é/', 'e', $data);
if($firstRow) { $firstRow = false; }
else {
$import="INSERT INTO ..... blah blah blah....

csv data import into mysql database using php

Hi I need to import a csv file of 15000 lines.
I m using the fgetcsv function and parsing each and every line..
But I get a timeout error everytime.
The process is too slow and data is oly partially imported.
Is there any way out to make the data import faster and more efficient?
if(isset($_POST['submit']))
{
$fname = $_FILES['sel_file']['name'];
$var = 'Invalid File';
$chk_ext = explode(".",$fname);
if(strtolower($chk_ext[1]) == "csv")
{
$filename = $_FILES['sel_file']['tmp_name'];
$handle = fopen($filename, "r");
$res = mysql_query("SELECT * FROM vpireport");
$rows = mysql_num_rows($res);
if($rows>=0)
{
mysql_query("DELETE FROM vpireport") or die(mysql_error());
for($i =1;($data = fgetcsv($handle, 10000, ",")) !== FALSE; $i++)
{
if($i==1)
continue;
$sql = "INSERT into vpireport
(item_code,
company_id,
purchase,
purchase_value)
values
(".$data[0].",
".$data[1].",
".$data[2].",
".$data[3].")";
//echo "$sql";
mysql_query($sql) or die(mysql_error());
}
}
fclose($handle);
?>
<script language="javascript">
alert("Successfully Imported!");
</script>
<?
}
The problem is everytime it gets stuck in between the import process and displays the following errors:
Error 1 :
Fatal Error: Maximum time limit of 30 seconds exceeded at line 175.
Error 2 :
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'S',0,0)' at line 1
This error I m not able to detect...
The file is imported oly partial everytime.. oly around 200 300 lines out of a 10000 lines..
You can build a batch update string for every 500 lines of csv and then execute it at once if you are doing the mysql inserts on each line. It'll be faster.
Another solution is to read the file with an offset:
Read the first 500 lines,
Insert them to the database
Redirect to csvimporter.php?offset=500
Return the 1. step and read the 500 lines starting with offset 500 this time.
Another solution would be setting the timeout limit to 0 with:
set_time_limit(0);
Set this at the top of the page:
set_time_limit ( 0 )
It will make the page run endlessly. However, that is not recommended but if you have no other option then cant help!
You can consult the documentation here.
To make it faster, you need to check your the various SQL you are sending and see if you have proper indexes created.
If you are calling user defined functions and these functions are referring to global variables, then you can minimize the time take even more by passing those variables to the function and change the code so that the function refers to those passed variables. Referring to global variables is slower than local variables.
You can make use of LOAD DATA INFILE which is a mysql utility, this is much faster than fgetcsv
more information is available on
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
simply use this # the beginning of your php import page
ini_set('max_execution_time',0);
PROBLEM:
There is a huge performance impact on the way you INSERT data into your table. For every one of your records you send an INSERT request to the server, 15000 INSERT requests that's huge!
SOLUTION::
Well you should group your data like the way mysqldump does. In your case you just need three insert statement not 15000 as below:
before the loop write:
$q = "INSERT into vpireport(item_code,company_id,purchase,purchase_value)values";
And inside the loop concatenate the records to the query as below:
$q .= "($data[0],$data[1],$data[2],$data[3]),";
Inside the loop check that the counter is equal to 5000 OR 10000 OR 15000 then insert data to the vpireprot table and then set the $q to INSERT INTO... again.
run the query and enjoy!!!
If this is a one-time exercise, PHPMyAdmin supports Import via CSV.
import-a-csv-file-to-mysql-via-phpmyadmin
He also notes the user of leveraging MySQL's LOAD DATA LOCAL INFILE. This is a very fast way to import data into a database table. load-data Mysql Docs link
EDIT:
Here is some pseudo-code:
// perform the file upload
$absolute_file_location = upload_file();
// connect to your MySQL database as you would normally
your_mysql_connection();
// execute the query
$query = "LOAD DATA LOCAL INFILE '" . $absolute_file_location .
"' INTO TABLE `table_name`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(column1, column2, column3, etc)";
$result = mysql_query($query);
Obviously, you need to ensure good SQL practices to prevent injection, etc.

optimizing Code for inserting 27000*2 keys from plain text file to DB

I need to insert data from a plain text file, explode each line to 2 parts and then insert to the database. I'm doing in this way, But can this programme be optimized for speed ?
the file has around 27000 lines of entry
DB structure [unique key (ext,info)]
ext [varchar]
info [varchar]
code:
$string = file_get_contents('list.txt');
$file_list=explode("\n",$string);
$entry=0;
$db = new mysqli('localhost', 'root', '', 'file_type');
$sql = $db->prepare('INSERT INTO info (ext,info) VALUES(?, ?)');
$j=count($file_list);
for($i=0;$i<$j;$i++)
{
$data=explode(' ',$file_list[$i],2);
$sql->bind_param('ss', $data[0], $data[1]);
$sql->execute();
$entry++;
}
$sql->close();
echo $entry.' entry inserted !<hr>';
If you are sure that file contains unique pairs of ext/info, you can try to disable keys for import:
ALTER TABLE `info` DISABLE KEYS;
And after import:
ALTER TABLE `info` ENABLE KEYS;
This way unique index will be rebuild once for all records, not every time something is inserted.
To increase speed even more you should change format of this file to be CSV compatible and use mysql LOAD DATA to avoid parsing every line in php.
When there are multiple items to be inserted you usually put all data in a CSV file, create a temporary table with columns matching CSV, and then do a LOAD DATA [LOCAL] INFILE, and then move that data into destination table. But as I can see you don't need much additional processing, so you can even treat your input file as a CSV without any additional trouble.
$db->exec('CREATE TEMPORARY TABLE _tmp_info (ext VARCHAR(255), info VARCHAR(255))');
$db->exec("LOAD DATA LOCAL INFILE '{$filename}' INTO TABLE _tmp_info
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'"); // $filename = 'list.txt' in your case
$db->exec('INSERT INTO info (ext, info) SELECT t.ext, t.info FROM _tmp_info t');
You can run a COUNT(*) on temp table after that to show how many records were there.
If you have a large file that you want to read in I would not use file_get_contents. By using it you force the interpreter to store the entire contents in memory all at once, which is a bit wasteful.
The following is a snippet taken from here:
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
echo $line;
}
fclose($file_handle);
This is different in that all you are keeping in memory from the file at a single instance in time is a single line (not the entire contents of the file), which in your case will probably lower the run-time memory footprint of your script. In your case, you can use the same loop to perform your INSERT operation.
If you can use something like Talend. It's an ETL program, simple and free (it has a paid version).
Here is the magic solution [3 seconds vs 240 seconds]
ALTER TABLE info DISABLE KEYS;
$db->autocommit(FALSE);
//insert
$db->commit();
ALTER TABLE info ENABLE KEYS;

Prevent duplicates in MYSQL database from uploaded CSV

I am using the following script to upload records to my MYSQL database, the problem I can see is if a client record is uploaded and it already exists in the database and is duplicated.
I have seen lots of posts on here about people asking on how to remove duplicates from the csv file itself on upload, e.g if there are two instances of the name bob and the postcode lh456gl in the csv dont upload it, but what I want to know is if its possible to check the database for a record first before adding that record so not to insert a record that already is there.
So something like :
if exist namecolumn=$name_being_inserted and postcode=postcode_being_inserted then
do not add that record.
Is this even possible to do ?.
<?php
//database connect info here
//check for file upload
if(isset($_FILES['csv_file']) && is_uploaded_file($_FILES['csv_file']['tmp_name'])){
//upload directory
$upload_dir = "./csv";
//create file name
$file_path = $upload_dir . $_FILES['csv_file']['name'];
//move uploaded file to upload dir
if (!move_uploaded_file($_FILES['csv_file']['tmp_name'], $file_path)) {
//error moving upload file
echo "Error moving file upload";
}
//open the csv file for reading
$handle = fopen($file_path, 'r');
while (($data = fgetcsv($handle, 1000, ',')) !== FALSE) {
//Access field data in $data array ex.
$name = $data[0];
$postcode = $data[1];
//Use data to insert into db
$sql = sprintf("INSERT INTO test (name, postcode) VALUES ('%s','%s')",
mysql_real_escape_string($name),
mysql_real_escape_string($postcode)
);
mysql_query($sql) or (mysql_query("ROLLBACK") and die(mysql_error() . " - $sql"));
}
//delete csv file
unlink($file_path);
}
?>
There are two pure MySQL methods that I can think of that would deal with this issue. REPLACE INTO and INSERT IGNORE.
REPLACE INTO will overwrite the existing row whereas INSERT IGNORE will ignore errors triggered by duplicate keys being entered in the database.
This is described in the manual as:
If you use the IGNORE keyword, errors that occur while executing the
INSERT statement are treated as warnings instead. For example, without
IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY
value in the table causes a duplicate-key error and the statement is
aborted. With IGNORE, the row still is not inserted, but no error is
issued.
For INSERT IGNORE to work you will need to setup a UNIQUE key/index on one or more of the fields. Looking at your code sample though you do not have anything that could be considered unique in your insert query. What if there are two John Smiths in Wolverhampton? Ideally you would have something like an email address to define as unique.
Simply create a UNIQUE-key over name and postcode, then a row cannot be inserted when a row with both values for that fields already exists.
I would let the records to be inserted in the database and then, after inserting those records, just execute:
ALTER IGNORE TABLE dup_table ADD UNIQUE INDEX(a,b);
where a, and b are your columns where you don't want to have duplicates (key columns...you can have them more). You can wrap all that into transaction. So, just start transaction, insert all records (no matter if they are duplicates), execute command I wrote, commit transaction and then you can remove that (a, b) unique index to prepare it for the next import. Easy.

How to speed up processing a huge text file?

I have an 800mb text file with 18,990,870 lines in it (each line is a record) that I need to pick out certain records, and if there is a match write them into a database.
It is taking an age to work through them, so I wondered if there was a way to do it any quicker?
My PHP is reading a line at a time as follows:
$fp2 = fopen('download/pricing20100714/application_price','r');
if (!$fp2) {echo 'ERROR: Unable to open file.'; exit;}
while (!feof($fp2)) {
$line = stream_get_line($fp2,128,$eoldelimiter); //use 2048 if very long lines
if ($line[0] === '#') continue; //Skip lines that start with #
$field = explode ($delimiter, $line);
list($export_date, $application_id, $retail_price, $currency_code, $storefront_id ) = explode($delimiter, $line);
if ($currency_code == 'USD' and $storefront_id == '143441'){
// does application_id exist?
$application_id = mysql_real_escape_string($application_id);
$query = "SELECT * FROM jos_mt_links WHERE link_id='$application_id';";
$res = mysql_query($query);
if (mysql_num_rows($res) > 0 ) {
echo $application_id . "application id has price of " . $retail_price . "with currency of " . $currency_code. "\n";
} // end if exists in SQL
} else
{
// no, application_id doesn't exist
} // end check for currency and storefront
} // end while statement
fclose($fp2);
At a guess, the performance issue is because it issues a query for each application_id with USD and your storefront.
If space and IO aren't an issue, you might just blindly write all 19M records into a new staging DB table, add indices and then do the matching with a filter?
Don't try to invent the wheel, it's been done. Use a database to search through the file's content. You can looad that file into a staging table in your database and query your data using indexes for fast access if they add value. Most if not all databases have import/loading tools to get a file into the database relatively fast.
19M rows on DB will slow it down if DB was not designed properly. You can still use text files, if it is partitioned properly. Recreating multiple smaller files, based on certain parameters, storing in proper sorted way might work.
Anyway PHP is not the best language for file IO and processing, it is much slower than Java for this task, while plain old C would be one of the fastest for the job. PHP should be restricted to generated dynamic Web output, while core processing should be in Java/C. Ideally it should be Java/C service which generates output, and PHP using that feed to generate HTML output.
You are parsing the input line twice by doing two explodes in a row. I would start by removing the first line:
$field = explode ($delimiter, $line);
list($export_date, ...., $storefront_id ) = explode($delimiter, $line);
Also, if you are only using the query to test for a match based on your condition, don't use SELECT * use something like this:
"SELECT 1 FROM jos_mt_links WHERE link_id='$application_id';"
You could also, as Brandon Horsley suggested, buffer a set of application_id values in an array and modify your select statement to use the IN clause thereby reducing the number of queries you are performing.
Have you tried profiling the code to see where it's spending most of its time? That should always be your first step when trying to diagnose performance problems.
Preprocess with sed and/or awk ?
Databases are built and designed to cope with large amounts of data, PHP isn't. You need to re-evaluate how you are storing the data.
I would dump all the records into a database, then delete the records you don't need. Once you have done that, you can copy those records wherever you want.
As others have mentioned, the expense is likely in your database query. It might be faster to load a batch of records from the file (instead of one at a time) and perform one query to check multiple records.
For example, load 1000 records that match the USD currency and storefront at a time into an array and execute a query like:
'select link_id from jos_mt_links where link_id in (' . implode(',', application_id_array) . ')'
This will return a list of those records that are in the database. Alternatively, you could change the sql to be not in to get a list of those records that are not in the database.

Categories