I am trying to do a CSV bulk import of names into MYSQL, but during the process there is a special characters that halts the operation.
The character over the name - Pérez
Is there a way to have MYSQL to ignore this on upload?
My next step is to automate the upload via a web page where a customer can just upload the CSV file and hit submit, therefore hoping to work out these glitches.
I took the suggestion of the panel and recreated my table as UTF8-Default.
ERROR 1366: Incorrect string value: '\xE9rez' for column 'acct_owner' at row 1
SQL Statement:
I tried this and I still get the same error on that special character, plus now for some reason my auto-increment column does not increment, it just captures the data from the last_update column, therefore everything shifts left.
Well in case you want to insert/load csv data with special character you can try like this
LOAD DATA INFILE 'file.csv'
IGNORE INTO TABLE table
CHARACTER SET UTF8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
To resolve this I ended up rebuilding my DB tables, i then went ahead and prior to submitting my data I cycled through it with some code to remove the characters that I know are causing me problems. While might not be the best way it was certainly the easiest way i found. I initially had a cycle that removed any characters that were not standard A-Z or 0-9 but I discovered that caused me other problems as I needed to see ; and \ and even / in some of my details. Therefore I only concentrated on what I know was breaking the import to MYSQL.
Here is a snip of the code i used:
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
//Replace all the special characters to avoid any insert errors
$data = preg_replace('/\'/', '', $data);
$data = preg_replace('/é/', 'e', $data);
if($firstRow) { $firstRow = false; }
else {
$import="INSERT INTO ..... blah blah blah....
Related
We are writing some php to move data from a Microsoft SQL database to a MySQL database. The Microsoft data contains user input text, with line breaks and all sorts of wacky stuff
The process we've been working with is as follows:
select from a Microsoft SQL database into php array (works fine)
processes this into a .csv by looping the fputcsv() function:
fputcsv($delimiter = ',', $enclosure = '"', $escape_char = "\\");
(the .csv file looks good when I open it with google sheets)
upload the .csv file to the MySQL server with the following code:
LOAD DATA LOCAL INFILE 'c:/filename.csv'
INTO TABLE tablename
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; ```
The problem is that when step 3 is executed, the resulting MySQL table is all jumbled up because it appears to get confused with the line breaks in the data, thinking it's a new line in the table.
I noticed the MySQL LOAD DATA command doesn't take the escape character as an argument, but I can't seem to find a way to give it to it. Is this the issue? How can we make this work?
I receive a csv file that looks fairly straightforward.
I run this on it and it tells me its ASCII.
echo mb_detect_encoding($fhandle , "auto");
However when i run my import code: It doesnt work correctly.
$sql= "LOAD DATA LOCAL INFILE '". $fhandle ."' INTO TABLE sys6_impBet FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES
( AccNo_1,
MtgDate,
Code,
Venue,
Location,
Pool,
EventNo,
Gross_Sales,
Refunds,
Turnover,
Dividends,
Profit_Loss);" ;
It brings in the correct number of records but puts a NULL or 0 in every field / record.
So it is reading the file as it sees the records but won't get the values.
Heres a small sample:
AccNo_1,MtgDate,Code,Venue,Location,Pool,EventNo,Gross_Sales,Refunds,Turnover,Dividends,Profit_Loss
66096159,12/07/2015,Gallops,Penola,SA,Treble,0,279.00,0.00,279.00,"1,955.70","1,676.70"
66096159,12/07/2015,Gallops,Warrnambool,VIC,Treble,0,"1,048.00",0.00,"1,048.00","2,672.80","1,624.80"
66096718,12/07/2015,Gallops,Kalgoorlie,WA,Win,2,783.00,0.00,783.00,"1,174.50",391.50
66096718,12/07/2015,Gallops,Penola,SA,Win,6,204.00,0.00,204.00,"1,143.00",939.00
66096718,12/07/2015,Gallops,Sha Tin,HK,Win,4,197.00,0.00,197.00,"2,064.00","1,867.00"
Is it an encoding problem.
IF I open the file in notepad and save as encoding UTF-8 and save it back down. Then the above code works and all is imported.
But I cant do that for every file every day??
Any ideas I can try?
I have tried this but no different:
$fhandle = mb_convert_encoding($fhandle, "UTF-8", "ASCII");
S
first check the internal encoding used by php
echo mb_internal_encoding();
then set it to required one
mb_internal_encoding("UTF-8");
I'm faced with a problematic CSV file that I have to import to MySQL.
Either through the use of PHP and then insert commands, or straight through MySQL's load data infile.
I have attached a partial screenshot of how the data within the file looks:
The values I need to insert are below "ACC1000" so I have to start at line 5 and make my way through the file of about 5500 lines.
It's not possible to skip to each next line because for some Accounts there are multiple payments as shown below.
I have been trying to get to the next row by scanning the rows for the occurrence of "ACC"
if (strpos($data[$c], 'ACC') !== FALSE){
echo "Yep ";
} else {
echo "Nope ";
}
I know it's crude, but I really don't know where to start.
If you have a (foreign key) constraint defined in your target table such that records with a blank value in the type column will be rejected, you could use MySQL's LOAD DATA INFILE to read the first column into a user variable (which is carried forward into subsequent records) and apply its IGNORE keyword to skip those "records" that fail the FK constraint:
LOAD DATA INFILE '/path/to/file.csv'
IGNORE
INTO TABLE my_table
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 4 LINES
(#a, type, date, terms, due_date, class, aging, balance)
SET account_no = #account_no := IF(#a='', #account_no, #a)
There are several approaches you could take.
1) You could go with #Jorge Campos suggestion and read the file line by line, using PHP code to skip the lines you don't need and insert the ones you want into MySQL. A potential disadvantage to this approach if you have a very large file is that you will either have to run a bunch of little queries or build up a larger one and it could take some time to run.
2) You could process the file and remove any rows/columns that you don't need, leaving the file in a format that can be inserted directly into mysql via command line or whatever.
Based on which approach you decide to take, either myself or the community can provide code samples if you need them.
This snippet should get you going in the right direction:
$file = '/path/to/something.csv';
if( ! fopen($file, 'r') ) { die('bad file'); }
if( ! $headers = fgetcsv($fh) ) { die('bad data'); }
while($line = fgetcsv($fh)) {
echo var_export($line, true) . "\n";
if( preg_match('/^ACC/', $line[0] ) { echo "record begin\n"; }
}
fclose($fh);
http://php.net/manual/en/function.fgetcsv.php
I'm trying to get a CSV imported into a MySQL database, where each new line should represent a new row in the database.
Here is what I have so far in the CSV:
1one, 1two, 1three, 1four
2one, 2two, 2three, 2four
And in the application:
$handle = fopen($_FILES['filename']['tmp_name'], "r");
$data = fgetcsv($handle, 1000, ",");
$sql = "INSERT INTO tbl (col1, col2, col3, col4) VALUES (?, ?, ?, ?)";
$q = $c->prepare($sql);
$q->execute(array($data[0],$data[1],$data[2],$data[3]))
The problem is that only the first four values are being inserted, clearly due to the lack of a loop.
I can think of two options to solve this:
1) Do some "hacky" for loop, that remembers the position of the index, and then does n+1 on each of the inserted array values.
2) Realise that fgetcsv is not the function I need, and there is something better to handle new lines!
Thanks!
while ($data = fgetcsv($handle, 1000, ",")){
//process each $data row
}
You may also wish to set auto_detect_line_endings to true in php.ini, to avoid issues with Mac created CSVs.
Why would you need a script for this? You can do this in 1 simple query:
LOAD DATA LOCAL INFILE '/data/path/to/file.csv' INTO your_db.and_table
FIELDS TERMINATED BY ', ' /* included the space here, bc there's one in your example*/
LINES TERMINATED BY '\n' /* or, on windows box, probably by '\r\n'*/
(`col1`, `col2`, `col3`, `col4`);
That's all there is to it (in this case, mysql manual will provide more options that can be specified like OPITIONALLY ENCLOSED BY etc...)
Ok, as far as injection goes: while inserting it's -to the best of my knowledge- impossible to be an issue. The data is at no point used to build a query from, MySQL just parses it as varchar data and inserts the data (it doesn't execute any of it). The only operation it undergoes is a cast, type cast to int or float if that turns out to be required.
What could happen is that the data does contain query strings that could do harm when you start selecting data from your table. You might be able to set your MySQL server to escape certain characters for this session, or you could just run a str_replace('``','``',$allData); or something in your script.
Bottom line is: I'm not entirely sure, but the risk of injection should be, overall, rather small.
A bit more can be found here
When it comes to temp files, since you're using $_FILES['filename']['tmp_name'], you might want to use your own temp file: file_put_contents('myLoadFile.csv',file_get_contents($_FILES['filename']['tmp_name']));, and delete that file once you're done. It could well be that it's possible to use the tempfile directly, but I haven't tried that, so I don't know (and not going to try today :-P).
How do we deal with field with comma when using load data infile? i have this query:
$sql = "LOAD DATA LOCAL INFILE '{$file}' INTO TABLE sales_per_pgs
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#user_id, #account_code, #pg_code, #sales_value)
SET
user_id = #user_id,
account_code = #account_code,
product_group_code = #pg_code,
sales_value = REPLACE(#sales_value, ',', ''),
company_id = {$company_id},
year = {$year},
month = {$month}";
and a line from the csv looks like this:
139, pg89898, op89890, 1,000,000.00
where 1,000,000.00 is a sales value.
Currently, what is inserted in my database is only "1.
EDIT
The user downloads a form with columns like:
user id, account id, pg id, sales value
where the first three columns user id, account id, pg id, were populated and the sales value column is blank because the user has to fill it up manually... the user uses MS excel to do that...
after the form is completed, he will now upload it, in which i am using the load data infile command...
Your content should really look like:
"139", "pg89898", "op89890", "1,000,000.00"
Then you could add the following to the command:
ENCLOSED BY '"' ESCAPED BY "\\"
And you won't have an issue.
Also, somethign you could try if you don't have any paragraphs or strings with , in them:
FIELDS TERMINATED BY ', '
You will have to alter the CSV file that is being input or alter the output that generates the CSV file - sounds the same but it isn't.
You can modify the data coming in by encapsulating fields with quotes and update your command so that it recognizes that fields are encapsulated with them using a command like ENCLOSED BY '"'
or
alter your output so that it formats the number as 1000000 rather than 1,000,000
had the same problem and used just ENCLOSED BY '"' which fixed my issue since i had mixed numbers and strings which is exctyly what ENCLOSED BY is for , from the manuall :
If you specify OPTIONALLY, the ENCLOSED BY character is used only to
enclose values from columns that have a string data type (such as
CHAR, BINARY, TEXT, or ENUM):
In a CSV, comas separate "columns". Since your last value is 1,000,000.00 it is regarded as 3 different columns instead one just one (as intended).
You can either quote each value(column) or change the number format, by removing the commas (,).
if your entire file is exactly as you wrote, then maybe you could use fields terminated by ', ' (comma + space), if and only if you don't have that string within any individual value. If you are using Linux (or any other Unix like system) and your field separator is comma + space, you can use sed to replace this separator with something else:
sed 's/, /|/g' myfile.csv > myfile.txt
However, I would recommend what has already been said: modify your input file enclosing each value with quotes or double quotes and use fields terminated by ',' optionally enclosed by '"'.
Remember that your field termination character must be unique, and must not be contained within any individual value.
As a workaround, try this one -
LOAD DATA INFILE
...
FIELDS TERMINATED BY ', '
...