I'm using PHP and MySQL LOAD DATA LOCAL INFILE to import about 200,000 lines of data from an external file. However, I get a character issue somewhere with the text KOA.
On Windows server, I got KO when reading this field from database:
On Linux server, I got some unreadable code:
Therefore, I tried to encode the character O in KOA and found it be \x4F\xC2\x9D instead of \x4F...
I'm wondering if there is anyway to parse the whole file and output the correct characters without manually change the content in the file?
Possibly some functions in PHP or using NPM?
First open data file in notepad++
then click on Encoding > convert to UTF-8 and save it.
Now upload data on mysql database server.
Related
I have a set of image imported from MSSQL in csv. The file size is 1gb. Datatype in MSSQL is image. When I want to import to Postgres, datatype in bytea, error occured.
ERROR: invalid byte sequence for encoding "UTF8": 0xff
CONTEXT: COPY photo, line 1
When I look into the csv file, the image file is in
0xFFD8FFE000104A46494600010101006000600000FFE1...
My questions:
What datatype in PostgreSQL can be used to import this type of file?
How to retrieve image from this type of file using Postgres and PHP?
Solution that I tried:
I tried to copy just three lines and save to new csv file, import it into the photo table, and it succeed. Weird, why is it when I want to import whole csv table, error occurred.
I have tried this https://stackoverflow.com/a/22211207/3602791 in my php using sample image and it was a success, but when I want to retrieve the three lines image that I imported, it failed saying that my image have an error.
http://pastebin.com/WrfjFqY6 This is a sample of line in the csv. 2 columns, id and photo.
Anyone know how to solve this? Thanks in advance.
As yenyen notes in the comments, the issue was that the input was UCS-2 (probably really UTF-16) encoded.
UCS-2 is a two-byte-per-character encoding that contains null bytes. If you tell PostgreSQL the file is utf-8 then it'll see the input as garbage full of invalid utf-8 sequences. If you tell PostgreSQL it's a simple 1-byte encoding like latin1, PostgreSQL will see the zero (null) byte and realise it's not latin-1 after all.
The trick here is to examine the input file with an editor that can show the raw bytes, not just use a text editor that automagically reads the BOM and loads it as encoded text. If in doubt use a hex editor.
I have two sites I'm developing (in PHP). They are using identical code to provide an XLS export (using PEAR excel) and they are running on the same local server. To rule out a problem with the actual data in the xls, I am just outputting a file with no data for now.
When I export from site A and save the file it's reported as 'ANSI' encoded within Notepad++. This file opens correctly in Excel.
When I export from site B, the file is reported as 'UTF-8' encoded, the file won't open in Excel. If I convert the file to ANSI or UTF-8 without BOM in Textpad++, it opens just fine in Excel.
The same encoding difference is present between site A and B when I save an arbitrary page on the site, so I think it may be more fundamental than just how the Excel file is being generated (same encoding when exporting CSV/ODS formats). I've compared the http headers between site A and B during the export, they are functionally identical. Explicitly adding Charset=ISO-8859-1 to the header makes no difference. The apache virtual hosts are also functionally identical between sites. Both sites are using identical character encodings in their databases (but since I'm not exporting any data right now, this is irrelevant).
What else could be causing this which I haven't accounted for?
Thanks!
UPDATE
The excel generation is a red herring, I've removed all of that and simply outputting the download header and a test string. When saved, the file is still encoded differently between sites. The code which generates the download file seems identical when I diff the various files...
I haven't been able to repeat the problem by creating a simplified test case. When I tried, both sites output files which are saved as ANSI - I don't understand what else could be going on.
the ANSI "mode" just uses the language table you have on your system to save data; you cannot be sure the saved document will be visible to others.
the UTF-8 without BOM means utf8 but without appending some strange utf characters (2 or 3 i think at the top of file), probably causing excel a headache.
Im going always with without bom approach if im thinking i18n
Thanks for all your input into this, it's much appreciated. In the end I tracked it down, a PHP source file was being included somewhere along the way which was encoded UTF-8 rather than ANSI (Windows-1252). I don't really understand why this causes a problem though, since that PHP include doesn't output anything. Very weird and very frustrating, I hope maybe someone else finds my pain useful.
I have a .txt file on our webserver which gets updated and replaced by the client's third party property software.
Does anyone have a script (PHP/MySQL) where I can read this file and import it into a table in my database? Ideally something using codeigniter, but standard PHP will work just fine too.
It is in this format:
"BranchID","PropertyID","PropertyName","Street","DisplayStreet","Postcode","PricePrefix","Price","Bedrooms","Receptions","Bathrooms","ParkingSpaces","Numeric5","Numeric6","Numeric7","Numeric8","Numeric9","AREA","TYPE","FURNISHED","CHILDREN","SMOKING","PETS","GARDEN","DSS","PARKING","cFacility1","cFacility2","cFacility3","cFacility4","cFacility5","cFacility6","cFacility7","cFacility8","cFacility9","cFacility10","Tenure","ShortDescription","MainDescription","AvailabilityCode","AvailabilityDate","FullAddress","PricePrefixPos"
My field names match these headers exactly.
You can use the MYSQL LOAD DATA INFILE directly, see MySQL Reference
This will save come scripting time and will be much faster than importing it via a PHP script.
You may also parse it with PHP.
It looks like a csv.
I would use fgetcsv() to parse the file.
I have a script that reads the contents of a remote CSV file, iterates over the lines, and adds the data items to a database. This file has on average about 3000 lines, and therefore 3000 products.
To make a few things clear:
I DO NOT have control over the data in the CSV file beforehand
I DO NOT have access to / control over the manner in which this CSV file is cretaed
The CSV file is dynamically generated once a day, from data in a MySQL database
The problem:
My script only iterates over about 1300 lines then stops, no errors, nothing. All text is enclosed in double quotes, and generally the CSV file seems correctly formatted. The weird thing is this: If I download the CSV file, open it in Notepad++ and change the encoding to UTF-8 WITHOUT BOM, upload that to a test server and run my script on THAT file, I get the FULL 3000 items and all is fine.
So, I am assuming that the people generating this file need to insert the data as UTF-8? Because I cannot control that process, I would like to know if there is a fairly simple manner in which I can apply the UTF-8 WITHOUT BOM encoding to that file, or at least read the file contents into a variable and re-encode that?
Many thanks
You can use iconv to change the encoding directly from php before you process your file.
Edit: The php version of iconv can be used to process the data. If you want to re-encode the file before importing it, you'd have to use the linux command iconv (assuming a LAMP server) using for example exec.
sounds like you are trying to do this directly from the other server. why dont you get the entire file and save it to your own server, do any manipulation to that and then do your processing?
I am looking for a way to open a CSV file, that was created with a PHP script, in Excel - in such a way that Excel knows how to parse the file. Currently when i double click on a CSV file created with PHP, Excel opens the content in a single column so it does not parse each line. Also, if i do CTRL-O in Excel and select the CSV file to be opened, Excel launches a wizard where i am able to select parsing and encoding option.
Are there any 'headers' or flag characters that i could prepend to the CSV output in PHP to let Excel know how to open a file? I know, for example, that in order for Excel to handle UTF8 encoding, a U+FEFF character needs to be included as the first character in the CSV file, so maybe there is something similar for parsing?
Thanks.
Beware that depending on your xcel version you will or will not have the csv options dialog in Excel when opening direclty the csv file.
latest versions of Excel I've tested needs a special menu usage to get this dialog.
So you should provides what Excels wants.
And what he wants is a tab-separated csv (not comas, funny enough when he save a csv file he use comas but not in his auto-import), without " and without carriage returns in cells, and not in utf8.
Some says he need some sort of UTF16, I can't remember exactly, certainly the UTF-32LE BOM cited by Mark Baker. You will certainly have to transcode your chars.
Then do not forget to set tes text/csv mime type header.
When I see this broken auto-import csv without dialog of the new Excel I wonder if they didn't want to avoid complelty csv usage :-)
Ho, and I saw somewhere in past that there some mysterious formating commands you can use in an pure HTML table export that Excel will understand really better than the csv format.
You should search a lttle about it, maybe really simplier.