Re-encode an entire CSV file before parsing - using simple PHP? - php

I have a script that reads the contents of a remote CSV file, iterates over the lines, and adds the data items to a database. This file has on average about 3000 lines, and therefore 3000 products.
To make a few things clear:
I DO NOT have control over the data in the CSV file beforehand
I DO NOT have access to / control over the manner in which this CSV file is cretaed
The CSV file is dynamically generated once a day, from data in a MySQL database
The problem:
My script only iterates over about 1300 lines then stops, no errors, nothing. All text is enclosed in double quotes, and generally the CSV file seems correctly formatted. The weird thing is this: If I download the CSV file, open it in Notepad++ and change the encoding to UTF-8 WITHOUT BOM, upload that to a test server and run my script on THAT file, I get the FULL 3000 items and all is fine.
So, I am assuming that the people generating this file need to insert the data as UTF-8? Because I cannot control that process, I would like to know if there is a fairly simple manner in which I can apply the UTF-8 WITHOUT BOM encoding to that file, or at least read the file contents into a variable and re-encode that?
Many thanks

You can use iconv to change the encoding directly from php before you process your file.
Edit: The php version of iconv can be used to process the data. If you want to re-encode the file before importing it, you'd have to use the linux command iconv (assuming a LAMP server) using for example exec.

sounds like you are trying to do this directly from the other server. why dont you get the entire file and save it to your own server, do any manipulation to that and then do your processing?

Related

How to parse the character in PHP

I'm using PHP and MySQL LOAD DATA LOCAL INFILE to import about 200,000 lines of data from an external file. However, I get a character issue somewhere with the text KOA.
On Windows server, I got KO when reading this field from database:
On Linux server, I got some unreadable code:
Therefore, I tried to encode the character O in KOA and found it be \x4F\xC2\x9D instead of \x4F...
I'm wondering if there is anyway to parse the whole file and output the correct characters without manually change the content in the file?
Possibly some functions in PHP or using NPM?
First open data file in notepad++
then click on Encoding > convert to UTF-8 and save it.
Now upload data on mysql database server.

Getting garbage value when downloading Word file from MySQL database using PHP

I am getting garbage value when downloading Word file from MySQL database using PHP. The garbage value like:
PK!0É(r¥[Content_Types].xml ¢( ´TÉnÂ0½Wê?D¾V‰¡‡ªªº[¤Ò0ö¬z“ÇlßI QÕB
This looks like a perfectly valid .docx-file when viewed in a text-editor. Note that all MS-Office formats are some kind of binary format and its a non-trivial process to extract the text-contents.
As for .docx: it's basically a bunch of several .xml-files that are zipped together - to see those contents just rename it to .zip, unpack it with your favorite zip-tool and view the contents - you won't be happy with that either :-(

Sanatise CSV content in PHP

I've built a bulk user import engine for my web application and it's working perfectly. I'm now sitting here asking myself, is it secure? After all, the content of this file is being pumped into my database!
Not being the wisest security nerd around I need a little advice here.
Users are not able to rename the file after it's uploaded.
When the file is uploaded, its name is instantly changed.
Files must be .csv and have a csv relative mimetype for the upload to work.
The uploaded file is stored in a directory not accessible via the WWW and is deleted as soon as the import has completed, usually a few hundred milliseconds.
I'm opening the file and removing blank lines during the import
What about the actual content of the file? How can I sanitize the file to ensure it doesn't contain any executable code? I looked at the PHP manual and saw that as of PHP 4.3.5 getcsv() is binary safe, but being totally honest, I'm not 100% sure as to what that means.
I'm currently thinking about converting the CSV content into an array and creating a function that escapes the array content. Any other suggestions or is the above completely safe?
You can try using array_walk() to run mysql_escape_string() or your database's equivalent to be doubly sure everything is kosher.
function escape_sql(&$item, $key)
{
$item = mysql_escape_string($item);
}
array_walk($input_array, 'escape_sql');
If your array is multi-dimensional you can use array_walk_recursive(), which operates similarly.

Best practice - exporting CSV

I am looking for the best way to export a CSV file. With MySQL and PHP.
Currently Im generating an CSV with INTO OUTFILE, it works that way but I don't think it's the good way.
Isn't there a better option to make a CSV export download button for every moment a user clicks the download button?
A INTO OUTFILE export is only possible for one instance and is not overwritable.
I have to generate a timestamp and save the file, and then get the latest file from my directory.
This method looks a bit messy for downloading a CSV file from a server...
Has anyone got better solutions?
Thanks!
I think you are well off with exporting via INTO OUTFILE. The reason is that sending the content to the CSV file is done by the MySQL server. Doing it with the PHP Script would be slower (first of all because it is a script, second of all because the data from the SQL server need to be passed to the script) and cost you more resources.
If the CSV file(s) become large you should keep in mind that your Script still may expire. You can encounter this issue by either setting an higher value for the maximum running time of a script in the configuration or have the CSV file being created by an external process/script
Maybe something like this:
`echo $query | mysql > $unique`;
$contents = file($unique);

Using Excel to view CSV file created in PHP

I am looking for a way to open a CSV file, that was created with a PHP script, in Excel - in such a way that Excel knows how to parse the file. Currently when i double click on a CSV file created with PHP, Excel opens the content in a single column so it does not parse each line. Also, if i do CTRL-O in Excel and select the CSV file to be opened, Excel launches a wizard where i am able to select parsing and encoding option.
Are there any 'headers' or flag characters that i could prepend to the CSV output in PHP to let Excel know how to open a file? I know, for example, that in order for Excel to handle UTF8 encoding, a U+FEFF character needs to be included as the first character in the CSV file, so maybe there is something similar for parsing?
Thanks.
Beware that depending on your xcel version you will or will not have the csv options dialog in Excel when opening direclty the csv file.
latest versions of Excel I've tested needs a special menu usage to get this dialog.
So you should provides what Excels wants.
And what he wants is a tab-separated csv (not comas, funny enough when he save a csv file he use comas but not in his auto-import), without " and without carriage returns in cells, and not in utf8.
Some says he need some sort of UTF16, I can't remember exactly, certainly the UTF-32LE BOM cited by Mark Baker. You will certainly have to transcode your chars.
Then do not forget to set tes text/csv mime type header.
When I see this broken auto-import csv without dialog of the new Excel I wonder if they didn't want to avoid complelty csv usage :-)
Ho, and I saw somewhere in past that there some mysterious formating commands you can use in an pure HTML table export that Excel will understand really better than the csv format.
You should search a lttle about it, maybe really simplier.

Categories