I have script that read *.CSV file and then export it content to MSSQL Database. Script is running only via CLI.
My problem is that this CSV file contains string with national characters like ą,ó,ż,ź,ś. For example i have word pracowników but in CLI i see word pracownikˇw.
My code
$handler = fopen($file, "r");
if ($handler !== false) {
while (($this->currentRow = fgetcsv($handler, 0, $this->csvDelimiter)) !== false) {
$row = $this->setHeaders(
$this->currentRow,
$this->config[$type]['columnMapping']
);
if ($row !== false) {
$this->dataImported[$type][] = $row;
}
}
fclose($handler);
}
What i tried
Using fgetcsv with setlocale or without - not working.
Replace fgetcsv with fgets and read each line via str_getcsv - not working.
Using utf8_encode for each row - not working.
Additional info
According to my PHP (PHP5.3) and few editors this file is encoded in ANSII, i tried to decoded it with iconv but all special characters are always replace with some strange symbols, like showed before.
On loop of $this->currentRow try to use for each element which has special char.
echo mb_convert_encoding($data[$c],"HTML-ENTITIES","UTF-8");
Related
I have a database that holds stock levels for certain items that are supplied by different suppliers. Each supplier sends me a daily CSV file with their current stock levels. I am trying to update the stock levels into my database.
The problem I am having is that when I extract the data from the CSV and send it through queries, it is not being working properly.
I have echoed the queries prior to sending them and the output is fine. Using phpMyAdmin, if I just paste the code as it is echoed, it works fine. This has led me to believe that it is an encoding problem.
Viewing the CSV file in cPanel File Manager I see there is an odd character at the beginning of the file. (I believe this is caleld a BOM). If I delete this characted and save the CSV file then my code works perfectly and the databse updates as expected.
Editing the file in cPanel File Manager, the Encoding opens as ansi_x3.110-1983. While manually deleting the character will fix the issue, it is not an option as I want this to be a fully automated daily process.
My code to open the file and extract the data from CSV:
// Open File
$csvData = fopen($file, "r");
if($csvData !== FALSE)
{
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
// Close file
fclose($csvData);
My code to build a simple search query
foreach($csvRow as $row)
{
$searchQuery = "SELECT * FROM supplier WHERE supplierItemCode = '".$row[0]."'";
$result = $conn->query($searchQuery);
echo "<br>".$searchQuery;
if($result->num_rows > 0)
{
// CODE NEVER REACHES HERE
}
As mentioned, if I simply paste the echo of $searchQuery into phpMyAdmin and run the query it works fine.
I have tried using fseek($csvData, 2) which successfully removes the BOM characters from the first row of data, but that is having no effect.
As suggested, I have tried using
$csvData = fopen($file, "r");
$BOM = null;
if($csvData !== FALSE)
{
$BOM = fread($csvData, 3);
if($BOM !== FALSE)
{
if($BOM != "\xef\xbb\xbf")
{
echo "<h5>BOM: ".$BOM; // This code is executed every time
fseek($csvData, 0);
}
}
//fseek($csvData, 2); // This was my earlier attempts without the above BOM filter
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
Using the BOM filter method produces this output.
As a further note, you'll notice that in my Update query output, there is a blank space in the SET quantity column. This space is not visible in the csv file.
This query is built with
$updateQuery = "UPDATE supplier SET ".$supplier." = '".$row[2]."' WHERE supplierItemCode = '".$row[0]."'";
Any suggestions on what exactly is causing this issue and how I can get around it.
Thanks in advance.
Try the following modification to the code that opens and reads the CSV file. It checks for the presence of the BOM and bypasses it if present:
$cvsRow = [];
// Open File
$csvData = fopen($file, "r");
if($csvData !== FALSE)
{
$BOM = fread($csvData, 4); // read potential BOM sequences to see if one is present or not
if ($BOM !== FALSE)
{
if (strlen($BOM) >= 3 && substr_compare($BOM, "\xef\xbb\xbf", 0, 3) == 0)
{
fseek($csvData, 3); // found UTF-8 encoded BOM
}
elseif (strlen($BOM) >= 2 && (substr_compare($BOM, "\xfe\xff", 0, 2) == 0 || substr_compare($BOM, "\xff\xfe", 0, 2) == 0))
{
fseek($csvData, 2); // found UTF-16 encoded BOM
}
elseif ($BOM != "\00\00\xfe\xff" && $BOM != "\xff\xfe\00\00")
{
fseek($csvData, 0); // did not find UTF-32 encoded BOM
}
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
// Close file (only if it has been successfully opened)
fclose($csvData);
}
I finally got a solution to work. After doing a lot of investigating, I believed it was encoded in UTF-16, despite what the BOM characters may have been saying.
I just wrote a simple function to convert each CSV value I was passing to the SQL.
function Convert($str)
{
return mb_convert_encoding($str, "UTF-8", "UTF-16BE");
}
........
$updateQuery = "UPDATE supplier SET ".$supplier." = '".Convert($row[2])."' WHERE supplierItemCode = '".Convert($row[0])."'";
I'm not sure why the BOM was causing such issue and why removing it entirely was not working. Thanks for everyone's help that lead me to discover the encoding problem.
I want to be able to read a csv file, decode it with PHP base64_decode() and then write that decoded data to a new file in the same format.
I tried reading the file line by line and then decoding it while it read the file but the data kept coming out corrupt or broken (containing symbols and random characters).
My csv file has only one column of base64 encoded strings with no delimiters. Each string is on its own row and there is only one string per row.
Like so:
ZXhhbXBsZUBlbWFpbC5jb20=
ZXhhbXBsZUBlbWFpbC5jb20=
ZXhhbXBsZUBlbWFpbC5jb20=
ZXhhbXBsZUBlbWFpbC5jb20=
etc...
I want my new file to be in the same format and the same data but it should be decoded.
like so:
example#email.com
example#email.com
example#email.com
example#email.com
etc...
This is how I am reading the data. I tried using trim() inside base64_decode to get rid of any possible white space or characters but it didn't help. I haven't got to the write to csv part yet because I need proper output.
// csv file is uploaded via a form, I move it to the uploads/ directory
$csv_file = $_FILES['file']['name'];
// filename will always be the user uploaded file
$file_name = $csv_file;
// open the file in read
if (($handle = fopen("uploads/".$file_name, "r")) !== FALSE) {
// read the file line by line
while (($data = fgetcsv($handle, 0, ",")) !== FALSE) {
// display column of data
echo base64_decode($data[0]);
}
// close file
fclose($handle);
}
My expected output:
example#email.com
example#email.com
example#email.com
example#email.com
etc...
My actual output:
�XZ�˘��A͡����չ兡��������X\�\�[�\�PXZ�˘��\�\��YM�XZ�˘��G7FWfV�g&GF������6��email#example.com�]�[�ܙ[�XZ�˘��G6ӓ���#T�����6��#7C7##4�����6��ɽ���Ѽ��������兡��������ٜ̌LPXZ�˘��Aɕ�����������email#examplevV�W'6��CCT�����6��v�G7W���d�����6��v���v��&W$�����6��ݥ�����齝兡������wwwemail#exampleemail#exampleۙ�\�MLMP[����]]��NNۚ�XZ�˘��Aщɽݸ������兡������[٘[M�[����Aѡ������͕�٥���
�������ѡ����ѽ�������������[YX���ܝ
Got it working...just needed to auto-detect line endings.
// without this my code breaks, I'm assuming since my csv has no delimiter it was having issues finding the line endings
ini_set('auto_detect_line_endings', TRUE);
// Store each row in this array
$allRowsAsArray = array();
// open file
if (!$fp=fopen("uploads/".$csv_file,"r")) echo "The file could not be opened.<br/>";
// add each row from col into array
while (( $data = fgetcsv ( $fp , 0)) !== FALSE )
{
$allRowsAsArray[] = $data;
}
// decode array line by line, also add linebreaks back in
foreach($allRowsAsArray as $result) {
echo base64_decode($result[0])."\n";
}
<?php
$file = new SplFileObject("data.csv");
while (!$file->eof()) {
echo base64_decode($file->fgetcsv());
}
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 4 years ago.
I wrote a PHP script that connects to a distributor's server, downloads several inventory files, and creates a massive .csv file to import into WooCommerce. Everything works except for one thing: when I look at the exported .csv file, the "x" character in my "caliber" column is always converted to the string "×".
updateInventoryFunctions.php:
function fixCalibers($number, $attributesList) {
$calibers = array (
# More calibers...
"9×23mm Winchester" => '9X23'
);
$pos = array_search($attributesList[$number], $calibers);
if ($pos !== false) {
$attributesList[$number] = $pos;
return $attributesList[$number];
} elseif ($attributesList[$number] == "40SW") {
$attributesList[$number] = '.40 S&W';
return $attributesList[$number];
} # More conditionals...
}
updateInventory.php:
# Code that connects to the distributor's server, downloads files, and places headers into the .csv file.
if (($localHandle = fopen("current_rsr_inventory.csv", "w")) !== false) {
# Code that defines arrays for future fixes and creates a multidimensional array of attributes...
foreach ($tempInventoryFile as &$line) {
$line = explode(";", $line);
# Code that fixes several inconsistencies from the distributor...
$matchingKey = array_search($line[0], $skuList);
$attributesList = array();
if ($matchingKey !== false) {
# Code that fixes more inconsistencies...
if ($attributesList[18] === "" || $attributesList[18] === null) {
array_splice($attributesList, 18, 1);
include_once "updateInventoryFunctions.php";
$attributesList[17] = fixCalibers(17, $attributesList);
} # More conditionals...
# Code that fixes more inconsistencies...
foreach ($attributesList as $attribute) {
$line[] = $attribute;
} // End foreach.
} // End if.
fputcsv($localHandle, $line);
} // End foreach.
} // End if.
# Code that closes files and displays success message...
The caliber "9×23mm Winchester" is displayed as "9×23mm Winchester" in the .csv file. I've tried placing single quotes around the array key and escaping the character "x". There are multiple instances of this mysterious switch.
Thanks in advance for any help!
This is an encoding issue. The character "×" is incorrectly encoded from UTF-8 to ISO-8859-1. Specify the output encoding as UTF-8, for example header('Content-Type: text/html; charset=utf-8');, or manually specify encoding in your browser will solve this issue.
"×" is U+C397, and code point C3 in ISO-8859-1 is tilde A "Ã".
Try to put header on top of your script:
header('Content-Type: text/html; charset=utf-8');
I have the following data being generated from a google spreadsheet rss feed.
いきます,go,5
きます,come,5
かえります,"go home, return",5
がっこう,school,5
スーパー,supermarket,5
えき,station,5
ひこうき,airplane,5
Using PHP I can do the following:
$url = 'http://google.com.....etc/etc';
$data = file_get_contents($url);
echo $data; // This prints all Japanese symbols
But if I use:
$url = 'http://google.com.....etc/etc';
$handle = fopen($url);
while($row = fgetcsv($handle)) {
print_r($row); // Outputs [0]=>,[1]=>'go',[2]=>'5', etc, i.e. the Japanese characters are skipped
}
So it appears the Japanese characters are skipped when using either fopen or fgetcsv.
My file is saved as UTF-8, it has the PHP header to set it as UTF-8, and there is a meta tag in the HTML head to mark it as UTF-8. I don't think it's the document it's self because it can display characters through the file_get_contents method.
Thanks
I can't add comment to the answer from Darien
I reproduce the problem, after change a locale the problem was solved.
You must install jp locale on server before trying repeat this.
Ubuntu
Add a new row to the file /var/lib/locales/supported.d/local
ja_JP.UTF-8 UTF-8
And run command
sudo dpkg-reconfigure locales
Or
sudo locale-gen
Debian
Just execute "dpkg-reconfigure locales" and select necesary locales (ja_JP.UTF-8)
I don't know how do it for other systems, try searching by the keywords "locale-gen locale" for your server OS.
In the php file, before open csv file, add this line
setlocale(LC_ALL, 'ja_JP.UTF-8');
This looks like it might be the same as PHP Bug 48507.
Have you tried changing your PHP locale setting prior to running the code and resetting it afterwards?
You might want to consider this library. I remember using it some time back, and it is much nicer than the built-in PHP functions for handling CSV files. がんばって!
May be iconv character encoding help you
http://php.net/manual/en/function.iconv.php
You can do that by hand not using fgetcsv and friends:
<?php
$file = file('http://google.com.....etc/etc');
foreach ($file as $row) {
$row = preg_split('/,(?!(?:[^",]|[^"],[^"])+")/', trim($row));
foreach ($row as $n => $cell) {
$cell = str_replace('\\"', '"', trim($cell, '"'));
echo "$n > $cell\n";
}
}
Alternatively you can opt in for a more fancy closures-savvy way:
<?php
$file = file('http://google.com.....etc/etc');
array_walk($file, function (&$row) {
$row = preg_split('/,(?!(?:[^",]|[^"],[^"])+")/', trim($row));
array_walk($row, function (&$cell) {
$cell = str_replace('\\"', '"', trim($cell, '"'));
});
});
foreach ($file as $row) foreach ($row as $n => $cell) {
echo "$n > $cell\n";
}
I'm creating a script that will read a csv file and display it on a textarea using fgetcsv.
$handle = #fopen($filePath, "r");
if ($handle)
{
while (($buffer = fgetcsv($handle, 1000,",")) !== false)
{
foreach($buffer as $buff){
echo $buff."\n";
}
}
}
The format of the csv is
"line1-content1","line1-content2"
"line2-content1","line2-content2"
Using fgetcsv, the content will display inside the textarea without double-quote and comma. Can I format it so that it will also display the duoble quotes and comma?
Then upon saving it using fputcsv
$file_to_load = $_GET['filepath'];
$filePath = $dir.$file_to_load;
$trans = trim($_POST['txtarea']);
$keyarr = split("\n",$trans);
$fp = fopen($filePath, 'w');
foreach (array ($keyarr) as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
Looking on the csv file, it saved the csv but displays it like this
"line1-content1
","line1-content2
","line2-content1
","line2-content2"
It separates the "line1-content1" and "line1-content2" into two lines and put a comma after the end of every line.
Now I want to keep the formatting of #2. How will I code it?
Can you guide me into the right direction? Thanks!
Sounds like you want to display the actual raw CSV text, not the parsed data within the CSV. Instead of using fgetcsv(), just use fgets() and you'll get the text line without any parsing, preserving the quotes and commas.
As for fputcsv, it's going to write out what you pass into it, so make sure that whatever's coming back from the form is cleaned up (e.g. extra line breaks stripped out).