Hebrew letters ignored selectively by fgetcsv - php

I'm trying to read a CSV file in Hebrew in order to insert multiple posts to Wordpress.
I've saved the excel sheet as CSV (coma delimited).
After some encoding manipulation in Sublime Text, I see the Hebrew content normally in any text editor.
However, when I try to read the contents of the file using fgetcsv the Hebrew letters are being ignored selectively, i.e the letters in the field which are preceded with either a number or a Latin letter, ARE showing correctly. Hebrew Letters before the number/Latin letter are ignored and omitted from the output.
If I use file_get_contents and var_dump it, I get the entire content correctly, so it stands to reason that the problem lies with fgetcsv.
Code in functions.php:
function csv_to_array($filename='', $delimiter=',')
{
if(!file_exists($filename) || !is_readable($filename)) {
return FALSE;
}
$header = NULL;
$data = array();
if (($handle = fopen($filename, 'r')) !== FALSE)
{
while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
{
if(!$header):
$header = $row;
else:
$data[] = $row;
endif;
}
fclose($handle);
}
return $data;
}
used:
if (isset($_FILES['events'])) {
extract($_FILES['events']);
$events = csv_to_array($tmp_name);

Not very likely that the language that gave the world T_PAAMAYIM_NEKUDOTAYIM has now problems with Hebrew letters ;-).
Checking the encoding of the strings (var_dump might not be enough!) and Manvel's solution to this question might be of help to you:
The problem is that the function returns UTF-8 (it can check using
mb_detect_encoding), but do not convert, and these characters
take UTF-8. Тherefore, it's necessary to do the reverse-convert to
initial encoding (Windows-1251 or CP1251) using iconv. But since
fgetcsv returns an array, I suggest to write a custom function:
function customfgetcsv(&$handle, $length, $separator = ';'){
if(($buffer = fgets($handle, $length)) !== false) {
return explode( $separator, iconv( "CP1251", "UTF-8", $buffer ) );
}
return false;
}

Related

Irish accent get changed to wiered charecter while processing the csv file in php (yii1.1 framework)

I know there are a lot of similar question has already been asked in this community but unfortunately nothing gonna work for me.
I have a CSV sheet which I need to import in our system. Sheet is getting imported without any issue in Linux (creating the sheet with Libre Office) even with Irish character.
But main problem starts in Windows and iOS environment with excel (MS-excel) where character encoding get changed. And few of the Irish characters like
Ž, Ŕ and many others are getting changed to different symbols.
P.S : CSV is working fine if we are creating that through Numbers in iOS.
Below is the php method by which I'm reading the CSV sheet.
$path = CUploadedFile::getInstance($model, 'absence_data_file'); // Get the instance of selected file
$target = ['First Name', 'Last Name', 'Class', 'Year', 'From']; // Valid Header
public static function readCSV($path, $target) {
$updated_header = array();
$data = array();
if ($path->type == 'text/csv' || $path->type == 'application/vnd.ms-excel' || $path->type == 'text/plain' || $path->type == 'text/tsv') {
$fp = fopen($path->tempName, 'r');
$encoding_type = mb_detect_encoding(file_get_contents($path->tempName));
if ($fp !== FALSE) {
$header = fgetcsv($fp);
foreach ($header as $h) {
$updated_header[] = $h;
}
$updated_header = array_map( 'trim', array_values($updated_header));
if (array_diff($target, $updated_header)) {
$errormessage = 'Invalid header format.';
return $errormessage;
} else {
while ($ar = fgetcsv($fp)) {
$data[] = array_combine($updated_header, $ar);
}
$data['file_encoding'] = $encoding_type;
return $data;
}
}
} else {
$errormessage = "Invalid File type, You can import CSV files only";
return $errormessage;
}
}
Sheet which I'm importing (Check the pic):
Printing the data (First Record)
I'm not sure about Irish Codepage, but if it is Western European as you mentioned in your comment, I'm guessing your codepage would be ISO-8859-1 or ISO-8859-14, and your line of code should be:
$encoding_type = mb_detect_encoding(file_get_contents($path->tempName), 'ISO-8859-1', true);
or just simply following since you are sure its encoding is 'ISO-8859-1'
$encoding_type = 'ISO-8859-1'
Second and 3rd parameters in mb_detect_encoding tells the function to strictly try and encode using ISO-8859-1 if you want to try other codepages at the same time, you can provide a list of comma separated code pages to second parameter, e.g. UTF-8, ISO-8859-1
Note that you will need to call mb_convert_encoding to actually get the file in your desired encoding, so following code will strictly try to decode from ISO-8859-1 to UTF-8
$UTF8_text = mb_convert_encoding($content, 'UTF-8', 'ISO-8859-1');
if you insist in using fgetcsv, have a look at (mb_internal_encoding)[https://www.php.net/manual/en/function.mb-internal-encoding.php], it will set default encoding.

Read .txt file with PHP

using php, I have to read a txt file (UTF-8 format) in which there are values to be inserted into a mysql db, based on character length.
This is my script:
if (($handle = fopen(DIRECTORY_FILE.'test.txt', "r")) !== FALSE) {
while ($data = fgets($handle)) {
$mod = (int)substr($data, 0,7);
$cod = (int)substr($data, 10,3);
$descr = trim(substr($data,13,255));
$descr2 = trim(substr($data,268,255));
$seq = (int)substr($data, 523,4);
$codord = (int)substr($data, 527,20);
}
fclose($handle);
}
This works well until there are no special characters.
I noticed that when there is a degree symbol " ° ", is counted as 2 length, causing an error in the reading of the txt.
How can I ensure that the count is correct?
Thanks

PHP Reading from a CSV file When Containing HTML Data

I want to read from a .csv file that is separated by semicolons(;) and text delimiter is double quotes(").
The problem is that the file has 2 fields that contain long HTML data which includes double quotes. When I open it in Excel, it's displayed correctly; however, when using the fgetcsv($file, 0, ";")) function, it gives me a messy data due to the double quotes in the HTML code.
Here's what I tried:
$file = fopen($file, "r");
if ($file) {
while (($row = fgetcsv($file, 0, ";")) !== false) {
if (empty($header)) {
$header = $row;
continue;
}
foreach ($row as $key=>$value) {
$array[$header[$key]] = $value;
}
print_r($array);
}
}
Just a note to those that will suggest me to use strip_tags function: I can't o that as I need the HTML content of the data. Besides, I'm not able to change how the data is put in the CSV, I can just read it.
Can someone helps me in overcoming this issue?

PHP str_replace CSV Content

I am getting the contents of a CSV file and displaying (it works).
if (($handle = fopen($url, 'r')) === false) {
die('Error opening file');
}
$headers = fgetcsv($handle, 1024, ',');
$complete = array();
while ($row = fgetcsv($handle, 1024, ',')) {
$complete[] = array_combine($headers, $row);
}
fclose($handle);
However, in this CSV file there is a field that has contents for example like this:
"123456,123456,123456,123456"
I think my code isn't processing because of the double quotes, I think I need to convert to single quotes. If thats the case how would I integrate the following (I was thinking something like):
str_replace('"',"'", $url);
Look at the other parameters for fgetcsv()
By default the enclosure character is set to ", which means anything between quotes should be considered a single value. Replace that parameter with what you actually use as the enclosure character in the csv and it will work.
Something like (if your enclosure character is '):
while ($row = fgetcsv($handle, 1024, ',', "'")) {
Better than to read it wrong and try to fix it afterwards with str_replace.

Some characters in CSV file are not read during PHP fgetcsv()

I am reading a CSV file with php. Many of the rows have a "check mark" which is really the square root symbol: √ and the php code is just skipping over this character every time it is encountered.
Here is my code (printing to the browser window in "CSV style" format so I can check that the lines break at the right place:
$file = fopen($uploadfile, 'r');
while (($line = fgetcsv($file)) !== FALSE) {
foreach ($line as $key => $value) {
if ($value) {
echo $value.",";
}
}
echo "<br />";
}
fclose($file);
As an interim solution, I am just finding and replacing the checkmarks with 1's manually, in Excel. Obviously I'd like a more efficient solution :) Thanks for the help!
fgetcsv() only works on standard ASCII characters; so it's probably "correct" in skipping your square root symbols. However, rather than replacing the checkmarks manually, you could read the file into a string, do a str_replace() on those characters, and then parse it using fgetcsv(). You can turn a string into a file pointer (for fgetcsv) thusly:
$fp = fopen('php://memory', 'rw');
fwrite($fp, (string)$string);
rewind($fp);
while (($line = fgetcsv($fp)) !== FALSE)
...
I had a similar problem with accented first characters of strings. I eventually gave up on fgetscv and did the following, using fgets() and explode() instead (I'm guessing your csv is comma separated):
$file = fopen($uploadfile, 'r');
while (($the_line = fgets($file)) !== FALSE) // <-- fgets
{
$line = explode(',', $the_line); // <-- explode
foreach ($line as $key => $value)
{
if ($value)
{
echo $value.",";
}
}
echo "<br />";
}
fclose($file);
You should setlocale ar written in documentation
Note:
Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.
before fgetcsv add setlocale(LC_ALL, 'en_US.UTF-8'). In my case it was 'lt_LT.UTF-8'.
This behaviour is reported as a php bug

Categories