PHP convert encoding with Shift_JIS

PHP convert encoding with Shift_JIS - php

I have a text file. It contains "砡" character and its encoding is Shift-JIS.
I using function file_get_contents() in PHP (Laravel) to read this file, then response in json for client.
$file = file_get_contents("/path/to/file/text");
$file = iconv("SJIS", "UTF-8//IGNORE", $file);
return response()->json(['content' => $file]);
However, this charater "砡" doesn't correctly display, it show to "x".
How do I fix it ?

Try "SJIS-win" instead of "SJIS".

Related

PHP read a line from a csv file return wrong in charset

I got a csv file, if I set the charset to ISO-8859-2(eastern europe) in Libre Calc, than it renders the characters correctly, but since the server's locale set to EN-UK.
I can not read the characters correctly, for example:
it returns : T�t insted of Tót.
I tried many things like:
echo (mb_detect_encoding("T�t","ISO-8859-2","UTF-8"));
I know probably the char does not exist in UTF-8 but I tried.
Also tried to setup the correct charset in the header:
header('Content-Type: text/html; charset=iso-8859-2');
echo "T�th";
but its returns : TÄĹźËth insted of Tóth.
Please help me solve this, thanks in advance

I advise against setting the header to charset=iso-8859-2'. It is usual to work with UTF-8. If the data is available with a different encoding, it should be converted to UTF-8 and then processed as CSV. The following example code could be kept as simple as the newline characters in UTF-8 and iso-8859-2 are the same.
$fileName = "yourpath/Iso8859_2.csv";
$fp = fopen($fileName,"r");
while($row = fgets($fp)){
$strUtf8 = mb_convert_encoding($row,'UTF-8','ISO-8859-2');
$arr = str_getcsv($strUtf8);
var_dump($arr);
}
fclose($fp);
The exact encoding of the CSV file must be known. mb_detect_encoding is not suitable for determining the encoding of a file.

PHP : converting an UCS-2 LE BOM string to UTF-8 stops working once i write the string to a file

I am currently having a hard time trying to do the simplest thing :
I have a UCS-2 LE BOM encoded file that I am converting to UTF-8.
Here is what Notepad++ says about the encoding :
My converting routine is simple :
I am opening the input file and creating an output file.
I am parsing the input file and converting everyline on-the-go to the UTF-8 format
Once the converting is done, I remove the input file
Once the input file is removed, I rename my output file to the name of the input file
Here is the code that does it :
public function convertCsvToUtf8(string $absolutePathToFile) : string {
$dotPosition = strrpos($absolutePathToFile, ".");
$absolutePathToNewFile = substr($absolutePathToFile, 0, $dotPosition)."-utf8.csv";
$res_input_file = fopen($absolutePathToFile, "r");
$res_output_file = fopen($absolutePathToNewFile, "w+");
while($input_string = fgets($res_input_file)){
$inputEncoding = mb_detect_encoding($input_string, mb_list_encodings(), true);
$output_string = iconv($inputEncoding, 'UTF-8', $input_string);
fputs($res_output_file, ($output_string));
}
fclose($res_input_file);
fclose($res_output_file);
unlink($absolutePathToFile);
rename($absolutePathToNewFile, $absolutePathToFile);
return $absolutePathToFile;
}
Here you can see an example of an execution :
So... everything seems to be okay at a first glance (expect the fact that the "°" is replaced by a weird character); but when I open the output file with Notepad++, here is a sample what I see :
I have no idea what is going on here.
Any help would be awesome !
Feel free to ask for more details !
Thanks in advance,

Encoding issue with PHP while writing in a .csv file

I'm working with a php array which contains some values parsed from a previous scraping process (using Simple HTML DOM Parser). I can normally print / echo the values of this array, which contains special chars é,à,è, etc. BUT, the problem is the following :
When I'm using fwrite to save values in a .csv file, some characters are not successfully saved. For example, Székesfehérvár is well displayed on my php view in HTML, but saved as Székesfehérvár in the .csv file which I generate with the php script above.
I've already set-up several things in the php script :
The page I'm scraping seems to be utf-8 encoded
My PHP script is also declared as utf-8 in the header
I've tried a lot of iconv and mb_encode methods in different places in the code
NOTE that when I'm make a JS console.log of my php array, using json_encode, the characters are also broken, maybe linked to the original encoding of the page I'm scraping?
Here's a part of the script, it is the part who is writing values in a .csv file
<?php
$data = array(
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2")
// ...
);
//filename
$filename = 'myFileName.csv';
foreach($data as $line) {
$string_txt = ""; //declares the content of the .csv as a string
foreach($line as $item) {
//writes a new line of the .csv
$line_txt = "";
//each line of the .csv equals to the values of the php subarray, tab separated
$line_txt .= $item . "\t";
}
//PHP endline constant, indicates the next line of the .csv
$line_txt .= PHP_EOL;
//add the line to the string which is the global content of the .csv
$line_txt .= $string_txt;
}
//writing the string in a .csv file
$file = fopen($filename, 'w+');
fwrite($file, $string_txt);
fclose($file);
I am currently stuck because I can't save values with accentuated characters correctly.

Put this line in your code
header('Content-Type: text/html; charset=UTF-8');
Hope this helps you!

Try it
$file = fopen('myFileName.csv','w');
$data= array_map("utf8_decode", $data);
fputcsv($file,$data);

Excel has problems displaying utf8 encoded csv files. I saw this before. But you can try utf8 BOM. I tried it and works for me. This is simply adding these bytes at the start of your utf8 string:
$line_txt .= chr(239) . chr(187) . chr(191) . $item . "\t";
For more info:
Encoding a string as UTF-8 with BOM in PHP
Alternatively, you can use the file import feature in Excel and make sure the file origin says 65001 : Unicode(UTF8). It should display your text properly and you will need to save it as an Excel file to preserve the format.

The solution (provided by #misorude) :
When scraping HTML contents from webpages, there is a difference between what's displayed in your debug and what's really scraped in the script. I had to use html_entity_decode to let PHP interpret the true value of the HTML code I've scraped, and not the browser's interpretation.
To validate a good retriving of values before store them somewhere, you could try a console.log in JS to see if values are correctly drived :
PHP
//decoding numeric HTML entities who represents "Sóstói Stadion"
$b = html_entity_decode("Sóstói Stadion");
Javascript (to test):
<script>
var b = <?php echo json_encode($b) ;?>;
//print "Sóstói Stadion" correctly
console.log(b);
</script>

Base64Decode to file - whats missing?

I have in base64 encoded string in a $_POST field $_POST['nimage'] if I echo it directly as the src value in an img tag, i see the image just fine in browser: echo "<img src='".$_POST['nimage']."'>";
Now, I'm obviously missing a step, because when I base64_decode the string and write it to a file locally on the server, an attempt to view the created file in browser states error:
"The image 'xxxx://myserversomewhere.com/images/img1.jpg' cannot be displayed because it contains errors"
My decode and file put are:
$file = base64_decode($_POST['nimage']);
file_put_contents('images/'. $_POST['imgname'], $file);
which results in images/img1.jpg on the local server. What am I doing wrong in the decode here? Although the base64 output doesn't appear to be URLencoded I have tried urldecode() on it first before base64_decode() just for safe measure with same results.
First few lines of the base64 encode is:
data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgICAgMCAgIDAwMDBAYEBAQEBAgGBgUGCQgKCgkICQkKDA8MCgsOCwkJDRENDg8QEBEQCgwSExIQEw8QEBD/2wBDAQMDAwQDBAgEBAgQCwkLEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBD/wAARCAF4AqsDAREAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2gJt+XPJPUGv2A/NB2044oAdtY9M8ccCgB6r8+0jtSYDxEW4xz2qQFCnGOPQ0AAQDJIz9KAF8rI6/hQA9Y+SBgjHIqWA5Yxz2xUsBwUdAMdzSAcFGAB0NADgCVK/KB/OgB6BNzc49agse2OgX2BFZvcCRUO7g

The data you're decoding has a data URI header attached:
data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
The header is use by the browser to identify the file type and encoding, but isn't part of the encoded data.
Strip the header (data:image/jpeg;base64,) from the data and base64 decode the rest before writing it to a file: you should be good to go.
$b64 = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...';
$dat = explode(',' $b64);
// element 1 of array from explode() contains B64-encoded data
if (($fileData = base64_decode($dat[1])) === false) {
exit('Base64 decoding error.');
}
file_put_contents($someFileName, $fileData);
NB: Check the return value of your call to base64_decode() for false and abort somehow with a message. It will trap any problems with the decoding process (like not removing the header!).

PHP fputcsv with UTF-8 Problem

I'm trying to allow my clients view some of the MySQL data in Excel. I have used PHP's fputcsv() function, like:
public function generate() {
setlocale(LC_ALL, 'ko_KR.UTF8');
$this->filename = date("YmdHis");
$create = $this->directory."Report".$this->filename.".csv";
$f = fopen("$create","w") or die("can't open file");
fwrite($f, "\xEF\xBB\xBF");
$i = 1;
$length = count($this->inputarray[0]);
fwrite($f, $this->headers."\n");
// print column titles
foreach($this->inputarray[0] as $key=>$value) {
$delimiter = ($i == $length) ? "\n\n" : ",";
fwrite($f, $key.$delimiter);
$i++;
}
// print actual rows
foreach($this->inputarray as $row) {
fputcsv($f, $row);
}
fclose($f);
}
My clients are Korean, and a good chunk of the MySQL database contains values in utf8_unicode_ci. By using the above function, I successfully generated a CSV file with correctly encoded data that opens fine in Excel on my machine (Win7 in English), but when I opened the file in Excel on the client computer (Win7 in Korean), the characters were broken again. I tried taking the header (\xEF\xBB\xBF) out, and commenting out the setlocale, to no avail.
Can you help me figure this out?

If, as you say, your CSV file has "correctly encoded data" - i.e. that it contains a valid UTF-8 byte stream, and assuming that the byte stream of the file on your client's site is the same (e.g. has not been corrupted in transit by a file transfer problem) then it sounds like the issue Excel on the client's machine not correctly interpreting the UTF-8. This might be because it's not supported or that some option needs to be selected when importing to indicate the encoding. As such, you might try producing your file in a different encoding (using mb_convert_encoding or iconv).
If you get your client to export a CSV containing Korean characters then you'll be able to take a look at that file and determine the encoding that is being produced. You should then try using that encoding.

Try encoding the data as UTF-16LE, and ensure that the file has the appropriate BOM.
Alternatively, send your clients an Excel file rather than a CSV, then the encoding shouldn't be a problem

Try wrapping the text in each fwrite call with utf8_encode.
Then use what is suggested here: http://www.php.net/manual/en/function.fwrite.php#69566

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP convert encoding with Shift_JIS - php

Try "SJIS-win" instead of "SJIS".

Related

PHP read a line from a csv file return wrong in charset

PHP : converting an UCS-2 LE BOM string to UTF-8 stops working once i write the string to a file

Encoding issue with PHP while writing in a .csv file

Base64Decode to file - whats missing?

PHP fputcsv with UTF-8 Problem

Categories

Resources