Encode file in UTF-16LE in PHP with Byte Order Mask (BOM) - php

I got a PHP that receive data in POST by my Javascript and I want to write these data in a CSV file. I need to encode this file in UTF-16LE.
What I try is :
1)
$data = $_POST['data'];
$data = iconv("UTF-8","UCS-2LE",$data);
The result when I open it in notepad++ is UCS-2 LE without Byte Order Mask.
2)
$data = $_POST['data'];
$data = mb_convert_encoding($data,"UTF-16LE","UTF-8");
The result is the same as the 1)
If I encode then manually in UTF-16LE with notepad++ I got the perfect result.
How can I get PHP to add a Byte Order Mask to the UTF-16 data?

If you want a BOM, you have to add it manually. For little endian, it is FFFE. So
$data = $_POST['data'];
$data = "\xFF\xFE".iconv("UTF-8","UCS-2LE",$data);
should do the trick...
Source: Wikipedia

Related

PHP read a line from a csv file return wrong in charset

I got a csv file, if I set the charset to ISO-8859-2(eastern europe) in Libre Calc, than it renders the characters correctly, but since the server's locale set to EN-UK.
I can not read the characters correctly, for example:
it returns : T�t insted of Tót.
I tried many things like:
echo (mb_detect_encoding("T�t","ISO-8859-2","UTF-8"));
I know probably the char does not exist in UTF-8 but I tried.
Also tried to setup the correct charset in the header:
header('Content-Type: text/html; charset=iso-8859-2');
echo "T�th";
but its returns : TÄĹźËth insted of Tóth.
Please help me solve this, thanks in advance
I advise against setting the header to charset=iso-8859-2'. It is usual to work with UTF-8. If the data is available with a different encoding, it should be converted to UTF-8 and then processed as CSV. The following example code could be kept as simple as the newline characters in UTF-8 and iso-8859-2 are the same.
$fileName = "yourpath/Iso8859_2.csv";
$fp = fopen($fileName,"r");
while($row = fgets($fp)){
$strUtf8 = mb_convert_encoding($row,'UTF-8','ISO-8859-2');
$arr = str_getcsv($strUtf8);
var_dump($arr);
}
fclose($fp);
The exact encoding of the CSV file must be known. mb_detect_encoding is not suitable for determining the encoding of a file.

PHP : converting an UCS-2 LE BOM string to UTF-8 stops working once i write the string to a file

I am currently having a hard time trying to do the simplest thing :
I have a UCS-2 LE BOM encoded file that I am converting to UTF-8.
Here is what Notepad++ says about the encoding :
My converting routine is simple :
I am opening the input file and creating an output file.
I am parsing the input file and converting everyline on-the-go to the UTF-8 format
Once the converting is done, I remove the input file
Once the input file is removed, I rename my output file to the name of the input file
Here is the code that does it :
public function convertCsvToUtf8(string $absolutePathToFile) : string {
$dotPosition = strrpos($absolutePathToFile, ".");
$absolutePathToNewFile = substr($absolutePathToFile, 0, $dotPosition)."-utf8.csv";
$res_input_file = fopen($absolutePathToFile, "r");
$res_output_file = fopen($absolutePathToNewFile, "w+");
while($input_string = fgets($res_input_file)){
$inputEncoding = mb_detect_encoding($input_string, mb_list_encodings(), true);
$output_string = iconv($inputEncoding, 'UTF-8', $input_string);
fputs($res_output_file, ($output_string));
}
fclose($res_input_file);
fclose($res_output_file);
unlink($absolutePathToFile);
rename($absolutePathToNewFile, $absolutePathToFile);
return $absolutePathToFile;
}
Here you can see an example of an execution :
So... everything seems to be okay at a first glance (expect the fact that the "°" is replaced by a weird character); but when I open the output file with Notepad++, here is a sample what I see :
I have no idea what is going on here.
Any help would be awesome !
Feel free to ask for more details !
Thanks in advance,

Encoding issue with PHP while writing in a .csv file

I'm working with a php array which contains some values parsed from a previous scraping process (using Simple HTML DOM Parser). I can normally print / echo the values of this array, which contains special chars é,à,è, etc. BUT, the problem is the following :
When I'm using fwrite to save values in a .csv file, some characters are not successfully saved. For example, Székesfehérvár is well displayed on my php view in HTML, but saved as Székesfehérvár in the .csv file which I generate with the php script above.
I've already set-up several things in the php script :
The page I'm scraping seems to be utf-8 encoded
My PHP script is also declared as utf-8 in the header
I've tried a lot of iconv and mb_encode methods in different places in the code
NOTE that when I'm make a JS console.log of my php array, using json_encode, the characters are also broken, maybe linked to the original encoding of the page I'm scraping?
Here's a part of the script, it is the part who is writing values in a .csv file
<?php
$data = array(
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2")
// ...
);
//filename
$filename = 'myFileName.csv';
foreach($data as $line) {
$string_txt = ""; //declares the content of the .csv as a string
foreach($line as $item) {
//writes a new line of the .csv
$line_txt = "";
//each line of the .csv equals to the values of the php subarray, tab separated
$line_txt .= $item . "\t";
}
//PHP endline constant, indicates the next line of the .csv
$line_txt .= PHP_EOL;
//add the line to the string which is the global content of the .csv
$line_txt .= $string_txt;
}
//writing the string in a .csv file
$file = fopen($filename, 'w+');
fwrite($file, $string_txt);
fclose($file);
I am currently stuck because I can't save values with accentuated characters correctly.
Put this line in your code
header('Content-Type: text/html; charset=UTF-8');
Hope this helps you!
Try it
$file = fopen('myFileName.csv','w');
$data= array_map("utf8_decode", $data);
fputcsv($file,$data);
Excel has problems displaying utf8 encoded csv files. I saw this before. But you can try utf8 BOM. I tried it and works for me. This is simply adding these bytes at the start of your utf8 string:
$line_txt .= chr(239) . chr(187) . chr(191) . $item . "\t";
For more info:
Encoding a string as UTF-8 with BOM in PHP
Alternatively, you can use the file import feature in Excel and make sure the file origin says 65001 : Unicode(UTF8). It should display your text properly and you will need to save it as an Excel file to preserve the format.
The solution (provided by #misorude) :
When scraping HTML contents from webpages, there is a difference between what's displayed in your debug and what's really scraped in the script. I had to use html_entity_decode to let PHP interpret the true value of the HTML code I've scraped, and not the browser's interpretation.
To validate a good retriving of values before store them somewhere, you could try a console.log in JS to see if values are correctly drived :
PHP
//decoding numeric HTML entities who represents "Sóstói Stadion"
$b = html_entity_decode("Sóstói Stadion");
Javascript (to test):
<script>
var b = <?php echo json_encode($b) ;?>;
//print "Sóstói Stadion" correctly
console.log(b);
</script>

how to base64 encode a doc file

I am working on a php project that I need to send base64 encoded doc document to the server using CURL. Here is the code:
$c = file_get_contents("test.doc");
$encoded = base64_encode($c);
I found the resulting $encoded is invalid.
In this website: https://www.base64encode.org/
I uploaded the file and base64 encoded online, the resultant encoded string is correct.
I then tried to cut and paste the text from the doc file and encoded them online at the above website, the resultant string is different than the valid string I got. Therefore, I guess I could not just extract the text from the doc file and base64 encode them.
try this-
$c = file_get_contents("test.doc");
file_put_contents('temp.txt',$c);
$cc = file_get_contents('temp.txt');
if(strlen($cc)=='0'){
$cc = fopen("temp.txt","r");}
$encoded = base64_encode($cc);
unlink('temp.txt');

PHP convert encoding with Shift_JIS

I have a text file. It contains "砡" character and its encoding is Shift-JIS.
I using function file_get_contents() in PHP (Laravel) to read this file, then response in json for client.
$file = file_get_contents("/path/to/file/text");
$file = iconv("SJIS", "UTF-8//IGNORE", $file);
return response()->json(['content' => $file]);
However, this charater "砡" doesn't correctly display, it show to "x".
How do I fix it ?
Try "SJIS-win" instead of "SJIS".

Categories