After hours of searching, I can't find a solution for saving a file in a forced UTF-8 encoding. If there is any character in a string which is only available in UTF-8, the file is successfully saved as a UTF-8, but if there are characters which are available in ASCII and UTF-8, the file is saved as ASCII
file_put_contents("test1.xml", "test"); // Saved as ASCII
file_put_contents("test2.xml", "test&"); // Saved as ASCII
file_put_contents("test3.xml", "tëst&"); // Saved as UTF-8
I can add a BOM to force a UTF-8 file, but the receiver of the document does not accept a BOM:
file_put_contents("utf8-force.xml", "\xEF\xBB\xBFtest&"); // Stored as UTF-8 because of the BOM
I did check the encoding with a simple code:
exec('file -I '.$file, $output);
print_r($output);
Since the character & is a single byte in ASCII and a two-byte character is UTF-8, the receiver of the file can't read the file.
Is there a solution to force a file to UTF-8 without a BOM in PHP?
file_put_contents will not convert encoding
You have to convert the string explicitly with mb_convert_encoding
try this :
$data = 'test';
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents("test1.xml", $data);
or you can try using stream_filer
$data = 'test';
$file = fopen('test.xml', 'r');
stream_filter_append($file, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($file, fopen($data, 'w'));
Related
I need to convert a CSV file from UCS-2LE to UTF-8 encoding. So far I've tried the following:
$str = file_get_contents($file);
$str = mb_convert_encoding($str, 'UTF-8', 'UCS-2LE');
file_put_contents($newfile, $str);
But the problem is PHP encoding the new file as UTF-8 BOM instead of pure UTF-8 (according to Notepad++).
Notepad++ also have options to set encoding as UTF-8 (without the BOM).
I don't understand why PHP adding BOM on UTF-8 even when I explicitly instructed it to UTF-8 only.
I have a .csv file encoded in UCS-2LE BOM. I need to make some changes to it and I want to use preg_replace, so I want to convert the file to UTF-8. However, when I convert it, all spaces disappear and all words which belong to one and the same line are sticked together.
My code is :
$content = file_get_contents( "myFile.csv" );
$content = mb_convert_encoding( $content, 'UCS-2LE', 'UTF-8');
What is the proper way to make the conversion so that I do not lose any spaces or characters?
Before converting - screenshot in Excel:
After converting the file:
You should change second line into this:
$content = mb_convert_encoding($content, 'UTF-8', 'UCS-2LE');
2nd argument is TO ENCODING, 3rd is FROM ENCODING.
I'm trying to recieve information from text file, and however when it's in hebrew, it shows "????" instead of the hebrew word
I can't change the file encoding, because ZaraRadio Outputs it, so I tried to set the charset of file to UTF-8, this way:
$npf = "CurrentSong.txt";
$ans = file_get_contents($npf);
$ans = mb_convert_encoding($ans, "UTF-8", "auto");
but it still not working...
any suggestions?
thanks.
Most likely auto will not serve because the file is encoded in a single byte encoding. You don't say which encoding it uses, but ISO-8859-8 is probably it.
$ans = mb_convert_encoding($ans, "UTF-8", "ISO-8859-8");
This question already has answers here:
Convert ASCII TO UTF-8 Encoding
(5 answers)
Closed 6 years ago.
I tried to do:
file_put_contents ( $file_name, utf8_encode($data) ) ;
But when i check the file encoding from the shell with the linux command: 'file file_name'
I get: 'file_name: ASCII text'
Does it mean that the utf8_encoding didn't worked? if so, what is the right way to convert from ASCII to UTF8
If your string doesn't contain any non-ASCII characters, then you likely won't see differences, since UTF-8 is backwards compatible with ASCII. Try writing, for example, the text "1000 さくら" and see what happens.
Please note that utf8_encode only converts a string encoded in
ISO-8859-1 to UTF-8. A more appropriate name for it would be
"iso88591_to_utf8". If your text is not encoded in ISO-8859-1, you do
not need this function. If your text is already in UTF-8, you do not
need this function. In fact, applying this function to text that is
not encoded in ISO-8859-1 will most likely simply garble that text.
If you need to convert text from any encoding to any other encoding,
look at iconv() instead.
See http://php.net/manual/en/function.utf8-encode.php
ASCII is a subset of UTF-8, so if a document is ASCII then it is already UTF-8
Found at: Convert ASCII TO UTF-8 Encoding
Try this:
$data = mb_convert_encoding($data, 'UTF-8', 'ASCII');
file_put_contents ( $file_name, $data );
or use this to change file encoding:
$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/ASCII');
stream_copy_to_stream($fd, fopen($output, 'w'));
Reference: How to write file in UTF-8 format?
how to convert Russian character to utf-8 in PHP using mb_convert_encoding or any other method?
Did you try the following? Not sure if it works, though.
mb_convert_encoding($str, 'UTF-8', 'auto');
$file = 'images/да так 1.jpg';//this is in UTF-8, needs to be system encoding (Russian)
$new_filename = mb_convert_encoding($file, "Windows-1251", "utf-8");//turn utf-8 to system encoding Windows-1251 (Russian)
now your russian files should open
your russian characters in php are already utf-8
what you need to do is have the name in the same encoding type as your system encoding
or if you need the opposite...
$new_filename = mb_convert_encoding($file, "utf-8", "Windows-1251");