I need to convert a CSV file from UCS-2LE to UTF-8 encoding. So far I've tried the following:
$str = file_get_contents($file);
$str = mb_convert_encoding($str, 'UTF-8', 'UCS-2LE');
file_put_contents($newfile, $str);
But the problem is PHP encoding the new file as UTF-8 BOM instead of pure UTF-8 (according to Notepad++).
Notepad++ also have options to set encoding as UTF-8 (without the BOM).
I don't understand why PHP adding BOM on UTF-8 even when I explicitly instructed it to UTF-8 only.
Related
After hours of searching, I can't find a solution for saving a file in a forced UTF-8 encoding. If there is any character in a string which is only available in UTF-8, the file is successfully saved as a UTF-8, but if there are characters which are available in ASCII and UTF-8, the file is saved as ASCII
file_put_contents("test1.xml", "test"); // Saved as ASCII
file_put_contents("test2.xml", "test&"); // Saved as ASCII
file_put_contents("test3.xml", "tëst&"); // Saved as UTF-8
I can add a BOM to force a UTF-8 file, but the receiver of the document does not accept a BOM:
file_put_contents("utf8-force.xml", "\xEF\xBB\xBFtest&"); // Stored as UTF-8 because of the BOM
I did check the encoding with a simple code:
exec('file -I '.$file, $output);
print_r($output);
Since the character & is a single byte in ASCII and a two-byte character is UTF-8, the receiver of the file can't read the file.
Is there a solution to force a file to UTF-8 without a BOM in PHP?
file_put_contents will not convert encoding
You have to convert the string explicitly with mb_convert_encoding
try this :
$data = 'test';
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents("test1.xml", $data);
or you can try using stream_filer
$data = 'test';
$file = fopen('test.xml', 'r');
stream_filter_append($file, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($file, fopen($data, 'w'));
I have a file, which contains some cyrillic characters. When I open this file in Notepad++ I see, that it has ANSI encoding. If I manually encode it into UTF-8 using Notepad++, then everything is absolutely ok - I can use this file in my parsers and get results. But what I want is to do it programmatically, using PHP. This is what I tried after searching through SO and documentation:
file_put_contents($file, utf8_encode(file_get_contents($file)));
In this case when my algorithm parses the resulting files, it meets such letters as "è", "í" , "â". In other words, in this case I get some rubbish. I also tried this:
file_put_contents($file, iconv('WINDOWS-1252', 'UTF-8', file_get_contents($file)));
But it produces the very same rubbish. So, I really wonder how can I achive programmatically what Notepad++ does. Thanks!
Notepad++ may report your encoding as ANSI but this does not necessarily equate to Windows-1252. 1252 is an encoding for the Latin alphabet, whereas 1251 is designed to encode Cyrillic script. So use
file_put_contents($file, iconv('WINDOWS-1251', 'UTF-8', file_get_contents($file)));
to convert from 1251 to utf-8 with iconv.
I have a .csv file encoded in UCS-2LE BOM. I need to make some changes to it and I want to use preg_replace, so I want to convert the file to UTF-8. However, when I convert it, all spaces disappear and all words which belong to one and the same line are sticked together.
My code is :
$content = file_get_contents( "myFile.csv" );
$content = mb_convert_encoding( $content, 'UCS-2LE', 'UTF-8');
What is the proper way to make the conversion so that I do not lose any spaces or characters?
Before converting - screenshot in Excel:
After converting the file:
You should change second line into this:
$content = mb_convert_encoding($content, 'UTF-8', 'UCS-2LE');
2nd argument is TO ENCODING, 3rd is FROM ENCODING.
I am trying to use file_put_contents (and file_get_contents for that matter) with a UTF-8 ¥ following this stackoverflow post: How to write file in UTF-8 format? which uses:
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
Which wasn't really explained well, since it produces an error of:
mb_convert_encoding(): Illegal character encoding specified
So 'OLD-ENCODING' was just a placeholder they were using.
The question I have is what encoding should I change this to? ASCII or ISO-8859-1? What encoding do most web hosts use? Does it matter?
When I open the file, I will get the symbol correctly, only if I have my notepad set with encoding UTF-8. If I open it with another character set it will show up with a "?".
Try without third parameter.
$str = mb_convert_encoding($str, "UTF-8");
Or auto:
$str = mb_convert_encoding($str, "UTF-8", "auto");
More info and examples on:
http://php.net/manual/function.mb-convert-encoding.php
mb_convert_encoding($data, 'UTF-8', mb_detect_encoding($data));
mb_detect_encoding
how to convert Russian character to utf-8 in PHP using mb_convert_encoding or any other method?
Did you try the following? Not sure if it works, though.
mb_convert_encoding($str, 'UTF-8', 'auto');
$file = 'images/да так 1.jpg';//this is in UTF-8, needs to be system encoding (Russian)
$new_filename = mb_convert_encoding($file, "Windows-1251", "utf-8");//turn utf-8 to system encoding Windows-1251 (Russian)
now your russian files should open
your russian characters in php are already utf-8
what you need to do is have the name in the same encoding type as your system encoding
or if you need the opposite...
$new_filename = mb_convert_encoding($file, "utf-8", "Windows-1251");