I'm trying to convert Hebrew characters from UTF-8 to ISO-8859-8-1 in order to save them into a file.
I have read about ten posts here , in this site,
no matter what I do, I always get question marks (???????) instead of hebrew letters.
I tried iconv(), mb_convert_encoding(), utf8_decode(), all of them convert from UTF-8 to ISO-8859-8-1 but I keep getting '?????????' in the file.
mb_convert_encoding($fullRecord, 'ISO-8859-1', 'UTF-8');
iconv("UTF-8", "ISO-8859-1", $fullRecord);
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $fullRecord);
Even this post didn't help because the solution there is in javascript:
Conversion from UTF8 to ASCII
I wish it could be in php...
I know that there are no hebrew characters in ASCII, but i have an example file that shows it can be done. when I open the file in notepad , it shows hebrew ok and the file is ANSI , so I guess it can be done somehow...
anyone please help?
try
iconv("UTF-8", "windows-1255", $fullRecord);
Related
I have variables with chinese words, their charset is GB2312. I want to convert them to UTF-8 because I want to save them to mysql table with utf-8 encoding. How to do that is PHP? I'm using PHP 7.
Here are what I have tried:
I have tried using $myvar = iconv('gb2312', 'utf-8', $myvar); However some of my variables get empty if it contains some characters (invalid UTF-8 chars maybe?)
I have tried using $myvar = mb_convert_encoding($myvar, 'UTF-8', 'GB2312'); It works better than iconv but when $myvar contain some characters as I mentioned above, they turned into question mark (?)
Please help me, thanks
Update
Here is an example of my chinese string:
GB2312 (Expected result): 第3章︰林鴻
Using mb_convert_encoding become: 第3章?林?
Using iconv become empty
I have a file, which contains some cyrillic characters. When I open this file in Notepad++ I see, that it has ANSI encoding. If I manually encode it into UTF-8 using Notepad++, then everything is absolutely ok - I can use this file in my parsers and get results. But what I want is to do it programmatically, using PHP. This is what I tried after searching through SO and documentation:
file_put_contents($file, utf8_encode(file_get_contents($file)));
In this case when my algorithm parses the resulting files, it meets such letters as "è", "í" , "â". In other words, in this case I get some rubbish. I also tried this:
file_put_contents($file, iconv('WINDOWS-1252', 'UTF-8', file_get_contents($file)));
But it produces the very same rubbish. So, I really wonder how can I achive programmatically what Notepad++ does. Thanks!
Notepad++ may report your encoding as ANSI but this does not necessarily equate to Windows-1252. 1252 is an encoding for the Latin alphabet, whereas 1251 is designed to encode Cyrillic script. So use
file_put_contents($file, iconv('WINDOWS-1251', 'UTF-8', file_get_contents($file)));
to convert from 1251 to utf-8 with iconv.
I'm trying to recieve information from text file, and however when it's in hebrew, it shows "????" instead of the hebrew word
I can't change the file encoding, because ZaraRadio Outputs it, so I tried to set the charset of file to UTF-8, this way:
$npf = "CurrentSong.txt";
$ans = file_get_contents($npf);
$ans = mb_convert_encoding($ans, "UTF-8", "auto");
but it still not working...
any suggestions?
thanks.
Most likely auto will not serve because the file is encoded in a single byte encoding. You don't say which encoding it uses, but ISO-8859-8 is probably it.
$ans = mb_convert_encoding($ans, "UTF-8", "ISO-8859-8");
I'm doing a simple (I thought) directory listing of files, like so:
$files = scandir(DOCROOT.'files');
foreach($files as $file)
{
echo ' <li>'.$file.PHP_EOL;
}
Problem is the files contains norwegian characters (æ,ø,å) and they for some reason come out as question marks. Why is this?
I can apparently fix(?) it by doing this before I echo it out:
$file = mb_convert_encoding($file, 'UTF-8', 'pass');
But it makes little sense to me why this helps, since pass should mean no character encoding conversion is performed, according to the docs... *confused*
Here is an example: http://random.geekality.net/files/index.php
It appears the encoding of the file names is in ISO Latin 1, but the page is interpreted by default using UTF-8. The characters do not come out as "question marks", but as Unicode replacement characters (�). That means the browser, which tries to interpret the byte stream as UTF-8, has encountered a byte invalid in UTF-8 and inserts the character at that point instead. Switch your browser to ISO Latin 1 and see the difference (View > Encoding > ...).
So what you need to do is to convert the strings from ISO Latin 1 to UTF-8, if you designate your page to be UTF-8 encoded. Use mb_convert_encoding($file, 'UTF-8', 'ISO-8859-1') to do so.
Why it works if you specify the $from encoding as pass I can only guess. What you're telling mb_convert_encoding with that is to convert from pass to UTF-8. I guess that makes mb_convert_encoding take the mb_internal_encoding value as the $from encoding, which happens to be ISO Latin 1. I suppose it's equivalent to 'auto' when used as the $from parameter.
INT. PALO TORCIDO HIGH SCHOOL, CAFETER�A - DAY
Hi, I uploaded a .txt to my server and got the contents with fopen/fread and alsot used file_get_contents just in case.
I can't seem to figure out how to encode the special characters...
In my HTML i have my UTF set to 8. I also tried a PHP HEADER to use UTF-8 encoding.
what is the proper way to handle files with letters not part of the english alphabet?
Try utf8_encode()
echo utf8_encode(file_get_contents('file.txt'));
This works if the *.txt is encoded in Latin1. If other encoding may be used too, detect the encoding using mb_detect_encoding() and encode it to UTF8 with mb_convert_encoding()