Encoding issue with Apache , displaying diamond characters in browser - php

Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.

Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.

You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>

Related

Bug with php file converted from ansi to utf-8

I have a few php scripts files encoded in ANSI. Now that I converted my website to html5, I need everything in UTF-8, so that accents in these file are displayed correctly without any php conversion through iconv(). I used Notepad++ to set the encoding of my scripts on UTF-8 and save the files, and most are fine, accents are displayed correctly, only the main script now blocks everything, and the server only returns a white page, without any error message, even with ini_set('error_reporting', 'E_ALL') !
When I change the encoding back to ANSI in Notepad++, and save the file without any other change, it works again (except the accents are not displayed correctly without iconv() ).
I did also try to use a php script to change the encoding with ...$file = iconv('ISO-8859-1','UTF-8', $file);... but the result is exactly the same !
I wrote a short php script to look for high char() values, but the highest values seems to be usual French accents like é, è, etc which are also present on other files and pose no problem. I did remove other special chars, without any effect...
The problem is that the file is large, more than 4500 lines and I'm not sure how to proceed to correct this ? Anyone has had this problem, or has any idea ?
The issue was with the "£" (pound) character, I used it a lot as delimiter in preg_match("£(...)£", "...", $string) and preg_replace conditions.
For some reason these characters were not accepted after conversion. I had to replace all of them, then only it worked fine in utf-8... Apparently they are not a problem now that the file is converted, I can use them again.

PHP Character Encoding Error: How they do it?

Problem:
I have a Textarea, that except XML as content and post to server. It works fine if all Ascii characters are there, but when we put data in hebrew then simplexml_load_string fail to load the data, prompting that invalid XML as data encoding breaks the data been posted.
What I did:
I have my HTML meta tag for UTF-8 is set, I do have php header set for content to be UTF-8
I have MySQL set to 'SET NAMES utf8.
When print_r(iconv_get_encoding('all')); it print all three values as ISO-8859-1.
When I print $_POST it shows hebrew characters fine on browser [on Browser view source as well], but still the function failed.
When I change php.ini to take iconv encoding as UTF-8 all works fine again.
However:
Same server does have 100s of Wordpress installation that run Hebrew website, and they don't have such problem.
So, my question is: Why my code is failing but wordpress or any other open source software works just fine with encoding. I did try to set iconv to utf-8 as first executable line, but nothing changed for me.
Not sure I explain my problem fine and my question is clear, if not please let me know. Thanks.
EDIT: I did try utf8_encode and utf8_decode function but they too failed.
You need to use mb_internal_encoding('UTF-8') to tell php what encoding you are using. With this you are overwriting the settings from php.ini.

Arabic characters and UTF-8 in aria2

I use aria2 to have download with XML_RPC and when i want to have a download like this in php :
$client->aria2_addUri( array($url), array("dir"=>'/home/amir/دانلود') );
it will create a folder named شسÛب instead of دانلود. i post a related post in aria2 forums. and they said aria2 has not problem if that string sent to aria2 with utf-8.
so, i used utf-8 header and convert the string to utf-8, but it's not works :
header('Content-type:application/json; charset=utf-8');
$dir_on_server = mb_convert_encoding($dir_on_server, 'UTF-8');
what do you think?
Try accessing the file or folder via the browser.
By writing a .htaccess-file with the content "Options Indexes" so that you're folders are shown.(I can even access them via http)
I created multiple files and folders by writing a script where the GET Value file or folder determines the name of the folder or file, I tried it with japanese and arabic characters. Albeit they won't be shown in FTP correctly (In my case only file names like: "?????") they are correctly displayed if you read them by script.
The problem might be at the program you're using to access your FTP, WinSCP for example has UTF-8 normally on "auto" by default, so forcing it might work out.(Although I have to admit that it's not working on my side, maybe my linux server is not supporting utf-8 file names which can also be a problem for you)
PS:
Also make sure your php-file is encoded(saved) in UTF-8 without BOM since you're using a constant utf-8 string.
EDIT:
Also if you still intent to use mb_convert_encoding, better add the optional parameter "from_encoding".
I tested this with japanese in a SHIFT-JIS encoded file:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8');
and it's not displaying correctly although my browser has UTF-8 activated, so it seems to be not always right when it's trying to detect the Encoding.
So this for example works for me then:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8', 'SJIS'); //from SJIS(SHIFT-JIS)
This little script is nice to findout the optional parameter you want for your arabic characters:
http://www.php.net/manual/de/function.mb-convert-encoding.php#97902
But converting won't be necessary if the file is already in UTF-8, it's only making sense if it's in some arabic encoding, so I think this is not really bringing you any further to the solution.
EDIT2:
Tried a different FTP-Program, Filezilla displays my files and folder, which have japanese names and the arabic one, correctly. (I was using WinSCP 4.3.4 before)

UTF-8 Encoding not working

I know a number of post is there for utf-8 encoding issue. but i'm getting fail to convert string into utf-8.
I have a string "beløp" in php.
When i print this screen in i frame it printed "bel�p".
After that i tried - utf8_encode("beløp"); - now i got output - "bel�p".
Again i tried iconv("UTF-8", "ISO-8859-1", "beløp"); now i got output - "bel ".
And finally i tried - utf8_encode(utf8_decode("beløp")); now i got output - "bel?p".
Please let me know where i'm wrong and how i can fix it.?
This
bel�p
is an indication that you are outputting a non-UTF-8 character in a UTF-8 context.
Make sure your file is encoded in UTF-8 ( Don't know what editor you're using, but Notepad++/Sublime Text got a "Save with encoding.." option ) and if at the top of your HTML page there's
<meta charset="utf-8">
Hi it's fixed there was problem in my file it was not encoded in "UTF-8".
I fixed by replacing "bel�p" to "beløp".
The reason your conversion does not work is because the original format of your "beløp" text was not in iso-8859-1. The utf8_encode will only work for conversions is from this format. What could work for this type of issues is to use mb_detect_encoding function (http://php.net/manual/en/function.mb-detect-encoding.php) to find out which format the text is originally from, then use the iconv convert from the detected encoding to utf-8. When this is done you have to make sure as mentioned on earlier comments that utf-8 is as encoding in the header.
Note that the php mb detect enconding is not very reliable and can make mistakes on detecting correct encoding. Especially if you do not have a large amount of text. To ensure to display all text correct at all times you need to make sure that all processing at all times is in the same encoding. If you get the text from external sources or web services you should always check the headers for correct encoding before the text is processed.

PHP fwrite function to write txt file in utf-8 encoding

I have made a form where a user writes his message in Arabic and submits it by a submit button. The message is saved in database and I need to create a .txt file on the server for some other application which shows something like this :
د پوليسو Ù¾Ø
I successfully used the fopen, fwrite functions to create my txt files.
When I open the file in notepad the Arabic text is shown correctly
but when I open it in eclipse I get something like this :
د پوليسو پر روزنيز مرکز توغندويي بريد وشو
Well afterwards when I save the txt file in notepad as utf-8 encoding the above unknown stuff changes to Arabic.
But I cant do that manually for every message.
I searched a lot on the internet and did these:
I saved the script in utf-8
I used utf8_encode function
I set this too ini_set('default_charset', 'UTF-8');
this too <meta http-equiv="Content-Type" content="text/html; charset=utf-8; encoding=utf-8" />
I change the parameter in fwrite to "wb" where b is for binary
Any solution to this problem ill be very glad I have continuously worked on this issue for the last week. I know the problem is in the encoding so how can I write utf-8 encoded files using PHP?
If the text displays fine in one program but not another, that just means one program interprets the file correctly while the other doesn't. Most likely Notepad sets a UTF-8 BOM on the file when you save it again, so Eclipse now automatically recognizes that it's UTF-8 encoded. Without that, Eclipse assumes latin-1 or some other encoding as the default.
Two options:
change your Eclipse preferences to open files as UTF-8 by default
set a BOM on the file when writing it, see Encoding a string as UTF-8 with BOM in PHP
A BOM can be helpful for making programs recognize UTF-8 but can also cause problems in other programs that don't expect or want BOMs. Whether to use a BOM or not depends on your intended use and target audience.
In eclipse you need to set your encoding in menu Edit > Set Encoding...

Categories