PHP Character Encoding Error: How they do it? - php

Problem:
I have a Textarea, that except XML as content and post to server. It works fine if all Ascii characters are there, but when we put data in hebrew then simplexml_load_string fail to load the data, prompting that invalid XML as data encoding breaks the data been posted.
What I did:
I have my HTML meta tag for UTF-8 is set, I do have php header set for content to be UTF-8
I have MySQL set to 'SET NAMES utf8.
When print_r(iconv_get_encoding('all')); it print all three values as ISO-8859-1.
When I print $_POST it shows hebrew characters fine on browser [on Browser view source as well], but still the function failed.
When I change php.ini to take iconv encoding as UTF-8 all works fine again.
However:
Same server does have 100s of Wordpress installation that run Hebrew website, and they don't have such problem.
So, my question is: Why my code is failing but wordpress or any other open source software works just fine with encoding. I did try to set iconv to utf-8 as first executable line, but nothing changed for me.
Not sure I explain my problem fine and my question is clear, if not please let me know. Thanks.
EDIT: I did try utf8_encode and utf8_decode function but they too failed.

You need to use mb_internal_encoding('UTF-8') to tell php what encoding you are using. With this you are overwriting the settings from php.ini.

Related

Bug with php file converted from ansi to utf-8

I have a few php scripts files encoded in ANSI. Now that I converted my website to html5, I need everything in UTF-8, so that accents in these file are displayed correctly without any php conversion through iconv(). I used Notepad++ to set the encoding of my scripts on UTF-8 and save the files, and most are fine, accents are displayed correctly, only the main script now blocks everything, and the server only returns a white page, without any error message, even with ini_set('error_reporting', 'E_ALL') !
When I change the encoding back to ANSI in Notepad++, and save the file without any other change, it works again (except the accents are not displayed correctly without iconv() ).
I did also try to use a php script to change the encoding with ...$file = iconv('ISO-8859-1','UTF-8', $file);... but the result is exactly the same !
I wrote a short php script to look for high char() values, but the highest values seems to be usual French accents like é, è, etc which are also present on other files and pose no problem. I did remove other special chars, without any effect...
The problem is that the file is large, more than 4500 lines and I'm not sure how to proceed to correct this ? Anyone has had this problem, or has any idea ?
The issue was with the "£" (pound) character, I used it a lot as delimiter in preg_match("£(...)£", "...", $string) and preg_replace conditions.
For some reason these characters were not accepted after conversion. I had to replace all of them, then only it worked fine in utf-8... Apparently they are not a problem now that the file is converted, I can use them again.

Unicode to Character Converter In PHP

I need to convert unicode to character in PHP. I am using MySQL database to store text The text is in unicode format with collation utf8-general-ci. When I retrive those data and display, some special characters are displayed:: like "मिनिसà¥à¤•à¤°à¥à¤Ÿà¤®à¤¾ करà¥à¤•à¥‡ नजर" for the text "मिनिस्कर्टमा कर्के नजर". This is Nepali font in unicode format. I need it in character or ascii format in PHP. I have tried utf8 encode and decode but none of them worked(displays question marks ???? in decoding and "à ¤®à ¤¿à ¤¨à ¤¿à ¤¸à ¥Âà ¤•à ¤°à ¥Âà ¤Ÿà ¤®à ¤¾ à ¤•à ¤°à ¥Âà ¤•à ¥‡ à ¤¨à ¤œà ¤°" on encoding). So, how can I get ascii value or character or unicode value of each unicode characters from mysql database in PHP???
Chnage the collation to utf_bin and in header of your pages <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">. Hope it works.
Ok, I got it. I used this php library and use utf8_chr_to_unicode_style function to convert each unicode charaters to code. I converted all the codes to my required font code(preeti nepali font code). That's all :).
Try function iconv.
It's a function for conversions from different encodings. Please try some at the link provided above. If you cannot manage to figure it out then comment and I will try to research more on the subject.
i have an issue related with the utf-8 charset. I've been all around the web (well, not entirely) but for quite awhile now and the best advice was and is to set the header charset to "UTF-8".
However, I was developing my web application locally on my machine using xampp (and sometimes wamp so as to get a distinction of the two when it came to debugging my code). Everything was working great =). But as soon as i uploaded it online, the result was not all that jazzy (the kind of errors you would get if you had set the headers to a different charset like "iso-8859-1").
Every header in my code has UTF-8 as the default charset, but i still got the same "hieroglyphic thingies". Then you guys gave me the idea that the issue isn't my code but the php.ini that was running it.. Turns out my local machine was running php 5.5 and the cpanel where i had uploaded my web application was running native php 5.3.
Well, when i changed the version of php that my cpanel was set by default from Native PHP 5.3 to PHP 5.5, believe you me guys =) it worked like a charm just like as if i was right there at the localhost of machine.
NOTE: Please, if you got the same problem as i did, just make sure your PHP is 5.5 version.. I'm posting this coz i feel you guys. Cheers!

UTF-8 encode in php - Eurosign missing

I read in a iCal File from a external server via curl. However I use UTF-8 in my outputdocument therefore the german umlauts Ä,Ö.Ü etc don't work (This showed up instead: �).
I correctly assumed the iCal File uses a different charset, and found utf8_encode($value) to solve my problem partly.
All the �s are gone and the proper ä,ö,ü chars are showing up. However I discovered,that a € Sign that was also displayed as � doesnt show up. But the � is also gone.
How do I get my €-sign back? ;)
Thanks

PHP urlencode for chinese characters

I'm creating a php application that involves sending chinese characters as url parameters.
I have to send query like :
http://xyz.com/?q=新
But the script at xyz.com won't automatically encode the chinese character. So, I need to explicitly send an encoded string as the paramter. It becomes:
http://xyz.com/?q=%E6%96%B0
The problem is, PHP won't encode the chinese character properly.
I've tried urlencode() and rawurlencode(). But they give %D0%C2 (doesn't work for my purpose) instead of %E6%96%B0 (works well with xyz.com) as the output.
I'm using this website to create the latter encoded string.
I've also defined header('Content-Type: text/html; charset=gb2312'); to display chinese characters properly.
Is there anything I can do to urlencode the chinese character properly?
Thanks!
PS: I'm a relatively new programmer and don't understand chinese.
You're URLencoding using the charset you specify in your header. %D0%C2 is 新 in gb2312; %E6%96%B0 is 新 in UTF-8. Switch your charset over to UTF-8 and you should fix this issue and still be able to display Simplified Chinese Han.
In order to reproduce your problem I created a simple PHP file:
<?php
var_dump(urlencode('新'));
?>
First I used UTF8 encoding and got %E6%96%B0. Afterwards I changed to GB2312 and got %D0%C2.
At http://meyerweb.com/eric/tools/dencoder/ they seem to use JavaScript, that's UTF8 capable and therefore returns %E6%96%B0, too.
PS: When changing from GB2312 to UTF8 some editors might break code some internationalized code. So please make sure to have a copy of your file before converting!

Encoding issue with Apache , displaying diamond characters in browser

Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.
Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.
You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>

Categories