How to use Tamil character in JSON php
<?php
/* Author : Girija S
Date : 4/21/2011
Description: To Check the Special Chars when we pass in the json server
*/
$text = "தமிழ் அகராதி With the exception <br>of HTML 2.0's ", &, <, and >, these entities are 'all' <br>new<br/> in HTML 4.0 and may not be supported by old browsers. Support in recent browsers is good.The following table gives the character entity <p>reference, decimal character reference, and hexadecimal character reference for markup-significant</p> and internationalization characters\n, as well as the rendering of each in your browser. Glyphs of the characters are available at the Unicode Consortium.<p>This is some text in a paragraph.</p>";
$text = json_encode(utf8_encode($text));
echo $text;
$text = json_decode($text);
echo $text;
?>
"meta http-equiv="Content-Type" content="text/html; charset=UTF-8""
use this in header it will solve the problem..
if you want to store in a data base you should yous
"mysql_query ("set character_set_results='utf8'"); "
before query..
I did like that and got success for my financial tamil application
<?php
//Try it ... working script.. add MIME type and Font characterset in header
header('Content-type="application/json"');
header('charset="utf-8"');
$text = "தமிழ் அகராதி With the exception <br>of HTML 2.0's ", &, <, and >, these entities are 'all' <br>new<br/> in HTML 4.0 and may not be supported by old browsers. Support in recent browsers is good.The following table gives the character entity <p>reference, decimal character reference, and hexadecimal character reference for markup-significant</p> and internationalization characters\n, as well as the rendering of each in your browser. Glyphs of the characters are available at the Unicode Consortium.<p>This is some text in a paragraph.</p>";
echo $text = json_encode($text);
echo '<br/><br/><br/>******************************************************************************<br/><br/>';
echo $text = json_decode($text, JSON_PRETTY_PRINT | JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP | JSON_UNESCAPED_UNICODE);
?>
Related
I'm trying to convert a string from this: “é” to this: “é”. It's a latin1 character but I can't do it right. So far I've tried two functions but none of them give me the right output.
$translation = 'Copà © rnico was Italian';
$translation = mb_convert_encoding($translation, 'utf-8', 'iso-8859-1'); //opt 1
$translation = iconv('utf-8', 'latin1', $translation); //opt 2
I'm getting this data from an Api so I don't know what's going on in the database.
This is the string in Spanish: Copérnico es italiano.
This is the data from the API: Copà © rnico is Italian
This is the result with $translation = bin2hex($translation);
436f70c38320c2a920726e69636f206973204974616c69616e
What's the right way to go? Greetings.
I had the same problem before and this option
$translation = iconv('utf-8', 'latin1', $translation); //opt 2
work verry well.
Your problem is `Copà © rnico was Italian` is not the same than `Copérnico was Italian`.
So when you try to convert the function iconv see 2 wrong UTF-8 symbols because de spaces, is not the same "à © "(2 invalid UTF-8 symbols and 2 spaces) than "é"(1 Valid UTF-8 symbol)
I am trying to convert Unicode character to text in PHP. But the string is the mixture of Unicode characters and text. But it is not working.
I followed this link (Unicode character in PHP string)
<?php
$unicodeChar = "{'singer': u'', 'name': u'\\\\u101c\\\\u1031\\\\u1011\\\\u1032\\\\u101c\\\\u103d\\\\u103e\\\\u1004\\\\u1037\\\\u103a\\\\u101c\\\\u102d\\\\u102f\\\\u1000\\\\u103a'}\\r\\n\\r\\n artist : Thar Gyi\\r\\n album : Sal Pone Ta Pone\\r\\n genre : R&B\\r\\n copyright : MyanmarSongs.NET\\r\\n track : 1\\r\\n title : Lay Htal Hlwint Lite";
echo json_decode('"'.$unicodeChar.'"');
echo mb_convert_encoding($unicodeChar, 'UTF-8', 'HTML-ENTITIES');
echo mb_convert_encoding($unicodeChar, 'UTF-8', 'UTF-16BE'); showing nothing
?>
All the above scenarios are not working when the value is the mixtures of Unicode characters and text like I used. But it is working when the value is so simple like this:
$unicodeChar = '\u1000';
echo json_decode('"'.$unicodeChar.'"');
How can I achieve this?
use following code
$unicodeChar = '\u1000';
echo json_decode('"'.$unicodeChar.'"');
Basically I have this string:
Český, Deutsch, English (US), Español (ES), Français (France), Italiano, 日本語, 한국어, Polski, 中文(繁體)
And I want to convert it into all possible HTML entities (there might be russian characters too!).
I've tried to make different "htmlspecialchars" and "htmlentities" function with different charsets but it returns empty strings...
$l = htmlentities("Český, Deutsch, English (US), Español (ES), Français (France), Italiano, 日本語, 한국어, Polski, 中文(繁體) €", ENT_COMPAT, "BIG5-HKSCS");
$l = htmlentities($l, ENT_COMPAT, "KOI8-R");
$l = htmlentities($l, ENT_COMPAT, "EUC-JP");
$l = htmlentities($l, ENT_COMPAT, "Shift_JIS");
$l = htmlentities($l, ENT_COMPAT, "Shift_JIS");
echo $l;
returns an empty string.
Any help?
Here's my "unutf8" function, which converts all UTF8 characters into HTML entities of the form 〹
function unutf8($str) {
return preg_replace_callback("([\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3}|[\xF8-\xFB][\x80-\xBF]{4}|[\xFC-\xFD][\x80-\xBF]{5})",
function($m) {
$c = $m[0];
$out = bindec(ltrim(decbin(ord($c[0])),"1"));
$l = strlen($c);
for( $i=1; $i<$l; $i++) {
$out = ($out<<6) | bindec(ltrim(decbin(ord($c[$i])),"1"));
}
if( $out < 256) return chr($out);
return "&#".$out.";";
},$str);
}
It parses the string for valid UTF8 character sequences and converts the multi-byte sequence into the ordinal value of the character. It's very messy and I don't expect to win any awards for good coding with this, but it works.
Please note, however, that if you have unencoded characters then you WILL run into problems. For example, if for some reason you have é©© then the result will be 驩. Please make sure your string is valid UTF8 before passing it to the function.
Use header to modify the HTTP header to utf-8:
header('Content-Type: text/html; charset=utf-8');
Also, make sure your HTML document is also in utf-8:
<meta http-equiv="Content-type" content="text/html" charset="utf-8" />
Don't go for tough solutions and just follow this small and simple steps :
1) mysql_set_charset("utf8", $conn); set this with your config connection code.
or
2) mysql_query("SET NAMES 'UTF8'");
enter your query here........
mysql_set_charset("UTF8", queryResult);
Test string:
$s = "convert this: ";
$s .= "–, —, †, ‡, •, ≤, ≥, μ, ₪, ©, ® y ™, ⅓, ⅔, ⅛, ⅜, ⅝, ⅞, ™, Ω, ℮, ∑, ⌂, ♀, ♂ ";
$s .= "but, not convert ordinary characters to entities";
$encoded = mb_convert_encoding($s, 'HTML-ENTITIES', 'UTF-8');
asssuming your input string is UTF-8, this should encode most everything into numeric entities.
Well htmlentities doesn't work correctly. Fortunately someone has posted code on the php website that seems to do the translation of multibyte characters properly
I did work on decoding ascii into html coded text (&#xxxx). https://github.com/hellonearthis/ascii2web
I have the following character encoding issue, somehow I have managed to save data with different character encoding into my database (UTF8) The code and outputs below show 2 sample strings and how they output. 1 of them would need to be changed to UTF8 and the other already is.
How do/should I go about checking if I should encode the string or not? e.g.
I need each string to be outputted correctly, so how do I check if it is already utf8 or whether it needs to be converted?
I am using PHP 5.2, mysql myisam tables:
CREATE TABLE IF NOT EXISTS `entities` (
....
`title` varchar(255) NOT NULL
....
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
<?php
$text = $entity['Entity']['title'];
echo 'Original : ', $text."<br />";
echo 'UTF8 Encode : ', utf8_encode($text)."<br />";
echo 'UTF8 Decode : ', utf8_decode($text)."<br />";
echo 'TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//TRANSLIT", $text)."<br />";
echo 'IGNORE TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//IGNORE//TRANSLIT", $text)."<br />";
echo 'IGNORE : ', iconv("ISO-8859-1", "UTF-8//IGNORE", $text)."<br />";
echo 'Plain : ', iconv("ISO-8859-1", "UTF-8", $text)."<br />";
?>
Output 1:
Original : France Télécom
UTF8 Encode : France Télécom
UTF8 Decode : France T�l�com
TRANSLIT : France Télécom
IGNORE TRANSLIT : France Télécom
IGNORE : France Télécom
Plain : France Télécom
Output 2:###
Original : Cond� Nast Publications
UTF8 Encode : Condé Nast Publications
UTF8 Decode : Cond?ast Publications
TRANSLIT : Condé Nast Publications
IGNORE TRANSLIT : Condé Nast Publications
IGNORE : Condé Nast Publications
Plain : Condé Nast Publications
Thanks for you time on this one. Character encoding and I don't get on very well!
UPDATE:
echo strlen($string)."|".strlen(utf8_encode($string))."|";
echo (strlen($string)!==strlen(utf8_encode($string))) ? $string : utf8_encode($string);
echo "<br />";
echo strlen($string)."|".strlen(utf8_decode($string))."|";
echo (strlen($string)!==strlen(utf8_decode($string))) ? $string : utf8_decode($string);
echo "<br />";
23|24|Cond� Nast Publications
23|21|Cond� Nast Publications
16|20|France Télécom
16|14|France Télécom
This may be a job for the mb_detect_encoding() function.
In my limited experience with it, it's not 100% reliable when used as a generic "encoding sniffer" - It checks for the presence of certain characters and byte values to make an educated guess - but in this narrow case (it'll need to distinguish just between UTF-8 and ISO-8859-1 ) it should work.
<?php
$text = $entity['Entity']['title'];
echo 'Original : ', $text."<br />";
$enc = mb_detect_encoding($text, "UTF-8,ISO-8859-1");
echo 'Detected encoding '.$enc."<br />";
echo 'Fixed result: '.iconv($enc, "UTF-8", $text)."<br />";
?>
you may get incorrect results for strings that do not contain special characters, but that is not a problem.
I made a function that addresses all this issues. It´s called Encoding::toUTF8().
<?php
$text = $entity['Entity']['title'];
echo 'Original : ', $text."<br />";
echo 'Encoding::toUTF8 : ', Encoding::toUTF8($text)."<br />";
?>
Output:
Original : France Télécom
Encoding::toUTF8 : France Télécom
Original : Cond� Nast Publications
Encoding::toUTF8 : Condé Nast Publications
You dont need to know what the encoding of your strings is as long as you know it is either on Latin1 (iso 8859-1), Windows-1252 or UTF8. The string can have a mix of them too.
Encoding::toUTF8() will convert everything to UTF8.
I did it because a service was giving me a feed of data all messed up, mixing UTF8 and Latin1 in the same string.
Usage:
$utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);
$latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);
Download:
http://dl.dropbox.com/u/186012/PHP/forceUTF8.zip
I've included another function, Encoding::fixUFT8(), wich will fix every UTF8 string that looks garbled.
Usage:
$utf8_string = Encoding::fixUTF8($garbled_utf8_string);
Examples:
echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂédÃÂération Camerounaise de Football");
echo Encoding::fixUTF8("Fédération Camerounaise de Football");
will output:
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Another way, maybe faster and less unreliable:
echo (strlen($str)!==strlen(utf8_decode($str)))
? $str //is multibyte, leave as is
: utf8_encode($str); //encode
It compares the length of the original string and the utf8_decoded string.
A string that contains a multibyte-character, has a strlen which differs from the similar singlebyte-encoded strlen.
For example:
strlen('Télécom')
should return 7 in Latin1 and 9 in UTF8
I made these little 2 functions that work well with UTF-8 and ISO-8859-1 detection / conversion...
function detect_encoding($string)
{
//http://w3.org/International/questions/qa-forms-utf-8.html
if (preg_match('%^(?: [\x09\x0A\x0D\x20-\x7E] | [\xC2-\xDF][\x80-\xBF] | \xE0[\xA0-\xBF][\x80-\xBF] | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} | \xED[\x80-\x9F][\x80-\xBF] | \xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2} )*$%xs', $string))
return 'UTF-8';
//If you need to distinguish between UTF-8 and ISO-8859-1 encoding, list UTF-8 first in your encoding_list.
//if you list ISO-8859-1 first, mb_detect_encoding() will always return ISO-8859-1.
return mb_detect_encoding($string, array('UTF-8', 'ASCII', 'ISO-8859-1', 'JIS', 'EUC-JP', 'SJIS'));
}
function convert_encoding($string, $to_encoding, $from_encoding = '')
{
if ($from_encoding == '')
$from_encoding = detect_encoding($string);
if ($from_encoding == $to_encoding)
return $string;
return mb_convert_encoding($string, $to_encoding, $from_encoding);
}
If your database contains strings in 2 different charsets, what I would do instead of plaguing all your application code with charset detection / conversion is to writhe a "one shot" script that will read all of your tables records and update their strings to the correct format (I would pick UTF-8 if I were you). This way your code will be cleaner and simpler to maintain.
Just loop records in every tables of your database and convert strings like this:
//if the 3rd param is not specified the "from encoding" is detected automatically
$newString = convert_encoding($oldString, 'UTF-8');
I didn't try your samples here, but from past experiences, there is a quick fix for this. Right after database connection execute the following query BEFORE running any other queries:
SET NAMES UTF8;
This is SQL Standard compliant, and works well with other databases, like Firebird and PostgreSQL.
But remember, you need ensure UTF-8 declarations on other spots too in order to make your application works fine. Follow a quick checklist.
All files should be saved as UTF-8 (preferred without BOM [Byte Order Mask])
Your HTTP Server should send the encoding header UTF-8. Use Firebug or Live HTTP Headers to inspect.
If your server compress and/or tokenize the response, you may see header content as chunked or gzipped. This is not a problem if you save your files as UTF-8 and
Declare encoding into HTML header, using proper meta tag.
Over all application (sockets, file system, databases...) does not forget to flag up UTF-8 everytime you can. Making this when opening a database connection or so helps you to not need to encode/decode/debug all the time. Grab'em by root.
What database do you use?
You need to know the charset of original string before you convert it to utf-8, if it's in the ISO-8859-1 (latin1) then utf8_encode() is the easiest way, otherwise you need to use either icov or mbstring lib to convert and both of these need to know the charset of input in order to covert properly.
Do you tell your database about charset when you insert/select data?