PHP - json_encode(string, JSON_UNESCAPED_UNICODE) not escaping czech chars - php

I'm selecting some data from database and encoding them as json, but I've got a problem with czech signs like
á,í,ř,č,ž...
My file is in utf-8 encoding, my database is also in utf-8 encoding, I've set header to utf-8 encoding as well. What else should I do please?
My code:
header('Content-Type: text/html; charset=utf-8');
while($tmprow = mysqli_fetch_array($result)) {
$row['user'] = mb_convert_encoding($tmprow['user'], "UTF-8", "auto");
$row['package'] = mb_convert_encoding($tmprow['package'], "UTF-8", "auto");
$row['url'] = mb_convert_encoding($tmprow['url'], "UTF-8", "auto");
$row['rating'] = mb_convert_encoding($tmprow['rating'], "UTF-8", "auto");
array_push($response, $row);
}
$json = json_encode($response, JSON_UNESCAPED_UNICODE);
if(!$json) {
echo "error";
}
and part of the printed json: "package":"zv???tkanalouce"
EDIT: Without mb_convert_encoding() function the printed string is empty and "error" is printed.

With the code you've got in your example, the output is:
json_encode($response, JSON_UNESCAPED_UNICODE);
"package":"zv???tkanalouce"
You see the question marks in there because they have been introduced by mb_convert_encoding. This happens when you use encoding detection ("auto" as third parameter) and that encoding detection is not able to handle a character in the input, replacing it with a question mark. Exemplary line of code:
$row['url'] = mb_convert_encoding($tmprow['url'], "UTF-8", "auto");
This also means that the data coming out of your database is not UTF-8 encoded because mb_convert_encoding($buffer, 'UTF-8', 'auto'); does not introduce question marks if $buffer is UTF-8 encoded.
Therefore you need to find out which charset is used in your database connection because the database driver will convert strings into the encoding of the connection.
Most easy is that you just tell per that database link that you're asking for UTF-8 strings and then just use them:
$mysqli = new mysqli("localhost", "my_user", "my_password", "test");
/* check connection */
if (mysqli_connect_errno()) {
printf("Connect failed: %s\n", mysqli_connect_error());
exit();
}
/* change character set to utf8 */
if (!$mysqli->set_charset("utf8")) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
The previous code example just shows how to set the default client character set to UTF-8 with mysqli. It has been taken from the manual, see as well the material we have on site about that, e.g. utf 8 - PHP and MySQLi UTF8.
You can then greatly improve your code:
$response = $result->fetch_all(MYSQLI_ASSOC);
$json = json_encode($response, JSON_UNESCAPED_UNICODE);
if (FALSE === $json) {
throw new LogicException(
sprintf('Not json: %d - %s', json_last_error(), json_last_error_msg())
);
}
header('Content-Type: application/json');
echo $json;

Related

Issues with showing french accents from my Database

I am connection to a Filemaker DB through ODBC, and some data contains accents such as é or è. These characters appear as "?" right now, which is a bit of a problem. Here is what my code looks like:
$connection = odbc_connect($dsn, $username, $password, SQL_CUR_USE_ODBC);
$sql = "SELECT * FROM Table1";
$res = odbc_exec($connection,$sql);
while ($row = odbc_fetch_array($res)){
$x++;
$values= ($x . ": Customer:". $row['Customer'] . "\n");
print($values);
}
odbc_free_result($res);
odbc_close($connection);
I tried a few things, such as adding 'charset=utf-8' in the header, but nothing seems to work so far. I'm pretty sure I need to include utf-8 somewhere, I just haven't found examples with odbc similar to my code online. Thanks!
You will need to connect using the correct encoding. You can determine the correct encoding with the following query:
SELECT hex(CustomerCustomer) FROM Table1;
Match the hex code of the offending character with the target encodings, most likely latin1 and UTF-8. If you cannot identify the hex codes, then paste the output here and I will identify it for you.
ODBC use a encode type called WIN1252.
Try it:
mb_convert_encoding($value,'UTF-8','Windows-1252');
i've used it to do the opposite from win1252 to utf8 by this way should works to.. Let me know
So try it:
Use the function mb_detect_encoding(). If the function doesn't exist try this code.
if ( !function_exists('mb_detect_encoding') ) {
function mb_detect_encoding ($string, $enc=null, $ret=null) {
static $enclist = array(
'UTF-8', 'ASCII',
'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4', 'ISO-8859-5',
'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9', 'ISO-8859-10',
'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',
'Windows-1251', 'Windows-1252', 'Windows-1254',
);
$result = false;
foreach ($enclist as $item) {
$sample = iconv($item, $item, $string);
if (md5($sample) == md5($string)) {
if ($ret === NULL) { $result = $item; } else { $result = true; }
break;
}
}
return $result;
}
Source:
PHP

Avoid re-conversion of a UTF-8 String PHP

Well, I have a BD with a lot of ISO strings and another with UTF-8 (yes, I ruin everything) and now I'm making a custom function that rewrite all the BD again to have all in UTF-8, the problem, is the conversion with UTF-8 strings... The ? appears:
$field = $fila['Field'];
$acon = mysql_fetch_array(mysql_query("SELECT `$field` as content FROM `$curfila` WHERE id='$i'"));
$content = $acon['content'];
if(!is_numeric($content)) {
if($content != null) {
if(ip2long($content) === false) {
mb_internal_encoding('UTF-8');
if(mb_detect_encoding($content) === "UTF-8") {
$sanitized = utf8_decode($content);
if($sanitized != $content) {
echo 'Fila [ID ('.$i.')] <b>'.$field.'</b> => '.$sanitized.'<br>';
//mysql_query("UPDATE `$curfila` SET `$field`='$sanitized' WHERE id='$i'");
}
}
}
}
}
PD: I check all the columns and rows of all the tables of the BD. (I show all everything before doing anything)
So, how can I detect that?
I tried mb_detect_encoding, but the all the string are in UTF-8... So, which function can I use now?
Thanks in advance.

Big5 conversion to UTF-8 in PHP

How do I convert this: 灣
to \u7063 in PHP?
The reason I'm asking is somehow that chinese character is stored as \u7063 in mysql (utf-8 encoding) but I cannot search it in db when they search with query '灣'.
Additional Information
My DB encoding is UTF-8, with Collation utf8_general_ci. PHP file was saved in UTF-8. I have tried the method suggested by Nambi, but it did not work, it returned ?? in console. See attached image.
try this code refer here
function big52utf8($big5str) {
$blen = strlen($big5str);
$utf8str = "";
for($i=0; $i<$blen; $i++) {
$sbit = ord(substr($big5str, $i, 1));
//echo $sbit;
//echo "<br>";
if ($sbit < 129) {
$utf8str.=substr($big5str,$i,1);
} elseif ($sbit > 128 && $sbit < 255) {
$new_word = iconv("BIG5", "UTF-8", substr($big5str,$i,2));
$utf8str.=($new_word=="")?"?":$new_word;
$i++;
}
}
return $utf8str;
}

swedish characters encoding failure when inserting data to mysql database from a webservice

I tried to insert swedish characters from a webservice,
My code like this
Header("Content-Type: text/html; charset=UTF-8");
$xml = new XMLReader();
$xml->open("http://ws.aldoc.eu/ws/mekafrance/menu.alx");
$j= 4;
$id = 6;
$idp = 6;
while($xml->read()){
if ($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == "Menuitem")
{
$product = $xml->expand();
$product = new SimpleXMLElement('<Menuitem>'.$xml->readInnerXML().'</Menuitem>');
$menucode = $product->menucode;
$stmt = $dbh->prepare("insert into `ps_category_lang`(`name`) values ( :name)");
$stmt->bindParam(':name', $name);
$name = mb_convert_encoding((string)$product->menu,"utf-8");
$stmt->execute();
}
}
When i look into name field it show me something like this "AC/Klimatanlägg".
the field encoding is utf8_general_ci, and the database also.
The file has utf-8 encoding, i set the header to utf also.
I think your problem will go away if you open menu.alx with notepad++ and set the encoding on the entire xml file to UTF-8.

Values in UTF-8 being encoded as NULL in JSON

I have a set of keywords that are passed through via JSON from a DB (encoded UTF-8), some of which may have special characters like é, è, ç, etc. This is used as part of an auto-completer. Example:
array('Coffee', 'Cappuccino', 'Café');
I should add that the array as it comes from the DB would be:
array('Coffee', 'Cappuccino', 'Café');
But JSON encodes as:
["coffee", "cappuccino", null];
If I print these via print_r(), they show up fine on a UTF-8 encoded webpage, but café comes through as "café" if text/plain is used if I want to look at the array using print_r($array);exit();.
If I encode using utf8_encode() before encoding to JSON, it comes through fine, but what gets printed on the webpage is "café" and not "café".
Also strange, but json_last_error() is being seen as an undefined function, but json_decode() and json_encode() work fine.
Any ideas on how to get UTF-8 encoded data from the database to behave the same throughout the entire process?
EIDT: Here is the PHP function that grabs the keywords and makes them into a single array:
private function get_keywords()
{
global $db, $json;
$output = array();
$db->query("SELECT keywords FROM listings");
while ($r = $db->get_array())
{
$split = explode(",", $r['keywords']);
foreach ($split as $s)
{
$s = trim($s);
if ($s != "" && !in_array($s, $output)) $output[] = strtolower($s);
}
}
$json->echo_json($output);
}
The json::echo_json method just encodes, sets the header and prints it (for usage with Prototype)
EDIT: DB Connection method:
function connect()
{
if ($this->set['sql_connect'])
{
$this->connection = #mysql_connect( $this->set['sql_host'], $this->set['sql_user'], $this->set['sql_pass'])
OR $this->debug( "Connection Error", mysql_errno() .": ". mysql_error());
$this->db = #mysql_select_db( $this->set['sql_name'], $this->connection)
OR $this->debug( "Database Error", "Cannot Select Database '". $this->set['sql_name'] ."'");
$this->is_connected = TRUE;
}
return TRUE;
}
More Updates:
Simple PHP script I ran:
echo json_encode( array("Café") ); // ["Caf\u00e9"]
echo json_encode( array("Café") ); // null
The reason could be the current client character setting. A simple solution could be to do set the client with
mysql_query('SET CHARACTER SET utf8')
before running the SELECT query.
Update (June 2014)
The mysql extension is deprecated as of PHP 5.5.0. It is now recommended to use mysqli. Also, upon further reading - the above way of setting the client set should be avoided for reasons including security.
I haven't tested it, but this should be an ok substitute:
$mysqli = new mysqli("localhost", "my_user", "my_password", "my_db");
if (!$mysqli->set_charset('utf8')) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
or with the connection parameter :
$conn = mysqli_connect("localhost", "my_user", "my_password", "my_db");
if (!mysqli_set_charset($conn, "utf8")) {
# TODO - Error: Unable to set the character set
exit;
}
json_encode seems to be dropping strings that contain invalid characters. It is likely that your UTF-8 data is not arriving in the proper form from your database.
Looking at the examples you give, my wild guess would be that your database connection is not UTF-8 encoded and serves ISO-8859-1 characters instead.
Can you try a SET NAMES utf8; after initializing the connection?
I tried your code sample like this
[~]> cat utf.php
<?php
$arr = array('Coffee', 'Cappuccino', 'Café');
print json_encode($arr);
[~]> php utf.php
["Coffee","Cappuccino","Caf\u00e9"]
[~]>
Based on that I would say that if the source data is really UTF-8, then json_encode works just fine. If its not, then thats where you get null. Why its not, I cannot tell based on this information.
Try sending your array through this function before doing json_encode():
<?php
function utf8json($inArray) {
static $depth = 0;
/* our return object */
$newArray = array();
/* safety recursion limit */
$depth ++;
if($depth >= '30') {
return false;
}
/* step through inArray */
foreach($inArray as $key=>$val) {
if(is_array($val)) {
/* recurse on array elements */
$newArray[$key] = utf8json($inArray);
} else {
/* encode string values */
$newArray[$key] = utf8_encode($val);
}
}
/* return utf8 encoded array */
return $newArray;
}
?>
Taken from comment on phpnet # http://php.net/manual/en/function.json-encode.php.
The function basically loops though array elements, perhaps you did your utf-8 encode on the array itself?
My solution to encode utf8 data was :
$jsonArray = addslashes(json_encode($array, JSON_FORCE_OBJECT|JSON_UNESCAPED_UNICODE))

Categories