I have a set of keywords that are passed through via JSON from a DB (encoded UTF-8), some of which may have special characters like é, è, ç, etc. This is used as part of an auto-completer. Example:
array('Coffee', 'Cappuccino', 'Café');
I should add that the array as it comes from the DB would be:
array('Coffee', 'Cappuccino', 'Café');
But JSON encodes as:
["coffee", "cappuccino", null];
If I print these via print_r(), they show up fine on a UTF-8 encoded webpage, but café comes through as "café" if text/plain is used if I want to look at the array using print_r($array);exit();.
If I encode using utf8_encode() before encoding to JSON, it comes through fine, but what gets printed on the webpage is "café" and not "café".
Also strange, but json_last_error() is being seen as an undefined function, but json_decode() and json_encode() work fine.
Any ideas on how to get UTF-8 encoded data from the database to behave the same throughout the entire process?
EIDT: Here is the PHP function that grabs the keywords and makes them into a single array:
private function get_keywords()
{
global $db, $json;
$output = array();
$db->query("SELECT keywords FROM listings");
while ($r = $db->get_array())
{
$split = explode(",", $r['keywords']);
foreach ($split as $s)
{
$s = trim($s);
if ($s != "" && !in_array($s, $output)) $output[] = strtolower($s);
}
}
$json->echo_json($output);
}
The json::echo_json method just encodes, sets the header and prints it (for usage with Prototype)
EDIT: DB Connection method:
function connect()
{
if ($this->set['sql_connect'])
{
$this->connection = #mysql_connect( $this->set['sql_host'], $this->set['sql_user'], $this->set['sql_pass'])
OR $this->debug( "Connection Error", mysql_errno() .": ". mysql_error());
$this->db = #mysql_select_db( $this->set['sql_name'], $this->connection)
OR $this->debug( "Database Error", "Cannot Select Database '". $this->set['sql_name'] ."'");
$this->is_connected = TRUE;
}
return TRUE;
}
More Updates:
Simple PHP script I ran:
echo json_encode( array("Café") ); // ["Caf\u00e9"]
echo json_encode( array("Café") ); // null
The reason could be the current client character setting. A simple solution could be to do set the client with
mysql_query('SET CHARACTER SET utf8')
before running the SELECT query.
Update (June 2014)
The mysql extension is deprecated as of PHP 5.5.0. It is now recommended to use mysqli. Also, upon further reading - the above way of setting the client set should be avoided for reasons including security.
I haven't tested it, but this should be an ok substitute:
$mysqli = new mysqli("localhost", "my_user", "my_password", "my_db");
if (!$mysqli->set_charset('utf8')) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
or with the connection parameter :
$conn = mysqli_connect("localhost", "my_user", "my_password", "my_db");
if (!mysqli_set_charset($conn, "utf8")) {
# TODO - Error: Unable to set the character set
exit;
}
json_encode seems to be dropping strings that contain invalid characters. It is likely that your UTF-8 data is not arriving in the proper form from your database.
Looking at the examples you give, my wild guess would be that your database connection is not UTF-8 encoded and serves ISO-8859-1 characters instead.
Can you try a SET NAMES utf8; after initializing the connection?
I tried your code sample like this
[~]> cat utf.php
<?php
$arr = array('Coffee', 'Cappuccino', 'Café');
print json_encode($arr);
[~]> php utf.php
["Coffee","Cappuccino","Caf\u00e9"]
[~]>
Based on that I would say that if the source data is really UTF-8, then json_encode works just fine. If its not, then thats where you get null. Why its not, I cannot tell based on this information.
Try sending your array through this function before doing json_encode():
<?php
function utf8json($inArray) {
static $depth = 0;
/* our return object */
$newArray = array();
/* safety recursion limit */
$depth ++;
if($depth >= '30') {
return false;
}
/* step through inArray */
foreach($inArray as $key=>$val) {
if(is_array($val)) {
/* recurse on array elements */
$newArray[$key] = utf8json($inArray);
} else {
/* encode string values */
$newArray[$key] = utf8_encode($val);
}
}
/* return utf8 encoded array */
return $newArray;
}
?>
Taken from comment on phpnet # http://php.net/manual/en/function.json-encode.php.
The function basically loops though array elements, perhaps you did your utf-8 encode on the array itself?
My solution to encode utf8 data was :
$jsonArray = addslashes(json_encode($array, JSON_FORCE_OBJECT|JSON_UNESCAPED_UNICODE))
Related
I am connection to a Filemaker DB through ODBC, and some data contains accents such as é or è. These characters appear as "?" right now, which is a bit of a problem. Here is what my code looks like:
$connection = odbc_connect($dsn, $username, $password, SQL_CUR_USE_ODBC);
$sql = "SELECT * FROM Table1";
$res = odbc_exec($connection,$sql);
while ($row = odbc_fetch_array($res)){
$x++;
$values= ($x . ": Customer:". $row['Customer'] . "\n");
print($values);
}
odbc_free_result($res);
odbc_close($connection);
I tried a few things, such as adding 'charset=utf-8' in the header, but nothing seems to work so far. I'm pretty sure I need to include utf-8 somewhere, I just haven't found examples with odbc similar to my code online. Thanks!
You will need to connect using the correct encoding. You can determine the correct encoding with the following query:
SELECT hex(CustomerCustomer) FROM Table1;
Match the hex code of the offending character with the target encodings, most likely latin1 and UTF-8. If you cannot identify the hex codes, then paste the output here and I will identify it for you.
ODBC use a encode type called WIN1252.
Try it:
mb_convert_encoding($value,'UTF-8','Windows-1252');
i've used it to do the opposite from win1252 to utf8 by this way should works to.. Let me know
So try it:
Use the function mb_detect_encoding(). If the function doesn't exist try this code.
if ( !function_exists('mb_detect_encoding') ) {
function mb_detect_encoding ($string, $enc=null, $ret=null) {
static $enclist = array(
'UTF-8', 'ASCII',
'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4', 'ISO-8859-5',
'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9', 'ISO-8859-10',
'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',
'Windows-1251', 'Windows-1252', 'Windows-1254',
);
$result = false;
foreach ($enclist as $item) {
$sample = iconv($item, $item, $string);
if (md5($sample) == md5($string)) {
if ($ret === NULL) { $result = $item; } else { $result = true; }
break;
}
}
return $result;
}
Source:
PHP
I'm selecting some data from database and encoding them as json, but I've got a problem with czech signs like
á,í,ř,č,ž...
My file is in utf-8 encoding, my database is also in utf-8 encoding, I've set header to utf-8 encoding as well. What else should I do please?
My code:
header('Content-Type: text/html; charset=utf-8');
while($tmprow = mysqli_fetch_array($result)) {
$row['user'] = mb_convert_encoding($tmprow['user'], "UTF-8", "auto");
$row['package'] = mb_convert_encoding($tmprow['package'], "UTF-8", "auto");
$row['url'] = mb_convert_encoding($tmprow['url'], "UTF-8", "auto");
$row['rating'] = mb_convert_encoding($tmprow['rating'], "UTF-8", "auto");
array_push($response, $row);
}
$json = json_encode($response, JSON_UNESCAPED_UNICODE);
if(!$json) {
echo "error";
}
and part of the printed json: "package":"zv???tkanalouce"
EDIT: Without mb_convert_encoding() function the printed string is empty and "error" is printed.
With the code you've got in your example, the output is:
json_encode($response, JSON_UNESCAPED_UNICODE);
"package":"zv???tkanalouce"
You see the question marks in there because they have been introduced by mb_convert_encoding. This happens when you use encoding detection ("auto" as third parameter) and that encoding detection is not able to handle a character in the input, replacing it with a question mark. Exemplary line of code:
$row['url'] = mb_convert_encoding($tmprow['url'], "UTF-8", "auto");
This also means that the data coming out of your database is not UTF-8 encoded because mb_convert_encoding($buffer, 'UTF-8', 'auto'); does not introduce question marks if $buffer is UTF-8 encoded.
Therefore you need to find out which charset is used in your database connection because the database driver will convert strings into the encoding of the connection.
Most easy is that you just tell per that database link that you're asking for UTF-8 strings and then just use them:
$mysqli = new mysqli("localhost", "my_user", "my_password", "test");
/* check connection */
if (mysqli_connect_errno()) {
printf("Connect failed: %s\n", mysqli_connect_error());
exit();
}
/* change character set to utf8 */
if (!$mysqli->set_charset("utf8")) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
The previous code example just shows how to set the default client character set to UTF-8 with mysqli. It has been taken from the manual, see as well the material we have on site about that, e.g. utf 8 - PHP and MySQLi UTF8.
You can then greatly improve your code:
$response = $result->fetch_all(MYSQLI_ASSOC);
$json = json_encode($response, JSON_UNESCAPED_UNICODE);
if (FALSE === $json) {
throw new LogicException(
sprintf('Not json: %d - %s', json_last_error(), json_last_error_msg())
);
}
header('Content-Type: application/json');
echo $json;
Well, I have a BD with a lot of ISO strings and another with UTF-8 (yes, I ruin everything) and now I'm making a custom function that rewrite all the BD again to have all in UTF-8, the problem, is the conversion with UTF-8 strings... The ? appears:
$field = $fila['Field'];
$acon = mysql_fetch_array(mysql_query("SELECT `$field` as content FROM `$curfila` WHERE id='$i'"));
$content = $acon['content'];
if(!is_numeric($content)) {
if($content != null) {
if(ip2long($content) === false) {
mb_internal_encoding('UTF-8');
if(mb_detect_encoding($content) === "UTF-8") {
$sanitized = utf8_decode($content);
if($sanitized != $content) {
echo 'Fila [ID ('.$i.')] <b>'.$field.'</b> => '.$sanitized.'<br>';
//mysql_query("UPDATE `$curfila` SET `$field`='$sanitized' WHERE id='$i'");
}
}
}
}
}
PD: I check all the columns and rows of all the tables of the BD. (I show all everything before doing anything)
So, how can I detect that?
I tried mb_detect_encoding, but the all the string are in UTF-8... So, which function can I use now?
Thanks in advance.
How do I convert this: 灣
to \u7063 in PHP?
The reason I'm asking is somehow that chinese character is stored as \u7063 in mysql (utf-8 encoding) but I cannot search it in db when they search with query '灣'.
Additional Information
My DB encoding is UTF-8, with Collation utf8_general_ci. PHP file was saved in UTF-8. I have tried the method suggested by Nambi, but it did not work, it returned ?? in console. See attached image.
try this code refer here
function big52utf8($big5str) {
$blen = strlen($big5str);
$utf8str = "";
for($i=0; $i<$blen; $i++) {
$sbit = ord(substr($big5str, $i, 1));
//echo $sbit;
//echo "<br>";
if ($sbit < 129) {
$utf8str.=substr($big5str,$i,1);
} elseif ($sbit > 128 && $sbit < 255) {
$new_word = iconv("BIG5", "UTF-8", substr($big5str,$i,2));
$utf8str.=($new_word=="")?"?":$new_word;
$i++;
}
}
return $utf8str;
}
How can I add a check in the PHP for the length of the $username passed. The site is UTF-8 but I believe Javascript is using a different encoding. You can see in the comments where I tried different things in the PHP and they don't work.
What I tried and didn't work:
Changing Ajax (javascript) to pass variables by UTF-8 and not javascript encoding
strlen, mb_strlen in the PHP - both return incorrect values
MORE INFO
My Ajax sends a username to my PHP, which checks the SQL DB and returns available or not. I decided to try and do some extra checking in the PHP before checking the DB (like mb_strlen($username). mb_internal_encoding("UTF-8"); is also set.
I was going to try and send the Ajax request in UTF-8 but didnt see a way to do that.
is UPPER being used correctly in the MySQL? - for UTF-8 stuff?
PHP BELOW ***********
// Only checks for the username being valid or not and returns 'taken' or 'available'
require_once('../defines/mainDefines.php'); // Connection variables
require_once('commonMethods.php');
require_once('sessionInit.php'); // start session, check for HTTP redid to HHHTPs
sleep(2); // Looks cool watching the spinner
$username = $_POST['username'];
//if (mb_strlen($username) < MIN_USERNAME_SIZE) echo 'invalid_too_short';
//if (mb_strlen($username, 'UTF-8') < 10) { echo ('invalid_too_short'); exit; }
//die ('!1!' . $username . '!2!' . mb_strlen($username) . '!3!' . strlen($username) . '!4!');
$dbc = mysqli_connect(DB_HOST, DB_READER, DB_READER_PASSWORD, DB_NAME) or die(DB_CONNECT_ERROR . DB_HOST . '--QueryDB--checkName.php');
$stmt = mysqli_stmt_init($dbc);
$query = "SELECT username FROM pcsuser WHERE UPPER(username) = UPPER(?)";
if (!mysqli_stmt_prepare($stmt, $query)) {
die('SEL:mysqli_prepare failed somehow:' . $query . '--QueryDB--checkName.php');
}
if (!mysqli_stmt_bind_param($stmt, 's', $username)) {
die('mysqli_stmt_bind_param failed somehow --checkName.php');
}
if (!mysqli_stmt_execute($stmt)) {
die('mysqli_stmt_execute failed somehow' . '--checkName.php');
}
mysqli_stmt_store_result($stmt);
$num_rows = mysqli_stmt_num_rows($stmt);
mysqli_stmt_bind_result($stmt, $row);
echo ($num_rows >= 1) ? 'taken' : 'available';
mysqli_stmt_close($stmt);
mysqli_close($dbc);
AJAX CODE BELOW
function CheckUsername(sNameToCheck) {
document.getElementById("field_username").className = "validated";
registerRequest = CreateRequest();
if (registerRequest === null)
alert("Unable to create AJAX request");
else {
var url= "https://www.perrycs.com/php/checkName.php";
var requestData = "username=" + escape(sNameToCheck); // data to send
registerRequest.onreadystatechange = ShowUsernameStatus;
registerRequest.open("POST", url, true);
registerRequest.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
registerRequest.send(requestData);
}
}
function ShowUsernameStatus() {
var img_sad = "graphics/signup/smiley-sad006.gif";
var img_smile = "graphics/signup/smiley-happy088.gif";
var img_checking = "graphics/signup/bluespinner.gif";
if (request.readyState === 4) {
if (request.status === 200) {
var txtUsername = document.getElementById('txt_username');
var fieldUsername = document.getElementById('field_username');
var imgUsername = document.getElementById('img_username');
var error = true;
var response = request.responseText;
switch (response) {
case "available":
txtUsername.innerHTML = "NAME AVAILABLE!";
error = false;
break;
case "taken":
txtUsername.innerHTML = "NAME TAKEN!";
break;
case "invalid_too_short":
txtUsername.innerHTML = "TOO SHORT!";
break;
default:
txtUsername.innerHTML = "AJAX ERROR!";
break;
} // matches switch
if (error) {
imgUsername.src = img_sad;
fieldUsername.className = 'error';
} else {
imgUsername.src = img_smile;
fieldUsername.className = 'validated';
}
} // matches ===200
} // matches ===4
}
TESTING RESULTS
This is what I get back when I DIE in the PHP and echo out as in the following (before and after making the Ajax change below [adding in UTF-8 to the request]...
PHP SNIPPIT
die ('!1!' . $username . '!2!' . mb_strlen($username) . '!3!' . strlen($username) . '!4!');
TEST DATA
Username: David Perry
!1!David Perry!2!11!3!11!4!
Username: ܦ"~÷Û♦
!1!ܦ\"~��%u2666!2!9!3!13!4!
The first one works. The second one should work but it looks like the encoding is weird (understandable).
7 visible characters for the 2nd one. mb_strlen shows 9, strlen shows 13.
After reading Joeri Sebrechts solution and link they gave me I looked up Ajax request parameters and someone had the following...
AJAX SNIPPIT (changed from original code)
registerRequest.setRequestHeader("Content-Type","application/x-www-form-urlencoded; charset=UTF-8");
(I added in the charset=UTF-8 from an example I saw on a article).
UPDATE: Nov 27, 9:11pm EST
Ok, after much reading I believe I am encoding my JS wrong. I was using escape... as follows...
var requestData = "username=" + escape(sNameToCheck);
After looking at this website...
http://www.the-art-of-web.com/javascript/escape/
it helped me understand more of what's going on with each function and how they encode and decode. I should be able to do this...
var requestData = "username=" + encodeURIComponent(sNameToCheck);
in JS and in PHP I should be able to do this...
$username = rawurldecode($_POST['username']);
Doing that still gives me 8 characters for my weird example above instead of 7. It's close, but am I doing something wrong? If I cursor through the text on the screen it's 7 characters. Any ideas to help me understand this better?
FIXED/SOLVED!!!
Ok, thank you for your tips that lead me in the right direction to make this work. My changes were as follows.
In the AJAX -- i used to have escape(sNameToCheck); --
var requestData = "username=" + encodeURIComponent(sNameToCheck);
In the PHP *-- I used to have $username = $_POST['username']; --*
$username = rawurldecode($_POST['username']);
if (get_magic_quotes_gpc()) $username = stripslashes($username);
I really hate magic_quotes... it's caused me about 50+ hours of frustration over form data in total because I forgot about it. As long as it works. I'm happy!
So, now the mb_strlen works and I can easily add this back in...
if (mb_strlen($username) < MIN_USERNAME_SIZE) { echo 'invalid_too_short'; exit; }
Works great!
PHP is a byte processor, it is not charset-aware. That has a number of tricky consequences.
Strlen() returns the length in bytes, not the length in characters. This is because php's "string" type is actually an array of bytes. Utf8 uses more than one byte per character for the 'special characters'. Therefore strlen() will only give you the right answer for a narrow subset of text (= plain english text).
Mb_strlen() treats the string as actual characters, but assumes it's in the encoding specified via mbstring.internal_encoding, because the string itself is just an array of bytes and does not have metadata specifying its character set. If you are working with utf8 data and set internal_encoding to utf8 it will give you the right answer. If your data is not utf8 it will give you the wrong answer.
Mysql will receive a stream of bytes from php, and will parse it based on the database session's character set, which you set via the SET NAMES directive. Everytime you connect to the database you must inform it what encoding your php strings are in.
The browser receives a stream of bytes from php, and will parse it based on the content-type charset http header, which you control via php.ini default_charset. The ajax call will submit in the same encoding as the page it runs from.
Summarized, you can find advice on the following page on how to ensure all your data is treated as utf8. Follow it and your problem should resolve itself.
http://malevolent.com/weblog/archive/2007/03/12/unicode-utf8-php-mysql/
From a quick glance, you can clean this up:
if (request.status == 200) {
if (request.responseText == "available") {
document.getElementById("txt_username").innerHTML = "NAME AVAILABLE!";
document.images['img_username'].src=img_smile;
document.getElementById("continue").disabled = false;
document.getElementById("field_username").className = 'validated';
} else if (request.responseText == "taken") {
document.getElementById("txt_username").innerHTML = "NAME TAKEN!";
document.images['img_username'].src=img_sad;
document.getElementById("field_username").className = 'error';
} else if (request.responseText == "invalid_too_short") {
document.getElementById("txt_username").innerHTML = "TOO SHORT!";
document.images['img_username'].src=img_sad;
document.getElementById("field_username").className = 'error';
} else {
document.getElementById("txt_username").innerHTML = "AJAX ERROR!";
document.images['img_username'].src=img_sad;
document.getElementById("field_username").className = 'error';
}
}
to:
// I prefer triple equals
// Read more at http://javascript.crockford.com/style2.html
if (request.status === 200) {
// use variables!
var txtUsername = document.getElementById('txt_username');
var fieldUsername = document.getElementById('field_username');
var imgUsername = document.getElementById('img_username');
var response = request.responseText;
var error = true;
// you can do a switch statement here too, if you prefer
if (response === "available") {
txtUsername.innerHTML = "NAME AVAILABLE!";
document.getElementById("continue").disabled = false;
error = false;
} else if (response === "taken") {
txtUsername.innerHTML = "NAME TAKEN!";
} else if (response === "invalid_too_short") {
txtUsername.innerHTML = "TOO SHORT!";
} else {
txtUsername.innerHTML = "AJAX ERROR!";
}
// refactor error actions
if (error) {
imgUsername.src = img_sad;
fieldUsername.className = 'error';
} else {
imgUsername.src = img_smile;
fieldUsername.className = 'validated';
}
}