I am connection to a Filemaker DB through ODBC, and some data contains accents such as é or è. These characters appear as "?" right now, which is a bit of a problem. Here is what my code looks like:
$connection = odbc_connect($dsn, $username, $password, SQL_CUR_USE_ODBC);
$sql = "SELECT * FROM Table1";
$res = odbc_exec($connection,$sql);
while ($row = odbc_fetch_array($res)){
$x++;
$values= ($x . ": Customer:". $row['Customer'] . "\n");
print($values);
}
odbc_free_result($res);
odbc_close($connection);
I tried a few things, such as adding 'charset=utf-8' in the header, but nothing seems to work so far. I'm pretty sure I need to include utf-8 somewhere, I just haven't found examples with odbc similar to my code online. Thanks!
You will need to connect using the correct encoding. You can determine the correct encoding with the following query:
SELECT hex(CustomerCustomer) FROM Table1;
Match the hex code of the offending character with the target encodings, most likely latin1 and UTF-8. If you cannot identify the hex codes, then paste the output here and I will identify it for you.
ODBC use a encode type called WIN1252.
Try it:
mb_convert_encoding($value,'UTF-8','Windows-1252');
i've used it to do the opposite from win1252 to utf8 by this way should works to.. Let me know
So try it:
Use the function mb_detect_encoding(). If the function doesn't exist try this code.
if ( !function_exists('mb_detect_encoding') ) {
function mb_detect_encoding ($string, $enc=null, $ret=null) {
static $enclist = array(
'UTF-8', 'ASCII',
'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4', 'ISO-8859-5',
'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9', 'ISO-8859-10',
'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',
'Windows-1251', 'Windows-1252', 'Windows-1254',
);
$result = false;
foreach ($enclist as $item) {
$sample = iconv($item, $item, $string);
if (md5($sample) == md5($string)) {
if ($ret === NULL) { $result = $item; } else { $result = true; }
break;
}
}
return $result;
}
Source:
PHP
Related
I have a problem of charset.
On localhost everything works fine, but now on remote server I see strange characters replacing others like à or è. I have read it's a charset issue and I think the problem can be my php.ini (I can't edit it).
To solve it I've tried many things:
I've set
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
on html,
ini_set('default_charset', 'UTF-8');
on php,
AddDefaultCharset utf-8
on my .htaccess file,
if I use utf8_encode on strings letters are replaced by ã or similar, if I leave it without doing anything letters are �
There is another way to solve this problem that I have not found yet?
Sorry, I forgot to say it: strings are retrieved from another site by a file_get_contents (I'm using a Yandex API)
Here's some code:
$yandex = 'https://dictionary.yandex.net/api/v1/dicservice.json/lookup?key=my_api_key&lang=it-it&text=attualità';
// get json from this page
$object = json_decode(file_get_contents($yandex));
$syns_array = array();
$type = '';
// if the word exists
if (!empty($object->def) && $object->def != FALSE && $object->def != NULL)
{
$type = $object->def[0]->tr[0]->pos;
$rows = $object->def[0]->tr;
// if there're synonyms
if (!empty($rows) && $rows != FALSE && $rows != NULL)
{
foreach ($rows as $row)
{
array_push($syns_array, $row->text);
// if there're more rows with syns
if (!empty($row->syn) && $row->syn !== FALSE && $row->syn !== NULL)
{
foreach ($row->syn as $syns_obj)
{
array_push($syns_array, $syns_obj->text);
}
}
}
}
}
// I echo my synonyms from the array
foreach($syns_array as $syn) {
echo $syn;
}
I forgot to say I was using mb_strtolower on those strings. Replacing it with strotolower the problem is solved... Sorry
Well, I have a BD with a lot of ISO strings and another with UTF-8 (yes, I ruin everything) and now I'm making a custom function that rewrite all the BD again to have all in UTF-8, the problem, is the conversion with UTF-8 strings... The ? appears:
$field = $fila['Field'];
$acon = mysql_fetch_array(mysql_query("SELECT `$field` as content FROM `$curfila` WHERE id='$i'"));
$content = $acon['content'];
if(!is_numeric($content)) {
if($content != null) {
if(ip2long($content) === false) {
mb_internal_encoding('UTF-8');
if(mb_detect_encoding($content) === "UTF-8") {
$sanitized = utf8_decode($content);
if($sanitized != $content) {
echo 'Fila [ID ('.$i.')] <b>'.$field.'</b> => '.$sanitized.'<br>';
//mysql_query("UPDATE `$curfila` SET `$field`='$sanitized' WHERE id='$i'");
}
}
}
}
}
PD: I check all the columns and rows of all the tables of the BD. (I show all everything before doing anything)
So, how can I detect that?
I tried mb_detect_encoding, but the all the string are in UTF-8... So, which function can I use now?
Thanks in advance.
How do I convert this: 灣
to \u7063 in PHP?
The reason I'm asking is somehow that chinese character is stored as \u7063 in mysql (utf-8 encoding) but I cannot search it in db when they search with query '灣'.
Additional Information
My DB encoding is UTF-8, with Collation utf8_general_ci. PHP file was saved in UTF-8. I have tried the method suggested by Nambi, but it did not work, it returned ?? in console. See attached image.
try this code refer here
function big52utf8($big5str) {
$blen = strlen($big5str);
$utf8str = "";
for($i=0; $i<$blen; $i++) {
$sbit = ord(substr($big5str, $i, 1));
//echo $sbit;
//echo "<br>";
if ($sbit < 129) {
$utf8str.=substr($big5str,$i,1);
} elseif ($sbit > 128 && $sbit < 255) {
$new_word = iconv("BIG5", "UTF-8", substr($big5str,$i,2));
$utf8str.=($new_word=="")?"?":$new_word;
$i++;
}
}
return $utf8str;
}
I have a set of keywords that are passed through via JSON from a DB (encoded UTF-8), some of which may have special characters like é, è, ç, etc. This is used as part of an auto-completer. Example:
array('Coffee', 'Cappuccino', 'Café');
I should add that the array as it comes from the DB would be:
array('Coffee', 'Cappuccino', 'Café');
But JSON encodes as:
["coffee", "cappuccino", null];
If I print these via print_r(), they show up fine on a UTF-8 encoded webpage, but café comes through as "café" if text/plain is used if I want to look at the array using print_r($array);exit();.
If I encode using utf8_encode() before encoding to JSON, it comes through fine, but what gets printed on the webpage is "café" and not "café".
Also strange, but json_last_error() is being seen as an undefined function, but json_decode() and json_encode() work fine.
Any ideas on how to get UTF-8 encoded data from the database to behave the same throughout the entire process?
EIDT: Here is the PHP function that grabs the keywords and makes them into a single array:
private function get_keywords()
{
global $db, $json;
$output = array();
$db->query("SELECT keywords FROM listings");
while ($r = $db->get_array())
{
$split = explode(",", $r['keywords']);
foreach ($split as $s)
{
$s = trim($s);
if ($s != "" && !in_array($s, $output)) $output[] = strtolower($s);
}
}
$json->echo_json($output);
}
The json::echo_json method just encodes, sets the header and prints it (for usage with Prototype)
EDIT: DB Connection method:
function connect()
{
if ($this->set['sql_connect'])
{
$this->connection = #mysql_connect( $this->set['sql_host'], $this->set['sql_user'], $this->set['sql_pass'])
OR $this->debug( "Connection Error", mysql_errno() .": ". mysql_error());
$this->db = #mysql_select_db( $this->set['sql_name'], $this->connection)
OR $this->debug( "Database Error", "Cannot Select Database '". $this->set['sql_name'] ."'");
$this->is_connected = TRUE;
}
return TRUE;
}
More Updates:
Simple PHP script I ran:
echo json_encode( array("Café") ); // ["Caf\u00e9"]
echo json_encode( array("Café") ); // null
The reason could be the current client character setting. A simple solution could be to do set the client with
mysql_query('SET CHARACTER SET utf8')
before running the SELECT query.
Update (June 2014)
The mysql extension is deprecated as of PHP 5.5.0. It is now recommended to use mysqli. Also, upon further reading - the above way of setting the client set should be avoided for reasons including security.
I haven't tested it, but this should be an ok substitute:
$mysqli = new mysqli("localhost", "my_user", "my_password", "my_db");
if (!$mysqli->set_charset('utf8')) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
or with the connection parameter :
$conn = mysqli_connect("localhost", "my_user", "my_password", "my_db");
if (!mysqli_set_charset($conn, "utf8")) {
# TODO - Error: Unable to set the character set
exit;
}
json_encode seems to be dropping strings that contain invalid characters. It is likely that your UTF-8 data is not arriving in the proper form from your database.
Looking at the examples you give, my wild guess would be that your database connection is not UTF-8 encoded and serves ISO-8859-1 characters instead.
Can you try a SET NAMES utf8; after initializing the connection?
I tried your code sample like this
[~]> cat utf.php
<?php
$arr = array('Coffee', 'Cappuccino', 'Café');
print json_encode($arr);
[~]> php utf.php
["Coffee","Cappuccino","Caf\u00e9"]
[~]>
Based on that I would say that if the source data is really UTF-8, then json_encode works just fine. If its not, then thats where you get null. Why its not, I cannot tell based on this information.
Try sending your array through this function before doing json_encode():
<?php
function utf8json($inArray) {
static $depth = 0;
/* our return object */
$newArray = array();
/* safety recursion limit */
$depth ++;
if($depth >= '30') {
return false;
}
/* step through inArray */
foreach($inArray as $key=>$val) {
if(is_array($val)) {
/* recurse on array elements */
$newArray[$key] = utf8json($inArray);
} else {
/* encode string values */
$newArray[$key] = utf8_encode($val);
}
}
/* return utf8 encoded array */
return $newArray;
}
?>
Taken from comment on phpnet # http://php.net/manual/en/function.json-encode.php.
The function basically loops though array elements, perhaps you did your utf-8 encode on the array itself?
My solution to encode utf8 data was :
$jsonArray = addslashes(json_encode($array, JSON_FORCE_OBJECT|JSON_UNESCAPED_UNICODE))
Is there some way to detect if a string has been base64_encoded() in PHP?
We're converting some storage from plain text to base64 and part of it lives in a cookie that needs to be updated. I'd like to reset their cookie if the text has not yet been encoded, otherwise leave it alone.
Apologies for a late response to an already-answered question, but I don't think base64_decode($x,true) is a good enough solution for this problem. In fact, there may not be a very good solution that works against any given input. For example, I can put lots of bad values into $x and not get a false return value.
var_dump(base64_decode('wtf mate',true));
string(5) "���j�"
var_dump(base64_decode('This is definitely not base64 encoded',true));
string(24) "N���^~)��r��[jǺ��ܡם"
I think that in addition to the strict return value check, you'd also need to do post-decode validation. The most reliable way is if you could decode and then check against a known set of possible values.
A more general solution with less than 100% accuracy (closer with longer strings, inaccurate for short strings) is if you check your output to see if many are outside of a normal range of utf-8 (or whatever encoding you use) characters.
See this example:
<?php
$english = array();
foreach (str_split('az019AZ~~~!##$%^*()_+|}?><": Iñtërnâtiônàlizætiøn') as $char) {
echo ord($char) . "\n";
$english[] = ord($char);
}
echo "Max value english = " . max($english) . "\n";
$nonsense = array();
echo "\n\nbase64:\n";
foreach (str_split(base64_decode('Not base64 encoded',true)) as $char) {
echo ord($char) . "\n";
$nonsense[] = ord($char);
}
echo "Max nonsense = " . max($nonsense) . "\n";
?>
Results:
Max value english = 195
Max nonsense = 233
So you may do something like this:
if ( $maxDecodedValue > 200 ) {} //decoded string is Garbage - original string not base64 encoded
else {} //decoded string is useful - it was base64 encoded
You should probably use the mean() of the decoded values instead of the max(), I just used max() in this example because there is sadly no built-in mean() in PHP. What measure you use (mean,max, etc) against what threshold (eg 200) depends on your estimated usage profile.
In conclusion, the only winning move is not to play. I'd try to avoid having to discern base64 in the first place.
function is_base64_encoded($data)
{
if (preg_match('%^[a-zA-Z0-9/+]*={0,2}$%', $data)) {
return TRUE;
} else {
return FALSE;
}
};
is_base64_encoded("iash21iawhdj98UH3"); // true
is_base64_encoded("#iu3498r"); // false
is_base64_encoded("asiudfh9w=8uihf"); // false
is_base64_encoded("a398UIhnj43f/1!+sadfh3w84hduihhjw=="); // false
http://php.net/manual/en/function.base64-decode.php#81425
I had the same problem, I ended up with this solution:
if ( base64_encode(base64_decode($data)) === $data){
echo '$data is valid';
} else {
echo '$data is NOT valid';
}
Better late than never: You could maybe use mb_detect_encoding() to find out whether the encoded string appears to have been some kind of text:
function is_base64_string($s) {
// first check if we're dealing with an actual valid base64 encoded string
if (($b = base64_decode($s, TRUE)) === FALSE) {
return FALSE;
}
// now check whether the decoded data could be actual text
$e = mb_detect_encoding($b);
if (in_array($e, array('UTF-8', 'ASCII'))) { // YMMV
return TRUE;
} else {
return FALSE;
}
}
UPDATE For those who like it short
function is_base64_string_s($str, $enc=array('UTF-8', 'ASCII')) {
return !(($b = base64_decode($str, TRUE)) === FALSE) && in_array(mb_detect_encoding($b), $enc);
}
We can combine three things into one function to check if given string is a valid base 64 encoded or not.
function validBase64($string)
{
$decoded = base64_decode($string, true);
$result = false;
// Check if there is no invalid character in string
if (!preg_match('/^[a-zA-Z0-9\/\r\n+]*={0,2}$/', $string)) {$result = false;}
// Decode the string in strict mode and send the response
if (!$decoded) {$result = false;}
// Encode and compare it to original one
if (base64_encode($decoded) != $string) {$result = false;}
return $result;
}
I was about to build a base64 toggle in php, this is what I did:
function base64Toggle($str) {
if (!preg_match('~[^0-9a-zA-Z+/=]~', $str)) {
$check = str_split(base64_decode($str));
$x = 0;
foreach ($check as $char) if (ord($char) > 126) $x++;
if ($x/count($check)*100 < 30) return base64_decode($str);
}
return base64_encode($str);
}
It works perfectly for me.
Here are my complete thoughts on it: http://www.albertmartin.de/blog/code.php/19/base64-detection
And here you can try it: http://www.albertmartin.de/tools
base64_decode() will not return FALSE if the input is not valid base64 encoded data. Use imap_base64() instead, it returns FALSE if $text contains characters outside the Base64 alphabet
imap_base64() Reference
Here's my solution:
if(empty(htmlspecialchars(base64_decode($string, true)))) {
return false;
}
It will return false if the decoded $string is invalid, for example: "node", "123", " ", etc.
$is_base64 = function(string $string) : bool {
$zero_one = ['MA==', 'MQ=='];
if (in_array($string, $zero_one)) return TRUE;
if (empty(htmlspecialchars(base64_decode($string, TRUE))))
return FALSE;
return TRUE;
};
var_dump('*** These yell false ***');
var_dump($is_base64(''));
var_dump($is_base64('This is definitely not base64 encoded'));
var_dump($is_base64('node'));
var_dump($is_base64('node '));
var_dump($is_base64('123'));
var_dump($is_base64(0));
var_dump($is_base64(1));
var_dump($is_base64(123));
var_dump($is_base64(1.23));
var_dump('*** These yell true ***');
var_dump($is_base64(base64_encode('This is definitely base64 encoded')));
var_dump($is_base64(base64_encode('node')));
var_dump($is_base64(base64_encode('123')));
var_dump($is_base64(base64_encode(0)));
var_dump($is_base64(base64_encode(1)));
var_dump($is_base64(base64_encode(123)));
var_dump($is_base64(base64_encode(1.23)));
var_dump($is_base64(base64_encode(TRUE)));
var_dump('*** Should these yell true? Might be edge cases ***');
var_dump($is_base64(base64_encode('')));
var_dump($is_base64(base64_encode(FALSE)));
var_dump($is_base64(base64_encode(NULL)));
May be it's not exactly what you've asked for. But hope it'll be usefull for somebody.
In my case the solution was to encode all data with json_encode and then base64_encode.
$encoded=base64_encode(json_encode($data));
this value could be stored or used whatever you need.
Then to check if this value isn't just a text string but your data encoded you simply use
function isData($test_string){
if(base64_decode($test_string,true)&&json_decode(base64_decode($test_string))){
return true;
}else{
return false;
}
or alternatively
function isNotData($test_string){
if(base64_decode($test_string,true)&&json_decode(base64_decode($test_string))){
return false;
}else{
return true;
}
Thanks to all previous answers authors in this thread:)
Usually a text in base64 has no spaces.
I used this function which worked fine for me. It tests if the number of spaces in the string is less than 1 in 20.
e.g: at least 1 space for each 20 chars --- ( spaces / strlen ) < 0.05
function normalizaBase64($data){
$spaces = substr_count ( $data ," ");
if (($spaces/strlen($data))<0.05)
{
return base64_decode($data);
}
return $data;
}
Your best option is:
$base64_test = mb_substr(trim($some_base64_data), 0, 76);
return (base64_decode($base64_test, true) === FALSE ? FALSE : TRUE);