I am reading an rss feed http://beersandbeans.com/feed/
The feeds says it is UTF8 format, and I am using simplepie rss to import the content When i grab the content and store it in $content I perform the following:
<?php
header ('Content-type: text/html; charset=utf-8');
?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head><body>
<?php
echo $content;
echo $enc = mb_detect_encoding($content, "UTF-8,ISO-8859-1", true);
echo $content = mb_convert_encoding($content, "UTF-8", $enc);
echo $enc = mb_detect_encoding($content, "UTF-8,ISO-8859-1", true);
?>
</body></html>
This then produces:
..... Camping: 2,000isk/day for 5 days) = $89 .....
ISO-8859-1
..... Camping: Â Â 2,000isk/day for 5 days) = $89 .....
UTF-8
Why is it outputting the  ?
Try not specifying "UTF-8,ISO-8859-1" and see what encoding it gives you. It might be detecting ISO-8859-1 because it's the last one in that list, rather than the actual encoding of the string.
Set strict-mode to true in mb_detect_encoding(), see http://www.php.net/manual/de/function.mb-detect-encoding.php#102510
Also try http://www.php.net/manual/de/function.mb-convert-encoding.php instead of iconv()
Related
I know there are many questions to this problem and I've read most of them, of course including 'UTF-8 all the way through'.
Following those examples and hints I reduced everything to this minimal example - which unfortunately still won't print a german umlaut ö after json_encoding an array:
(and here is the question - why? what else can I do?)
<?php
error_reporting(E_ALL);
header('Content-Type: text/html; charset=UTF-8');
?>
<!DOCTYPE html>
<html lang="de">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<?php
echo "<br>ini_get('default_charset') ". ini_get('default_charset')."<br>"; // nothing shown
// if (!ini_set('default_charset', 'utf-8')) { // won't work (I guess I'm not allowed to do that)
// echo "could not set default_charset to utf-8<br>";
// }
echo "Köln"; // yay! displays "Köln" as expected
$darr = Array();
$locationString = mb_convert_encoding("location", "UTF-8");
$darr[$locationString] = mb_convert_encoding("Köln", "UTF-8");
$json = json_encode($darr);
echo $json;
// output:
// {"plain":"K\u00f6ln","utf_encode":"K\u00c3\u00b6ln","utf_decode":"K"}
// dah? why?
$array = json_decode($json);
var_dump($array);
// ... even worse: "Köln"
phpinfo();
?>
</body>
</html>
relevant system info:
php 5.2.5 (yeah, I know. I can't change it)
from phpinfo():
default_charset no value
json
json support enabled
json version 1.2.1
mbstring
Multibyte Support enabled
Multibyte string engine libmbfl
mbstring.encoding_translation Off Off
Could this be my problem?
...and yes, the php file is encoded utf-8 (without BOM) in sublimeText. Submitted to server via FileZilla once as ASCII, once Binary, no change.
When encoding unicode data with json_encode you should use the JSON_UNESCAPED_UNICODE flag:
$json = json_encode($darr, JSON_UNESCAPED_UNICODE);
The above is available since php 5.4.0.
For older versions you can try and use this function instead:
function unicode_json_encode($arr) {
//convmap since 0x80 char codes so it takes all multibyte codes (above ASCII 127). So such characters are being "hidden" from normal json_encoding
array_walk_recursive($arr, function (&$item, $key) { if (is_string($item)) $item = mb_encode_numericentity($item, array (0x80, 0xffff, 0, 0xffff), 'UTF-8'); });
return mb_decode_numericentity(json_encode($arr), array (0x80, 0xffff, 0, 0xffff), 'UTF-8');
}
The above function was taken from the comments in json_encode page in php.net
You simply haven't told PHP not to escape the characters when it encodes the data as JSON.
From the manual:
JSON_UNESCAPED_UNICODE (integer)
Encode multibyte Unicode characters literally (default is to escape as \uXXXX). Available since PHP 5.4.0.
So:
$array = json_decode($json, JSON_UNESCAPED_UNICODE);
I am getting the lovely � box where spanish characters should be displayed. (ie: ñ, á, etc). I have already made sure that my meta http-equiv is set to utf-8:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I have also made sure that the page header is set for utf-8 also:
header('Content-type: text/html; charset=UTF-8');
Here is the beginning stages of my code thus far:
<?php
setlocale(LC_ALL, 'es_MX');
$datetime = strtotime($event['datetime']);
$date = date("M j, Y", $datetime);
$day = strftime("%A", $datetime);
$time = date("g:i", $datetime);
?>
<?= $day ?> <?= $time ?>
The above code is in a where statement. I have read that switching the collation in the database can also be a factor but I already have it set to UTF-8 General ci. Plus, the only thing that is in that column is DateTime anyway which is numbers and cannot be collated anyway.
result: s�bado 8:00
Any help is greatly appreciated as always.
Things to consider in PHP/MySQL/UTF-8
The database tables and text columns should be set to UTF-8
HTML page Content-Type should be set to UTF-8
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
PHP should send a header informing the browser to expect UTF-8
header('Content-Type: text/html; charset=utf-8' );
The PHP-MySQL connection should be set to UTF-8
mysqli_query("SET CHARACTER_SET_CLIENT='utf8'",$conn);
mysqli_query("SET CHARACTER_SET_RESULTS='utf8'",$conn);
mysqli_query("SET CHARACTER_SET_CONNECTION='utf8'",$conn);
PHP ini has default_charset setting it should be utf-8
if you do not have access to it use ini_set('default_charset', 'utf-8');
I have suffered this problem for many years and I can't find any logic and I have tried all the solutions above.
One solution is to make html codes for all text.
Here is a function I have used when all else has failed.
function span_accent($wordz)
{
$wordz = str_replace( "Á","Á",$wordz);
$wordz = str_replace( "É","É",$wordz);
$wordz = str_replace( "Í","Í",$wordz);
$wordz = str_replace( "Ó","Ó",$wordz);
$wordz = str_replace( "Ú","Ú",$wordz);
$wordz = str_replace( "Ñ","Ñ",$wordz);
$wordz = str_replace( "Ü","Ü",$wordz);
$wordz = str_replace( "á","á",$wordz);
$wordz = str_replace( "é","é",$wordz);
$wordz = str_replace( "í","í",$wordz);
$wordz = str_replace( "ó","ó",$wordz);
$wordz = str_replace( "ú","ú",$wordz);
$wordz = str_replace( "ñ","ñ",$wordz);
$wordz = str_replace( "ü","ü",$wordz);
$wordz = str_replace( "¿","¿",$wordz);
$wordz = str_replace( "¡","¡",$wordz);
$wordz = str_replace( "€","€",$wordz);
$wordz = str_replace( "«","«",$wordz);
$wordz = str_replace( "»","»",$wordz);
$wordz = str_replace( "‹","‹",$wordz);
$wordz = str_replace( "›","›",$wordz);
return $wordz;
}
Kindly check your file ENCODING. It must be in UTF-8 or UTF-8 without BOM.
To change you file encoding. Use Notepad++(you can use also other editor where you can change the file encoding). In menu bar > Choose ENCODING > Choose any UTF-8 or UTF-8 without BOM.
See link for the difference of UTF-8 and UTF-8 without BOM.
What's different between UTF-8 and UTF-8 without BOM?
Hope it can help. :)
Having a similar problem, I found the answer here.
Not Displaying Spanish Characters
The resolution was to change from UTF-8 to windows-1252.
(HTML) <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
(PHP) ini_set('default_charset', 'windows-1252');
My problem was reading Spanish characters from a CSV file. When I opened the file in Excel, the characters appeared fine. In my editor, the odd character was shown regardless of the intended character. This change seems to work for my requirements.
it's important to check that your code is also codified as UTF-8 (you can see this property in a lot of text and code editors).
Because there is only one symbol (the black square), its probably that you are using ISO-8859-1 or ISO-8859-15 .
Can you see that the content is correct in the database table, look at it with phpmyadmin for eg. If it is, be sure your php files are utf8 encoded, take a look at your ide/editor configuration.
Use utf8mb4 or Windows-1252
ini_set('default_charset', 'utf8mb4');
or
header('Content-Type: text/html; charset=utf8mb4');
then use tag,
<meta charset="utf8mb4">
I have a form and in a textarea I want to display some text that have some spanish characters but encoded as html. The problem is that instead of the spanish character it displays the html code. I'm using htmlentities to display it in the form. my code to display is:
<?php echo htmlentities($string, ENT_QUOTES, "UTF-8") ?>
Any idea or I just shouldnt use htmlentities in a form? Thanks!
EDIT
Lets say $string = 'á'
When I just do <?php echo $string ;?> I get á
If I do <?php echo htmlentities($string, ENT_QUOTES, "UTF-8") ?> I get á
I'm so confused!
You can try explicitly adding content type at the top of your file as below
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
if it's already encoded as html then you need to decode it now..you can use html_entity_decode($string);
Your string to be echoed in the form should be á as returned from database and not á
$string = 'á'; // your string as fetched from database
echo html_entity_decode($string);// this will display á in the textarea
and before saving to database you need to
htmlentities($_POST['txtAreaName'], ENT_QUOTES, "UTF-8"); // return `á`
If I understand you correctly, you need to use...
<meta charset="utf-8">
in your page header, and then...
<?php echo html_entity_decode($string, ENT_QUOTES); ?>
This will convert your HTML entities back to their proper characters
You might be looking for htmlspecialchars.
echo htmlspecialchars('<á>', ENT_COMPAT | ENT_HTML5, "UTF-8");
outputs <á>.
I store a json string that contains some (chinese ?) characters in a mysql database.
Example of what's in the database:
normal.text.\u8bf1\u60d1.rest.of.text
On my PHP page I just do a json_decode of what I receive from mysql, but it doesn't display right, it shows things like "½±è§�"
I've tried to execute the "SET NAMES 'utf8'" query at the beginning of my file, didn't change anything.
I already have the following header on my webpage:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
And of course all my php files are encoded in UTF-8.
Do you have any idea how to display these "\uXXXX" characters nicely?
This seems to work fine for me, with PHP 5.3.5 on Ubuntu 11.04:
<?php
header('Content-Type: text/plain; charset="UTF-8"');
$json = '[ "normal.text.\u8bf1\u60d1.rest.of.text" ]';
$decoded = json_decode($json, true);
var_dump($decoded);
Outputs this:
array(1) {
[0]=>
string(31) "normal.text.诱惑.rest.of.text"
}
Unicode is not UTF-8!
$ echo -en '\x8b\xf1\x60\xd1\x00\n' | iconv -f unicodebig -t utf-8
诱惑
This is a strange "encoding" you have. I guess each character of the normal text is "one byte" long (US-ASCII)? Then you have to extract the \u.... sequences, convert the sequence in a "two byte" character and convert that character with iconv("unicodebig", "utf-8", $character) to an UTF-8 character (see iconv in the PHP-documentation). This worked on my side:
$in = "normal.text.\u8bf1\u60d1.rest.of.text";
function ewchar_to_utf8($matches) {
$ewchar = $matches[1];
$binwchar = hexdec($ewchar);
$wchar = chr(($binwchar >> 8) & 0xFF) . chr(($binwchar) & 0xFF);
return iconv("unicodebig", "utf-8", $wchar);
}
function special_unicode_to_utf8($str) {
return preg_replace_callback("/\\\u([[:xdigit:]]{4})/i", "ewchar_to_utf8", $str);
}
echo special_unicode_to_utf8($in);
Otherwise we need more Information on how your string in the database is encoded.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That's a red herring. If you serve your page over http, and the response contains a Content-Type header, then the meta tag will be ignored. By default, PHP will set such a header, if you don't do it explicitly. And the default is set as iso-8859-1.
Try with this line:
<?php
header("Content-Type: text/html; charset=UTF-8");
i have a few problems with multi language support.
My website is using charset iso 8859 1
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
When i the fetched title or content is in chinese, the display will be funky text
$doc = new DOMDocument;
if (#$doc->load($url) === false) return;
$title = $doc->getElementsByTagName("title")->item(0)->nodeValue;
$content = $doc->getElementsByTagName("content")->item(0)->nodeValue;
However if i change my header to UTF-8, it will work, however due to other scripts i wont be able to do that. any idea how?
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
In your case, utf8_decode() will do:
$title = utf8_decode($title);
$content= utf8_decode($content);
For more complex conversions from one character set to another, one would usually use iconv() or mb_convert_encoding().
e.g.
$title = iconv("UTF-8", "iso-8859-1", $title);
$content = iconv("UTF-8", "iso-8859-1", $content);
Chinese characters won't display correct if your web page charset is iso-8859-1
pick UTF-8 or gb2312, big5
then convert it using mb_convert_encoding
mb_detect_order(array('utf-8', 'big5', 'gb2312'));
$in_encoding = mb_detect_encoding($str);
if (!$in_encoding || $in_encoding=='EUC-CN' || $in_encoding=='BIG-5')
{
$str = mb_convert_encoding($str, 'UTF-8');
}