I am getting the lovely � box where spanish characters should be displayed. (ie: ñ, á, etc). I have already made sure that my meta http-equiv is set to utf-8:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I have also made sure that the page header is set for utf-8 also:
header('Content-type: text/html; charset=UTF-8');
Here is the beginning stages of my code thus far:
<?php
setlocale(LC_ALL, 'es_MX');
$datetime = strtotime($event['datetime']);
$date = date("M j, Y", $datetime);
$day = strftime("%A", $datetime);
$time = date("g:i", $datetime);
?>
<?= $day ?> <?= $time ?>
The above code is in a where statement. I have read that switching the collation in the database can also be a factor but I already have it set to UTF-8 General ci. Plus, the only thing that is in that column is DateTime anyway which is numbers and cannot be collated anyway.
result: s�bado 8:00
Any help is greatly appreciated as always.
Things to consider in PHP/MySQL/UTF-8
The database tables and text columns should be set to UTF-8
HTML page Content-Type should be set to UTF-8
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
PHP should send a header informing the browser to expect UTF-8
header('Content-Type: text/html; charset=utf-8' );
The PHP-MySQL connection should be set to UTF-8
mysqli_query("SET CHARACTER_SET_CLIENT='utf8'",$conn);
mysqli_query("SET CHARACTER_SET_RESULTS='utf8'",$conn);
mysqli_query("SET CHARACTER_SET_CONNECTION='utf8'",$conn);
PHP ini has default_charset setting it should be utf-8
if you do not have access to it use ini_set('default_charset', 'utf-8');
I have suffered this problem for many years and I can't find any logic and I have tried all the solutions above.
One solution is to make html codes for all text.
Here is a function I have used when all else has failed.
function span_accent($wordz)
{
$wordz = str_replace( "Á","Á",$wordz);
$wordz = str_replace( "É","É",$wordz);
$wordz = str_replace( "Í","Í",$wordz);
$wordz = str_replace( "Ó","Ó",$wordz);
$wordz = str_replace( "Ú","Ú",$wordz);
$wordz = str_replace( "Ñ","Ñ",$wordz);
$wordz = str_replace( "Ü","Ü",$wordz);
$wordz = str_replace( "á","á",$wordz);
$wordz = str_replace( "é","é",$wordz);
$wordz = str_replace( "í","í",$wordz);
$wordz = str_replace( "ó","ó",$wordz);
$wordz = str_replace( "ú","ú",$wordz);
$wordz = str_replace( "ñ","ñ",$wordz);
$wordz = str_replace( "ü","ü",$wordz);
$wordz = str_replace( "¿","¿",$wordz);
$wordz = str_replace( "¡","¡",$wordz);
$wordz = str_replace( "€","€",$wordz);
$wordz = str_replace( "«","«",$wordz);
$wordz = str_replace( "»","»",$wordz);
$wordz = str_replace( "‹","‹",$wordz);
$wordz = str_replace( "›","›",$wordz);
return $wordz;
}
Kindly check your file ENCODING. It must be in UTF-8 or UTF-8 without BOM.
To change you file encoding. Use Notepad++(you can use also other editor where you can change the file encoding). In menu bar > Choose ENCODING > Choose any UTF-8 or UTF-8 without BOM.
See link for the difference of UTF-8 and UTF-8 without BOM.
What's different between UTF-8 and UTF-8 without BOM?
Hope it can help. :)
Having a similar problem, I found the answer here.
Not Displaying Spanish Characters
The resolution was to change from UTF-8 to windows-1252.
(HTML) <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
(PHP) ini_set('default_charset', 'windows-1252');
My problem was reading Spanish characters from a CSV file. When I opened the file in Excel, the characters appeared fine. In my editor, the odd character was shown regardless of the intended character. This change seems to work for my requirements.
it's important to check that your code is also codified as UTF-8 (you can see this property in a lot of text and code editors).
Because there is only one symbol (the black square), its probably that you are using ISO-8859-1 or ISO-8859-15 .
Can you see that the content is correct in the database table, look at it with phpmyadmin for eg. If it is, be sure your php files are utf8 encoded, take a look at your ide/editor configuration.
Use utf8mb4 or Windows-1252
ini_set('default_charset', 'utf8mb4');
or
header('Content-Type: text/html; charset=utf8mb4');
then use tag,
<meta charset="utf8mb4">
Related
I have string that looks like this "v\u00e4lkommen till mig" that I get after doing utf8_encode() on the string.
I would like that string to become
välkommen till mig
where the character
\u00e4 = ä = ä
How can I achive this in PHP?
Do not use utf8_(de|en)code. It just converts from UTF8 to ISO-8859-1 and back. ISO 8859-1 does not provide the same characters as ISO-8859-15 or Windows1252, which are the most used encodings (besides UTF-8). Better use mb_convert_encoding.
"v\u00e4lkommen till mig" > This string looks like a JSON encoded string which IS already utf8 encoded. The unicode code positiotion of "ä" is U+00E4 >> \u00e4.
Example
<?php
header('Content-Type: text/html; charset=utf-8');
$json = '"v\u00e4lkommen till mig"';
var_dump(json_decode($json)); //It will return a utf8 encoded string "välkommen till mig"
What is the source of this string?
There is no need to replace the ä with its HTML representation ä, if you print it in a utf8 encoded document and tell the browser the used encoding. If it is necessary, use htmlentities:
<?php
$json = '"v\u00e4lkommen till mig"';
$string = json_decode($json);
echo htmlentities($string, ENT_COMPAT, 'UTF-8');
Edit: Since you want to keep HTML characters, and I now think your source string isn't quite what you posted (I think it is actual unicode, rather than containing \unnnn as a string), I think your best option is this:
$html = str_replace( str_replace( str_replace( htmlentities( $whatever ), '<', '<' ), '>', '>' ), '&', '&' );
(note: no call to utf8-decode)
Original answer:
There is no direct conversion. First, decode it again:
$decoded = utf8_decode( $whatever );
then encode as HTML:
$html = htmlentities( $decoded );
and of course you can do it without a variable:
$html = htmlentities( utf8_decode( $whatever ) );
http://php.net/manual/en/function.utf8-decode.php
http://php.net/manual/en/function.htmlentities.php
To do this by regular expression (not recommended, likely slower, less reliable), you can use the fact that HTML supports &#xnnnn; constructs, where the nnnn is the same as your existing \unnnn values. So you can say:
$html = preg_replace( '/\\\\u([0-9a-f]{4})/i', '&#x$1;', $whatever )
The html_entity_decode worked for me.
$json = '"v\u00e4lkommen till mig"';
echo $decoded = html_entity_decode( json_decode($json) );
Im storing text in a DB as UTF8.
When a post is sent via JS to my API, such symbols as ö come back as "ö"
My website html is declared as
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
My API output is sent out with a header declaring utf-8, like so:
$status_header = 'HTTP/1.1 '.$status.' '.self::getStatusCodeMessage($status);
header($status_header);
header('Content-type: ' . $content_type.'; charset=utf-8');
if ($body !== '') {
echo $body;
The only way I've managed to get round this is by using PHP on my output todo this:
private static function fixText($text) {
$replaceChars = array(
"“" => "\"",
'•' => '·',
"â€" => "\"",
"’" => "'",
'ö' => 'ö',
'â€' => "'",
"é" => "é",
"ë" => "ë",
"£" => "£"
);
foreach($replaceChars as $oldChar => $newChar) {
$text = str_replace($oldChar, $newChar, $text);
}
$text = iconv("UTF-8", "UTF-8//IGNORE", $text);
return $text;
}
Obviously this is not ideal as I have to keep adding more and more symbols to the map.
UPDATE:
A developer had sneakily added this code:
$document->text = mb_convert_encoding($document->text, mb_detect_encoding($document->text), "cp1252");
As a way to overcome old latin characters coming through damaged.
Seeing those funny characters means that you have double-encoded UTF-8 stored. You don't show how you are adding data to the database. If you use utf8_encode() on already UTF-8 encoded strings, this will be your result.
MongoDB only accepts UTF-8 but you should not encoded it yourself again, if you're already gettings UTF-8 send through to you by the webserver.
Instead of:
header('Content-type: ' . $content_type.'; charset=utf-8');
Consider setting the default charset in php.ini:
default_charset=UTF-8
Actually, I have googled a Lot, And I have explored this forum too, but this is my second day, and I could not find the solution.
My Problem is that I want to convert the Html Codes
باخ
to its equallent unicode characters
خ ا ب
Actually I do not want to convert all the html symbols to unicode characters. I only want to convert the arabic / urdu html code to unicode characters. The range of these characters is from ؛ To ۹ If there is no any PHP function then How can I replace the codes with their equallent unicode character in one go?
I think you're looking for:
html_entity_decode('باخ', ENT_QUOTES, 'UTF-8');
When you go from ب to ب, that's called decoding. Doing the opposite is called encoding.
As for replacing only characters from ؛ to ۹ maybe try something like this.
<?php
// Random set of entities, two are outside the 1563 - 1785 range.
$entities = '؛<لñ۸۹';
// Matches entities from 1500 to 1799, not perfect, I know.
preg_match_all('/[5-7][0-9]{2};/', $entities, $matches);
$entityRegex = array(); // Will hold the entity code regular expression.
$decodedCharacters = array(); // Will hold the decoded characters.
foreach ($matches[0] as $entity)
{
// Convert the entity to human-readable character.
$unicodeCharacter = html_entity_decode($entity, ENT_QUOTES, 'UTF-8');
array_push($entityRegex, "/$entity/");
array_push($decodedCharacters, $unicodeCharacter);
}
// Replace all of the matched entities with the human-readable character.
$replaced = preg_replace($entityRegex, $decodedCharacters, $entities);
?>
That's as close as I can get to solving this. Hopefully, this helps a little. It's 5:00am where I am now, so I'm off to sleep! :)
did you try the utf-8 encoding in html head?
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
try this
<?php
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
foreach($trans_tbl as $k => $v)
{
$ttr[$v] = utf8_encode($k);
}
$text = 'بب....;خ';
$text = strtr($text, $ttr);
echo $text;
?>
for mysql solution you can set the character set as
$mysqli = new mysqli($host, $user, $pass, $db);
if (!$mysqli->set_charset("utf8")) {
die("error");
}
I'm trying to move over some fish species information profiles from a bespoke CMS using latin1 charset to a WordPress customised (custom post type, with numerous meta fields) database which uses UTF-8.
On top of that, the old CMS uses some odd bbCode bits.
Basically, I'm looking for a function which will do this:
Take information from my old database with latin1_swedish_ci collation (and latin1 charset)
Convert all of the non-standard characters (we have characters from languages including but not exclusive of Croatian, Czech, Spanish, French and German) to HTML entities such as á (numbers like &134; fine too).
Convert all of the bbCode (see below) to HTML
Convert ' and " to HTML entities
Return the information with utf-8 charset to my new database
The bbCode to and from are:
$search = array( '[i]', '[/i]', '[b]', '[/b]', '[pl]', '[/pl]' );
$replace = array( '<i>', '</i>', '<strong>', '</strong>', '', '' );
The function that I've tried so far is:
$search = array( '[i]', '[/i]', '[b]', '[/b]', '[pl]', '[/pl]' );
$replace = array( '<i>', '</i>', '<strong>', '</strong>', '', '' );
function _convert($content) {
if(!mb_check_encoding($content, 'UTF-8')
OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {
$content = mb_convert_encoding($content, 'UTF-8');
if (mb_check_encoding($content, 'UTF-8')) {
return $content;
} else {
echo "<p>Couldn't convert to UTF-8.</p>";
}
}
}
function _clean($content) {
$content = _convert( $content );
/* edited out because otherwise all HTML appears as <html> rather than <html>
//$content = htmlentities( $content, ENT_QUOTES, "UTF-8" );
$content = str_replace( $search, $replace, $content );
return $content;
}
However this is stopping some fields from being imported to the new database and isn't replacing the bbCode.
If I use the following code, it mostly works:
$var = str_replace( $search, $replace, htmlentities( $row["var"], ENT_QUOTES, "UTF-8" ) );
However, certain fields containing what I think are Czech/Croatian characters don't appear at all.
Does anyone have any suggestions for how I can, in the order listed above, successfully convert the information from the "old format" to the new?
I would say if you want to convert all your non-ASCII characters you won't need to do any latin1 to UTF-8 conversion what so ever. Let's say you run a function such as htmlspecialchars or htmlentities on your data, then all non-ASCII characters will be replaced with their corresponding entity code.
Basically, after this step, there shouldn't be any characters left that needs conversion to UTF-8. Also, if you wanted to convert your latin1 encoding string into UTF-8 i strongly suspect utf8_encode will du just fine.
PS. When it comes to converting bbCode into HTML I would recommend using regular expressions instead. For example you could do it all in a line like this:
$html_data = preg_replace('/\[(/?[a-z]+)\]/i', '<$1>', $bb_code_data);
I'm trying to replace string "Red Dwarf (TV Series 1988â€") - IMDb" to "Red Dwarf (TV Series 1988') - IMDb"
I have a translation table of these funny characters in an array. I tried to replace them using: str_replace but it did not work. Can anybody suggest a workaround on this? This is the snippet of the code:
function replaceFunnyChar( $input ){
$translation = array(
'’' => "'",
"â€\"" => '-',
'é' => 'é',
'è' => 'è',
'“' => '"',
'â€' => '"',
'‘' => "'",
'â' => 'ã',
'Ã"' => 'ä',
'â€"' => '–',
'Ä«' => 'ī',
'阴' => '阴',
'é™°' => '陰',
"阳" => "阳",
"陽" => "陽",
'´' => "'",
'ü' => 'ü',
"Ã,Ã'" => "'",
'•' => '–'
);
foreach( $translation as $find => $replace ){
$output = str_replace($find, $replace, $input );
//$output = preg_replace("/" . $find . "/", $replace, $input );
}
return $output;
}
It is best to detect the encoding of the data you have (if you are scraping, then it is in the HTTP header, and overridden by the meta tag in the HTML), then you can use something such as Iconv to convert it: http://php.net/manual/en/book.iconv.php
If the data you get is UTF-8, you don't actually need to convert it. Just store it and make sure your DBMS is set up to support UTF-8. Then when displaying the data again, make sure you specify UTF-8 on your webpage.
If you are using Windows command line to show the characters, it is a little more complicated as Windows command line doesn't use UTF-8. Try Ubuntu or Mac OS X.
Also, if you already have the data but cannot download it again, then you need to make sure how you show the characters -- if shown on a webpage, then the web browser can further mess up the characters if it uses a different encoding than what it is supposed to be. You can also dump the bytes out, and replace the string using the byte sequence instead of quoted string as in the original code.
From the Top of my Head, thats an Decoding Error, you can probably get rid of it when you play around with the charsets for a while .
Anyhow, you can also just drop every char over ASCI 127:
function _dropAsciOver127($entity){
if(($asciCode = ord($entity[0])) > 127){
return '';
}else{
return $entity[0];
}
}
$weird = 'Red Dwarf (TV Series 1988â€") - IMDb';
$cool = preg_replace_callback('/[^\w\d ]/i','_dropAsciOver127', $weird);
print $cool; // prints Red Dwarf (TV Series 1988") - IMDb
I think your problem is your CHARSET, and a solution is to save the document as a UTF-8 (whitout BOM) in your text editor. Else you can add a header to your page, and it can be done like this:
HTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
PHP
header('Content-type: text/html; charset=utf-8');
Remember to set the header on top on top of the page! If you still having problems with charset, then try to change it from UTF-8 to ISO or something like that.
Make Sure These Things..
1: Table Collation type is UTF-8
2: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
if still not doing well try this before you add data in database
mysql_set_charset('utf8');