I collect a string from an HTML form, and pass it to an external SMS API that converts everything to UTF-8. I have a real hard time with random special characters displaying weird after the conversion. I have <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> set in the html header, I have tried accept-charset="UTF-8" in the form element and various other MBDETECT, ICONV functions with no luck. For Example:
$text = "¢";
$new = iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
echo $new;
echo utf8_encode($new);
The cent sign always ends up looking like : ¢ after being re-encoded to UTF-8 by the extrenal api I use no matter what I try. This is just code I use for testing. If utf8_encode() echos out ¢ instead of ¢, my problem is solved. The result is fine in html, The problem is it is being sent via SMS so when they receive the text message, the symbol looks like ¢ instead of ¢
Related
I've got a problem with some specials characters in PHP. I have a table in mysql (utf8_hungarian_ci) that contains some text with special characters like á, á, Ó, Ö, ö, ü, and I would like to show this text on my page. I've tested:
$text = htmlentities($text); //to convert the simple spec chars
$search = array("& otilde;","&O tilde;","& ucirc;","&U circ;");
$replace = array("& #337;","& #336;","& #369;","& #368;");
$text = str_replace($search, $replace, $text);
echo $text;
But this code works only if $text isn't set from database. If I use this code and my $text is selected from database, it doesn't shows me any text, and if I only use:
echo $text; without htmlentities and replacements
I get characters like this one: �
I know there were some questions about this and I have tried accepted answers, but it still doesn't work, so please help me if you want and if you have time. Thank you anyway. A good day to you all!
Also try setting in your header to use UTF-8 encoding.
In your PHP file, add
header('Content-type: text/html; charset=utf-8');
as well as specifying the encoding to be UTF-8 in your <meta> tag, to ensure that you told the browser. And see if it fixes the issue.
As well as including UTF-8 encoding in your meta tag.
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
...
</head>
Edit:
If you have access to Apache configuration, see if AddDefaultCharset is set to another encoding.
Try using mysql_set_charset() (mysqli_set_charset() if you're using MySQLi).
Try to put this in you html header:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
(Also, you may need to save your file in "utf-8" file encoding)
.
Secondly, you could use this to try to tranlate-or-remove the disturbing char that always prints out in your case:
$str_out = #iconv("ISO-8859-1", "UTF-8//TRANSLIT//IGNORE", $str_in);
This is a slightly generic answer but please read up this article I wrote on common character-encoding pitfalls in the PHP/MySQL stack and if you still have problems let's try to work through them.
http://webmonkeyuk.wordpress.com/2011/04/23/how-to-avoid-character-encoding-problems-in-php/
I have the following text which I manually enter into the wordpress posts table
‚
I encode into utf-8 using:
$text = "‚";
$enc = mb_detect_encoding($text, "UTF-8,ISO-8859-1");
$hotelDescription = iconv($enc, "UTF-8", $text);
However, when wordpress echoes it it displays
‚
Any ideas who I can output the correct characters?
You need to specify that the page displaying that string render using UTF-8 encoding. The output you posted is the iso-8859-1 version of that utf-8 string. Assuming the data is being stored in the database correctly as UTF-8 ensure the page where this string is being rendered has the following meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Looks like you're missing the & before #226;
My PHP script parses a web site and pulls out an HTML DIV that looks like this (and saves it as a string)
<div id="merchantinfo">The following merchants: Nautica®, Brookstone®, Teds® ©2012 Blabla</div>
I store this as $merchantList (string).
However, when I output the data to the webpage
echo $merchantList
The encoding gets messed up and displays as:
Nautica®, Brookstone®, Teds® ©2012 Blabla
I tried adding the following to the display page:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
But that didn't do anything. --Thanks
EDIT:: ------------
For the question, the accepted answer is correct.
But I realized my actual issue was slightly different.
The initial parsing using DOMDocument::loadHTML had already mangled the UTF-8 encoding, causing the string to save as
<div id="merchantinfo">The following merchants: Nauticaî, Brookstoneî, Tedsî ©2012 Blabla</div>
This was solved by:
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($html);
Use:
ini_set('default_charset', 'UTF-8');
And do not use iso-8859-1. Use UTF-8.
From the mojibake you posted the input string is utf-8, not iso-8859-1.
You need just to Use htmlspecialchars_decode function , exemple :
$string = '"hello dude"';
$decodechars = htmlspecialchars_decode($string);
echo $decodechars; // output : "hello dude"
I have some problems with char conversion on my php's page header.
I have to develop a snippet of code that, means WS (xml-rpc protocol), can interface with another snippet of code wrote in python.
This is python snippet's output:
Output={'metaTagKeyWords': '', 'metaTagTitle': '10% DISCOUNT FOR 3 NIGHTS','metaTagDescription': 'Questa \xc3\xa8 una prova: devo vedere che succede.\r\n\r\nProva prova.\r\n\r\nDaje.\r\n\r\nENGLISH VERSION !!!!\r\n'}
So I have to convert some char: first of all \xc3\xa8 that is the unicode conversion of "è" and, in a second time, the "\r\n\" chars.
I know how to procede with "\r\n\" chars, but I don't know how to convert the unicode char.
I have had alredy tried to do something like this:
htmlentities($data[$META_TITLE_KEY], ENT_QUOTES, 'UTF-8')
But it dind't work.
Moreover, I had alredy tried to convert in pyhon the string in UTF-8 (so that entity would be u'\xc3' or something like that, but the results are pretty the same.)
An additional info: that conversion have to be used on php file header, into "meta tag description" tag.
EDIT1:
It's seems to be that, what we belive as an UTF-8, is instead a LATIN-1. So, if i change in the header that part:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
in
<meta http-equiv="content-type" content="text/html;charset=ISO-8859-1" />
it works.
But I have to have a utf-8 charset; so I suppose that have to do something in python applicative logic (because when I go from editor to DB i encode something while when I return from DB to editor I decode something).
Stay tune for more info
EDIT2:
Maybe some function that i use to save my data onto Postrges DMB, convert data in latin-1 and then in utf-8. So, if I add this instruction:
d_meta[element] = codeDbToEditor(d_meta[element]).replace('\r\n', ' ').decode('latin-1')
everything seems to works.
Have I had the right "insipration"?
$str="Hello Loréane";
echo utf8_encode($str);
Hope It Helps
I'm trying to read a source code for a webPage that contains Arabic text but all what am getting is this جامعة (which is not Arabic, only a group of characters).
If I reload the page on my localhost I get the Arabic tags and text correctly.
But I really need to read that source code. any suggestions or lines of code I can add?
<html dir=rtl>
<META http-equiv=Content-Type content=text/html;charset=windows-1256>
These are few lines from that include the "encoding" used! The page is written using HTML and PHP
The characters are merely escaped to HTML entities. The browser decodes them to "real characters" when it renders the page. You can decode them yourself using html_entity_decode:
html_entity_decode('جامعة', ENT_COMPAT, 'UTF-8')
Note the last parameter, which sets the encoding the characters will be decoded to. Use whatever encoding you're working with internally, I'm just suggesting UTF-8 here.