I have a database with data in windows-1253 encoding.
I'm trying to convert them to utf8 with iconv function and display them in a page but I get characters like these: g óôçí åðüìåíç ôáéíßá ôïõ
Any thoughts?
This is the code I use
iconv(mb_detect_encoding($this->row["question"], mb_detect_order(), true),"UTF-8",htmlentities(stripslashes($this->row["question"])))
If you know the encoding is windows-1253, then simply try to use:
iconv('Windows-1253','UTF-8', $text);
Related
I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions.
Currently it looks like this:
$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));
However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.
I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting mb_detect_encoding decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.
So, is there a way to properly detect ASCII to feed to mb_convert_encoding as the source encoding?
Specifying a custom order, where ASCII is detected first, works.
mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');
For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php
You can specified explicitly
$val = mb_convert_encoding($val, 'UTF-8', 'ASCII');
EDIT:
$val = mb_convert_encoding($val, 'UTF-8', 'auto');
If you do not want to worry about what encodings you will allow, you can add them all
$encoding = mb_detect_encoding($val, implode(',', mb_list_encodings()));
I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions.
Currently it looks like this:
$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));
However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.
I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting mb_detect_encoding decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.
So, is there a way to properly detect ASCII to feed to mb_convert_encoding as the source encoding?
Specifying a custom order, where ASCII is detected first, works.
mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');
For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php
You can specified explicitly
$val = mb_convert_encoding($val, 'UTF-8', 'ASCII');
EDIT:
$val = mb_convert_encoding($val, 'UTF-8', 'auto');
If you do not want to worry about what encodings you will allow, you can add them all
$encoding = mb_detect_encoding($val, implode(',', mb_list_encodings()));
I'm having a problem with PHP's htmlentities and the é character. I know it's some sort of encoding issue I'm just overlooking, so hopefully someone can see what I'm doing wrong.
Running a straight htmlentities("é") does not return the correct code as expected (either é or é. I've tried forced the charset to be 'UTF-8' (using the charset parameter of htmlentities) but the same thing.
The ultimate goal is to have this character sent in an HTML email encoded in 'ISO-8859-1'. When I try to force it into that encoding, same issue. In the source of the email, you see é, and in the HTML view é.
Who can shed some light on my mistake?
// I assume that your page is utf-8 encoded
header("Content-type: text/html;charset=UTF-8");
$in_utf8encoded = "é à ù è ò";
// first you need the convert the string to the charset you want...
$in_iso8859encoded = iconv("UTF-8", "ISO-8859-1", $in_utf8encoded);
// ...in order to make htmlentities work with the same charset
$out_iso8859= htmlentities($in_iso8859encoded, ENT_COMPAT, "ISO-8859-1");
// then only to display in your page, revert it back to utf-8
echo iconv("ISO-8859-1", "UTF-8", $out_iso8859);
I have added htmlspecialchars for you to see that it is really encoded
http://sandbox.phpcode.eu/g/11ce7/4
<?PHP
echo htmlspecialchars(htmlentities("é", ENT_COMPAT | ENT_HTML401, "UTF-8"));
I suggest you take a look at http://php.net/html_entity_decode . You can use this in the following way:
$eacute = html_entity_decode('é',ENT_COMPAT,'iso-8859-1');
This way you don't have to care about the encoding of the php file.
edit: typo
I fixed with
$string = htmlentities($string,ENT_QUOTES | ENT_SUBSTITUTE,"ISO-8859-1");
If you have stored the special characters as é, then you could use the following soon after making connection to the database.
mysql_set_charset('utf8', $dbHandler);
With this, you now don't need to use htmlentities while displaying data.
How convert PHP value from windows-1257 to UTF-8? I tried many ways, but they was not successful. I have lttu�s and I wanna convert this to littūs.
utf8_encode();
iconv_set_encoding("windows-1257", "UTF-8");
mb_convert_encoding()
Doesn't work. :(
Can anybody help me?
$encoded= iconv ("CP1257","UTF-8", $string)
Use mb_convert_encoding($data, 'UTF-8', 'ISO-8859-13');
Have you checked that the page you are using to display the converted string has the Encoding and CodePage set correctly?
I have an UTF-8 encoded txt file and I want to import it to latin1_general_ci table.
Problem is that some characters display as ? in database and not as they supposed to.
I tried mb_convert_encoding($str, "ISO-8859-1", "UTF-8"); but that didn't do anything.
What am I doing wrong?
Latin1 doesn't include all Unicode characters. You can use iconv() with //TRANSLIT option to transliterate unknown characters to their closest latin1 equivalents:
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)
I use utf8_decode, it works for me.
You can convert them to binary and then convert it back to latin
insert into table values
(convert(binary convert(data using utf8) using latin1))