I have an UTF-8 encoded txt file and I want to import it to latin1_general_ci table.
Problem is that some characters display as ? in database and not as they supposed to.
I tried mb_convert_encoding($str, "ISO-8859-1", "UTF-8"); but that didn't do anything.
What am I doing wrong?
Latin1 doesn't include all Unicode characters. You can use iconv() with //TRANSLIT option to transliterate unknown characters to their closest latin1 equivalents:
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)
I use utf8_decode, it works for me.
You can convert them to binary and then convert it back to latin
insert into table values
(convert(binary convert(data using utf8) using latin1))
Related
I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions.
Currently it looks like this:
$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));
However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.
I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting mb_detect_encoding decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.
So, is there a way to properly detect ASCII to feed to mb_convert_encoding as the source encoding?
Specifying a custom order, where ASCII is detected first, works.
mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');
For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php
You can specified explicitly
$val = mb_convert_encoding($val, 'UTF-8', 'ASCII');
EDIT:
$val = mb_convert_encoding($val, 'UTF-8', 'auto');
If you do not want to worry about what encodings you will allow, you can add them all
$encoding = mb_detect_encoding($val, implode(',', mb_list_encodings()));
I'm trying to convert Hebrew characters from UTF-8 to ISO-8859-8-1 in order to save them into a file.
I have read about ten posts here , in this site,
no matter what I do, I always get question marks (???????) instead of hebrew letters.
I tried iconv(), mb_convert_encoding(), utf8_decode(), all of them convert from UTF-8 to ISO-8859-8-1 but I keep getting '?????????' in the file.
mb_convert_encoding($fullRecord, 'ISO-8859-1', 'UTF-8');
iconv("UTF-8", "ISO-8859-1", $fullRecord);
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $fullRecord);
Even this post didn't help because the solution there is in javascript:
Conversion from UTF8 to ASCII
I wish it could be in php...
I know that there are no hebrew characters in ASCII, but i have an example file that shows it can be done. when I open the file in notepad , it shows hebrew ok and the file is ANSI , so I guess it can be done somehow...
anyone please help?
try
iconv("UTF-8", "windows-1255", $fullRecord);
I have a database with data in windows-1253 encoding.
I'm trying to convert them to utf8 with iconv function and display them in a page but I get characters like these: g óôçí åðüìåíç ôáéíßá ôïõ
Any thoughts?
This is the code I use
iconv(mb_detect_encoding($this->row["question"], mb_detect_order(), true),"UTF-8",htmlentities(stripslashes($this->row["question"])))
If you know the encoding is windows-1253, then simply try to use:
iconv('Windows-1253','UTF-8', $text);
I try to eject text from Word .DOC file with PHP. All seems ok, but the only trouble is something like
СУДОВА БУХГАЛТЕРІЯ
instead of russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?
html_entity_decode should work with the proper parameters (unless you’re using PHP 5.3.3 or later):
html_entity_decode($str, ENT_QUOTES, 'UTF-8')
This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter’s default value was ISO-8859-1. In that case the cyrillic characters can’t be converted as the ISO 8859-1 character set doesn’t contain them.
I have a longtext in my db where i have some special chars like Ã
How i can convert it to "à"? I've tried using utf8_encode and _decode but it seems not work.
Document charset is utf8, and longtext field too.
It's not about encoding but html entities : http://php.net/manual/fr/function.html-entity-decode.php