PHP string array UTF-8 encoding fails

PHP string array UTF-8 encoding fails - php

Everything is set to UTF-8 (file encoding, MySQL [however I don't use it], Apache, meta, mbstring etc...) but check this out:
$s="áéőúöüóűí";
echo $s; //works perfectly
echo $s[0] // doesn't work. Prints out a single '?'.
I have tried almost everything. Any ideas? Thanks in advance!

It is absolutely correct behavior.
if you want to get a first letter from a multi-byte string, not first byte from binary string, you have to use mb_substr():
mb_internal_encoding("UTF-8");
echo mb_substr($s,0,1);

You should use mb_* functions for multibyte strings. mb_substr() in your case.

And if you define $s[0]="á", does it work ? I believe that when encoded in UTF-8, those special chars are stored over two UTF-chars.
If you display in ANSI some UTF-8 text, it is rendered like this :
Ã¡Ã©oÃºÃ¶Ã¼Ã³uÃ
You see that á becomes Ã¡
So rendering the first char ($s[0]) would only display the "Ã", which is an incomplete character

you have to make some changes in database go to the the table structure
you can find a column "Collation"
which column you want to change click edit on right side menu
the default Collation is - 'latin1_general_ci' change it to 'utf8_general_ci'

Related

A simple comparison in utf8, wrong result?

this code prints "no" , but it should print "ok" and utf8 encodes of two are different
$a="کیهان";
$b="كيهان";
echo utf8_encode($a)."==".utf8_encode($b)."<br>";
if(utf8_encode($a)==utf8_encode($b))
echo "ok";
else
echo "no";
and the result :
Ú©ÛÙØ§Ù==ÙÙÙØ§Ù
no
what's that © ?
edit : $a is copied and $b is typed

your unicode strings are different to begin with... shown here with spaces to hilight the point:
$a="ک ی ه ن";
$b="ك ي ه ن";
EDIT: for curiosity's sake...
Seems that they display identically in the tab at the top of the file, which must have font features which combine characters together, but displays differently in the body of code, where it is actually displayed back to front.

EDIT:
Billy's completely right (+1) about why the strings are not equal. This answer may explain why you see garbage text after the conversion.
I'm guessing that your original encoding is not ISO-8859-1.
See the first comment in the docs.
Please note that utf8_encode only converts a string encoded in
ISO-8859-1 to UTF-8. A more appropriate name for it would be
"iso88591_to_utf8". If your text is not encoded in ISO-8859-1, you do
not need this function. If your text is already in UTF-8, you do not
need this function. In fact, applying this function to text that is
not encoded in ISO-8859-1 will most likely simply garble that text.
You may want iconv instead.

array_key_exists Cyryllic characters

Can't get this to work with Cyrillic characters:
if (array_key_exists($list['fname'], $data)) {
}
Array keys are Cyrillic characters
Please help

Are all the cyrillic characters working otherwise? It seems it's probably over-runing the character set -- by default php is ansii, if I remember right. You need UTF-8.
In any case, put this at the top of that php file and see if that helps:
<?php
ini_set('default_charset', 'UTF-8');

If $list['fname'] is coming form mysql make sure you use UTF-8 charset and utf8_general_ci as collation. If its hard coded, save your php file as UTF-8.
Also you can always use a hash for the text as key.

How to use PHP htmlentities()?

In my project I currently use htmlentities() to filter data coming from the database:
echo htmlentities($variable_name);
I am in the USA and this works fine for me. My friend is in Brazil and for him some text characters don't show up correctly.
How can I use htmlentities() so it internationalizes properly?

The problem could be that the output is not encoded in UTF-8. According to the php docs for htmlentities, the function
takes an optional third argument
charset which defines character set
used in conversion. Presently, the
ISO-8859-1 character set is used as
the default.
So you can try calling
htmlentities($string, ENT_COMPAT, 'UTF-8');
instead, and that might fix the problem, since it's not the default character encoding.

While I suspect Keoki has it correct, another possible problem could be the font. If using a special character where your friend's font doesn't contain that character, they'll just see the missing character sign. In the webpage or whatever medium you are using to post the character, be sure that a font is set, as there's no guarentees on the default font working.
If neither of these be the case though, what is an example character that isn't showing up? Can you post the full code you are using?

You can also try iconv to Convert string to requested character encoding
http://www.php.net/manual/en/function.iconv.php

Convert foreign characters with accents

I'm trying to compare some text to the text in a database. In the database any text with an accent is encoded like in HTML (i.e. é) when I compare the database text to my string it doesn't match because my string just shows é. When I use the PHP function htmlentities to encode the string first the é turns into Ã© weird? Using htmlspecialchars doesn't encode the é at all.
How would you suggest I compare é to é as well as all the other accented characters?

You need to send in the correct charset to htmlentities. It looks like you're using UTF-8, but the default is ISO-8859-1. Change it like this:
$encoded = htmlentities($text, ENT_COMPAT, 'UTF-8');
Another solution is to convert the text to ISO-8859-1 before encoding, but that may destroy information (ISO-8859-1 does not contain nearly as many characters as UTF-8). If you want to try that instead, do like this:
$encoded = htmlentities(utf8_decode($text));

I'm working on french site, and I also had same problem. This is the function that I use.
function convert_accent($string)
{
return htmlspecialchars_decode(htmlentities(utf8_decode($string)));
}
What it does it decodes your string to utf8, than converts everything HTML entities. even tags. But we want to convert tags back to normal, than htmlspecialchars_decode will convert them back. So in the end you will get a string with converted accents without touching tags.
You can use pass through this function your email content before sending it to recipent.
Another issue you might face is that, sometimes with this function the content from database converts to ? . In this case you should do this before running your query:
mysql_query("SET NAMES `utf8`");
But you might need to do it, it depends on encoding in your table. I hope it helps.

The comparing task is related to the charset and the collation you selected when you create the database or the tables. If you are saving strings with a lot of accents like spanish I sugget you to use charset uft8 and the collation could be the more accurate to the language(english, french or whatever) you're using.
The best thing of using the correct charset in the database is that you can save the string in natural way e.g: my name I can store it as is "Mario Juárez" and I have no need of doing some weird conversions.

Ran into similar issues recently. Followed Emil's answer and it worked fine locally but not on our dev/stage environments. I ended up using this and it worked all around:
$title = html_entity_decode(utf8_decode($item));
Thanks for leading me in the right direction!

Read ansi file and convert to UTF-8 string

Is there any way to do that with PHP?
The data to be inserted looks fine when I print it out.
But when I insert it in the database the field becomes empty.

$tmp = iconv('YOUR CURRENT CHARSET', 'UTF-8', $string);
or
$tmp = utf8_encode($string);
Strange thing is you end up with an empty string in your DB. I can understand you'll end up with some garbarge in your DB but nothing at all (empty string) is strange.
I just typed this in my console:
iconv -l | grep -i ansi
It showed me:
ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
MS-ANSI
These are possible values for YOUR CURRENT CHARSET
As pointed out before when your input string contains chars that are allowed in UTF, you dont need to convert anything.
Change UTF-8 in UTF-8//TRANSLIT when you dont want to omit chars but replace them with a look-a-like (when they are not in the UTF-8 set)

"ANSI" is not really a charset. It's a short way of saying "whatever charset is the default in the computer that creates the data". So you have a double task:
Find out what's the charset data is using.
Use an appropriate function to convert into UTF-8.
For #2, I'm normally happy with iconv() but utf8_encode() can also do the job if source data happens to use ISO-8859-1.
Update
It looks like you don't know what charset your data is using. In some cases, you can figure it out if you know the country and language of the user (e.g., Spain/Spanish) through the default encoding used by Microsoft Windows in such territory.

Be careful, using iconv() can return false if the conversion fails.
I am also having a somewhat similar problem, some characters from the Chinese alphabet are mistaken for \n if the file is encoded in UNICODE, but not if it is UFT-8.
To get back to your problem, make sure the encoding of your file is the same with the one of your database. Also using utf-8_encode() on an already utf-8 text can have unpleasant results. Try using mb_detect_encoding() to see the encoding of the file, but unfortunately this way doesn't always work. There is no easy fix for character encoding from what i can see :(

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP string array UTF-8 encoding fails - php

Everything is set to UTF-8 (file encoding, MySQL [however I don't use it], Apache, meta, mbstring etc...) but check this out: $s="áéőúöüóűí"; echo $s; //works perfectly echo $s[0] // doesn't work. Prints out a single '?'. I have tried almost everything. Any ideas? Thanks in advance!

It is absolutely correct behavior. if you want to get a first letter from a multi-byte string, not first byte from binary string, you have to use mb_substr(): mb_internal_encoding("UTF-8"); echo mb_substr($s,0,1);

You should use mb_* functions for multibyte strings. mb_substr() in your case.

you have to make some changes in database go to the the table structure you can find a column "Collation" which column you want to change click edit on right side menu the default Collation is - 'latin1_general_ci' change it to 'utf8_general_ci'

Related

A simple comparison in utf8, wrong result?

array_key_exists Cyryllic characters

How to use PHP htmlentities()?

Convert foreign characters with accents

Read ansi file and convert to UTF-8 string

Categories

Resources