Weird char (�) appears after doing html_entity_decode - php

In a separate YML file i have :
flags: [<img src="/images/cms_bo/icons/english.png" alt="English"/>]
When i call this into my code, it's not interpreted, so i used html_entity_decode.
It works but i have only 1 strange char just before my image : �
<?php echo html_entity_decode($form['lang']->render()); ?>
All my files are UTF8 encoded. Do you have an idea on what i've missed to solve this problem ?
PS:
public static function getI18nCulturesForChoice()
{
return array_combine(self::getI18nCultures(), self::getI18nCulturesFlags());
}

Try using html_entity_decode($form['lang']->render(),ENT_QUOTES, "UTF-8");

Prior to PHP 5.3.3, the default character set for html_entity_decode was ISO-8859-1! If you're working with UTF-8, you will need to use the third argument to the function to tell it to deal with UTF-8 instead of assuming ISO-8859-1.
This is blindly assuming you're using an older version of PHP.
If you are using a newer version of PHP, consider using iconv with the //IGNORE//TRANSLIT flags to try and remove any bad UTF-8 sequences before passing the string into html_entity_decode.

Maybe your file has a Byte Order Mark (BOM) set.

Related

Can't decode GET parameter

I have a simple `url that pass two parameters. Name and cellphone. But when I use special characters, the parameter can't be decoded. It appears the ?? instead of the character.
I already tried use urldecode($_GET['name']), rawurldecode, html_entity_decode, utf8_decode, but none of this worked.
I have the utf-8 meta tag in my HTML and I also tryed pass this as a header inside php, but it didn't work.
The code is like this
<?php echo $_GET['name']; ?>
You simply have the use the correct function, which is utf8_encode:
<?php echo utf8_encode($_GET['name']); ?>
Output:
Consultório
The function utf8_encode:
This function converts the string data from the ISO-8859-1 encoding to
UTF-8.
See the documentation here.
name=Consult%F3rio
This is the good old ISO-8859-1 encoding for Consultório of the early days of the web. If the decoded version renders incorrectly, it's very likely that your application is not using ISO-8859-1 at all, thus there's no benefit in using it there either. If your app is using UTF-8, the simplest solution would be to switch entirely to UTF-8:
Consult%C3%B3rio
This is basically what you get with any builtin PHP function when fed with UTF-8 data because they work at byte level:
var_dump(rawurlencode('Consultório')); // string(16) "Consult%C3%B3rio"
If this happens to be third-party data you can't control, please check Martin's answer.

PHP greek url convert

I have a URL like: domain.tld/Σχετικά_με_μας
[edit]
Reading the $_SERVER['REQUEST_URI'] I get to work with:
%CE%A3%CF%87%CE%B5%CF%84%CE%B9%CE%BA%CE%AC_%CE%BC%CE%B5_%CE%BC%CE%B1%CF%82
[/edit]
In PHP I need to convert it to HTML, I get pretty far with:
htmlentities(urldecode($navstring), ENT_QUOTES, 'UTF-8');
It results in:
Σχετικά_με_μας
but the 'ά' becomes 'ά' But I need it converted to
ά
I'dd really appreciate help. I need a universal solution, not a "string replace"
I have been playing around a little, and the following worked. Use mb-convert-encoding instead of htmlentities.:
mb_convert_encoding(urldecode($navstring),'HTML-ENTITIES','UTF-8');
//string(90) "domain.tld/Σχετικά_με_μας"
See mb-convert-encoding
Information
All modern web browsers understand UTF-8 character encoding.
My advice would be :
Always know the character encoding of the data you are using.
Store your data with UTF-8.
Output data with UTF-8
The mbstring php extension doesn't just manipulate Unicode strings. It also converts multibyte strings between various character encodings.
Use the mb_detect_encoding() (ref) and mb_convert_encoding() (ref 2) functions to convert Unicode strings from one character encoding to another.
PHP Needs to know !
You also need to tell PHP that you are working with UTF-8, to tell him the default value, you can do it in your php.ini file :
default_charset = "UTF-8";
That default value is added to the default Content-Type header returned by PHP unless you specified it with the header() function :
header('Content-Type: application/json;charset=utf-8');
Keep in mind
The default character set is used by a lot of functions in PHP such as :
htmlentities()
htmlspecialchars()
all the mbstring functions
...

PHP string array UTF-8 encoding fails

Everything is set to UTF-8 (file encoding, MySQL [however I don't use it], Apache, meta, mbstring etc...) but check this out:
$s="áéőúöüóűí";
echo $s; //works perfectly
echo $s[0] // doesn't work. Prints out a single '?'.
I have tried almost everything. Any ideas? Thanks in advance!
It is absolutely correct behavior.
if you want to get a first letter from a multi-byte string, not first byte from binary string, you have to use mb_substr():
mb_internal_encoding("UTF-8");
echo mb_substr($s,0,1);
You should use mb_* functions for multibyte strings. mb_substr() in your case.
And if you define $s[0]="á", does it work ? I believe that when encoded in UTF-8, those special chars are stored over two UTF-chars.
If you display in ANSI some UTF-8 text, it is rendered like this :
áéoúöüóuí
You see that á becomes á
So rendering the first char ($s[0]) would only display the "í", which is an incomplete character
you have to make some changes in database go to the the table structure
you can find a column "Collation"
which column you want to change click edit on right side menu
the default Collation is - 'latin1_general_ci' change it to 'utf8_general_ci'

How to use PHP htmlentities()?

In my project I currently use htmlentities() to filter data coming from the database:
echo htmlentities($variable_name);
I am in the USA and this works fine for me. My friend is in Brazil and for him some text characters don't show up correctly.
How can I use htmlentities() so it internationalizes properly?
The problem could be that the output is not encoded in UTF-8. According to the php docs for htmlentities, the function
takes an optional third argument
charset which defines character set
used in conversion. Presently, the
ISO-8859-1 character set is used as
the default.
So you can try calling
htmlentities($string, ENT_COMPAT, 'UTF-8');
instead, and that might fix the problem, since it's not the default character encoding.
While I suspect Keoki has it correct, another possible problem could be the font. If using a special character where your friend's font doesn't contain that character, they'll just see the missing character sign. In the webpage or whatever medium you are using to post the character, be sure that a font is set, as there's no guarentees on the default font working.
If neither of these be the case though, what is an example character that isn't showing up? Can you post the full code you are using?
You can also try iconv to Convert string to requested character encoding
http://www.php.net/manual/en/function.iconv.php

htmlspecialchars(): Invalid multibyte sequence in argument

I am getting this error in my local site.
Warning (2): htmlspecialchars(): Invalid multibyte sequence in argument in [/var/www/html/cake/basics.php, line 207]
Does anyone knows, what is the problem or what should be the solution for this?
Thanks.
Be sure to specify the encoding to UTF-8 if your files are encoded as such:
htmlspecialchars($str, ENT_COMPAT, 'UTF-8');
The default charset for htmlspecialchars is ISO-8859-1 (as of PHP v5.4 the default charset was turned to 'UTF-8'), which might explain why things go haywire when it meets multibyte characters.
I ran in to this error on production and found this great post about it -
http://insomanic.me.uk/post/191397106/php-htmlspecialchars-htmlentities-invalid
It appears to be a bug in PHP (for CentOS at least) that displays this error on when display errors is Off!
You are feeding corrupted character data into the function, or not specifying the right encoding.
I had this issue a while ago, old behavior (prior to PHP 5.2.7 I believe) was to return the string despite corruption, but since that version it will throw this error instead.
My solution involved writing a script to feed my strings through iconv using the //IGNORE modifier to remove corrupted data.
(We had a corrupted database which had some strings in UTF-8, some in latin-1 usually with incorrectly defined character types on the columns).
(Looking at the comment to Tatu's answer, I would start by looking at (and playing with) the contents of the $charset variable.
The correct code in order not to get any error is:
htmlentities($string, ENT_IGNORE, 'UTF-8') ;
Beside this you can also use str_replace to replace some bad characters to your needs and then use htmlentities function.
Have a look at this rss feed it replaced the greater html sign to gt; tag which might not look nice when reading thee rss feed. You can replace this with something like "-" sign or ")" and etc.
Had the same problem because I was using substr on utf-8 string.
Error was infrequent and seemingly random. Error occurred only if string was cut on multibyte char!
mb_substr solved the problem :)
That's actually one of the most frequent errors I get.
Sometimes I dont use __() translation - just plain German text containing äöü.
There it is especially important to mind the encoding of the files.
So make sure you properly save the files that contain special chars as UTF8.

Categories