Why does the htmlentities function not properly work? - php

I tried the htmlentities() function with PHP 5 with this code:
<?php
$string="Einstürzende Neubauten"; echo htmlentities($string);
?>
And it only displays two whitespaces (i.e. " "). Why is that? I tried to replace the "u with diaeresis" char with another and it works. How can i get that work too?

use charset for your given content to .... eg
$res = htmlentities ( $string, ENT_COMPAT, 'UTF-8');
For more informations take a look in the manual htmlentities()
Which PHP-Version did you use?
maybe this could be a solution for you
$string = mb_convert_encoding ($str , "UTF-8");
// testing
var_dump($string);
$res = htmlentities ( $string, ENT_COMPAT, 'UTF-8');
// testing
var_dump($res);
See PHP manual

I had same problem when I upgraded the PHP version from 5.2 to 5.6. I wrote:
$res = htmlentities("Producción", ENT_IGNORE);
And I got
Produccin
but I solved it, adding this after connect to database
mysqli_set_charset($idCon,'utf8');

Related

code for removing utf8 characters does not work on php 5.3.3

I'm using this code:
$technical = iconv("UTF-8", "UTF-8//IGNORE", $technical);
It works good on >php7 but it does not work on php5.3.3
It's the same code, is there any replace solution for version 5.3.3?
Try with this regex:
$regex = "/((?: [\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3}){1,100})| ./x";
$technical = preg_replace($regex, '$1', $technical);
Edit, because of your character is a xA0 try this:
$technical = str_replace("\xc2\xa0",' ',$technical);

Handling UTF-8 string in PHP 5.3

We are using some third-party php library functions and have some difficulties converting utf-8 strings.
After some experiment, this is what we got so far:
(1) The following will print the correct unicode word (it's 'one' word) in browser(we use Firefox):
$s = "\345\244\247";
echo $s;
大 <-- (prints out a correct unicode word)
(2) However, the library function will return something like this:
$s2 = "\\345\\244\\247";
echo $s2;
\345\244\247 <-- the print out will contain escape character so the unicode isn't showing correctly
(3) So the question is, is there a php function capable of doing this, converting $s2 to the correct unicode form (like $s)?
Thanks.
The environment is PHP 5.3.
Something like http://ideone.com/Owl2a3 :
function _conv($oct) {
return chr(octdec($oct[1]));
}
$es = "\\345\\244\\247";
$es = preg_replace_callback('#\\\\(\d{3})#', '_conv', $es);
echo $es;
outputs 大
the problem is, that you're escaping the slashes!
use this:
$s2 = str_replace("\\","\",$s2);

write a php function which works for any lanuage

I'm writing a function to clear text which works with or without ut8 characters.
I keep getting text like this.
Coventry Salary - �25,000 - �35,000
but with this function it removes the � but leaves other.
I want to know if anyone wrote a function which cleans the text.
function convertHTMLSpecialChars ( $str='' )
{
$str = htmlspecialchars ( $str );
$str = mb_convert_encoding($str, 'UTF-8', mb_detect_encoding($str));
$str = htmlspecialchars($str, ENT_NOQUOTES, 'UTF-8');
return $str;
}
this function:
$str = mb_convert_encoding($str, 'UTF-8', mb_detect_encoding($str));
just tries to detect the character set from $str; if it finds that $str contains
utf8 characters it will return "utf8" so the func will be actually:
$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
which doesnt help much..
in my opinion you should give the character set of your string by hand.
for example, if its turkish: iso-8859-5, if its greek: iso-8859-7 and so..
Make sure the server outputs your page as UTF-8.
You can force it by using:
header ('Content-type: text/html; charset=utf-8');

PHP ent_quotes exception for <br> and <ar>

i am trying to covert HTML to entities using PHP, but i need to except <br> and <a> tags.
here's an example of my code
<?php
$string[0] = "<a href='http://hidd3n.tk'>Needs to stay</a> Filler text in between
<br><br> <script src='http://malicious.com/'></script> NEEDS to go";
$string[1] = htmlentities($string[0], ENT_QUOTES, "UTF-8");
?>
Let me suggest you to use a BBCode which will be way more safe.
EDIT:
Okay i have worked out a way.
Take this function rather safe than previous one:
function convert_myhtml_entities($string){
$string = htmlentities($string, ENT_NOQUOTES, "UTF-8");
$string = preg_replace('/<\s*br\s*(\/|)\s*>/U','<br$1>',$string);
$string = preg_replace('/<\s*a(.*)\s*>/U','<a$1>',$string);
$string = preg_replace('/<\s*\/\s*a\s*>/U','</a>',$string);
return $string;
}
now it is the tested with the string above.

get utf8 urlencoded characters in another page using php

I have used rawurlencode on a utf8 word.
For example
$tit = 'தேனின் "வாசம்"';
$t = (rawurlencode($tit));
when I click the utf8 word ($t), I will be transferred to another page using .htaccess and I get the utf8 word using $_GET['word'];
The word displays as தேனினà¯_"வாசமà¯" not the actual word. How can I get the actual utf8 word. I have used the header charset=utf-8.
Was my comment first, but should have been an answer:
magic_quotes is off? Would be weird if it was still on in 2011. But you should check and do stripslashes.
Did you use rawurldecode($_GET['word']); ? And do you use UTF-8 encoding for your PHP file?
<?php
$s1 = <<<EOD
தேனினà¯_"வாசமà¯"
EOD;
$s2 = <<<EOD
தேனின் "வாசம்"
EOD;
$s1 = mb_convert_encoding($s1, "WINDOWS-1252", "UTF-8");
echo bin2hex($s1), "\n";
echo bin2hex($s2), "\n";
echo $s1, "\n", $s2, "\n";
Output:
e0aea4e0af87e0aea9e0aebfe0aea9e0af5f22e0aeb5e0aebee0ae9ae0aeaee0af22
e0aea4e0af87e0aea9e0aebfe0aea9e0af8d2022e0aeb5e0aebee0ae9ae0aeaee0af8d22
தேனின��_"வாசம��"
தேனின் "வாசம்"
You're probably just not showing the data as UTF-8 and you're showing it as ISO-8859-1 or similar.

Categories