html_entity_decode not encoding to proper symbol

html_entity_decode not encoding to proper symbol - php

Maybe I did not understand everything correctly how htmlentities and decoding of them works.
But I use
$text = htmlentities($variable, ENT_COMPAT | ENT_HTML5, 'ISO-8859-1', false);
to save some input into my databse.
Afterwards there is one case where I have to put the content of the database into a pdf.
Whereas my browser reads e.g. a &comma; as a ',' or a &rpar; as ')'
the Pdf prints exactly &comma; or &rpar;
I figured out how to search and replace speficic characters with str_replace
but as there is the function html_entitiy_decode I would rather like to use this to display the content.
So I do something like
$myContentFromMyDB = html_entity_decode($myContentFromMyDB);
But unfortunately I don't see a changing result in my pdf.
Am I mistaken about how the decoding works?
I thought I was right, also having a look at this page:
http://php.net/html_entity_decode
But somehow the conversion does not take place.
I also tried this
html_entity_decode($myContentFromMyDB, ENT_COMPAT, 'UTF-8');
but it did not work either.
Any idea?
Is it because I use ISO-8859-1 in the first place?
The thing is that I use code from somebody else and have to work with it. But I don't know whether there is a sense of putting ISO-8859-1 so I don't like the thought of changing it...
Hope somebody can help me with that...
Cheers and thx

Related

using htmlentities with superglobal variables

I'm working on php with a book now. The book said I should be careful using superglobal variables, so it's better to use htmlentities like this.
$came_from = htmlentities($_SERVER['HTTP_REFERER']);
So, I wrote a code like this;
<?php
$came_from=htmlentities($_SERVER['HTTP_REFERER']);
echo $came_from;
?>
However, the display of the code above was the same without htmlentities(); It didn't change anything at all. I thought that it would change \ into something else. Did I use it wrong?

So, by default, htmlentities() encodes characters using ENT_COMPAT (converts double-quotes and leave single-quotes alone) and ENT_HTML401. Seeing as the backslash isn't part of the HTML 4.01 entity spec (as far as I can see anyway), it won't be converted.
If you specify the ENT_HTML5 flag, you get a different result
php > echo htmlentities('abc\123');
abc\123
php > echo htmlentities('abc\123', ENT_HTML5);
abc&bsol;123
This is because backslash is part of the HTML5 spec. See http://dev.w3.org/html5/html-author/charref

Sorry. My previous answer was absolutely wrong. I was confused with something else. My apologise. Let me refrain my answer:
htmlentities will convert special characters into their HTML entity. "<" for example will be converted to "<". Your browser will automaticly recognise this HTML entity and decode it back to "<". So you won't notice any difference.
The reason for this is to prevent problems when saving your document in something different then UTF-8 encoding. Any characters not encoded might become screwed up for this reason.

Apostrophes and imagettftext()

I've been trying forever to figure out what's going on here. I'm trying to use imagettftext() to put text on an image I'm creating in PHP. I've got some text:
$line = "I'm using this string";
When I echo is out it displays exactly the same. The final imagettftext() variable is the line that places the text on the image. So when I do this:
echo $line."</br>";
imagettftext($my_img, $font_size, 0, $x+4, (($font_size+$margin_top)*$line_number)+$new_shadow_addition, $shadow_colour, $font, $line);
It echoes out the line correctly but then when I look at the image, it displays it as
I□m using this string
And it does so for any other apostrophe. The string is correct but it somehow encodes it or decodes it before imagettftext(). I tried to convert it to pure UTF-8 before using imagettftext but it still didn't matter (it's currently in ASCII; I detected the encoding before I used it).
It's not the font I'm using because I've tried several fonts.
Any ideas why this would be happening?
EDIT
For further information, I'm using simple_html_dom to crawl data from another page and then using that info for the image so I'm not sure if that would affect anything. It shouldn't because I've detected the encoding and the characters and nothing seems out of place.
This is driving me absolutely crazy, I've been revisiting this for three days now and it doesn't make sense. I've tried all UTF-8 decoding possibilities in PHP and anything else I can think of or find. I did a rawurlencode() on the string that I'm using and it's returning a %92 for the apostrophe character meaning it is an apostrophe, not a single quote or the %60 character. Any help would be greatly appreciated. Thank you.
EDIT
I've determined that this is just related to the apostrophe character (%92 in ASCII). I've tried with %27 (the single quote) and that works fine. No other character I've seen seems to cause the problem either so it looks like it's isolated to the apostrophe character.

Well I don't know WHY it was happening but I figured out a workaround in case anyone else has this problem (and if so, I feel your pain, super frustrating...).
I did this:
$line = rawurlencode($line);
$line = str_replace('%92', '%27', $line);
$line = rawurldecode($line);
It url encodes it, finds the apostrophe characters (%92) and replaces them with a single quote character (%27). This is not exactly an answer to the question but it's a solution to the problem. Hope this helps someone.

Passing code through the post variable?

I am coding a small template editor and the problem I am having is that code keeps getting converted into other characters, such as:
<?php
$hello = "hello";
?>
and it writes exactly that to the file, I want to write the actual code, php and html.
How can I accomplish this?

In this specific case you should run the contents of the file through the html_entity_decode function.
Description from the documentation -
Convert special characters to HTML entities
$str = '<?php';
echo html_entity_decode($str);
Outputs - <?php

Your issue is that your PHP is calling htmlspecialchars(). This converts characters that could be an issue (such as <>) into their HTML-safe version. You can resolve this by removing the htmlspecialchars() function (not recommended, as it's probably there for a reason) or calling html_entity_decode() on the code you want to save to a file.

I wouldn't necessarily recommend using html_entity_decode. It's usually a bad idea to fix incorrectly-encoded text by just reversing the encoding. Instead, figure out why it's encoded incorrectly in the first place.

Issues with str_replace

First, the string is being pulled from an XML file.
There's a special character that I am trying to replace: '£'
When I use str_replace like so:
$ability1 = str_replace("£", "", $ability);
This is what var_dump shows:
string(138) "Argothian Pixies can't be blocked by artifact creatures.�Prevent all damage that would be dealt to Argothian Pixies by artifact creatures."
Once $ability1 is passed and wordpress inserts it into the post. This is the result.
Argothian Pixies can’t be blocked by artifact creatures.
It deletes everything after the � character.
Why would £ be changed to � when its supposed to be "". I'm not quite sure what I'm missing

Make sure the string is using the correct encoding, try encoding or decoding to UTF8 and then apply the str_replace.

Maybe your string is in UTF-8? PHP. You would have to do something like this:
$ability1 = utf8_decode($ability);
$ability1 = preg_replace("/[£ ]/","", $ability1);
$ability1 = utf8_encode($ability1);

How is the XML file encoded? I suspect it may be UTF-8. In which case you'll need to see a function such as utf_decode() to handle it correctly in your code (assuming your code is in ANSI)

Get source code with Chinese characters PHP

Well, I give up.
I've been messing around with all I could think of to retrieve data from a target website that has information in traditional Chinese encoding (charset=GB2312).
I've been using the simple_html_parser like always but it doesn't seem to return the Chinese characters, in fact all I get are some weird question marks embedded inside a rhomboid shape.
("�������ѯ�ؼ��֣�" Like so)
Declaring the encoding for the php file didn't do anything except of getting rid of some unwanted character showing at the start of the page.
By declaring it I mean:
header('Content-Type', 'text/html; charset=GB2312');
I can't get any data that's written in Chinese, also tried file_get_contents with the same luck. I'm probably missing something obvious since I can't find any related discussion elsewhere.
Thanks in advance.

Have you tried converting the encoding with mb_convert_encoding or iconv, e.g.
$str = mb_convert_encoding($content, 'UTF-8', 'GB2312');
or
$str = iconv("UTF-8", "GB2312//IGNORE", $content);

Get it in whatever character set the source uses, then convert it to something usable locally, such as UTF-8. Then send it to the browser.

set header('Content-Type: text/html; charset=utf-8');
It's working for me

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.