Auto php echo for HTMl special characters - php

I'm developing a php web app in Portuguese, but when I want to echo some word with special character, like:
ç õ â é
The echo prints the html equivalent. Is there any function that convertsthe special characters to their html equivalent?
Tks

I think you want to use htmlentities()
http://ch.php.net/manual/en/function.htmlentities.php

what is your page encoding? utf-8? and what is your file's encoding?
try to set the file's encoding to UTF-8 without BOM
and set page encoding to utf-8

Related

How to replace bullets with •

I am retrieving text data from a database which includes bullets and newlines. I have successfully removed the newlines and converted them to <br /> using the nl2br() function in PHP, but the bullets act weird and display "•" instead of "•" (see screenshot).
I have tried using htmlspecialchars() function in PHP but it still displays the same output.
I have used htmlentities() now instead of htmlspecialchars. I have solved my own problem but I hope this thread will help others in the future.
The Unicode character U+2022 (BULLET) is encoded in UTF-8 as the octets E2 80 A2. If your page contains these octets, and the page is incorrectly interpreted using a different character encoding, such as Windows-1252, the resulting page will display the three characters â, €, ¢.
To properly display the bullet character, you need to declare the correct character encoding for your document:
header ('Content-Type: text/html; charset=utf-8');
If it is not feasible to use the UTF-8 encoding, you can convert the string using htmlentities(), which should convert the bullet characters, and other undisplayable characters, into HTML character references (•):
$s = "Bullet \xe2\x80\xa2 character";
echo htmlentities ($s), "\n";
Or, if PHP's character encoding is not configured correctly:
$s = "Bullet \xe2\x80\xa2 character";
echo htmlentities ($s, ENT_NOQUOTES, 'utf-8'), "\n";

Convert the Chinese Characters From ISO-8859-1 To UTF-8

I got a system which previously the html encoding type was set as ISO-8859-1 and it caused all the Chinese characters store in the format of "&\#36830;&\#34915;&\#35033;".
So my question is, how can I convert the format above into Chinese word back in UTF-8?
For your information, I had tried with utf8_decode, iconv, but none of them work. :(
Thank you very much.
The current text encoding of that string is rather insubstantial. What you have there are HTML entities; they have little to do with the underlying "physical" encoding like ISO-8859 or UTF-8. What you want is to decode those HTML entities into a byte representation of the characters in a specific encoding, in this case to UTF-8. Therefore:
echo html_entity_decode('连衣裙', ENT_COMPAT, 'UTF-8');
// 连衣裙
You need to use:
utf8_encode($data);
and not decode,to convert your current ISO-8859-1 to UTF-8.
Some native PHP functions such as strtolower(), strtoupper() and ucfirst() do not always function correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code:
setlocale(LC_CTYPE, 'C');
Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
Just for your reference:
ISO-8859-1 => Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish
UTF-8 => Chinese (simplified), Chinese (traditional), Japanese, Persian
There are many tools that can convert character references to characters, and writing such a tool is rather straightforward, especially if you know the references are all decimal. So the answer really depends on the software environment.
For example, to do such a conversion for an individual HTML document, you could use the BabelPad editor: command Convert → Numeric Character References (NCR) → NCR to Unicode, and save the result as UTF-8.

Send polish characters in ISO-8859-1 environment with a html form

I have the following problem:
I have a webpage with a ISO-8859-1 charset.
Inside this page i have a form with the argument
accept-charset="UTF-8"
The form will be send with the post method
Now i want to submit polish letters mixed with latin letters
test ł,ą,ę,ś,ć,ż,ź,ó test
And want to display the result with PHP.
I tried several combinations of mb_convert_encoding, utf8_encode / utf8_decode, iconv and everything else. But the display is not right.
I just want to say to the browser: the page charset is iso-8859-1 but this small area is UTF-8
You cannot do anything with character encoding conversion here. Polish characters simply cannot be encoded in ISO-8859-1, full stop. If you do not want to change the encoding of your website to an encoding which can encode Polish and other characters (such as ISO-8859-2, UTF-8 or similar), all you can do is to represent these characters as HTML entities:
echo htmlentities($utf8Text, ENT_NOQUOTES, 'UTF-8');
You cannot make part of the page be encoded in a different encoding.

PHP, HTML and character encodings

I actually have a fairly simple question but I'm unable to find an answer anywhere. The PHP function html_entity_decode is supposed to "converts all HTML entities to their applicable characters from string."
So, since Ω is the HTML encoding for the Greek captical letter Omega, I'd expect that echo html_entity_decode('Ω', ENT_COMPAT, 'UTF-8'); would output Ω. But instaid, it outputs some strange characters which my browser can't recongize. Why is this?
Thanks,
Martijn
When you convert entities into UTF-8 characters like your last parameter specifies, your output encoding must be UTF-8 as well. Otherwise, in a single-byte encoding like ISO-8859-1, you will see double-byte characters as two broken single ones.
It's works fine:
http://codepad.viper-7.com/tb2LaW
Make sure your webpage encoding is UTF-8
If you have different encoding on webpage change this:
html_entity_decode('Ω', ENT_COMPAT, 'UTF-8');
^^^^^
header('Content-type: text/html;charset=utf-8');
mysql_set_charset("utf8", $conn);
Refer this URL:-
http://www.phpwact.org/php/i18n/charsets
php mysql character set: storing html of international content

PHP character encoding � sign instead of à

Hi this is a very strange error I have on some pages of this joomla website:
http://www.pcsnet.it/news
If you go into the details of a specific news the à character is correctly displayed.
Other accented characters seem not affected.
I've checked that UTF-8 encoding is default both in the MySql db and that the text files are in UTF-8 encoding.
Other ideas?
What is very interesting in your case is that it only affects the letter à! So it cannot be an encoding problem.
Here is my take on your problem. The letter à is encoded in two bytes in utf8. The first byte is xC3, which is à in latin-1, the second byte is... non breaking space! (The other accented letters, such as è are encoded by à followed by an other accented letter in latin-1, and they are not affected).
Therefore, my guess is that you have a script, somewhere, that removes, or replaces, the non-breaking space in latin-1, i.e., character xA0. The resulting lonely byte xC3 cannot be displayed properly, so the general placeholder � is displayed instead. (just load your page in latin-1, you will see that I am right).
Find that script that removes non-breaking spaces, and you'll be fine.
Are you by any chance using htmlentities, htmlspecialchars or html_entity_decode on the text you retrieved from the database ? If so, you need to force UTF8 in the third parameter because it's not the default charset of these functions.
Example: htmlentities('£hello', null, 'utf-8');
A � sign is usually an indication that the character the browser is trying to display is not available in the font used. It's probably not an à, if that works on other pages (using the same font).

Categories