PHP, HTML and character encodings

PHP, HTML and character encodings - php

I actually have a fairly simple question but I'm unable to find an answer anywhere. The PHP function html_entity_decode is supposed to "converts all HTML entities to their applicable characters from string."
So, since Ω is the HTML encoding for the Greek captical letter Omega, I'd expect that echo html_entity_decode('Ω', ENT_COMPAT, 'UTF-8'); would output Ω. But instaid, it outputs some strange characters which my browser can't recongize. Why is this?
Thanks,
Martijn

When you convert entities into UTF-8 characters like your last parameter specifies, your output encoding must be UTF-8 as well. Otherwise, in a single-byte encoding like ISO-8859-1, you will see double-byte characters as two broken single ones.

It's works fine:
http://codepad.viper-7.com/tb2LaW
Make sure your webpage encoding is UTF-8
If you have different encoding on webpage change this:
html_entity_decode('Ω', ENT_COMPAT, 'UTF-8');
^^^^^

header('Content-type: text/html;charset=utf-8');
mysql_set_charset("utf8", $conn);
Refer this URL:-
http://www.phpwact.org/php/i18n/charsets
php mysql character set: storing html of international content

Related

How to replace bullets with •

I am retrieving text data from a database which includes bullets and newlines. I have successfully removed the newlines and converted them to <br /> using the nl2br() function in PHP, but the bullets act weird and display "â€¢" instead of "•" (see screenshot).
I have tried using htmlspecialchars() function in PHP but it still displays the same output.

I have used htmlentities() now instead of htmlspecialchars. I have solved my own problem but I hope this thread will help others in the future.

The Unicode character U+2022 (BULLET) is encoded in UTF-8 as the octets E2 80 A2. If your page contains these octets, and the page is incorrectly interpreted using a different character encoding, such as Windows-1252, the resulting page will display the three characters â, €, ¢.
To properly display the bullet character, you need to declare the correct character encoding for your document:
header ('Content-Type: text/html; charset=utf-8');
If it is not feasible to use the UTF-8 encoding, you can convert the string using htmlentities(), which should convert the bullet characters, and other undisplayable characters, into HTML character references (•):
$s = "Bullet \xe2\x80\xa2 character";
echo htmlentities ($s), "\n";
Or, if PHP's character encoding is not configured correctly:
$s = "Bullet \xe2\x80\xa2 character";
echo htmlentities ($s, ENT_NOQUOTES, 'utf-8'), "\n";

UTF-8 and ÅÄÖ special characters PHP

Hi have a look at this picture: http://ctrlv.in/175196 and as you can see the ÅÄÖ are replaced with �.
I have this at the very top of my php: <meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
and when I look at source it is indeed utf-8 - so why dont they display properly?

When you see the UNICODE REPLACEMENT CHARACTER �, it means your text is being interpreted as UTF-8 (or another Unicode encoding), but one of the byte sequences in the file was not valid in this encoding.
In other words, the file is not UTF-8 encoded.

try this.
iconv('windows-1250', 'utf-8', $your_variable);
if that is coming from an sql query set_charset('utf8') first before the query.

using notepad++ try encoding utf-8 without bom

Auto php echo for HTMl special characters

I'm developing a php web app in Portuguese, but when I want to echo some word with special character, like:
ç õ â é
The echo prints the html equivalent. Is there any function that convertsthe special characters to their html equivalent?
Tks

I think you want to use htmlentities()
http://ch.php.net/manual/en/function.htmlentities.php

what is your page encoding? utf-8? and what is your file's encoding?
try to set the file's encoding to UTF-8 without BOM
and set page encoding to utf-8

utf-8 to iso-8859-1 encoding problem

I'm trying preview the latest post from an rss feed on another website. The feed is UTF-8 encoded, whilst the website is ISO-8859-1 encoded. When displaying the title, I'm using;
$post_title = 'Blogging â€“ does it pay the bills?';
echo mb_convert_encoding($post_title, 'iso-8859-1','utf-8');
// returns: Blogging ? does it pay the bills?
// expected: Blogging - does it pay the bills?
Note that the hyphen I'm expecting isn't a normal minus sign but some big-ass uber dash. Well, a few pixels longer anyway. :) Not sure how else to describe it as my keyboard can't produce that character...

mb_convert_encoding only converts the internal encoding - it won't actually change the byte sequences for characters from one character set to another. For that you need iconv.
mb_internal_encoding( 'UTF-8' );
ini_set( 'default_charset', 'ISO-8859-1' );
$post_title = 'Blogging — does it pay the bills?'; // I used the actual m-dash here to best mimic your scenario
echo iconv( 'UTF-8', 'ISO-8859-1//TRANSLIT', $post_title );
Or, as others have said, just convert out-of-range characters to html entities.

I suspect you mean an Em Dash (—). ISO-8859-1 doesn't include this character, so you aren't going to have much luck converting it to that encoding.
You could use htmlentities(), but I'd suggest moving off ISO-8859-1 to UTF-8 for publication.

I suppose the following:
Your file is actually encoded with UTF-8
Your editor interprets the file with Windows-1252
The reason for that is that your EM DASH character (U+2014) is represented by â€“. That’s exactly what you get when you interpret the UTF-8 code word of that character (0xE28094) with Windows-1252 (0xE2=â, 0x80=€, 0x94=”). So you first need to fix your editor encoding.
And the reason for the ? in your output is that ISO 8859-1 doesn’t contain the EM DASH character.

It's probably an em dash (U+2014), and what you're trying to do isn't converting the encoding, because the hyphen is a different character. In other words, you want to search for such characters and replace them manually.
Better yet, just switch the website to UTF-8. It largely coincides with Latin-1 and is more appropriate for a website in 2009.

utf-8 and htmlentities in RSS feeds

I'm writing some RSS feeds in PHP and stuggling with character-encoding issues. Should I utf8_encode() before or after htmlentities() encoding? For example, I've got both ampersands and Chinese characters in a description element, and I'm not sure which of these is proper:
$output = utf8_encode(htmlentities($source)); or
$output = htmlentities(utf8_encode($source));
And why?

It's important to pass the character set to the htmlentities function, as the default is ISO-8859-1:
utf8_encode(htmlentities($source,ENT_COMPAT,'utf-8'));
You should apply htmlentities first as to allow utf8_encode to encode the entities properly.
(EDIT: I changed from my opinion before that the order didn't matter based on the comments. This code is tested and works well).

First: The utf8_encode function converts from ISO 8859-1 to UTF-8. So you only need this function, if your input encoding/charset is ISO 8859-1. But why don’t you use UTF-8 in the first place?
Second: You don’t need htmlentities. You just need htmlspecialchars to replace the special characters by character references. htmlentities would replace “too much” characters that can be encoded directly using UTF-8. Important is that you use the ENT_QUOTES quote style to replace the single quotes as well.
So my proposal:
// if your input encoding is ISO 8859-1
htmlspecialchars(utf8_encode($string), ENT_QUOTES)
// if your input encoding is UTF-8
htmlspecialchars($string, ENT_QUOTES, 'UTF-8')

Don't use htmlentities()!
Simply use UTF-8 characters. Just make sure you declare encoding of the feed in HTTP headers (Content-Type:application/xml;charset=UTF-8) or failing that, in the feed itself using <?xml version="1.0" encoding="UTF-8"?> on the first line.

It might be easier to forget htmlentities and use a CDATA section. It works for the title section, which doesn't seem support encoded HTML characters in Firefox's RSS viewer:
<title><![CDATA[News & Updates " > » ☂ ☺ ☹ ☃ Test!]]></title>

You want to do $output = htmlentities(utf8_encode($source));. This is because you want to convert your international characters into proper UTF8 first, and then have ampersands (and possibly some of the UTF-8 characters as well) turned in to HTML entities. If you do the entities first, then some of the international characters may not be handled properly.
If none of your international characters are going to be changed by utf8_encode, then it doesn't matter which order you call them in.

After much trial & error, I finally found a way to properly display a string from a utf8-encoded database value, through an xml file, to an html page:
$output = '<![CDATA['.utf8_encode(htmlentities($string)).']]>';
I hope this helps someone.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP, HTML and character encodings - php

When you convert entities into UTF-8 characters like your last parameter specifies, your output encoding must be UTF-8 as well. Otherwise, in a single-byte encoding like ISO-8859-1, you will see double-byte characters as two broken single ones.

It's works fine: http://codepad.viper-7.com/tb2LaW Make sure your webpage encoding is UTF-8 If you have different encoding on webpage change this: html_entity_decode('Ω', ENT_COMPAT, 'UTF-8'); ^^^^^

header('Content-type: text/html;charset=utf-8'); mysql_set_charset("utf8", $conn); Refer this URL:- http://www.phpwact.org/php/i18n/charsets php mysql character set: storing html of international content

Related

How to replace bullets with •

UTF-8 and ÅÄÖ special characters PHP

Auto php echo for HTMl special characters

utf-8 to iso-8859-1 encoding problem

utf-8 and htmlentities in RSS feeds

Categories

Resources