How can I convert input to HTML Characters correctly - php

Let's say I'm including a file which contains html. The html have characters as exclamation symbols, Spanish accents (á, ó). The parsed included text gets processed as symbols instead of their correct value. This happens on FF but not on IE (8).
I have tried the following functions:
htmlspecialchars, htmlentities, utf8_encode
include htmlentities("cont/file.php");
Sample file.php contents:
<div>Canción, “Song Name”</div>
Output:
Canci�n, �Song Name�

Your code does nothing but to run the string "cont/fie.php" through htmlentities(), the content of the file is not affected by that.

You should set your encoding to UTF-8 on HTML page you are viewing this content on. htmlentities isn't affecting this text at all.
I tried the same stuff with following code and it worked fine:
index.php
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>TODO supply a title</title>
</head>
<body>
<p>
TODO write content
<?php
include "test.php";
?>
</p>
</body>
</html>
test.php
<div>ääääääó</div>

Output an HTTP Content-Type header that specifies the character encoding you are using (UTF-8 is recommended) in the charset parameter.

echo htmlentities(file_get_contents("cont/file.php")); is what you're probably asking.
But, as mentioned before, you must not use htmlentities but UTB-8 encoding

This is what end up working on two different your code and mine doing the trick; the reason being hard to know but something with parsing.
This is browser showed (FF + IE)-->
alt text http://i77.photobucket.com/albums/j65/speedcoder/4-3-20101-22-31PM.png
Sample** ('include' function not use, so Output Buffer not needed):
<?php
$varr = '<div>ääääääó</div>';
echo utf8_encode($varr);
?>
This one didn't work for me:
<?php
include "test.php";
?>
If the above sample using an include file with html code it didn't convert at least for me the characters. I changed it to not been include file and worked with the utf8_encode, but the problem is that my code needs where using include function which din't work.
The next sample below uses include method and output buffer which allowed code to be rendered and parsed before utf8_encode encoding transpired.
My Code Scenario (for my specific scenario has to be with ob since include file also contains code which needs to be parsed first):
ob_start();
include ("cont/file.php");
$content = ob_get_contents();
ob_end_clean();
echo utf8_encode($content);
Thanks for helping me figure it out "Ondrej Slinták"!!!

Related

Text in html file is displayed but html tags are being ignored

I have a file test.php containing, this:
<HTML>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<?php include('text.html'); ?>
</body>
</html>
The included file contains a html formated text with various html tags.
When executing test.php, the included text is displaid but the html tags are not formating the output but are visible like in a source.Also the utf-8 special characters are not displaid properly
Exemple: �h3> ( 1880, Bucure_ti - 1970, Bucure_ti )<�/h3> <�P>
If I do not use include but I am simply pasting the content of the included file directly into the source code, the output is fine.
I also tried to use the below code I found on this site, instead of include, but the result was the same:
<?php
$f = fopen("bio.php", "r");
// Read line by line until end of file
while(!feof($f)) {
echo fgets($f) . "<br />";
}
fclose($f);
?>
What is wrong ?``
text.html and/or test.php are/is not encoded in UTF-8 as your meta tag is claiming. Encode them both in UTF-8. If you're still experiencing issues, encode them UTF-8 w/o BOM. (Byte-Order Mark)

Convert utf8 without bom to utf 8

Files
index.php :
<?php
include_once 'index_a.php';
?>
index_a.php :
<html>
<head>
<title>test</title>
</head>
<body>
casa
</body>
</html>
Results
The first result is from the index.php and the second index_a.php.
Why I defend those quotes?
If index_a.php converts the file in UTF-8 without BOM, quotation marks do not appear, but I want the file to be encoded in UTF-8.
you question doesn't make sense: UTF8 file encoding may (but shouldn't, as the byte ordering for UTF8 is fixed) have a BOM. In both cases your file will be UTF8 encoded, so you're done already. What happened here is that you've asked an XY question
So, what you really want to know is: why do those quotes show up for a normal UTF8 encoded file without BOM, but not when there is a BOM, and the answer to that is that you're giving the browser HTML code that could be any version of HTML, and expect it know which version you want rendered.
Without any knowledge of the document type, the browser may, or may not, treat any whitespace between tags as a single whitespace, or no whitespace, depending on the rendermode it guessed you wanted. So if you really don't want that " " then you shouldn't rely on the file encoding, you should make it explicit to the browser that what you're giving it to render is proper HTML. Add
<!doctype html>
at the top so that all browsers know this is a modern HTML5 content file and should be parsed accordingly, rather than falling back into an unpredictable quirks mode.
edit
http://jsbin.com/helikafuni/1/ shows proper HTML5 doctype and element use (you're using ancient HTML4.1 syntax. It's time to read up on how HTML5 changed a lot of the rules and use those new rules instead)
If you want to change your encoding of your Files i would sugguest you to use Notpad++!
After you installed it you can open your files in it and change the encoding like this:
(See point "Convert to UTF-8")
UPDATE:
This should work for you:
index.php:
<?php
include_once 'index_a.php';
?>
index_a.php:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>test</title>
</head>
<body>
casa
</body>
</html>

Encode accentuated characters in json using php

I've a query result that contains some accentuated characters Like :
CollectionTitle => Afleuréss
But when i write a json file with json_encode($Result_Array) and retrieve the result it shows :
CollectionTitle => NULL
then i used array_map() :
$res[] = array_map('utf8_encode', $row);
But it results me :
CollectionTitle => Afleuréss instead of CollectionTitle => Afleuréss
Please suggest me better way to resolve this issue.
Thanks
The second one is actually the correct one. The problem is your browser cannot detect the encoding and defaults to whatever the default is (probably ISO-8859-1). Switch your browser encoding and you'll see the right character appear.
Add to your HTML head:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Note that you should have a proper HTML doctype because browsers default to non utf8. You can do a simple test, like I did, this works:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<?php
$title = "Jérôme";
echo $title."<br>";
But the place for the meta tag is in the head tag. The HTML document should look like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>An XHTML 1.0 Strict standard template</title>
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
</head>
<body>
<?php
$title = "Jérôme";
echo $title."<br>";
?>
That is standard.
json_encode only supports UTF-8, but the rest of your app is using Windows-1252. I don't suggest using utf8_encode as that converts from ISO-8859-1 to UTF-8. That only works 95% of the time for you because you are using Windows-1252, not ISO-8859-1*.
I don't know if it's possible for you but if you can, you should switch over to UTF-8 so you don't need this fragile conversion code anywhere.
*This is probably confusing. Browsers do not actually allow you to use ISO-8859-1 and instead treat it as Windows-1252. Same with MySQL, Latin1 means Windows-1252. Both are defaults. utf8_encode/decode of course use actual ISO-8859-1, so it's incompatible in the 0x80-0x9F range.
In your second example / step:
$res[] = array_map('utf8_encode', $row);
It looks like you're trying to encode in UTF-8 something that is not ISO-8859-1.
You should detect / know what's the encoding coming from your Database, and transcode it properly to UTF-8 with iconv for example.
As an alternative, you should know:
What is the encoding in the Database?
What is the encoding of the PHP files?
What is the encoding in the HTML page? <meta charset="utf-8">
And if that's possible, move all of the above to UTF-8...

php vs htm, weird gaps

the normal one:
http://labvc.x10hosting.com/AT/site/home.htm
VS
the odd one:
http://labvc.x10hosting.com/AT/site/home.php
when i look at the code side by side, its almost identical, the only thing that would make them give that weird gap should be the CSS but they're using the same sheet.
ideas?
EDIT:
I checked and made minute changes to the code, look again at the source.
Both are EXACTLY the same. wtf is with this gap.
EDIT:
there's a pixel wide character just before the xml deceleration, how do i stop it form occurring?
There's a  on the php one at the top of the page explaining the gap IMHO
Your PHP output has a double byte-order-mark at the head.
Inspecting your code with Firebug, I see this as the first line
<?xml version="1.0" encoding="UTF-8"?>
Now, all those funky characters there are just ISO-8859-1 decodings of the UTF-8 BOM (0xEF 0xBB 0xBF).
These are possibly added by your IDE/editor into the head of the PHP files themselves. Check your preferences and see what encoding is being used. If it's something like "UTF-8 + BOM" then switch it to just "UTF-8" and that should fix it.
The php page generates a "." before the doctype which explain the extra space
the html page
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
the php page
.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
delete any code above the doctype in the php page
If you're including any files at the top of the PHP file it could be that you actually have whitespace or a period AFTER the ending ?> in the included file
header.php
<?php
phpstuff...
?> <whitespace or period here>
home.php
<?php
include "header.php";
?>
This can be solved by never using ?> in pure PHP files. It's ok to leave it open like this:
<?php
phpstuff...
You have an extra bit of white space at the top of the PHP file's source code (before the xml declaration).

not being displayed properly

I am having problem in displaying the in my web page, after using utf8_decode() in PHP it gets displayed as �.
i have been using
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I just noticed, all the other special characters, like ® , ™ etc are also not working.
Be sure that you've specified UTF-8 encoding in your HTML document's tag:
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
That's strange since utf8_encode(' ')===' '. Regardless of whether it's utf8 or latin1 encoded the byte-sequence for is the same.
Is the remaining string properly utf8 encoded?
edit: Why do you use utf8_decode() (converting utf8 encoded strings to latin1) in the first place when you're telling the browser that your page is utf8 encoded?
Have you checked the encoding of the php file itself?
In some windows editors (like notepad++) you can have some utf-8 character problems when you check the wrong encoding for your file - even if you set your meta tag correctly.
In notepad++ you can change it in this section:
Change notepad++ file encoding http://img198.imageshack.us/img198/9081/notepadp.png
If you're not using notepad++, we'll need some more detailed information from your setup, like Operating System used, IDE, etc.
Also make sure you give the document a proper dtd definition by putting something like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
As the first line of html in your php file.
When you use utf8_decode, the string that is passed to this function where is it loaded from? Are you loading data from database? Do you have any included files? If so, check that they are all encoded as GmonC wrote. Try to echo somewhere in page and see if it will show correctly. If not, try to make clean .php file and than see if problem still occurs. If not than some included file could be the problem because it could have different encoding

Categories