Random HTML accent encoding issues - php

I have one problem when using accents in HTML. The problem is that my page is loaded sometimes with all characters ok and sometimes with the typical strange characters like Ã, only need to refresh the page to load ok or wrong... this is absolutely random but first time after clean cache is always bad loaded.
Of course I have the meta line in headers
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>"
The file have php extension, don't know if this is relevant but I include the next two lines in the php section:
header("Content-Type: text/html;charset=UTF-8");
ini_set('default_charset', 'UTF-8');
Thanks

Those settings tell the browser what encoding you say you are using but doesn't change your encoding itself,
if your data is not utf8 encoded you need to encode it in your code using something like the utf8_encode() function or the mb_convert_encoding() function.
you can use the function mb_detect_encoding() to find out what encoding your data is in, en then encode accordingly.

Related

$_POST will convert from utf-8 to ä ö ü etc

I am new here, so I apologize if I am doing anything wrong.
I have a form which submits user input onto another page. User is expected to type ä, ö, é, etc... I have placed all of the following in the document:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
header('Content-Type:text/html; charset=UTF-8');
<form action="whatever.php" accept-charset="UTF-8">
I even tried:
ini_set('default_charset', 'UTF-8');
When the other page loads, I need to check what the user input with something like:
if ( $_POST['field'] == $check ) {
...
}
But if he inputs something like 'München', PHP will compare 'München' with 'München' and will never trigger TRUE even though it should. Since it is specified UTF-8 everywhere, I am guessing that the server is converting to something else (Windows-1252 as I read on another thread) because it does not support or is not configured to UTF-8. I am using Apache on a local server before I load into production; I have not changed (and don't know how to) any of the default settings. I've been working on a Windows 7, editing with Notepad++ enconding my files in ANSI. If I bin2hex('München') I get '4dc3bc6e6368656e'.
If I echo $_POST['field']; it displays 'München' correctly.
I have researched everywhere for an explanation, all I find is that I should include those tags/headings I already have.
Any help is much appreciated.
You are facing many different problems at the same, let's start with the simplest one.
Problem 1) You say that echo $_POST['field']; will display it correctly? What do you mean with "display"? It can be displayed correctly in two cases:
either the field is in UTF-8 and your page has been declared as UTF-8 and the browser is displaying it as UTF-8 or,
the field is in Latin-1 and the browser has decided (through the auto-detection heuristics) that your page is in Latin-1.
So, the fact that echo $_POST['field']; is correct tells you nothing.
Problem 2) You are using
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
header('Content-Type:text/html; charset=UTF-8');
Is this PHP code? If it is, it will be an error because the header must be set before sending out any byte. If you do this you will not set the Content-Type header and PHP should generate a warning.
Problem 3) You are using
<form action="whatever.php" accept-charset="UTF-8">
Some browsers (IE, mostly) ignore accept-charset if they can coerce the data to be sent in ASCII or ISO Latin-1. So the data will be in UTF-8 and declared as ISO Latin-1 or ISO Latin-1 and sent as ISO Latin-1 (but this second case is not your case).
Have a look at https://stackoverflow.com/a/8547004/449288 to see how to solve this problem.
Problem 4) Which strings are you comparing? For example, if you have
$city = "München"
$_POST['city'] == $city
The result of this code will depend on the encoding of the PHP file. If the file is encoded in ISO Latin-1 and the $_POST correctly contains UTF-8 data, the == will compare different bytes and will return false.
Another solution that may be helpful is in Apache, you can place a directive in your configuration file (httpd.conf) or .htacess called AddDefaultCharset. It looks like this:
AddDefaultCharset utf-8
http://httpd.apache.org/docs/2.0/mod/core.html#adddefaultcharset
That will override any other default charsets.
I changed "mbstring.detect_order = pass" in my php.ini file and i worked
I've used Unicode characters in my forms and file many times. I had not any problem up to now.
Try to do these steps and check the result:
Remove header('Content-Type:text/html; charset=UTF-8'); from your HTML form codes.
Use your form just like <form action="whatever.php"> without accept-charset="UTF-8". (It's better to insert the method of sending data in your form tag).
In target page (whatever.php), insert again <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in a <head> tag.
I always did my project like what I mentioned here and I did not have any problem with Unicode strings.
This is due to the character encoding of the PHP file(s).
The hardcoded München is stored with the character encoding of the source file(s), in this case ANSI and when that value is compared to the UTF-8 encoded value provided in the $_POST variable, the two will, quite naturally, differ.
The solution to your problem is one of:
Serve and process content with the same encoding as that of the source file(s), in this case likely to be windows-1252.
This would, for starters, include changing the content="text/html; charset=UTF-8" to content="text/html; charset=windows-1252" whenever serving HTML data.
Avoid all hardcoded values that could be affected by character encoding issues between UTF-8 and windows-1252, more or less only hardcode values that only includes English letters and numbers.
Any UTF-8 values would have to be read from a source that ensures they are UTF-8 encoded (for instance a database set to use UTF-8 as storage encoding as well as connection encoding).
Wrap all hardcoded assignments in utf8_encode(), for instance $value = utf8_encode ('München');
Change the encoding of the source file(s) to UTF-8.
This can be accomplished in any number of ways, a decent text editor will be able to do it or the outstanding libiconv can be used, especially for batch processing.
Either solution 1 or 4 would be my preferred solution, especially if multiple people are involved in the project.
As a side-note, some text editors (notably Notepad++) has the option of using either UTF-8 or UTF-8 without BOM. The BOM (Byte Order Mark) is pointless in UTF-8 and will cause problems when writing headers in PHP (most often when doing a redirect). This is because the BOM is right in front of the initial <?php, causing the server to send the BOM just as it would had there been any other character in front. The difference is you'd note a character in front, but the BOM isn't displayed.
Rule of thumb: Always use UTF-8 without BOM.

HTML - Mixing UTF-8 coming from MySQL database and special chars into HTML

I have a database where everything is defined in UTF-8 (charsets, collations, ...).
I have a PHP page that gets datas from that database and display it.
That PHP page contains some hard text with special charaters, like é, à, ...
My PHP page has meta charset defined to utf-8.
I call mysql_set_charset("utf8");
My PHP page is written on an editor that is configured to encode to utf-8 Unicode (Dreamweaver CS4, there is no other utf-8 option)
Anything coming from the database is ok, but...
I can't display well the hard special characters (é, à, ù, ...).
Same problem when I use strip_tags(html_entity_decode($datafromdatabase)); on datas coming from database. Here it's really problematic.
What may I do to keep using UTF-8, but being able to display well the special chars without having to use their html equivalent (é, &agrave, ...) ?
EDIT
The problem with hard characters was coming from the php page that was not saved using adhoc encoding. I have created a new document copyed/pasted the old code into that new page, and saved it over the old page. No more problem with hard characters.
But I still have problems with strip_tags(html_entity_decode($datafromdatabase));
using $datafromdatabase = htmlentities(strip_tags(html_entity_decode($datafromdatabase)), ENT_COMPAT, "UTF-8") does not solve the problem. I have stange characters starting with # for each é, à, ù in the text coming from the database (stored as &eacute, ...)
I looks like it's a problem with your browser properly displaying the characters rather than saving.
Check two things.
Issue a utf8 http header
header( 'Content-Type: text/html; charset=UTF-8' );
And make sure your html declaration is mentioning utf8
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
That's for html 4
If your document is properly encoded, this should do it.
The problem with hard characters was coming from the php page that was not saved using adhoc encoding. I have created a new document copyed/pasted the old code into that new page, and saved it over the old page. No more problem with hard characters.
For the problem coming from strip_tags(html_entity_decode($datafromdatabase)); I had in fact to use strip_tags(html_entity_decode($datafromdatabase, ENT_QUOTES, "UTF-8"));

Character encoding in PHP

I never had this problem before, it was usually my database or the html page. But now i think its my php. I import text from a csv or from a text area and in both ways it goes wrong.
for example é changes to é. I used htmlentities to fix this but it didn't work. The htmlentities function didn't return é in html but é in html entities, so it already loses the real characters before htmlentities comes in to place... So does that mean my php file has the wrong encoding or something?
I hope someone can help me out..
Thanks!
Chris
A file is usually ISO-8859-1 (Latin) or UTF-8 ... ISO-8859-1 is 1 byte per char, UTF-8 is 1-4 bytes per char. So if you get 2 chars when you expect one, then you are reading UTF-8 and showing it as ISO-8859-1 ... if you get strange chars, then you are reading ISO-8859-1 and showing it as UTF-8.
If you provide more details, it would be easier to pinpoint, but in short, you have inconsistent charsets and need to convert one or the other so they're all the same. But from what it seems, you're using ISO-8859-1 in your project, but you are reading some UTF-8 from somewhere... use utf8_decode($text) if that data should be indeed be stored as UTF-8, or find the data and convert it manually.
EDIT: If you are using AJAX somewhere, then you will ALWAYS get UTF-8 from it, and you'll have to decode it yourself with utf8_decode() if you want to keep using ISO-8859-1.
Try opening your php file and change the encoding to UTF-8
if that doesn't help, add this to your php:
header('Content-Type: text/html; charset=utf-8');
Or this to your html:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Take a look at PHP's iconv().

weird characters

Si i'm parsing a web page with a parser that i created..and when i parse the page and echo the content out I get characters like these †why is doing it that,it supposed to be ... or any other character like -- instead.
The weird characters are caused by encoding problems, your best bet is to encode them to UTF-8 (make sure your page is also in UTF-8) before you echo them.
You can use the function utf8_encode for that.
Here is a very complete answer on how
to successfully do that:
Detect encoding and make everything UTF-8
Usually those type of characters come from bad character encoding. From the top of my head, your best solution is to check the web page that you created for the meta tag supplying character encoding on the webpage. Something like this:
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
And making sure you supply the same character encoding on your end.
I go this solved with iconv("UTF-8","ISO-8859-1",$string) it does the job, 10x guys

Arabic characters corrupt on landing, fine after refresh - UTF8

I have an php page with mixed Latin and Arabic characters. The charset declaration tag is in the html code
and the file is saved as UTF-8. All the text is static and in the php file (does not come from a DB or an external source)
When I browse to the site some pages randomly get corrupt in IE and FF and display all question marks. After I refresh the page, text is displayed properly though... I have been working with Arabic and Hebrew for a long time and this is the first time I run in to this issue. Can anybody think of a cause?
Chrome is always fine...
Turns out the script reference that was before the meta description was causing the problem. I moved
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
to be the first item after the opening head tag and this is no longer an issue. Thanks for all the comments..
P.S I wasn't the one who code this page, and only working on localizing it, thats why I didn't even think that meta tag being after script would even make a difference...
Try to send appropriate header, something like this:
header("Content-Type: text/xml; charset=utf-8");
Try using UTF8_encode on your content:
http://php.net/manual/en/function.utf8-encode.php
If you have some text you want to store in a DB and display even if the page encoding is latin-1, there is a free tool that can convert Unicode to escaped HTML:
http://www.sprawk.com/tools/escapeUnicode

Categories