PHP: Cyrillic characters not displayed correctly - php

Recently I switched hosting from one provider to the other and I have problems displaying Cyrillic characters. The characters which are read from the database are displayed correctly, but characters which are hardcoded in the php file aren't (they are displayed as question marks).
The files which contain the php source code are saved in utf-8 form. Help anybody?

Try placing a meta tag indicating the encoding in the head section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

My PHP module was exactly with same problem all text was like "?????????????????"
And my code was written with Notepad++. I found the solution for the problem. The problem wasn't in the header charset or meta tag because the browser actually knows that it is the UTF-8 charset. I tried all encodings from the browser and the result was the same so I knew the problem is somewhere else, not in the browser character encoding at all.
I just opened the PHP module with Notepad++ and selected all code. After that in the Encoding menu I selected "Convert to UTF-8." After uploading to the server, everything worked like a charm.

put this after connecting database:
mysql_query("SET NAMES UTF8");
for mysqli
mysqli_query($connecDB,"SET NAMES UTF8");
and in the header of the page:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

The problem seems quite strange.
What's the form of these question marks? Is it black diamonds with questions or just plain question marks?
First of all double check if your files are really utf-8 encoded.
Try to add this header to your code (above all output)
header('Content-Type: text/html; charset=utf-8');
But I doubt it would help, as your database text already looks good.
Do you have any SET NAMES queries in your code? What charset it is set?

It had something to do with the encoding of the php files. The files were created using Windows Notepad and saved with utf-8 encoding.
When I used Notepad2 to open the files, the encoding of the files was "utf-8 with signature". When I changed encoding to "utf-8", the text displayed correctly.

The reason for your problem is often accidental re-encoding the script files by a programmer's editor. It isn't a good practice to hardcode strings which rely on encoding in your php files.
Try switching your browser's encoding to find what encoding is used for hardcoded text, it might help you address the issue. Also make sure to send proper http headers for each page:
header('Content-Type: text/html; charset=utf-8');
Optionaly you can insert meta tag in you HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

For me, the line that made the difference is the following:
$mysqli->set_charset("utf8")
Straight from the PHP documentation page.
The server was returning latin, after setting the charset to utf8 now works fine.

I have been fighting with this exact same problem as I'm trying to add a bit of french/german internationalization to a few controls on a widget.
Characters with accents that are stored in my db print fine as UTF-8. However, characters that are hardcoded into arrays in PHP files either display as the black diamond with a question mark inside or the little square box.
I've tried encoding/decoding the hardcoded strings from my php file every which way, but couldn't get the characters to display properly.
Since I have such a finite set of characters and am working strictly with HTML, I just added a bit of functionality to my intl class to substitute the characters for html entities.
I have these properties.
static $accentEntities = array('á' => 'á',
'É' => 'É',
'é' => 'é',
'í' => 'í',
'û' => 'û',
'ü' => 'ü');
static $accents = array();
static $entities = array();
I setup some my replacement arrays in my constructor...
foreach (self::$accentEntities as $char => $entity) {
self::$accents[] = $char;
self::$entities[] = $entity;
}
And then when I need one of my hardcoded strings in my class I just return it like so...
return str_replace(self::$accents,self::$entities,$str);
It's a totally ghetto solution... but for now, it works. I'd definitely like to hear the correct way to display accents/special characters that are hardcoded into a PHP file.

Related

Accented characters in mySQL table

I have some texts in French (containing accented characters such as "é"), stored in a MySQL table whose collation is utf8_unicode_ci (both the table and the columns), that I want to output on an HTML5 page.
The HTML page charset is UTF-8 (< meta charset="utf-8" />) and the PHP files themselves are encoded as "UTF-8 without BOM" (I use Notepad++ on Windows). I use PHP5 to request the database and generate the HTML.
However, on the output page, the special characters (such as "é") appear garbled and are replaced by "�".
When I browse the database (via phpMyAdmin) those same accented characters display just fine.
What am I missing here?
(Note: changing the page encoding (through Firefox's "web developer" menu) to ISO-8859-1 solves the problem... except for the special characters that appears directly in the PHP files, which become now corrupted. But anyway, I'd rather understand why it doesn't work as UTF-8 than changing the encoding without understanding why it works. ^^;)
I experienced that same problem before, and what I did are the following
1) Use notepad++(can almost adapt on any encoding) or eclipse and be sure in to save or open it in UTF-8 without BOM.
2) set the encoding in PHP header, using header('Content-type: text/html; charset=UTF-8');
3) remove any extra spaces on the start and end of my PHP files.
4) set all my table and columns encoding to utf8mb4_general_ci or utf8mb4_unicode_ci via PhpMyAdmin or any mySQL client you have. A comparison of the two encodings are available here
5) set mysql connection charset to UTF-8 (I use PDO for my database connection )
PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET utf8"
or just execute the SQL queries before fetching any data
6) use a meta tag <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
7) use a certain language code for French
<meta http-equiv="Content-language" content="fr" />
8) change the html element lang attribute to the desired language
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">
and will be updating this more because I really had a hard time solving this problem before because I was dealing with Japanese characters in my past projects
9) Some fonts are not available in the client PC, you need to use Google fonts to include it on your CSS
10) Don't end your PHP source file with ?>
NOTE:
but if everything I said above doesn't work, try to adjust your encoding depending on the character-set you really want to display, for me I set everything to SHIFT-JIS to display all my japanese characters and it really works fine. But using UFT-8 must be your priority
This works for me
Make your database utf8_general_ci
Save your files in N++ as UTF-8 without BOM
Put $mysqli->query('SET NAMES utf8'); after the connection to the database in your PHP file
Put < meta charset="utf-8" /> in your HTML-s
Works perfect.
If your php.ini default_charset is not set to UTF-8, you need to use a Content-type to define your data. Apply the following header at the top of your file(s) :
header("Content-type: text/html; charset=utf-8");
If you have still troubles with encoding, the cause may be one of the following:
a database server charset problem (check encoding of your server)
a database client charset problem (check encoding of your connection)
a database table charset problem (check encoding of your table)
a php default encoding problem (check default_encoding parameter in parameters.ini)
a multibyte missconfigured (see mb_string parameters in parameters.ini)
a <form> charset problem (check that it is sent as utf-8)
a <html> charset problem (where no enctype is set in your html file)
a Content-encoding: problem (where the wrong encoding is sent by Apache).
SET NAMES worked for me.
My issue was in one of my editing pages the field with the foreign characters would not display, on the production web pages there was no problem.
I know you already have an answer. That's great. But strangely none of these answers solved my issue. I'd like to share my answer for the benefit of the others who may encounter the same issues.
I also had the same problems as the OP, with regards to French accents in a multi-lingual application.
But I encountered this issue for the first time when I had to pass (French accented) data as segments in AJAX calls.
Yes, we must have the database set to work with UTF8. But the fact that AJAX calls had query strings (in my case segments, since I'm using CodeIgniter), I had to simply encode the French text.
To do this on the client-side, use the Javascript encodeURI() function with your data.
And to reverse it in PHP, just use urldecode($MyStr) where data was received as parameters.
Hope this helps.
Type something full French signs in your (php) file
Save that file as UTF-8
Paste line beneath into your website header
header('Content-Type: text/html; charset=utf-8');
Page (file) should look good.
If looks good go here for mysql behavior after (SET_NAMES).

Weird characters though using the right encoding

I work on a website that has different language interfaces, so far I use english and german.
when the german text is loaded, it shows weird characters like the following screenshot
though I use
header('Content-type: text/html; charset=utf-8');
and also in the html header
<META http-equiv="content-type" content="text/html; charset=utf-8">
what else can I do to solve it ?
Thanks
The content of the page needs to also be in UTF-8. Your content was probably made using MS Word, which uses Windows 1251 encoding. You need to re-save your document as UTF-8.
UTF-8 does not convert formats for you.
If those strings are saved in a file, the file has to be encoded in UTF-8 too.
If you're getting them from a database, they'll have to be stored as UTF-8 and you'll have to set the connection charset to utf-8.
You could also check whether your text is UTF-8 and if not, convert it with utf8_encode.

$_POST will convert from utf-8 to ä ö ü etc

I am new here, so I apologize if I am doing anything wrong.
I have a form which submits user input onto another page. User is expected to type ä, ö, é, etc... I have placed all of the following in the document:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
header('Content-Type:text/html; charset=UTF-8');
<form action="whatever.php" accept-charset="UTF-8">
I even tried:
ini_set('default_charset', 'UTF-8');
When the other page loads, I need to check what the user input with something like:
if ( $_POST['field'] == $check ) {
...
}
But if he inputs something like 'München', PHP will compare 'München' with 'München' and will never trigger TRUE even though it should. Since it is specified UTF-8 everywhere, I am guessing that the server is converting to something else (Windows-1252 as I read on another thread) because it does not support or is not configured to UTF-8. I am using Apache on a local server before I load into production; I have not changed (and don't know how to) any of the default settings. I've been working on a Windows 7, editing with Notepad++ enconding my files in ANSI. If I bin2hex('München') I get '4dc3bc6e6368656e'.
If I echo $_POST['field']; it displays 'München' correctly.
I have researched everywhere for an explanation, all I find is that I should include those tags/headings I already have.
Any help is much appreciated.
You are facing many different problems at the same, let's start with the simplest one.
Problem 1) You say that echo $_POST['field']; will display it correctly? What do you mean with "display"? It can be displayed correctly in two cases:
either the field is in UTF-8 and your page has been declared as UTF-8 and the browser is displaying it as UTF-8 or,
the field is in Latin-1 and the browser has decided (through the auto-detection heuristics) that your page is in Latin-1.
So, the fact that echo $_POST['field']; is correct tells you nothing.
Problem 2) You are using
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
header('Content-Type:text/html; charset=UTF-8');
Is this PHP code? If it is, it will be an error because the header must be set before sending out any byte. If you do this you will not set the Content-Type header and PHP should generate a warning.
Problem 3) You are using
<form action="whatever.php" accept-charset="UTF-8">
Some browsers (IE, mostly) ignore accept-charset if they can coerce the data to be sent in ASCII or ISO Latin-1. So the data will be in UTF-8 and declared as ISO Latin-1 or ISO Latin-1 and sent as ISO Latin-1 (but this second case is not your case).
Have a look at https://stackoverflow.com/a/8547004/449288 to see how to solve this problem.
Problem 4) Which strings are you comparing? For example, if you have
$city = "München"
$_POST['city'] == $city
The result of this code will depend on the encoding of the PHP file. If the file is encoded in ISO Latin-1 and the $_POST correctly contains UTF-8 data, the == will compare different bytes and will return false.
Another solution that may be helpful is in Apache, you can place a directive in your configuration file (httpd.conf) or .htacess called AddDefaultCharset. It looks like this:
AddDefaultCharset utf-8
http://httpd.apache.org/docs/2.0/mod/core.html#adddefaultcharset
That will override any other default charsets.
I changed "mbstring.detect_order = pass" in my php.ini file and i worked
I've used Unicode characters in my forms and file many times. I had not any problem up to now.
Try to do these steps and check the result:
Remove header('Content-Type:text/html; charset=UTF-8'); from your HTML form codes.
Use your form just like <form action="whatever.php"> without accept-charset="UTF-8". (It's better to insert the method of sending data in your form tag).
In target page (whatever.php), insert again <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in a <head> tag.
I always did my project like what I mentioned here and I did not have any problem with Unicode strings.
This is due to the character encoding of the PHP file(s).
The hardcoded München is stored with the character encoding of the source file(s), in this case ANSI and when that value is compared to the UTF-8 encoded value provided in the $_POST variable, the two will, quite naturally, differ.
The solution to your problem is one of:
Serve and process content with the same encoding as that of the source file(s), in this case likely to be windows-1252.
This would, for starters, include changing the content="text/html; charset=UTF-8" to content="text/html; charset=windows-1252" whenever serving HTML data.
Avoid all hardcoded values that could be affected by character encoding issues between UTF-8 and windows-1252, more or less only hardcode values that only includes English letters and numbers.
Any UTF-8 values would have to be read from a source that ensures they are UTF-8 encoded (for instance a database set to use UTF-8 as storage encoding as well as connection encoding).
Wrap all hardcoded assignments in utf8_encode(), for instance $value = utf8_encode ('München');
Change the encoding of the source file(s) to UTF-8.
This can be accomplished in any number of ways, a decent text editor will be able to do it or the outstanding libiconv can be used, especially for batch processing.
Either solution 1 or 4 would be my preferred solution, especially if multiple people are involved in the project.
As a side-note, some text editors (notably Notepad++) has the option of using either UTF-8 or UTF-8 without BOM. The BOM (Byte Order Mark) is pointless in UTF-8 and will cause problems when writing headers in PHP (most often when doing a redirect). This is because the BOM is right in front of the initial <?php, causing the server to send the BOM just as it would had there been any other character in front. The difference is you'd note a character in front, but the BOM isn't displayed.
Rule of thumb: Always use UTF-8 without BOM.

Unicode and PHP - am I doing something wrong?

I'm using Kohana 3, which has full support for Unicode.
I have this as the first child of my <head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The Unicode character I am inserting into is é as in Café.
However, I am getting the triangle with a ? (as in could not decode character).
As far as I can tell in my own code, I am not doing any string manipulation on the text.
In fact, I have placed the accent straight into a view's PHP file and it is still not working.
I copied the character from this page: http://www.fileformat.info/info/unicode/char/00e9/index.htm
I've only just started examining PHP's Unicode limitations, so I could be doing something horribly wrong.
So, how do I display this character? Do I need to resort to the HTML entity?
Update
So this works
Caf<?php echo html_entity_decode('é', ENT_NOQUOTES, 'UTF-8'); ?>
Why does that work? If I copy the output accented e from that script and insert it into my document, it doesn't work.
View the http headers. You should see something like
Content-Type: text/html; charset=UTF-8
Browsers don't pay much attention to meta tags, if there was a real http header stating a different encoding.
update
Whatcha get from this?
echo bin2hex('é');
echo chr(0xc3) . chr(0xa9);
You should get c3a9é, otherwise I'd say file encoding issue.
I guess, you see �, the replacement character for invalid UTF-8 byte sequences. Your text is not UTF-8 encoded. Check your editor’s settings to control the encoding of the PHP file.
If you’re not sure about the encoding of your sources, you can enforce UTF-8 compatibilty as described here (German text): Force UTF-8.
You should never need entities except the basic ones.

Arabic characters corrupt on landing, fine after refresh - UTF8

I have an php page with mixed Latin and Arabic characters. The charset declaration tag is in the html code
and the file is saved as UTF-8. All the text is static and in the php file (does not come from a DB or an external source)
When I browse to the site some pages randomly get corrupt in IE and FF and display all question marks. After I refresh the page, text is displayed properly though... I have been working with Arabic and Hebrew for a long time and this is the first time I run in to this issue. Can anybody think of a cause?
Chrome is always fine...
Turns out the script reference that was before the meta description was causing the problem. I moved
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
to be the first item after the opening head tag and this is no longer an issue. Thanks for all the comments..
P.S I wasn't the one who code this page, and only working on localizing it, thats why I didn't even think that meta tag being after script would even make a difference...
Try to send appropriate header, something like this:
header("Content-Type: text/xml; charset=utf-8");
Try using UTF8_encode on your content:
http://php.net/manual/en/function.utf8-encode.php
If you have some text you want to store in a DB and display even if the page encoding is latin-1, there is a free tool that can convert Unicode to escaped HTML:
http://www.sprawk.com/tools/escapeUnicode

Categories