How to display Japanese characters on a php page? - php

I'm trying to display Japanese characters on a PHP page. No loading from the database, just stored in a language file and echo'ed out.
I'm running into a weird scenario. I have the page properly setup with UTF-8 and I test a sample page on my local WAMP server and it works.
The moment I tested it out our development and production servers the characters don't display properly.
This leads me to believe then that it's a setting in php.ini. But I haven't found much information about this so I'm not really sure if this is the issue.
Is there something fundamental I'm missing?
Thanks

Since you've stated that it is working in your development environment and not in your live, you might want to check Apache's AddDefaultCharset and set this to UTF-8, if it's not already.
I tend to make sure the following steps are checked
PHP Header is sent in UTF-8
Meta tag is set to UTF-8 (Content-Type)
Storage is set to UTF-8
Server output is set to UTF-8
That seems to work for me. Hope this helps.

You have to deliver the documents with the proper encoding declaration in the HTTP header field Content-Type.
In PHP you do this via the header function before the first data has been send to the client, so preferably as one of the first statements:
<?php
header('Content-Type: text/html;charset=utf-8');
// the rest

Firstly, I'll assume the same client machine is used for both tests.
So, use Firebug or your tool-of-choice to check the HTTP response headers on your local server, and compare them with the headers generated by the other servers. You will no doubt find a difference.
Typically your server should be including a header like this in the response:
Content-Type: text/html; charset=UTF-8
If the headers on the two systems look pretty much the same, grab the body of both responses and load it up in a hex editor and look for encoding differences.

Try following (worked for me, CentOS 6.8, PHP 5.6)
#1
Apache Config
/etc/httpd/conf/httpd.conf
AddDefaultCharset UTF-8
#2
PHP Config
/etc/php.ini:
default_charset = "utf-8" >> default_charset = "Shift_JIS"
Note : set error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT
#3
html head meta
http-equiv="content-type" content="text/html; charset=Shift_JIS"

Related

Apache 2.4, PHP 5.6.17 Debian 8.3, How to remove encoding in default PHP header?

I am playing with few hobby websites. Due to the fact that I am a beginner in programming, sometimes I have problem with my scripts hogging hosting resources.
To avoid potential problems, I have decided to rent a budget dedicated server (Atom N2800 with 2 GB RAM).
I have installed Debian 8.3 (Polish language), then Apache 2.4 and PHP 5.6.
As far, everything works. A html file is displayed normally. However when I change extension to .php, the default header sent has UTF-8 encoding (thus, ignoring information in <meta> that the text is encoded in latin2.). I can correct it by adding header() function at the beginning, but fully rewriting modified phpBB is at that moment beyond my abilities.
Examples:
http://37.187.105.171/1.html - file encoded in Latin2, no information about encoding in HTML header, so it uses <meta> information.
http://37.187.105.171/1.php - the same file with extension changed to php - in the heder is an information about UTF-8 encoding.
Also, 1_h.php (can't post more than 2 links) is the same file with added: <?PHP header('Content-Type: text/html; charset=iso-8859-2'); ?> at start.
How can I remove encoding from the default PHP header?
change this line in php.ini file :
default_charset = "utf-8"
in php:
<?php header('Content-Type: text/html; charset=utf-8'); ?>
in html:
<meta http-equiv="content-type" content="text/html; charset=utf-8" >
default encoding:
in phpRoot/php.ini:
default_charset = "utf-8"

UTF-8 problems with PHP DOM on Debian server

I have a problem with UTF-8 strings in PHP on my Debian server.
Update in details
I´ve done a little more testing and the situation is now more specific. I updated the title and details to fit it better the situation. Thanks for the responses and sorry that the problem wasn´t described clearly. The following script works fine on my local Windows machine but not on my Debian server:
<?php
header("Content-Type: text/html; charset=UTF-8");
$string = '<html><head></head><body>UTF-8: ÄÖÜ<br /></body</html>';
$document = new DOMDocument();
#$document->loadHTML($string);
echo $document->saveHTML();
echo $string;
As expected on my local machine the output is:
UTF-8: ÄÖÜ
UTF-8: ÄÖÜ
On my server the output is:
UTF-8: ÄÖÜ
UTF-8: ÄÖÜ
I wrote the script in Notepad++ in UTF-8 without BOM and transferred it over SSH. As noticed by guido the string itself is properly UTF-8 encoded. There seems to be a problem with PHP DOM or maybe libxml. And the reason must be some setting since it is machine dependant.
Original question
I work locally with XAMPP on Windows and everything is fine. But when I deploy my project on the server UTF-8 strings get all messed up. In fact when I upload this test script
echo utf8_encode('UTF-8 test: ÄÖÜ');
I get "ÃÃÃ". Also when I connect with putty to the server I cannot write umlauts (ÄÖÜ) correctly in the shell. I have no idea if this issue is even PHP related.
Check for your apache's AddDefaultCharset setting.
On standard debian apache distributions, the setting can be modified in /etc/apache2/conf.d/charset.
Please verify that your file is byte-to-byte the same as on your local machine. FTP transfer in text mode could have messed it up. You may want to try binary one.
EDIT: answer for updated question:
<?php
header("Content-Type: text/html; charset=UTF-8");
$string = '<html><head>'
.'<meta http-equiv="content-type" content="text/html; charset=utf-8">'
.'</head><body>UTF-8: ÄÖÜ<br /></body</html>';
$document = new DOMDocument();
#$document->loadHTML($string);
echo $document->saveHTML();
echo $string;
?>
I suspect your input string may be already UTF-8. Try:
setlocale(LC_CTYPE, 'de_DE.UTF-8');
$s = "UTF-8 test: ÄÖÜ";
if (mb_detect_encoding($s, "UTF-8") == "UTF-8") {
echo "No need to encode";
} else {
$s = utf8_encode($s);
echo "Encoded string $s";
}
Are you explicitly sending a content-type header? If you omit it, it's likely that Apache is sending one for you. If the file is served with a Latin-1 encoding (by Apache) and the browser reads it as such, then your UTF-8 characters will be malformed.
Try this:
<?php
echo "Drop some UTF-8 characters here.";
Then this:
<?php
header("Content-Type: text/html; charset=UTF-8");
echo "Drop some UTF-8 characters here.";
The second should work, if the first doesn't. You may also want to save the file as a UTF-8-encoded file, if it's not already.
If your database characters are messed up, try setting the (My)SQL connection encoding.
Try changing the defualt charset on the server in your php.ini file:
default_charset = "UTF-8"
also, make sure your are sending out the proper content type headers as utf-8
In my experience with utf-8, if you properly configure the php mbstring module and use the mbstring functions, and also make sure your database connection is using utf-8 then you won't have any problems.
The db part can be done for mysql with the query "SET NAMES 'utf8'"
I usually started an output buffer using mbstring to handle the buffer. This is what I use in production websites and it is a very solid approach. Then send the buffer when you have finished rendering your content.
Let me know if you would like the sampe code for that.
Another easy trick to just see if it is the wrong headers being sent out by php or the webserver is to use the view->encoding menu on your browser and see if it is utf-8. If it's not and you switch it to utf-8 and everything looks ok then it is a problem with your headers or content type. If it is already utf-8 and the text is screwed up then it is something going wrong in your code or db connection. If you are using mysql make sure the tables and columns involved are also utf-8
The cause of the problem was an old version of libxml (2.6.32.) on the server. On the development machine it was 2.7.3. I upgraded libxml to an unstable package resulting in version 2.7.8. The problems are now gone.

Content-type not working in PHP

I have some issues with a PHP file that is not working properly. The Content-type does not get recieved by any browser at all. Firebug interprets the file as text/html instead of css. Here's the file :
<?php
header('Content-Type: text/css; charset=UTF-8');
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', 'On');
/* CSS goes on from here */
I tested to put a row with echo 'TEST'; before the header line, and was expecting to see the classic "headers already sent" error, but nothing appears!
Normal .css-files are working like a charm however.
What can I do to sort this out?
UPDATE:
Did change default_mimetype = "text/html" to default_mimetype = "text/css" in php.ini and all pages got immediately interpreted as css, so there's must be a way to just send css headers for this file :)
The full file from demand of John:
<?php
header('Content-Type: text/css; charset=UTF-8');
echo 'body {background-color: #000000; }';
?>
UPDATE #2:
Adding ini_set('default_mimetype', 'text/css'); to the PHP file fixes this file, but it doesn't solve the issue that causes this fault...
UPDATE #3:
Tested adding AddType text/css .css to both .htaccess and Apache config. Still no luck. Also tested to send headers separated from charset: header('Content-Type: text/css'); - Still no luck...
UPDATE #4:
Have reinstalled Apache+PHP at the server to see if the problem goes away, but no. Same old, same old...
Check your php.ini file for the output_buffering setting. If it's not set to "off" than PHP is automatically doing output buffering for you. Set that to off and echo something before the header command, and you should see the "classic error".
You shouldn't use the closing ?>. I know this is a controversial suggestion, but too many times people add a return and/or space after it, which gets output to the browser (before the header). There are very few cases where it not using it would cause a problem.
Make sure your file editor doesn't save a BOM in your PHP file.
Try to move error_reporting / ini_set to make them the firsts PHP statements (before header() call). This way you will see all errors (if any). Don't forget to put that OFF in production!
Silly remark, but make sure this file is interpreted as PHP (extension is .php or if not an .htaccess tell the server to interpret as PHP).
Everything else is fine with your code. If it still doesn't work, check your server logs. Maybe something else crashes the execution of this PHP file (invalid MIME or else)...
the reason is because the header function works only if it is the first one to be called!
If you put an echo before, the content type automatically becomes text/html
try to print a CSS code after the header to test if it actually works.
Read this page for more infos
EDIT: did you change your post ? :-)
This is usually caused by a fatal error (ie, syntax error) that causes the script to abort before any of the code is execute (before display_errors can be set through ini_set() at runtime). Try changing display_errors in the php config file (php.ini).
Maybe the function header() is disabled in your configuration?
Test:
print ini_get('disable_functions');
It may be worth checking with curl to see what headers are actually being sent.
Try this from a command line and check for the "text/css":
curl -I http://example.com
Depending on the browser's request headers, PHP could also be sending the output gzipped using output buffering. In the PHP file, try this to check for ob_gzhandler.
print_r(ob_list_handlers());
If it's enabled, check in for zlib.output_compression in your php.ini or Apache configuration.
I found the full file is indented.
<?php
header('Content-Type: text/css; charset=UTF-8');
echo 'body {background-color: #000000; }';
?>
Because the indentation on line 1 outputted 4 spaces, therefore, the header will not work.
This sounds like your webserver is interpreting the script as a normal file. Does it have a .php extension and do other .php files work as expected?
Looks perfectly ok and the line with the echo should definitely generate a warning. Could it be that you're editing the wrong file?
You can try Content-Style-Type: text/css
See the below from the here
<META http-equiv="Content-Style-Type" content="text/css">
The default style sheet language may also be set with
HTTP headers. The above META
declaration is equivalent to the HTTP
header:
Content-Style-Type: text/css
Edit:
At the link , it's mentionned to add AddType text/css .css in the apache config file.
Maybe you can give it a try
Edit2
Look up for 'css' at this link. Someone had the same problem as you. Try sending the header without the charset
I recently had a hair-threatening
problem with Firefox and XHTML 1.0
transitional.
It worked fine with other browsers,
and also with HTML 4.1.
To cut a long story short,
PHP-generated JS and CSS files were
still being reported by the headers as
text/html, while in the HTML they were
text/css and application/javascript;
Firefox having been told the page was
XHTML 1.0 became anal-retentive and
refused to style the page. (I think
the JS still worked but I fixed it
anyway.)
Solution:
header('Content-type: text/css'); and
header('Content-type:
application/javascript');
Edit 3
There was a post about some forms not submitting any data because of an utility called AVG Linkscanner. Since you have reinstalled Apache + php and I assume you didn't reinstall the OS, so you can maybe investigate on this/try by turning some utilities/plugs-ins off.
Wild guess : open your file with something which would display any BOM. If you see some strange characters before <?php you have your problem. Check your current editor options to save UTF-8 file and make it save them without BOM.
Maybe there are issues with caching.
Try this:
header('Content-type: text/css');
header("Cache-Control: no-cache, must-revalidate");
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT");
echo 'body {background-color: #000000; }';
Works for me on an out of the box XAMPP installation and firefox - firebug reports correct content type.
output_buffering = Off in php.ini was the reason for me why it keeps sending Content-Type = text/html. Setting it to 1 solves it.
In my case, I struggled for hours with my code header('text/javascript');, wondering why the response MIME type wasn't being sent. The correct PHP code is header('Content-type: text/javascript');. There is no proper error detection for this programming error in PHP, Apache, the browsers, or the Web Developer Tools.

$_GET encoding problem with cyrillic text

I'm trying this code (on my local web server)
<?php
echo 'the word is / думата е '.$_GET['word'];
?>
but I get corrupted result when enter ?word=проба
the word is / думата е ����
The document is saved as 'UTF-8 without BOM' and headers are also UTF-8.
I have tried urlencode() and urldecode() but the effect was same.
When upload it on web server, works fine...
What if you try sending a HTTP Content-type header, to indicate the browser which encoding / charset your page is generating ?
For instance, something like this might help :
header('Content-type: text/html; charset=UTF-8');
echo 'the word is / думата е '.$_GET['word'];
Of course, this is if you are generating HTML -- you probably are.
Considering there is a configuration setting at the server's level that defines which encoding is sent by default, maybe the default encoding on your server is OK -- while the one on your local server is not.
Sending such a header by yourself would solve the problem : it would make sure the encoding is always set properly.
I suppose you are using the Apache web server.
There is a common problem with Apache configuration - a line with "AddDefaultCharset" in the config should be commented out (add # in the begining of the line, or replace the line with "AddDefaultCharset off") because it "overrides any encoding given in the files in meta http-equiv or xml encoding tags".
In my current installation (Apache2 # Ubuntu Linux) the line is found in "/etc/apache2/conf.d/charset" but in other (Linux/Unix) setups can be in "/etc/apache2/httpd.conf", or "/etc/apache/httpd.conf" (if you are using Apache 1). If you don't find it in these files you can search for it with "cd /etc/apache2 ; grep -r AddDefaultCharset *" (for Apache 2 # Unix/Linux).
Take a look at Changing the server encoding. An excellent read!
Cheers!
If You recieve $_GET from AJAX make sure that Your blablabla.js file in UTF-8 encode. Also You can use iconv("cp1251","utf8",$_GET['word']); to display your $_GET['word'] in UTF-8
I just had the issue and it sometimes happens if you filter the GET variable with htmlentities(). It seems like this function converts cyrillic characters into weird stuff.

International Fonts Display Issue with UTF-8

We have developed a PHP-MySQL application in two languages - English and Gujarati. The Gujarati language contains symbols that need unicode UTF-8 encoding for proper display.
The application runs perfectly on my windows based localhost and on my Linux based testing server on the web.
But when I transfer the application to the client's webserver (linux redhat based), the Gujarati characters are not displayed properly.
I have checked the raw data from both the databases (on my webserver and on the client's webserver) - it is exactly the same. However the display is different. On my server the fonts are displayed perfectly, but when I try to access the client's copy of the app, the display of Guajarati font is all wrong.
Since I am testing both these installation instances from the same machine and the same browser, the issue is not of browser incompatability or the code. I believe that there is some server setting that needs to be done, which I am missing out.
Can you help please.
Thanks
Vinayak
UPDATE
Thanks. We have tried the apache and php settings suggestions given by the SO community members. But the problem remains.
To breakdown the issue we have looked at the different stages that the data is passing through.
The two databases (at the client's end and at at our end) are identical. There is no difference in them.
The next step in this app is that a query is run which recovers the data, creates an xml file and saves it.
This XML file is then displayed using a PHP page.
We have identified that the problem is that there is a difference in the XML file being created. The code used for creating the XML file is as below:
function fnCreateTestXML($testid)
{
$objQuery = new clsQuery();
$objTest = new clsTest();
$setnames = $objQuery->fnSelectRecords("tbl_testsets", "setnumber", "");
$queryresultstests = $objQuery->fnSelectRecords("tbl_tests", "", "");
if($queryresultstests)
{
foreach($queryresultstests as $queryresulttest)
{
foreach($setnames as $setname)
{
//Creating Root node test and set its attribute
$doc = new DomDocument('1.0','utf-8');
$root = $doc->createElement('test');
$root = $doc->appendChild($root);
//and so on
//Saving XML on the disk
$xml_create = $doc->saveXML();
$testname = "testsxml.xml";
$xml_string = $doc->save($testname);
Any ideas??
Thanks
Vinayak
The answer almost certainly lies in the headers being sent with the web pages. To diagnose issues like this, I've found it useful to install the firefox addon "Live HTTP Headers".
Install that addon, then turn it on and reload a page from the client's webserver, and from your own.
What you'll probably see is that the page served by your webserver has the header:
Content-Type: text/html; charset=UTF-8
Whereas when served by the client webserver it says:
Content-Type: text/html
The way I would recommend fixing this is for you to ensure that you explicitly set the header to specify utf-8 in every page of your application. This then insulates your application from future configuration changes on the client's end.
To do this, call
header('Content-type: text/html; charset=utf-8');
on each page before sending any data.
Since you've stated that it is working in your development environments and not on your clients, you might want to check the clients Apache's AddDefaultCharset and set this to UTF-8, if it's not already. (Assuming that they're using Apache.)
I tend to make sure the following steps are checked
PHP Header is sent in UTF-8
Meta tag is set to UTF-8 (Content-Type)
Storage is set to UTF-8
Server output is set to UTF-8
Make sure your php code files are encoded in UTF8 with BOM (Byte Order Mark)
Make sure, that the response headers are correct - the Content-type should have UTF-8 in it.
Check the character set settings on the DB instance on the client's machine:
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Before executing any text fetching queries, try executing:
SET NAMES utf8
SET CHARACTER SET utf8
If that works, you might want to add the following to the my.cnf on the client's machine:
[mysqld]
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8
default-collation=utf8_general_ci
collation-server=utf8_general_ci
Please use this meta tag : meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
Make sure use this code in php :mysql_query ("set character_set_results='utf8'");

Categories