echo and UTF-8 (PHP) - php

I have installed Apache on my server (I wasn't using Apache) and special characters started to show wrong.
So I changed every file to UTF-8, configured MySQL to work with UTF-8 and everything worked fine. However, my Python app (which retrieves some information from the website) doesn't work properly.
For example, I had a file "test.php" which returned either 0 or 1. Python code then did whatever with that result.
But now, my Python app doesn't receive "0", I don't know what it gets from the website. I made the app send a GET request to my site with what it was getting and it sent me this: "???0".
What can I do? I tried to change the header to send the result as ISO-8859-1 (as it was before) but isn't working either.

It's BOM symbol. Remove this symbol from script in Notepad++ editor (Menu -> Encoding -> Encode in UTF-8 without BOM).

Related

BOM being added to any return or die response

I'm using jQuery to retrieve a json response from an endpoint
die(json_encode(array('success' => 3, 'message' => 'You must use at least 1 credit or more.')));
Whenever I check the JSON response received in chrome developer tools I'm getting a red dot showing \ufeff is being added before the json response. I've encoded the PHP file with UTF-8 in Notepad++ however it still adds the BOM character infront of any response. If I return anything or change the die it will still show the BOM character in the response.
I've tried the same file on my localhost and it works absolutely fine however on the server it adds the character.
I'm at a loss as to what's causing the issue, any help would be greatly appreciated.
This is an 13 year old issue
There are workarounds (removing BOM from all PHP files, ob_clean at script start), but the real solution is to have a PHP compiled with --enable-zend-multibyte or --enable-mbstring, or wait until it is fixed by the PHP team.
As you sometimes have no control over the PHP version and compilation flags on hosted environments, I prefer removing BOMs from all PHP files, to prevent this kind of issues. This will work on any server.
Your solution is to fix the output with JS. But for other usages, e.g. generating an image or other binary data via PHP, or sending headers, you cannot solve this way.
It seems it was specifically an issue with this server configuration as it works on other servers. For the meantime I've filtered the response to remove any BOM chracters using javascript before parsing the JSON response.

Emoji from android to web

I have a android app, which messages workds with emoji. Saved message with emojis is diaplayed ok on android after fetching from mysql via json.
Now I want to display same message with emojis on web script.
Found JS lib https://github.com/iamcal/js-emoji but cant make it work.
Anyone has a ready to use implementation of it?
Sample db record look like this:
Unii \uD83D\uDE02\uD83D\uDE03\uD83D\uDE2E\uD83D\uDE25\uD83D\uDE23\uD83D\uDE0F
These are android emojis. Hot make the work on web?
First of all coping files will not make it work ;) you need also do some configuration:
first of all download that repo
run npm install in main directory
run bower install in main directory
now we need to run some grunt task but before that make sure that you have copied this - https://github.com/iamcal/emoji-data/tree/6daffc10d8e8fd06b80ec24c9bdcb65218f71563 to emoji-data folder in downloaded-repo-location/build/emoji-data
also copy that content of that whole emoji-data (https://github.com/iamcal/emoji-data/tree/6daffc10d8e8fd06b80ec24c9bdcb65218f71563) to C:\js-emoji\build\emoji-data
now in demo.htm (which is placed in mainfolder/demo/demo.htm change jquery linkage to an also make sure that this line is placed above ""
run "grunt" from console.
check if in downloaded-repo-root/lib/emoji.js in line 520 you have listed emojis ;)
run demo.htm in browser
Basically check browser console if it has any errors. Most common erros is that there will be en empty emoji.prototype.data on line 519 in emoji.js file - so you need to be sure the grunt task finishes correctly without errors.
Figured it out. The basic configuration from https://github.com/iamcal/js-emoji is enough to make js script to work. The problem was the string encoding. Android uses "Unicode escape sequences" to store specials characters in strings. It works great on mobile, but php has issues with it. Therefore we need to convert Unicode escape sequences with php working version. The converted version of previous db rec
Unii \ud83d\ude02\ud83d\ude03\ud83d\ude2e\ud83d\ude25\ud83d\ude23\ud83d\ude0f
Php convert functions can be found # How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

Ubuntu encoding of new files

I'm searching there for a long time, but without any helpful result.
I'm developing a PHP project using eclipse on a Ubuntu 11.04 VM. Every thing works fine. I've never need to look for the file encoding. But after deploying the project to my server, all contents were shown with the wrong encoding. After a manual conversion to UTF8 with Notepad++ my problems were solved.
Now I want to change it in my Ubuntu VM, too. And there's the problem. I've checked the preferences in Eclipse but every property ist set to UTF8: General content types, workspace, project settings, everything ...
If I look for the encoding on the terminal, it says "test_new.dat: text/plain; charset=us-ascii". All files are saved to ascii format. If I try to create a new file with the terminal ("touch") it's also the same.
Then I've tried to convert the files with iconv:
iconv -f US-ASCII -t UTF8 -o test.dat test_new.dat
But the encoding doesn't change. Especially PHP files seems to be resistant. I have some *.ini files in my project for which a conversion works?!
Any idea what to do?
Here are my locale settings of Ubuntu:
LANG=de_DE.UTF-8
LANGUAGE=de_DE:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
I was also wondering about character encoding and found something that might be usefull here.
When I create a new empty .txt-file on my ubuntu 12.04 and ask for its character encoding with: "file -bi filename.txt" it shows me: charset=binary. After opening it and writing something inside like "haha" I saved it using "save as" and explicitly chose UTF-8 as character encoding. Now very strangely it did not show me charset=UTF-8 after asking again, but returned charset=us-ascii. This seemed already strange. But it got even stranger, when I did the whole thing again but this time included some german specific charakters (ä in this case) in the file and saved again (this time without saving as, I just pressed save). Now it said charset=UTF-8.
It therefore seems that at least gedit is checking the file and downgrading from UTF-8 to us-ascii if there is no need for UTF-8 since the file can be encoded using us-ascii.
Hope this helped a bit even though it is not php related.
Greetings
UTF-8 is compatible with ASCII. An ASCII text file is therefore also valid UTF-8, and a conversion from ASCII to UTF-8 is a no-op.

$_GET encoding problem with cyrillic text

I'm trying this code (on my local web server)
<?php
echo 'the word is / думата е '.$_GET['word'];
?>
but I get corrupted result when enter ?word=проба
the word is / думата е ����
The document is saved as 'UTF-8 without BOM' and headers are also UTF-8.
I have tried urlencode() and urldecode() but the effect was same.
When upload it on web server, works fine...
What if you try sending a HTTP Content-type header, to indicate the browser which encoding / charset your page is generating ?
For instance, something like this might help :
header('Content-type: text/html; charset=UTF-8');
echo 'the word is / думата е '.$_GET['word'];
Of course, this is if you are generating HTML -- you probably are.
Considering there is a configuration setting at the server's level that defines which encoding is sent by default, maybe the default encoding on your server is OK -- while the one on your local server is not.
Sending such a header by yourself would solve the problem : it would make sure the encoding is always set properly.
I suppose you are using the Apache web server.
There is a common problem with Apache configuration - a line with "AddDefaultCharset" in the config should be commented out (add # in the begining of the line, or replace the line with "AddDefaultCharset off") because it "overrides any encoding given in the files in meta http-equiv or xml encoding tags".
In my current installation (Apache2 # Ubuntu Linux) the line is found in "/etc/apache2/conf.d/charset" but in other (Linux/Unix) setups can be in "/etc/apache2/httpd.conf", or "/etc/apache/httpd.conf" (if you are using Apache 1). If you don't find it in these files you can search for it with "cd /etc/apache2 ; grep -r AddDefaultCharset *" (for Apache 2 # Unix/Linux).
Take a look at Changing the server encoding. An excellent read!
Cheers!
If You recieve $_GET from AJAX make sure that Your blablabla.js file in UTF-8 encode. Also You can use iconv("cp1251","utf8",$_GET['word']); to display your $_GET['word'] in UTF-8
I just had the issue and it sometimes happens if you filter the GET variable with htmlentities(). It seems like this function converts cyrillic characters into weird stuff.

International Fonts Display Issue with UTF-8

We have developed a PHP-MySQL application in two languages - English and Gujarati. The Gujarati language contains symbols that need unicode UTF-8 encoding for proper display.
The application runs perfectly on my windows based localhost and on my Linux based testing server on the web.
But when I transfer the application to the client's webserver (linux redhat based), the Gujarati characters are not displayed properly.
I have checked the raw data from both the databases (on my webserver and on the client's webserver) - it is exactly the same. However the display is different. On my server the fonts are displayed perfectly, but when I try to access the client's copy of the app, the display of Guajarati font is all wrong.
Since I am testing both these installation instances from the same machine and the same browser, the issue is not of browser incompatability or the code. I believe that there is some server setting that needs to be done, which I am missing out.
Can you help please.
Thanks
Vinayak
UPDATE
Thanks. We have tried the apache and php settings suggestions given by the SO community members. But the problem remains.
To breakdown the issue we have looked at the different stages that the data is passing through.
The two databases (at the client's end and at at our end) are identical. There is no difference in them.
The next step in this app is that a query is run which recovers the data, creates an xml file and saves it.
This XML file is then displayed using a PHP page.
We have identified that the problem is that there is a difference in the XML file being created. The code used for creating the XML file is as below:
function fnCreateTestXML($testid)
{
$objQuery = new clsQuery();
$objTest = new clsTest();
$setnames = $objQuery->fnSelectRecords("tbl_testsets", "setnumber", "");
$queryresultstests = $objQuery->fnSelectRecords("tbl_tests", "", "");
if($queryresultstests)
{
foreach($queryresultstests as $queryresulttest)
{
foreach($setnames as $setname)
{
//Creating Root node test and set its attribute
$doc = new DomDocument('1.0','utf-8');
$root = $doc->createElement('test');
$root = $doc->appendChild($root);
//and so on
//Saving XML on the disk
$xml_create = $doc->saveXML();
$testname = "testsxml.xml";
$xml_string = $doc->save($testname);
Any ideas??
Thanks
Vinayak
The answer almost certainly lies in the headers being sent with the web pages. To diagnose issues like this, I've found it useful to install the firefox addon "Live HTTP Headers".
Install that addon, then turn it on and reload a page from the client's webserver, and from your own.
What you'll probably see is that the page served by your webserver has the header:
Content-Type: text/html; charset=UTF-8
Whereas when served by the client webserver it says:
Content-Type: text/html
The way I would recommend fixing this is for you to ensure that you explicitly set the header to specify utf-8 in every page of your application. This then insulates your application from future configuration changes on the client's end.
To do this, call
header('Content-type: text/html; charset=utf-8');
on each page before sending any data.
Since you've stated that it is working in your development environments and not on your clients, you might want to check the clients Apache's AddDefaultCharset and set this to UTF-8, if it's not already. (Assuming that they're using Apache.)
I tend to make sure the following steps are checked
PHP Header is sent in UTF-8
Meta tag is set to UTF-8 (Content-Type)
Storage is set to UTF-8
Server output is set to UTF-8
Make sure your php code files are encoded in UTF8 with BOM (Byte Order Mark)
Make sure, that the response headers are correct - the Content-type should have UTF-8 in it.
Check the character set settings on the DB instance on the client's machine:
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Before executing any text fetching queries, try executing:
SET NAMES utf8
SET CHARACTER SET utf8
If that works, you might want to add the following to the my.cnf on the client's machine:
[mysqld]
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8
default-collation=utf8_general_ci
collation-server=utf8_general_ci
Please use this meta tag : meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
Make sure use this code in php :mysql_query ("set character_set_results='utf8'");

Categories