the dreaded UTF-8 BOM - php

I read your answer to reformat php files used by includes and it did remove the problem. My question is we are working on a web site that needs to display different laguages, will this be a problem?
Thanks Conrad

The BOM is not necessary in UTF-8. It can be safely removed from your PHP scripts without losing UTF-8 support.

Related

Encoding of Files for PHP project

I am developing a php project which is in HTML5. Following is the meta used for all pages in my website.
<meta charset="utf-8">
I am coding in windows machine using NetBeans. I was not really aware of encoding of the files. Since the code was working fine, i was not giving importance for this.
However, based on some of the questions in stackoverflow, I could understand more about encoding. I noticed that many php/js/css files of my project are saved in UTF-8 encoding whereas some php/js/css files are saved in ANSI encoding. (to understand this, i opened the file in notepad, clicked on save as and checked the default encoding shown).
It seems the files in which I pasted some of the unicode characters were autosaved in UTF-8 and all other files were saved in ANSI encoding (I guess it might be Windows-1252). All this happened even though I set project preference as UTF-8 in netbeans.
Is it required to save those files (files which does not use unicode) also to UTF-8 as my html meta says UTF-8? (Note that there are no issues when I tested my website, but my testing was from a windows machine)
I am also curious to know, how the browser render the web page correctly though some of the php files are saved in ANSI but served with meta UTF-8.
(to understand this, i opened the file in notepad, clicked on save as and checked the default encoding shown).
This isn't an accurate way of checking the encoding of a file.
Files which contain only ASCII characters -- like most CSS and Javascript source files! -- are valid in most text encodings. Notepad will call them "ANSI" because that's its default, but they're also perfectly valid as UTF-8. No conversion is necessary.

Find source of BOM in Zend Framework 2

I realized that all response that returns my Zend Framework 2 application contains weird characters at the beginning. For example, when I copy the source code of any page returned by ZF2, I see these characters at the beginning of the file when I paste it in Notepad++ : . These seem to be 6 Byte Order Mark characters following each other.
I checked the encoding of my files, and every file I opened in Notepad++ were said to be in UTF-8 w/o BOM.
Also, I checked other pages on my server from other sites, and there is no problem.
Could you please help me understand why there is such a thing at the beginning of each page of my site, even in the Json data returned by my web services. What would be the quickest way to spot from where these are printed and how to get rid of them?
Thank you for your help.
I eventually found my answer here:
Elegant way to search for UTF-8 files with BOM?
I tried both ways described on the thread:
grep -rl $'\xEF\xBB\xBF' .
or using Total Commander available here.
It helped me find the files where the BOM character appears and was then able to convert these files to UTF-8 w/o BOM.

Setting up cpanel server for PHP UTF-8 characters [BOM error]

I've been having a bit of a headache getting my server working with UTF-8 characters. I have a list of UK towns some of which have Gaelic characters and some Scottish ones also have special characters it seems.
To facilitate this the developer put a BOM at the start of the PHP file. On his server a Ubuntu setup this worked fine as apparently his PHP defaults to UTF-8. However on my server it caused an issue with the headers having been already sent (Obviously reading the BOM as ABC instead of a BOM declaration).
I went into WHM and set the Default_charset in php.ini to UTF-8, and php info reads the default charset as UTF-8. I didn't want to edit the httpd.conf to set Apache to UTF-8 across the entire server as this would possible cause problems elsewhere so I set UTF-8 in the .htaccess file as follows
AddDefaultCharset utf-8
However the BOM was still being read as text and messing up the headers. With the BOM removed the page and the site works fine, just the special characters are read as random ISO characters.
I don;t have much experience with server settings or administration and none with BOM or setting up and running PHP in UTF-8 so any help or leads would be a great help. no doubt I've done something extremely stupid and obvious.

kannada words are displaying as question marks in firefox browser

I am trying to display kannada words in mozilla browser on Ubuntu 12.04 lts connecting through the MySql.
I have used collation utf-8 general ci and used header('Content-type:text/html; charset=utf-32'); php code in php.
When I tried to retrieve the words from database and display it on the firefox browser it is displaying as question marks...
Please help.
The character encoding that you declare in a header must be the same as the actual encoding. It seems that these differ radically (UTF-32 vs. possibly UTF-8). Find out the actual encoding and declare it.
Don’t use UTF-32 on web pages. Firefox was the last major browser that supported it, and the support was removed in 2011.

Space turns into capital 'i' on Arabic website

I am building an Arabic website using PHP. It is probably an encoding error, but spaces in the Arabic language turns into capital "i"'s for some reason. I have included UTF-8 enconding in the website's main CSS, but the error still exists.
Note: This only happens when using Chrome on Windows OS.
After thorough research, it appeared that it wasn't a Charset issue. I changed the fonts around in some CSS tags and the "i" appeared in some fonts and disappeared in others. So, I chose one that worked. Thanks for everyone's help.

Categories