How do I remove ï»¿ from the beginning of a file?

How do I remove ï»¿ from the beginning of a file? - php

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: ï»¿
PHP removes all whitespace, so a random ï»¿ in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.
I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.
If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.

Three words for you:
Byte Order Mark (BOM)
That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
To automatize the BOM's removal you can use awk as shown in this question.
As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding(), like this:
<?php
//Storing the previous encoding in case you have some other piece
//of code sensitive to encoding and counting on the default value.
$previous_encoding = mb_internal_encoding();
//Set the encoding to UTF-8, so when reading files it ignores the BOM
mb_internal_encoding('UTF-8');
//Process the CSS files...
//Finally, return to the previous encoding
mb_internal_encoding($previous_encoding);
//Rest of the code...
?>

Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.

In PHP, you can do the following to remove all non characters including the character in question.
$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is
Code:
grep -rl $'\xEF\xBB\xBF' /home/username/public_html
and if you are comfortable with the vi editor, open the file in vi:
vi /path-to-file-name/file.php
And enter the command to remove the BOM:
set nobomb
Save the file:
wq

BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.
From Removing BOM from UTF-8:
#!/usr/bin/perl
#file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(#file);
I am sure it translates to PHP easily.

I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.
Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.

For me, this worked:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If I remove this meta, the ï»¿ appears again. Hope this helps someone...

You can use
vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'
Replacing with awk seems to work, but it is not in place.

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).
If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.
See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Open the PHP file under question, in Notepad++.
Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

Same problem, different solution.
One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:
# Original
$xml_string = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.
I personally use E Text Editor.
In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.
Alt text http://oth4.com/encoding.png
E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.
One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.
Just create a .vbs file, and paste the following code in it.
You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.
' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
Const UTF8_BOM = "ï»¿"
Const UTF16BE_BOM = "þÿ"
Const UTF16LE_BOM = "ÿþ"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
MsgBox "UTF-8-BOM detected!"
ElseIf Left(t, 2) = UTF16BE_BOM Then
MsgBox "UTF-16-BOM (Big Endian) detected!"
ElseIf Left(t, 2) = UTF16LE_BOM Then
MsgBox "UTF-16-BOM (Little Endian) detected!"
Else
MsgBox "No BOM detected!"
End If
If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.
' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
MsgBox "BOM gelöscht!"
Else
MsgBox "Kein UTF-8-BOM vorhanden!"
End If
The code is from Heiko Jendreck.

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Use Total Commander to search for all BOMed files:
Elegant way to search for UTF-8 files with BOM?
Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8
...and do not even think about using n...d again!

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).
In my case, I had 2 different solutions which worked for me :
First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.
AddDefaultCharset ISO-8859-1
The second solution was to change the bad encoding of the php file.

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

This works for me!
def removeBOMs(fileName):
BOMs = ['ï»¿',#Bytes as CP1252 characters
'þÿ',
'ÿþ',
'^#^#þÿ',
'ÿþ^#^#',
'+/v',
'÷dL',
'Ýsfs',
'Ýsfs',
'^Nþÿ',
'ûî(',
'„1•3']
inputFile = open(fileName, 'r')
contents = inputFile.read()
for BOM in BOMs:
if not BOM in contents:#no BOM in the file...
pass
else:
newContents = contents.replace(BOM,'', 1)
newFile = open(fileName, 'w')
newFile.write(newContents)
return None

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".
Maybe it'll work.

Related

There is a hidden character in my output and I don't know what it is [duplicate]

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: ï»¿
PHP removes all whitespace, so a random ï»¿ in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.
I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.
If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.

Three words for you:
Byte Order Mark (BOM)
That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
To automatize the BOM's removal you can use awk as shown in this question.
As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding(), like this:
<?php
//Storing the previous encoding in case you have some other piece
//of code sensitive to encoding and counting on the default value.
$previous_encoding = mb_internal_encoding();
//Set the encoding to UTF-8, so when reading files it ignores the BOM
mb_internal_encoding('UTF-8');
//Process the CSS files...
//Finally, return to the previous encoding
mb_internal_encoding($previous_encoding);
//Rest of the code...
?>

Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.

In PHP, you can do the following to remove all non characters including the character in question.
$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is
Code:
grep -rl $'\xEF\xBB\xBF' /home/username/public_html
and if you are comfortable with the vi editor, open the file in vi:
vi /path-to-file-name/file.php
And enter the command to remove the BOM:
set nobomb
Save the file:
wq

BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.
From Removing BOM from UTF-8:
#!/usr/bin/perl
#file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(#file);
I am sure it translates to PHP easily.

I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.
Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.

For me, this worked:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If I remove this meta, the ï»¿ appears again. Hope this helps someone...

You can use
vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'
Replacing with awk seems to work, but it is not in place.

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).
If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.
See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Open the PHP file under question, in Notepad++.
Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

Same problem, different solution.
One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:
# Original
$xml_string = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.
I personally use E Text Editor.
In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.
Alt text http://oth4.com/encoding.png
E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.
One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.
Just create a .vbs file, and paste the following code in it.
You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.
' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
Const UTF8_BOM = "ï»¿"
Const UTF16BE_BOM = "þÿ"
Const UTF16LE_BOM = "ÿþ"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
MsgBox "UTF-8-BOM detected!"
ElseIf Left(t, 2) = UTF16BE_BOM Then
MsgBox "UTF-16-BOM (Big Endian) detected!"
ElseIf Left(t, 2) = UTF16LE_BOM Then
MsgBox "UTF-16-BOM (Little Endian) detected!"
Else
MsgBox "No BOM detected!"
End If
If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.
' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
MsgBox "BOM gelöscht!"
Else
MsgBox "Kein UTF-8-BOM vorhanden!"
End If
The code is from Heiko Jendreck.

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Use Total Commander to search for all BOMed files:
Elegant way to search for UTF-8 files with BOM?
Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8
...and do not even think about using n...d again!

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).
In my case, I had 2 different solutions which worked for me :
First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.
AddDefaultCharset ISO-8859-1
The second solution was to change the bad encoding of the php file.

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

This works for me!
def removeBOMs(fileName):
BOMs = ['ï»¿',#Bytes as CP1252 characters
'þÿ',
'ÿþ',
'^#^#þÿ',
'ÿþ^#^#',
'+/v',
'÷dL',
'Ýsfs',
'Ýsfs',
'^Nþÿ',
'ûî(',
'„1•3']
inputFile = open(fileName, 'r')
contents = inputFile.read()
for BOM in BOMs:
if not BOM in contents:#no BOM in the file...
pass
else:
newContents = contents.replace(BOM,'', 1)
newFile = open(fileName, 'w')
newFile.write(newContents)
return None

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".
Maybe it'll work.

"php include" strange characters in the generator xml "'╗ ┐' ╗ ┐"

The structure of this XML is corrupted because of "include" connection database.
As you can see, there are strange characters in the first line of the file ('╗ ┐' ╗ ┐).
However, they do not appear on the web, since they only appear when I use cmd.exe to type the file. Here is a screenshot of the offending file:
Here's the URL of the file:
http://web.wipix.com.br/aniversariantes.xml
In my PHP file, I have two "includes" in the files connection.php (connection to database) AND "serialize.php" to generate the XML.
This only works if I take out the "includes" and use everything on one page only. How can I fix this?

That is a byte order mark (Unicode character U+FEFF) but it being displayed in an incorrect encoding. Since your document claims to be encoded as ISO-8859-1 there should not be a byte order mark.

Probably your xml file is in UTF-8 format with BOM.
http://en.wikipedia.org/wiki/Byte_order_mark
Remove offending 8 bytes or save your xml without BOM using a text editor.
If xml is dinamically generated, you have to modify the generation code.
Moreover, the BOM bytes seems encoded badly. Probably the xml was converted in a wrong way and BOM bytes were screwed up.

The odd stuff at the beginning could be a byte-order mark, but I'm not sure.
A byte-order mark is a byte sequence inserted at the beginning of a file used to indicate the endianness of it, or whether the most significant byte comes first.
From your output, there are other weird characters (not text) in the file, so it is possible that the program inserted them in.

UTF-8 characters in fwrite

I'm trying to save HTML to a .html file,
This is working:
$html_file = "output.html";
$output_string="string with characters like ã or ì";
$fileHandle = fopen($html_file, 'w') or die("file could not be accessed/created");
fwrite($fileHandle, $output_string);
fclose($fileHandle);
When I check the output.html file, these special characters in my output_string are not read correctly.
My HTML file can't have a <head> tag with the charset information, this makes it work, but my output can't have any <html>, <head> or <body> tags.
I have tried stuff like
header('Content-type: text/plain; charset=utf-8');
I also tried utf8_encode() on the string before fwrite, but with no success so far.
If I read the output.html file in Notepad++ or Netbeans IDE, it shows the correct characters being saved, it's the browser that isn't them reading properly.
I'm pretty sure PHP is saving my file with the incorrect charset, because if I create HTML files in my computer with those special characters (even without any charset setting), these are read correctly.

Try to add a BOM (Byte Order Mark) to your file :
$output_string = "\xEF\xBB\xBF";
$output_string .= "string with characters like ã or ì";
$fileHandle = // ...

Yes, PHP is writing the file correctly, only the reading program doesn't know what character encoding it is and interprets the data with the wrong charset. If you cannot include meta information that convey the correct charset and if the file format itself (plain text) does not offer a way to specify the charset and if the reading application is not able to correctly guess the charset, then there's no solution.

Whatever editor you are using to write this code must have facility to set character-type as 'UTF-8'.
Set the character-type of the file in which you have written this code.
I am using an editor that allows to change the character encoding of file from the bottom. There must be something similar for the editor you are using.

If you need the string in UTF-8 regardless of the php-script-file-encoding (if it's a single-byte one), you should use the UTF-8 encoding of those characters:
$output_string = "string with characters like \xC3\xA3 or \xC3\x8C";

PHP fwrite function to write txt file in utf-8 encoding

I have made a form where a user writes his message in Arabic and submits it by a submit button. The message is saved in database and I need to create a .txt file on the server for some other application which shows something like this :
Ø¯ Ù¾ÙˆÙ„ÙŠØ³Ùˆ Ù¾Ø
I successfully used the fopen, fwrite functions to create my txt files.
When I open the file in notepad the Arabic text is shown correctly
but when I open it in eclipse I get something like this :
Ø¯ Ù¾ÙˆÙ„ÙŠØ³Ùˆ Ù¾Ø± Ø±ÙˆØ²Ù†ÙŠØ² Ù…Ø±Ú©Ø² ØªÙˆØºÙ†Ø¯ÙˆÙŠÙŠ Ø¨Ø±ÙŠØ¯ ÙˆØ´Ùˆ
Well afterwards when I save the txt file in notepad as utf-8 encoding the above unknown stuff changes to Arabic.
But I cant do that manually for every message.
I searched a lot on the internet and did these:
I saved the script in utf-8
I used utf8_encode function
I set this too ini_set('default_charset', 'UTF-8');
this too <meta http-equiv="Content-Type" content="text/html; charset=utf-8; encoding=utf-8" />
I change the parameter in fwrite to "wb" where b is for binary
Any solution to this problem ill be very glad I have continuously worked on this issue for the last week. I know the problem is in the encoding so how can I write utf-8 encoded files using PHP?

If the text displays fine in one program but not another, that just means one program interprets the file correctly while the other doesn't. Most likely Notepad sets a UTF-8 BOM on the file when you save it again, so Eclipse now automatically recognizes that it's UTF-8 encoded. Without that, Eclipse assumes latin-1 or some other encoding as the default.
Two options:
change your Eclipse preferences to open files as UTF-8 by default
set a BOM on the file when writing it, see Encoding a string as UTF-8 with BOM in PHP
A BOM can be helpful for making programs recognize UTF-8 but can also cause problems in other programs that don't expect or want BOMs. Whether to use a BOM or not depends on your intended use and target audience.

In eclipse you need to set your encoding in menu Edit > Set Encoding...

How to avoid echoing character 65279 in php?

I have encountered a similar problem described here (and in other places) -
where as on an ajax callback I get a xmlhttp.responseText that seems ok (when I alert it - it shows the right text) - but when using an 'if' statement to compare it to the string - it returns false.
(I am also the one who wrote the server-side code returning that string) - after much studying the string - I've discovered that the string had an "invisible character" as its first character. A character that was not shown. If I copied it to Notepad - then deleted the first character - it won't delete until pressing Delete again.
I did a charCodeAt(0) for the returned string in xmlhttp.responseText. And it returned 65279.
Googling it reveals that it is some sort of a UTF-8 control character that is supposed to set "big-endian" or "small-endian" encoding.
So, now I know the cause of the problem - but... why does that character is being echoed?
In the source php I simply use
echo 'the string'...
and it apparently somehow outputs [chr(65279)]the string...
Why? And how can I avoid it?

To conclude, and specify the solution:
Windows Notepad adds the BOM character (the 3 bytes: EF BB BF) to files saved with utf-8 encoding.
PHP doesn't seem to be bothered by it - unless you include one php file into another -
then things get messy and strings gets displayed with character(65279) prepended to them.
You can edit the file with another text editor such as Notepad++ and use the encoding
"Encode in UTF-8 without BOM",
and this seems to fix the problem.
Also, you can save the other php file with ANSI encoding in notepad - and this also seem to work (that is, in case you actually don't use any extended characters in the file, I guess...)

If you want to print a string that contains the ZERO WIDTH NO-BREAK SPACE char (e.g., by including an external non-PHP file), try the following code:
echo preg_replace("/\xEF\xBB\xBF/", "", $string);

If you are using Linux or Mac, here is an elegant solution to get rid of the  character in PHP.
If you are using WordPress (25% of Internet websites are powered by WordPress), the chances are that a plugin or the active theme are introducing the BOM character due a file that contains BOM (maybe that file was edited in Windows). If that's the case, go to your wp-content/themes/ folder and run the following command:
grep -rl $'\xEF\xBB\xBF' .
This will search for files with BOM. If you have .php results in the list, then do this:
Rename the file to something like filename.bom.bak.php
Open the file in your editor and copy the content in the clipbard.
Create a new file and paste the content from the clipboard.
Save the file with the original name filename.php
If you are dealing with this locally, then eventually you'd need to re-upload the new files to the server.
If you don't have results after running the grep command and you are using WordPress, then another place to check for BOM files is the /wp-content/plugins folder. Go there and run the command again. Alternatively, you can start deactivating all the plugins and then check if the problem is solved while you active the plugins again.
If you are not using WordPress, then go to the root of your project folder and run the command to find files with BOM. If any file is found, then run the four steps procedure described above.

You can also remove the character in javascript with:
myString = myString.replace(String.fromCharCode(65279), "" );

I had this problem and changed my encoding to utf-8 without bom, Ansi, etc with no luck. My problem was caused by using a php include function in the html body. Moving the include function to above my html (above !DOCTYPE tag) resolved the issue.
After I knew my issue I tested include, include_once and require functions. All attempts to include a file from within the html body created the extra miscellaneous 𐃁 character at the spot where the PHP code would start.
I also tried to assign the result of the include to a variable ... i.e $result = include("myfile.txt"); with the same extra character being added
Please note that moving the include above the HTML would not remove the extra character from showing, however it removes it from my data and out of the content area.

In addition to the above, I just had this issue when pulling some data from a MySQL database (charset is set to UTF-8) - the issue being the HTML tags, I allowed some basic ones like <p> and <a> when I displayed it on the page, I got the &#65729 character looking through Dev Tools in Chrome.
So I removed the tags from the table and that removed the &#65729 issue (and the blank line above the where the text was to be displayed.
I just wanted to add to this, since my Rep isn't high enough to actually comment on the answer.
EDIT: Using VIM I was able to remove the BOM with :set nobomb and you can confirm the presence of the BOM with :set bomb? which will display either bomb or nobomb

I use "Dreamweaver CC 2015", by default it has this option enabled: "include BOM signature" or something like that, when you click on save as option from file menu. In the window that apears, you can see "Unicode Options..". You can disable the BOM option. And remeber to change all your files like that. Or you can simply go to preferences and disable the BOM option and save all your files.

I'm using the PhpStorm IDE to develop php pages.
I had this problem and use this option of IDE to remove any BOM characters and problem solved:
File -> Remove BOM
Try to find options like this in your IDE.

Probably something on the server. If you know it's there, I would just bypass it until solved.
myString = myString.substring(1)
Chops off the first character.

When using atom it is a white space on the start of the document before <?php

A Linux solution to find and remove this character from a file is to use sed -i 's/\xEF\xBB\xBF//g' your-filename-here

My solution is create a php file with content:
<?php
header("Content-Type:text/html;charset=utf-8");
?>
Save it as ANSI, then other php file will require/include this before any html or php code

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How do I remove ï»¿ from the beginning of a file? - php

Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.

In PHP, you can do the following to remove all non characters including the character in question. $response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added. From Removing BOM from UTF-8: #!/usr/bin/perl #file=<>; $file[0] =~ s/^\xEF\xBB\xBF//; print(#file); I am sure it translates to PHP easily.

For me, this worked: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> If I remove this meta, the ï»¿ appears again. Hope this helps someone...

You can use vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq' Replacing with awk seems to work, but it is not in place.

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿). If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save. See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Open the PHP file under question, in Notepad++. Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Copy the text of your filename.css file. Close your css file. Rename it filename2.css to avoid a filename clash. In MS Notepad or Wordpad, create a new file. Paste the text into it. Save it as filename.css, selecting UTF-8 from the encoding options. Upload filename.css.

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8". Maybe it'll work.

Related

There is a hidden character in my output and I don't know what it is [duplicate]

"php include" strange characters in the generator xml "'╗ ┐' ╗ ┐"

UTF-8 characters in fwrite

PHP fwrite function to write txt file in utf-8 encoding

How to avoid echoing character 65279 in php?

Categories

Resources