Since yesterday, my server is adding some garbage characters to any script in PHP. If I look at my code with view source, I see some spaces and if I display a JSON string, it is considered invalid.
If I take this simple example:
<?PHP
echo "hello";
?>
It displays hello but in the source code I see a blank line before hello. The encoding of the file is in UTF8 without BOM (did it with Notepad++)
If I use file_get_contents to load the PHP file and then use rawurlencode before outputting the content, I get the following garbage before hello:
%EF%BB%BF%EF%BB%BF%EF%BB%BF
My first thought was that it was an encoding issue but I checked the concerned PHP file(s) and they are all in UTF8 without BOM. The only solution I have found is to remove this string of garbage each time before treating the content of a file.
I'm using Wordpress and the problem suddenly appeared yesterday while I had not modified any file.
Do you have any idea?
Thanks
Laurent
Related
In a webapp I place a <div id="xxx" contentEditable=true > for editing purpose. The encodeURIComponent(xxx.innerHTML) will be send via Ajax POST type to a server, where a PHP script creates a simple txt file from it which in turn can be downloaded from the user to store it locally or print it on screen. It works perfect so far, but … Yes, but, character encoding is a mess. All special characters like the german Ä are interpretated wrong. In this case as ä
I google for some days and I study PHP methods like iconv() and I know how to set up a browsers character encoding and also set a text editor for a correct correspondending decoding. But nothing helps, its still a messs, or becoming even weired.
So my question is : Where in this encoding/decoding roundtrip from the browser to a server and back to the browser I have to do what, to ensure that an Ä will still be an Ä ?
I answer my question, because it turns out to be another problem as stated above. The contenteditable is actually part of a section of html code. On the serverside with PHP I need to filter out the contenteditable text which I do via a DOMDocument like this:
$doc = new DOMDocument();
$doc->loadHTML($_POST["data"]);
then I access the elements and their textual content as usual.
Finally I save the text with
file_put_contents($txtFile, $plainText, LOCK_EX);
The saved text then was a mess as written above. Now it turns out that you need to tell the DOMDocument the character set wich loadHTML() has to interpretate. In this case UTF-8.
First I did it as recommended in PHP this way :
$doc = new DOMDocument('1.0', 'UTF-8');
But that doesn't help (I wonder). Then I found this answer in SO. And the final solution is this :
$doc->loadHTML('<?xml encoding="UTF-8">' . $_POST["data"]);
Though it works it is a trick. Finally the question is left over, how to do it the right way ? If somebedoy has the definite answer, he is very welcome.
You need to make sure that the content is encoded consistently throughout its roundtrip from user input to server-side storage and back to the browser again.
I would recommend using UTF-8. Check that your HTML document (which includes the contenteditable zone) is UTF-8 encoded, and that the XMLHttpRequest/Ajax request does not specify a different encoding when it sends the content to the server.
Check that your server-side application encodes the text file as UTF-8 also. And check that the HTTP response headers declare the file's encoding as UTF-8 when the file is requested and downloaded in the browser.
Somewhere along this path, the encoding differs, and that is what is causing the error. iconv converts between different encodings, which should not be necessary if everything is consistent.
Good luck!
I am sending an AJAX request expecting JSON response.
However, the returned JSON is preceded with a red dot\bullet which is causing a parse error.
Here is a screenshot from Postman:
The dot is not shown on Raw or Preview display, only on Pretty.
In Chrome Dev Tools Network tab it appears under Response. Preview is shown normally as if the dot isn't there.
As mentioned in a comment before: In Chrome, red dots usually represent non-printable special unicode characters.
Please check your server side code to prevent outputting those characters
If your files are encoding with UTF-8, better to encode them with UTF-8 without BOM. This can be easily done through notepad++. The steps are as follows,
Open your files in notepad++.
Go to Encoding option on the file menu.
Then select the option "convert to UTF-8 without BOM".
This may resolve your problem.
You need to clear you object buffer on server side.
I am using PHP as my server side language and I faced similar problem and the solution was cleaning my buffer using ob_clean();
I was faced red dots problem in my ajax response I tried a lot of solution but doest work for me after then I tried ob_clean() function I got the solution
I solved my problem using ob_clean() inside the constructor method
function __construct()
{
ob_clean();
}
i had the same problem and fixed this by converting the file from utf-8 to utf-8 without BOM
Problem:
I have a Textarea, that except XML as content and post to server. It works fine if all Ascii characters are there, but when we put data in hebrew then simplexml_load_string fail to load the data, prompting that invalid XML as data encoding breaks the data been posted.
What I did:
I have my HTML meta tag for UTF-8 is set, I do have php header set for content to be UTF-8
I have MySQL set to 'SET NAMES utf8.
When print_r(iconv_get_encoding('all')); it print all three values as ISO-8859-1.
When I print $_POST it shows hebrew characters fine on browser [on Browser view source as well], but still the function failed.
When I change php.ini to take iconv encoding as UTF-8 all works fine again.
However:
Same server does have 100s of Wordpress installation that run Hebrew website, and they don't have such problem.
So, my question is: Why my code is failing but wordpress or any other open source software works just fine with encoding. I did try to set iconv to utf-8 as first executable line, but nothing changed for me.
Not sure I explain my problem fine and my question is clear, if not please let me know. Thanks.
EDIT: I did try utf8_encode and utf8_decode function but they too failed.
You need to use mb_internal_encoding('UTF-8') to tell php what encoding you are using. With this you are overwriting the settings from php.ini.
Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.
Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.
You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>
I have encountered a similar problem described here (and in other places) -
where as on an ajax callback I get a xmlhttp.responseText that seems ok (when I alert it - it shows the right text) - but when using an 'if' statement to compare it to the string - it returns false.
(I am also the one who wrote the server-side code returning that string) - after much studying the string - I've discovered that the string had an "invisible character" as its first character. A character that was not shown. If I copied it to Notepad - then deleted the first character - it won't delete until pressing Delete again.
I did a charCodeAt(0) for the returned string in xmlhttp.responseText. And it returned 65279.
Googling it reveals that it is some sort of a UTF-8 control character that is supposed to set "big-endian" or "small-endian" encoding.
So, now I know the cause of the problem - but... why does that character is being echoed?
In the source php I simply use
echo 'the string'...
and it apparently somehow outputs [chr(65279)]the string...
Why? And how can I avoid it?
To conclude, and specify the solution:
Windows Notepad adds the BOM character (the 3 bytes: EF BB BF) to files saved with utf-8 encoding.
PHP doesn't seem to be bothered by it - unless you include one php file into another -
then things get messy and strings gets displayed with character(65279) prepended to them.
You can edit the file with another text editor such as Notepad++ and use the encoding
"Encode in UTF-8 without BOM",
and this seems to fix the problem.
Also, you can save the other php file with ANSI encoding in notepad - and this also seem to work (that is, in case you actually don't use any extended characters in the file, I guess...)
If you want to print a string that contains the ZERO WIDTH NO-BREAK SPACE char (e.g., by including an external non-PHP file), try the following code:
echo preg_replace("/\xEF\xBB\xBF/", "", $string);
If you are using Linux or Mac, here is an elegant solution to get rid of the character in PHP.
If you are using WordPress (25% of Internet websites are powered by WordPress), the chances are that a plugin or the active theme are introducing the BOM character due a file that contains BOM (maybe that file was edited in Windows). If that's the case, go to your wp-content/themes/ folder and run the following command:
grep -rl $'\xEF\xBB\xBF' .
This will search for files with BOM. If you have .php results in the list, then do this:
Rename the file to something like filename.bom.bak.php
Open the file in your editor and copy the content in the clipbard.
Create a new file and paste the content from the clipboard.
Save the file with the original name filename.php
If you are dealing with this locally, then eventually you'd need to re-upload the new files to the server.
If you don't have results after running the grep command and you are using WordPress, then another place to check for BOM files is the /wp-content/plugins folder. Go there and run the command again. Alternatively, you can start deactivating all the plugins and then check if the problem is solved while you active the plugins again.
If you are not using WordPress, then go to the root of your project folder and run the command to find files with BOM. If any file is found, then run the four steps procedure described above.
You can also remove the character in javascript with:
myString = myString.replace(String.fromCharCode(65279), "" );
I had this problem and changed my encoding to utf-8 without bom, Ansi, etc with no luck. My problem was caused by using a php include function in the html body. Moving the include function to above my html (above !DOCTYPE tag) resolved the issue.
After I knew my issue I tested include, include_once and require functions. All attempts to include a file from within the html body created the extra miscellaneous 𐃁 character at the spot where the PHP code would start.
I also tried to assign the result of the include to a variable ... i.e $result = include("myfile.txt"); with the same extra character being added
Please note that moving the include above the HTML would not remove the extra character from showing, however it removes it from my data and out of the content area.
In addition to the above, I just had this issue when pulling some data from a MySQL database (charset is set to UTF-8) - the issue being the HTML tags, I allowed some basic ones like <p> and <a> when I displayed it on the page, I got the 𐃁 character looking through Dev Tools in Chrome.
So I removed the tags from the table and that removed the 𐃁 issue (and the blank line above the where the text was to be displayed.
I just wanted to add to this, since my Rep isn't high enough to actually comment on the answer.
EDIT: Using VIM I was able to remove the BOM with :set nobomb and you can confirm the presence of the BOM with :set bomb? which will display either bomb or nobomb
I use "Dreamweaver CC 2015", by default it has this option enabled: "include BOM signature" or something like that, when you click on save as option from file menu. In the window that apears, you can see "Unicode Options..". You can disable the BOM option. And remeber to change all your files like that. Or you can simply go to preferences and disable the BOM option and save all your files.
I'm using the PhpStorm IDE to develop php pages.
I had this problem and use this option of IDE to remove any BOM characters and problem solved:
File -> Remove BOM
Try to find options like this in your IDE.
Probably something on the server. If you know it's there, I would just bypass it until solved.
myString = myString.substring(1)
Chops off the first character.
When using atom it is a white space on the start of the document before <?php
A Linux solution to find and remove this character from a file is to use sed -i 's/\xEF\xBB\xBF//g' your-filename-here
My solution is create a php file with content:
<?php
header("Content-Type:text/html;charset=utf-8");
?>
Save it as ANSI, then other php file will require/include this before any html or php code