I write a simple php to load some binary data from DB then output to client.
$sql="select FightPlayEnd from ZMTXLogic.FightLog where ID=".addslashes($id);
$result=$db->query($sql);
if($db->num_rows($result)>0)
{
$row = mysql_fetch_assoc($result);
$nByteCount = mb_strlen($row["FightPlayEnd"], '8bit');
//echo $nByteCount;
header("Content-type:application/octet-stream");
header("Accept-Ranges:bytes");
header("Accept-Length:".$nByteCount);
header("Content-Disposition:attachment;filename=FightPlayEnd.bin");
header( "Content-type: application/octet-stream");
echo $row["FightPlayEnd"];
}
The problem is the data got from IE is not same as original binary data but added
EF BB BF(view in UltraEdit) at the header and 0D 0A 0D 0A at the end. What is wrong with it?
0D 0A 0D 0A is just \r\n\r\n, that is, two linebreaks.
EE BB BF is a byte order mark. It signals UTF-8 encoding.
Edit (see comments):
Your script might be outputting more than it should (particularly those \r\n\r\n).
You need to clean the output buffer before you start outputting the data (ob_clean()) and exit right after your echo.
Related
To prepare a download of some HTML contenteditable, as plain text file, I do following :
Send the html contenteditable, which inherits other html elements, through Ajax to a server side script prepareDownload.php.
There I create a new DOMDocument : $doc = new DOMDocument();
Then I do : $doc->loadHTML('<?xml encoding="UTF-8">' . $_POST["data"]);
Then I am looking for text contents in certain elements and assemble it in $plainText
Finally I write $plainText to disk with : file_put_contents($txtFile, $plainText, LOCK_EX);
So far it works … but when I open the textfile the special characters like the German Ä are a mess.
To find out where the problem might be generated I place some print_r() commands on several stages in the php script and look into the browsers console whats coming back.
Until the point where I write $plainText with to disk file_put_contents() everything is perfect. Looking into the stored text file then, characters are a mess.
Now I assume that file_put_contents() misinterprets the given charset. But how to tell file_put_contents() that it should interpret (not encode) it as UTF-8 ?
EDIT:
As a test to find out more I replaced the explizit statement :
$doc->loadHTML('<?xml encoding="UTF-8">' . $_POST["data"])
with
$doc->loadHTML($_POST["data"])
The character ä in the file still looks weired, but different. The hexdump now looks like this :
0220: 20 76 69 65 6C 2C 20 65 72 7A C3 A4 68 6C 74 20 viel, erz..hlt
Now ä has two points (two bytes) and is hex C3 A4. What kind of encoding is this ?
EDIT2: The issue was with how my Perl client was interpreting the output from PHP's json_encode which outputs Unicode code points by default. Putting the JSON Perl module in ascii mode (my $j = JSON->new()->ascii();) made things work as expected.
I'm interacting with an API written in PHP that returns JSON, using a client written in Perl which then submits a modified version of the JSON back to the same API. The API pulls values from a PostgreSQL database whose encoding is UTF8. What I'm running in to is that the API returns a different character encoding, even though the value PHP receives from the database is proper UTF-8.
I've managed to reproduce what I'm seeing with a couple lines of PHP (5.3.24):
<?php
$val = array("Millán");
print json_encode($val)."\n";
According to the PHP documentation, string literals are encoded ... in whatever fashion [they are] encoded in the script file.
Here is the hex dumped file encoding (UTF-8 lower case a-acute = c3 a1):
$ grep ill test.php | od -An -t x1c
24 76 61 6c 20 3d 20 61 72 72 61 79 28 22 4d 69
$ v a l = a r r a y ( " M i
6c 6c c3 a1 6e 22 29 3b 0a
l l 303 241 n " ) ; \n
And here is the output from PHP:
$ php -f test.php | od -An -t x1c
5b 22 4d 69 6c 6c 5c 75 30 30 65 31 6e 22 5d 0a
[ " M i l l \ u 0 0 e 1 n " ] \n
The UTF-8 lower case a-acute has been changed to a "Unicode" lower case a-acute by json_encode.
How can I keep PHP/json_encode from switching the encoding of this variable?
EDIT: What's interesting is that if I change the string literal to utf8_encode("Millán") then things work as expected. The utf8_encode docs say that function only supports ISO-8859-1 input, so I'm a bit confused about why that works.
This is entirely based on a misunderstanding. json_encode encodes non-ASCII characters as Unicode escape sequences \u..... These sequences do not reference any physical byte encoding in any UTF encoding, it references the character by its Unicode code point. U+00E1 is the Unicode code point for the character á. Any proper JSON parser will decode \u00e1 back into the character "á". There's no issue here.
try the below command to solve their problems.
<?php
$val = array("Millán");
print json_encode($val, JSON_UNESCAPED_UNICODE);
Note: add the JSON_UNESCAPED_UNICODE parameter to the json_encode function to keep the original values.
For python, this Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
How do I do something as simple as (in PHP) this code in C:
char buffer[5] = "testing";
FILE* file2 = fopen("data2.bin", "wb");
fwrite(buffer, sizeof buffer, 1, file2);
fclose(file2);
Whenever I try to write a binary file in PHP, it doesn't write in real binary.
Example:
$ptr = fopen("data2.bin", 'wb');
fwrite($ptr, "testing");
fclose($ptr);
I found on internet that I need to use pack() to do this...
What I expected:
testing\9C\00\00
or
7465 7374 696e 679c 0100 00
What I got:
testing412
Thanks
You're making the classic mistake of confusing data with the representation of that data.
Let's say you have a text file. If you open it in Notepad, you'll see the following:
hello
world
This is because Notepad assumes the data is ASCII text. So it takes every byte of raw data, interprets it as an ASCII character, and renders that text to your screen.
Now if you go and open that file with a hex editor, you'll see something entirely different1:
68 65 6c 6c 6f 0d 0a 77 6f 72 6c 64 hello..world
That is because the hex editor instead takes every byte of the raw data, and displays it as a two-character hexadecimal number.
1 - Assuming Windows \r\n line endings and ASCII encoding.
So if you're expecting hexadecimal ASCII output, you need to convert your string to its hexadecimal encoding before writing it (as ASCII text!) to the file.
In PHP, what you're looking for is the bin2hex function which "Returns an ASCII string containing the hexadecimal representation of str." For example:
$str = "Hello world!";
echo bin2hex($str); // output: 48656c6c6f20776f726c6421
Note that the "wb" mode argument doesn't cause any special behavior. It guarantees binary output, not hexadecimal output. I cannot stress enough that there is a difference. The only thing the b really does, is guarantee that line endings will not be converted by the library when reading/writing data.
I am completing my project on fusion chart. I need to add BOM signature in my dynamic xml. But I am unable to figure out that how can I add BOM signature for dynamic xml using php.
My codes are like this
$filename="a.xml";
$file= fopen("$filename", "w");
$_xml="<something/>";
fwrite($file, $_xml);
fclose($file);
In fusion chart documentation I found I need to add for general php output
header ( 'Content-type: text/xml' );
echo pack ( "C3" , 0xef, 0xbb, 0xbf );
So can any one help me with this?
Thank you,
You can use a BOM as a signature no matter how the Unicode text is transformed: UTF-16, UTF-8, or UTF-32. The exact bytes comprising the BOM will be whatever the Unicode character U+FEFF is converted into by that transformation format. In that form, the BOM serves to indicate both that it is a Unicode file, and which of the formats it is in.
If you want to, just pass a string (which is binary in PHP) that contains the BOM. Example strings:
Bytes PHP String Encoding Form
----- ---------- -------------
00 00 FE FF "\0\0\xFE\xFF" UTF-32, big-endian
FF FE 00 00 "\xFF\xFE\0\0" UTF-32, little-endian
FE FF "\xFE\xFF" UTF-16, big-endian
FF FE "\xFF\xFE" UTF-16, little-endian
EF BB BF "\xEF\xBB\xBF" UTF-8
See http://unicode.org/faq/utf_bom.html
This is driving me crazy.
I have this one php file on a test server at work which does not work.. I kept deleting stuff from it till it became
<?
print 'Hello';
?>
it outputs
Hello
if I create a new file and copy / paste the same script to it it works!
Why does this one file give me the strange characters all the time?
That's the BOM (Byte Order Mark) you are seeing.
In your editor, there should be a way to force saving without BOM which will remove the problem.
Found it, file -> encoding -> UTF8 with BOM , changed to to UTF :-)
I should ahve asked before wasing time trying to figure it out :-)
Just in case, here is a list of bytes for BOM
Encoding Representation (hexadecimal)
UTF-8 EF BB BF
UTF-16 (BE) FE FF
UTF-16 (LE) FF FE
UTF-32 (BE) 00 00 FE FF
UTF-32 (LE) FF FE 00 00
UTF-7 2B 2F 76, and one of the following bytes: [ 38 | 39 | 2B | 2F ]†
UTF-1 F7 64 4C
UTF-EBCDIC DD 73 66 73
SCSU 0E FE FF
BOCU-1 FB EE 28 optionally followed by FF†