PHP export to binary Excel file - UTF-8 character encoding - php

I am using this simple function (taken from here) to export PHP array into simple binary Excel file. Writing binary Excel file was my requirement.
public static function array_to_excel($input)
{
$ret = pack('ssssss', 0x809, 0x8, 0x0, 0x10, 0x0, 0x0);
foreach (array_values($input) as $lineNumber => $row)
{
foreach (array_values($row) as $colNumber => $data)
{
if (is_numeric($data))
{
$ret .= pack('sssssd', 0x203, 14, $lineNumber, $colNumber, 0x0, $data);
}
else
{
$len = strlen($data);
$ret .= pack('ssssss', 0x204, 8 + $len, $lineNumber, $colNumber, 0x0, $len) . $data;
}
}
}
$ret .= pack('ss', 0x0A, 0x00);
return $ret;
}
Then to call this is pretty much simple simple:
Model_Utilities::array_to_excel($my_2d_array);
Function itself works great and is super simple to create simple binary PHP file. The problem I have is with UTF-8 characters. I get strange characters like Ä¡ instead of right characters... Is there a way to set character encoding in my to excel function?

EDIT:
After wading through hundreds of obfuscated Microsoft docs before locating the OpenOffice version of the XLS format spec, I managed to do something.
However, it relies on the BIFF8 format since, as far as I can tell, BIFF5 (the format used by Excel95) has no direct UTF-16 support.
function array_to_excel($input)
{
$cells = '';
foreach (array_values($input) as $lineNumber => $row)
{
foreach (array_values($row) as $colNumber => $data)
{
if (is_numeric($data))
{
$cells .= pack('sssssd', 0x203, 14, $lineNumber, $colNumber, 0x0, $data);
}
else
{
$data = mb_convert_encoding ($data, "UTF-16LE", "UTF-8");
$len = mb_strlen($data, "UTF-16LE");
$cells .= pack('ssssssC', 0x204, 9+2*$len, $lineNumber, $colNumber, 0x0, $len, 0x1).$data;
}
}
}
return pack('s4', 0x809, 0x0004, 0x0600, // <- this selects BIFF8 format
0x10) . $cells . pack('ss', 0x0A, 0x00);
}
$table = Array (
Array ("Добрый день", "Bonne journée"),
Array ("tschüß", "こんにちは。"),
Array (30, 40));
$xls = array_to_excel($table);
file_put_contents ("sample.xls", $xls);
My (French) PC version of Excel 2007 managed to open the sample file in compatibility mode, Russian and Japanese included. There is no telling how this hack would work on other variants, though.
EDIT (bis) : from the file specs linked above:
Byte Strings (BIFF2-BIFF5)
All Excel file formats up to BIFF5 contain simple byte strings. The byte string consists of the length of the string
followed by the character array. The length is stored either as 8bit value or as 16bit value, depending on the current record. The string is not zero-terminated.
The encoding of the character array is dependent on the current record.
Record LABEL, BIFF3-BIFF5:
Offset Size Contents
0 2 Index to row
2 2 Index to column
4 2 Index to XF record
6 var. Byte string, 16-bit string length
Unless you generate a much more complex file, I'm afraid BIFF5 is a no go.

Related

How to convert a large binary string to a binary value and back

I need to take a large binary string (whose length will always be divisible by 8) ...
// 96-digit binary string
$str = '000000000011110000000000000000001111111111111111111111111111111111111111000000000000000000001111';
... then convert it to a binary value (to store in a mysql db as type varbinary), and later convert it back again to recreate that string.
This is most likely NOT a duplicate question. Every posted stackoverflow answer I could find is either broken (PHP7 apparently changed how some of these functions work) or doesn't offer a solution to this specific problem. I've tried a few things, such as ...
// get binary value from binary string
$bin = pack('H*', base_convert($str, 2, 16));
// get binary string from binary value
$str2 = str_pad(base_convert(unpack('H*', $bin)[1], 16, 2), 96, 0, STR_PAD_LEFT);
... but this doesn't actually work.
My goal is to go back and forth between the given binary string and the smallest binary value. How is this best done?
These functions convert bit strings to binary character strings and back.
function binStr2charStr(string $binStr) : string
{
$rest8 = strlen($binStr)%8;
if($rest8) {
$binStr = str_repeat('0', 8 - $rest8).$binStr;
}
$strChar = "";
foreach(str_split($binStr,8) as $strBit8){
$strChar .= chr(bindec($strBit8));
}
return $strChar;
}
function charStr2binStr(string $charStr) : string
{
$strBin = "";
foreach(str_split($charStr,1) as $char){
$strBin .= str_pad(decbin(ord($char)),8,'0', STR_PAD_LEFT);
}
return $strBin;
}
usage:
// 96-digit binary string
$str = '000000000011110000000000000000001111111111111111111111111111111111111111000000000000000000001111';
$strChars = binStr2charStr($str);
// "\x00<\x00\x00\xff\xff\xff\xff\xff\x00\x00\x0f"
//back
$strBin = charStr2binStr($strChars);

How to read/write 64-bit unsigned little-endian integers?

pack doesn't support them, so how to read and write 64-bit unsigned little-endian encoded integers?
You could consider 64 bit numbers as an opaque chunk of 8 bytes of binary data and use the usual string manipulation functions to handle them (i.e. substr and the dot operator above all).
Whenever (and if) you need to perform arithmetics on them, you can use a couple of wrapper functions to encode/decode that chunk in its hexadecimal representation (in my implementation I call this intermediate type hex64) and use the external library you prefer to do the real work:
<?php
/* Save as test.php */
function chunk_to_hex64($chunk) {
assert(strlen($chunk) == 8);
return strrev(bin2hex($chunk));
}
function hex64_to_chunk($hex64) {
assert(strlen($hex64) == 16);
return strrev(pack('h*', $hex64));
}
// Test code:
function to_hex64($number) {
$hex64 = base_convert($number, 10, 16);
// Ensure the hex64 is left padded with '0'
return str_pad($hex64, 16, '0');
}
for ($number = 0.0; $number < 18446744073709551615.0; $number += 2932031007403.0) {
$hex64 = to_hex64($number);
$hex64_reencoded = chunk_to_hex64(hex64_to_chunk($hex64));
assert($hex64_reencoded == $hex64, "Result is $hex64_reencoded, expected $hex64");
}
$data = file_get_contents('test.php');
// Skip the last element because it is not 8 bytes
$chunks = array_slice(str_split($data, 8), 0, -1);
foreach ($chunks as $chunk) {
$hex64 = to_hex64($number);
$chunk_reencoded = hex64_to_chunk(chunk_to_hex64($chunk));
assert($chunk_reencoded == $chunk, "Result is $chunk_reencoded, expected $chunk");
}
I wrote a helper class for packing/unpacking 64-bit unsigned ints.
The relevant bit is just two lines:
$ints = unpack("#$offset/Vlo/Vhi", $data);
$sum = Math::add($ints['lo'], Math::mul($ints['hi'], '4294967296'));
(Math::* is a simple wrapper around bcmath)
And packing:
$out .= pack('VV', (int)bcmod($args[$idx],'4294967296'), (int)bcdiv($args[$idx],'4294967296'));
It splits the 64-bit ints into two 32-bit ints, which 64-bit PHP should support :-)

How can Detect UTF 16 decoding

I have to read a file and identify its decoding type, I used mb_detect_encoding() to detect utf-16 but am getting wrong result.. how can i detectutf-16 encoding type in php.
Php file is utf-16 and my header was windows-1256 ( because of Arabic)
header('Content-Type: text/html; charset=windows-1256');
$delimiter = '\t';
$f= file("$fileName");
foreach($f as $dailystatmet)
{
$transactionData = str_replace("'", '', $dailystatmet);
preg_match_all("/('?\d+,\d+\.\d+)?([a-zA-Z]|[0-9]|)[^".$delimiter."]+/",$transactionData,$matches);
array_push($matchesz, $matches[0]);
}
$searchKeywords = array ("apple", "orange", 'mango');
$rowCount = count($matchesz);
for ($row = 1; $row <= $rowCount; $row++) {
$myRow = $row;
$cell = $matchesz[$row];
foreach ($searchKeywords as $val) {
if (partialArraySearch($cell[$c_description], $val)) {
}
}}
function partialArraySearch($cell, $searchword)
{
if (strpos(strtoupper($cell), strtoupper($searchword)) !== false) {
return true;
}
return false;
}
Above code is for search with in the uploaded file.. if the file was in utf-8 then match was getting but when same file with utf-16 or utf-32 am not getting the result..
so how can i get the encoding type of uploaded file ..
If someone is still searching for a solution, I have hacked something like this in the "voku/portable-utf8" repo on github. => "UTF8::file_get_contents()"
The "file_get_contents"-wrapper will detect the current encoding via "UTF8::str_detect_encoding()" and will convert the content of the file automatically into UTF-8.
e.g.: from the PHPUnit tests ...
$testString = UTF8::file_get_contents(dirname(__FILE__) . '/test1Utf16pe.txt');
$this->assertContains('<p>Today’s Internet users are not the same users who were online a decade ago. There are better connections.', $testString);
$testString = UTF8::file_get_contents(dirname(__FILE__) . '/test1Utf16le.txt');
$this->assertContains('<p>Today’s Internet users are not the same users who were online a decade ago. There are better connections.', $testString);
My solution is to detect UTF-16 and convert the code in Latin 15 is
preg_match_all('/\x00/',$content,$count);
if(count($count[0])/strlen($content)>0.4) {
$content = iconv('UTF-16', 'ISO-8859-15', $content);
}
In other words i check the frequency of the hexadecimal character 00. If it is higher than 0.4 probably the text contains characters in the base set encoded in UTF-16. This means two bytes for character but usually the second byte is 00.

Export to ExCel shows some columns empty

I am trying to export the data from PHP & MySql. I am getting Empty value for Fields which are having huge data(around 1000 chars). Except those fields everything is working fine.
This is the code which i am using right now. Please check once and let me know if any modifications. I searched in google, so many said its cache/memory problem.
<?php
function xlsBOF() {
echo pack("ssssss", 0x809, 0x8, 0x0, 0x10, 0x0, 0x0);
return;
}
function xlsEOF() {
echo pack("ss", 0x0A, 0x00);
return;
}
function xlsWriteNumber($Row, $Col, $Value) {
echo pack("sssss", 0x203, 14, $Row, $Col, 0x0);
echo pack("d", $Value);
return;
}
function xlsWriteLabel($Row, $Col, $Value ) {
$L = strlen($Value);
echo pack("ssssss", 0x204, 8 + $L, $Row, $Col, 0x0, $L);
echo $Value;
return;
}
$sel_sql=mysql_query("select desc from table");
$myFile = date("m-d-Y").'_users.xls';
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Content-Type: application/force-download");
header("Content-Type: application/octet-stream");
header("Content-type: application/vnd.ms-excel");
header("Content-Type: application/download");;
header("Content-Disposition: attachment;filename=".$myFile);
header("Content-Transfer-Encoding: binary ");
// XLS Data Cell
xlsBOF();
xlsWriteLabel(0,1,"desc");
$xlsRow = 1;
while(list($desc)=mysql_fetch_row($sel_sql)) {
xlsWriteLabel($xlsRow,1,"$desc");
$xlsRow++;
}
xlsEOF();
?>
Assuming you're using an xls file output: in a BIFF format file, a cell is limited to 32,767 characters; but only 1,024 characters are displayed in the cell. All 32,767 characters are display in the formula bar.
EDIT
Reading the code that you've actually posted:
Excel has limits on the amount of data a cell can hold: for Excel BIFF 8 files, that limit is 32,767 characters. However, for long strings, this data is maintained in the BIFF file across several blocks with continuation records, For BIFF 5 files (Excel 95) the limit is 2084 bytes per block; in BIFF 8 files (Excel 97 and above) the limit is 8228 bytes. Records that are longer than these limits must be split up into CONTINUE blocks.
This relatively simplistic writer isn't written to handle splitting the record into multiple continuation records: it doesn't even use the BIFF 8 shared string table, or indicate what BIFF version it is writing (which means Excel will open it using lowest common denominator parameters). It simply tries to store the entire contents of the cell into a standard label block. To fix this, you'd need to fix your code to handle splitting the string values with continuation blocks (via a shared string table); or switch to using a library that does handle splitting the shared strings across multiple blocks already.

Problem downloading data to Excel using PHP "pack" function

I have a PHP application that downloads MySQL table data to a spreadsheet on the client machine. I found some code in numerous places on the web that works fine for Open Office Scalc on a machine running Redhat. The problem I ran into was when I tried to download to an MS Excel on a Windows PC rather than to an Open Office spreadsheet. The problem seems to be associated with the length of the string inserted into a cell. If it is too long, Excel thinks the file is corrupt and won't load it. "Too long" seems to be about 255 characters, even though the MS Excel specs say 32,000 is the maximum length.
To investigate the problem further, I tried downloading the same data as a tab-separated values file and let Excel convert it to a spreadsheet. Using that method, there was no problem loading very long strings into the spreadsheet cells. So strings much longer than 255 characters can in fact be inserted into Excel cells, but just not with the code I am using, even though that code works with Open Office. Using tsv files would not solve our problem, because the long strings have carriage returns that we want to retain, and carriage returns are interpreted as row separators when a tsv file is loaded into a spreadsheet.
The PHP function that writes a string to a spreadsheet cell is:
function xlsWriteLabel($Row, $Col, $Value ) {
$L = strlen($Value);
echo pack("ssssss", 0x204, 8 + $L, $Row, $Col, 0x0, $L);
echo $Value;
return;
}
The other necessary code for transfers to spreadsheets in addition to the above function can be found at:
http://www.appservnetwork.com/modules.php?name=News&file=article&sid=8
I haven't found any explanations as to the meanings of the various arguments passed to the "pack" function in the above code, and I'm wondering if changing one the arguments in the function above could solve the problem.
So, if anyone has a solution to this problem, I'd be interested in hearing it.
...unless your text has double quotes.
What you should use is proper delimiters from the ASCII char set.
Seq Dec Hex Acro Name
^\ 28 1C FS File Separator
^] 29 1D GS Group separator
^^ 30 1E RS Record Separator
^_ 31 1F US Unit separator
I don't know php but in perl a print statement might look like:
print chr(30), "Field1", chr(30), "Field2", chr(30), "Field3", chr(31);
I have written a simple approach to generate absolute excel file.
A simple function.
First you convert your contents to an Array (it is very simple and everybody know.). Then pass the array to my function given below.
<?php
//array to excel
function arrayToExcel($array, $filename='List')
{
if (!is_array($array) or !is_array($array[0])) return false;
$xl = pack("ssssss", 0x809, 0x8, 0x0, 0x10, 0x0, 0x0); //begining of excel file
$i = 0;
foreach ($array as $key => $val)
{
$j=0;
foreach ($val as $cell_key => $cell_val)
{
$cell_val = trim($cell_val);
$length = strlen($cell_val);
//checking cell value is a number and the length not equal to zero
if(preg_match('/^[0-9]*(\.[0-9]+)?$/', $cell_val) && $length!=0) {
$xl .= pack("sssss", 0x203, 14, $i, $j, 0x0); //writing number column
$xl .= pack("d", $cell_val);
}
else {
$xl .= pack("ssssss", 0x204, 8 + $length, $i, $j, 0x0, $length); //writing string column
$xl .= $cell_val;
}
$j++;
} //end of 2nd foreach
$i++;
} //end of first foreach
$xl .= pack("ss", 0x0A, 0x00); //end of excel file
$filename = $filename.".xls";
header("Pragma: public");
header('Content-Type: application/vnd.ms-excel');
header("Content-Disposition: attachment; filename=$filename");
header("Cache-Control: no-cache");
echo $xl;
} //end of arrayToExcel function
//eg: arrayToExcel($myarray, "contact_list");
?>
Surround your cell values with double quotes "", it will allow you to import the data in a tsv or csv format without having to worry about the carriage returns being interpreted as a new row. You could also use the fputcsv function from PHP to export a tsv or csv file that excel can read.
This is the slightly more efficient version of their functions thank you linked too, but I don't know if the format they output is actually correct for excel or not.
function xlsBOF(){
echo pack("s6", 0x809, 0x8, 0x0, 0x10, 0x0, 0x0);
}
function xlsEOF(){
echo pack("ss", 0x0A, 0x00);
}
function xlsWriteNumber($Row, $Col, $Value){
echo pack("s5d", 0x203, 14, $Row, $Col, 0x0, $Value);
}
function xlsWriteLabel($Row, $Col, $Value){
$L = strlen($Value);
echo pack("s6A*", 0x204, 8 + $L, $Row, $Col, 0x0, $L, $Value);
}

Categories