problems with lz4 between php and golang

problems with lz4 between php and golang - php

I try to compress data with lz4_compress in php and uncompress data with https://github.com/pierrec/lz4 in golang
but it fails.
it seems that the lz4_compress output misses the lz4 header, and the block data is little different.
please help me solve the problem.
<?php
echo base64_encode(lz4_compress("Hello World!"));
?>
output:
DAAAAMBIZWxsbyBXb3JsZCE=
package main
import (
"bytes"
"encoding/base64"
"fmt"
"github.com/pierrec/lz4"
)
func main() {
a, _ := base64.StdEncoding.DecodeString("DAAAAMBIZWxsbyBXb3JsZCE=")
fmt.Printf("%b\n", a)
buf := new(bytes.Buffer)
w := lz4.NewWriter(buf)
b := bytes.NewReader([]byte("Hello World!"))
w.ReadFrom(b)
fmt.Printf("%b\n", buf.Bytes())
}
output:
[1100 0 0 0 11000000 1001000 1100101 1101100 1101100 1101111 100000 1010111 1101111 1110010 1101100 1100100 100001]
[100 100010 1001101 11000 1100100 1110000 10111001 1100 0 0 10000000 1001000 1100101 1101100 1101100 1101111 100000 1010111 1101111 1110010 1101100 1100100 100001]

lz4.h explicitly says
lz4.h provides block compression functions. It gives full buffer control to user.
Decompressing an lz4-compressed block also requires metadata (such as compressed size). Each application is free to encode such metadata in whichever way it wants.
An additional format, called LZ4 frame specification (doc/lz4_Frame_format.md),
take care of encoding standard metadata alongside LZ4-compressed blocks. If your application requires interoperability, it's recommended to use it. A library is provided to take care of it, see lz4frame.h.
The PHP extension doesn't do that; it produces bare compressed blocks.
http://lz4.github.io/lz4/ explicitly lists the PHP extension as not interoperable (in the "Customs LZ4 ports and bindings" section).

Sound good! And now try
echo -n DAAAAMBIZWxsbyBXb3JsZCE= | base64 -d
I got in first 4 bytes is written 0C 00 00 00 - that is the lenght of string and rest is Hello World!. Therefore I think, that if php realize that compression of such a short input is not possible it writes the input (try echo -n "Hello World!" | lz4c ). But problem is it does not allow you recognize such a thing, or I'm wrong?

Related

Memory usage of php process

SUMMARY
Short recomendations (from more datailed informations, see answers)
To avoid memory leaks you can:
unset variables at once when they become useless
you can use xdebug for detailed report of memory consumption by functions and find memory leaks
you can set memory_limit (for example to 5Mb) to avoid dummy memory allocation
QUESTION
For what php can use memory, except libraries and variables?
I monitor memory, used by variables and its ~ 3Mb with this code:
$vars = array_keys(get_defined_vars());
$cnt_vars = count($vars);
$allsize = 0;
for ($j = 0; $j < $cnt_vars; $j++) {
try
{
$size = #serialize($$vars[$j]);
$size = strlen($size);
}
catch(Exception $e){
$str = json_encode($$vars[$j]);
$str = str_replace(array('{"','"}','":"','":'), '', $str);
$size = strlen($str);
}
$vars[$j] = array(
'size' => $size,
'name' => $vars[$j]
);
$allsize += $size;
}
and libraries takes ~ 18Mb (libcurl, etc.)
So total its 21 Mb, but
pmap -x (process)
shows, that total memory consumption is kB: 314028 RSS: 74704 Dirty: 59672
so, total real consumption is ~74Mb.
Also i see some large blocks with [anon] mapping in my pmap
For what PHP using this blocks?
php version: 5.5.9-1ubuntu4.14
php extensions:
root#webdep:~# php -m
[PHP Modules]
bcmath
bz2
calendar
Core
ctype
curl
date
dba
dom
ereg
exif
fileinfo
filter
ftp
gd
gettext
hash
iconv
json
libxml
mbstring
mcrypt
mhash
openssl
pcntl
pcre
PDO
pdo_pgsql
pgsql
Phar
posix
readline
Reflection
session
shmop
SimpleXML
soap
sockets
SPL
standard
sysvmsg
sysvsem
sysvshm
tokenizer
wddx
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib
[Zend Modules]
Zend OPcache

PHP is not same as C or CPP code that compiles to single binary. All your scripts are executed inside Zend Virtual Machine. And most of the memory is consumed by VM itself. That includes the memory used by loaded extensions the shared libraries (.so files) used by PHP process and any other shared resources.
I don't remember the exact source but somewhere I read that nearly 70% of total CPU cycles are consumed by PHP internals and only 30% get to your code (Please correct me if I am wrong here). This is not directly related with Memory consumption but should give an idea about how PHP works.
About anon blocks I found some details in another SO answer. The answer is about Java but same should apply to PHP as well.
Anon blocks are "large" blocks allocated via malloc or mmap -- see the
manpages. As such, they have nothing to do with the Java heap (other
than the fact that the entire heap should be stored in just such a
block).
Here's the actual link for SO answer
https://stackoverflow.com/a/1483482/1012809
Check this article for more details on anonymous memory pages (anon)
https://techtalk.intersec.com/2013/07/memory-part-2-understanding-process-memory/
Also check this Slide-share for more details on PHP memory management
http://www.slideshare.net/jpauli/understanding-php-memory
I would recommend to disable some extensions. That should save you some unused memory.

NOTE: this is not exactly an answer but information requested by the OP, but the comment field is too short for this... These are more of tools how to debug this kind of problems.
Xdebug’s docs are pretty comprehensive, they should tell how to use it far better than I could by copying their docs to here. The script you gave is a bit fuzzy, so I did not do the trace myself, but it would give you line-by-line diffs of memory usage.
Basically set xdebug.show_mem_delta to 1 with Xdebug enabled to generate the function trace, which you can then open in a text editor to see what part exactly is the thing that leaks memory.
Then you can compare the initial (or middle position) total memory to see how much it differs from the real memory usage you are seeing.
TRACE START [2007-05-06 14:37:26]
0.0003 114112 +114112 -> {main}() ../trace.php:0
Here the total memory would be the 114112.
If the difference is really big, you may want to use something like shell_exec() to get the real memory usage in between all lines, and output that, and then you can compare that output to Xdebug’s memory output to see where the difference happens.
If the difference is from the very first line of the script, the culprit could be an extension of PHP. See php -m if there is any fishy extensions.

First of all make an array, to investigate memory it is taking
$startMemory = memory_get_usage();
$array = range(1, 100000);
echo memory_get_usage() - $startMemory, ' bytes';
one integer is 8 bytes (on a 64 bit unix machine and using the long type) and here 100000 integers, so you obviously will need 800000 bytes. That’s something like 0.76 MB.
This array gives 14649024 bytes. That’s 13.97 MB - eighteen times more than estimated.
Here is a quick summary of the memory usage of the different components involved:
| 64 bit | 32 bit
---------------------------------------------------
zval | 24 bytes | 16 bytes
+ cyclic GC info | 8 bytes | 4 bytes
+ allocation header | 16 bytes | 8 bytes
===================================================
zval (value) total | 48 bytes | 28 bytes
===================================================
bucket | 72 bytes | 36 bytes
+ allocation header | 16 bytes | 8 bytes
+ pointer | 8 bytes | 4 bytes
===================================================
bucket (array element) total | 96 bytes | 48 bytes
===================================================
total total | 144 bytes | 76 bytes
Again, for large, static arrays, if i call like:
$startMemory = memory_get_usage();
$array = new SplFixedArray(100000);
for ($i = 0; $i < 100000; ++$i) {
$array[$i] = $i;
}
echo memory_get_usage() - $startMemory, ' bytes';
It will result 5600640 bytes
That’s 56 bytes per element and thus much less than the 144 bytes per element a normal array uses. This is because a fixed array doesn’t need the bucket structure. So it only requires one zval (48 bytes) and one pointer (8 bytes) for each element, giving the observed 56 bytes.
Hope this will be helpful.

Nothing is wrong with the numbers you see, you shouldn't combine them, this is just "tripling", you are seeing different sections (read-only, executable, writable) for libraries listed separately, your number is correct.

Piecemeal bzcompression for large files in PHP

Creating bzip2 archived data in PHP is very easy thanks to its implementation in bzcompress. In my present application I cannot in all reason simply read the input file into a string and then call bzcompress or bzwrite. The PHP documentation does not make it clear whether successive calls to bzwrite with relatively small amounts of data will yield the same result as when compressing the whole file in one single swoop. I mean something along the lines of
$data = file_get_contents('/path/to/bigfile');
$cdata = bzcompress($data);
I tried out a piecemeal bzcompression using the routines shown below
function makeBZFile($infile,$outfile)
{
$fp = fopen($infile,'r');
$bz = bzopen($outfile,'w');
while (!feof($fp))
{
$bytes = fread($fp,10240);
bzwrite($bz,$bytes);
}
bzclose($bz);
fclose($fp);
}
function unmakeBZFile($infile,$outfile)
{
$bz = bzopen($infile,'r');
while (!feof($bz))
{
$str = bzread($bz,10240);
file_put_contents($outfile,$str,FILE_APPEND);
}
}
set_time_limit(1200);
makeBZFile('/tmp/test.rnd','/tmp/test.bz');
unmakeBZFile('/tmp/test.bz','/tmp/btest.rnd');
To test this code I did two things
I used makeBZFile and unmakeBZFile to compress and then decompress a SQLite database - which is what I need to do eventually.
I created a 50Mb filled with random data dd if=/dev/urandom of='/tmp.test.rnd bs=50M count=1
In both cases I performed a diff original.file decompressed.file and found that the two were identical.
All very nice but it is not clear to me why this is working. The PHP docs state that bzread(bzpointer,length) reads a maximum length bytes of UNCOMPRESSED data. If my code below is woring it is because I am forcing the bzwite and bzread size to 10240 bytes.
What I cannot see is just how bzread knows how to fetch lenth bytes of UNCOMPRESSED data. I checked out the format of a bzip2 file. I cannot see tht there is anything there which helps easily establish the uncompressed data length for a chunk of the .bz file.
I suspect there is a gap in my understanding of how this works - or else the fact that my code below appears to perform a correct piecemeal compression is purely accidental.
I'd much appreciate a few explanations here.

To understand how the decompression get the length of bytes you have to understand first the compression. It seems that you don't know any thing about compression algorigthim.
BZIP2
Crucial algorithm of BZIP2 is the Burrows Wheeler transformation (BWT), that converts the original data into a suitable form for following coding. The current version applies a Huffman code. Compression algorithm processes the data in blocks totally independent from each block. Block sizes can be set in a range from 1-9 (100,000 - 900,000 bytes).
BZIP2 Data Structure
The first two character of compressed string start with letter 'BZ' and thereafter 1 byte for algorigthim used. Thereafter identification of the block size immediately follows, being valid for the entire file (h1, h2, h3 to h9). The parameter indicates the block size in units from 1-9 (100,000 - 900,000 bytes).
Actual original data are stored in blocks according to the selected size and will be protected individually with a CRC32 checksum. Additionally a 48 bit identifier introduces each block. This block structure allows a partial reconstruction of damaged files.
GZIP/BZIP
Gzip and bzip2 are functionally equivalent. One advantage of GZIP is that it can compress a stream, a sequence where you can't look behind. This makes it the official compressor of http streams. GZZIP DEFLATE RFC 1951 Compressed Data Format Specification and GUNZIP RFC 1952 File Format Specification are published documents.
GIP explained

why golang and php gzip results are different [duplicate]

Firstly, my Java version:
string str = "helloworld";
ByteArrayOutputStream localByteArrayOutputStream = new ByteArrayOutputStream(str.length());
GZIPOutputStream localGZIPOutputStream = new GZIPOutputStream(localByteArrayOutputStream);
localGZIPOutputStream.write(str.getBytes("UTF-8"));
localGZIPOutputStream.close();
localByteArrayOutputStream.close();
for(int i = 0;i < localByteArrayOutputStream.toByteArray().length;i ++){
System.out.println(localByteArrayOutputStream.toByteArray()[i]);
}
and output is:
31
-117
8
0
0
0
0
0
0
0
-53
72
-51
-55
-55
47
-49
47
-54
73
1
0
-83
32
-21
-7
10
0
0
0
Then the Go version:
var gzBf bytes.Buffer
gzSizeBf := bufio.NewWriterSize(&gzBf, len(str))
gz := gzip.NewWriter(gzSizeBf)
gz.Write([]byte(str))
gz.Flush()
gz.Close()
gzSizeBf.Flush()
GB := (&gzBf).Bytes()
for i := 0; i < len(GB); i++ {
fmt.Println(GB[i])
}
output:
31
139
8
0
0
9
110
136
0
255
202
72
205
201
201
47
207
47
202
73
1
0
0
0
255
255
1
0
0
255
255
173
32
235
249
10
0
0
0
Why?
I thought it might be caused by different byte reading methods of those two languages at first. But I noticed that 0 can never convert to 9. And the sizes of []byte are different.
Have I written wrong code? Is there any way to make my Go program get the same output as the Java program?
Thanks!

First thing is that the byte type in Java is signed, it has a range of -128..127, while in Go byte is an alias of uint8 and has a range of 0..255. So if you want to compare the results, you have to shift negative Java values by 256 (add 256).
Tip: To display a Java byte value in an unsigned fashion, use: byteValue & 0xff which converts it to int using the 8 bits of the byte as the lowest 8 bits in the int. Or better: display both results in hex form so you don't have to care about sign-ness...
Even if you do the shift, you will still see different results. That might be due to different default compression level in the different languages. Note that although the default compression level is 6 in both Java and Go, this is not specified and different implementations are allowed to choose different values, and it might also change in future releases.
And even if the compression level would be the same, you might still encounter differences because gzip is based on LZ77 and Huffman coding which uses a tree built on frequency (probability) to decide the output codes and if different input characters or bit patterns have the same frequency, assigned codes might vary between them, and moreover multiple output bit patterns might have the same length and therefore a different one might be chosen.
If you want the same output, the only way would be (see notes below!) to use the 0 compression level (not to compress at all). In Go use the compression level gzip.NoCompression and in Java use the Deflater.NO_COPMRESSION.
Java:
GZIPOutputStream gzip = new GZIPOutputStream(localByteArrayOutputStream) {
{
def.setLevel(Deflater.NO_COMPRESSION);
}
};
Go:
gz, err := gzip.NewWriterLevel(gzSizeBf, gzip.NoCompression)
But I wouldn't worry about the different outputs. Gzip is a standard, even if outputs are not the same, you will still be able to decompress the output with any gzip decoders whichever was used to compress the data, and the decoded data will be exactly the same.
Here are the simplified, extended versions:
Not that it matters, but your codes are unneccessarily complex. You could simplify them like this (these versions also include setting 0 compression level and converting negative Java byte values):
Java version:
ByteArrayOutputStream buf = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(buf) {
{ def.setLevel(Deflater.NO_COMPRESSION); }
};
gz.write("helloworld".getBytes("UTF-8"));
gz.close();
for (byte b : buf.toByteArray())
System.out.print((b & 0xff) + " ");
Go version:
var buf bytes.Buffer
gz, _ := gzip.NewWriterLevel(&buf, gzip.NoCompression)
gz.Write([]byte("helloworld"))
gz.Close()
fmt.Println(buf.Bytes())
NOTES:
The gzip format allows some extra fields (headers) to be included in the output.
In Go these are represented by the gzip.Header type:
type Header struct {
Comment string // comment
Extra []byte // "extra data"
ModTime time.Time // modification time
Name string // file name
OS byte // operating system type
}
And it is accessible via the Writer.Header struct field. Go sets and inserts them, while Java does not (leaves header fields zero). So even if you set compression level to 0 in both languages, the output will not be the same (but the "compressed" data will match in both outputs).
Unfortunately the standard Java does not provide a way/interface to set/add these fields, and Go does not make it optional to fill the Header fields in the output, so you will not be able to generate exact outputs.
An option would be to use a 3rd party GZip library for Java which supports setting these fields. Apache Commons Compress is such an example, it contains a GzipCompressorOutputStream class which has a constructor which allows a GzipParameters instance to be passed. This GzipParameters is the equvivalent of the gzip.Header structure. Only using this would you be able to generate exact output.
But as mentioned, generating exact output has no real-life value.

From RFC 1952, the GZip file header is structured as:
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
Looking at the output you've provided, we have:
| Java | Go
ID1 | 31 | 31
ID2 | 139 | 139
CM (compression method) | 8 | 8
FLG (flags) | 0 | 0
MTIME (modification time) | 0 0 0 0 | 0 9 110 136
XFL (extra flags) | 0 | 0
OS (operating system) | 0 | 255
So we can see that Go is setting the modification time field of the header, and setting the operating system to 255 (unknown) rather than 0 (FAT file system). In other respects they indicate that the file is compressed in the same way.
In general these sorts of differences are harmless. If you want to determine if two compressed files are the same, then you should really compare the decompressed versions of the files though.

pdftk Error: Failed to open PDF file:

I am using pdftk library to extract the form fields from the pdf .Everything is just running fine except the one issue that i got a pdf file pdf file link. which causes the error is given bellow
Error: Failed to open PDF file:
http://www.uscis.gov/sites/default/files/files/form/i-9.pdf
Done. Input errors, so no output created.
command for this is
root#ri8-MS-7788:/home/ri-8# pdftk http://192.168.1.43/form/i-9.pdf dump_data_fields
the same command is working for all other forms .
Attempt1
I have tried to encrypt the pdf to unsafe version but it produce the same error . here is the command
pdftk http://192.168.1.43/forms/i-9.pdf input_pw foopass output /var/www/forms/un-i-9.pdf
Update
this is my full function to handle this
public function Formanalysis($pdfname)
{
$pdffile=Yii::app()->getBaseUrl(true).'/uploads/forms/'.$pdfname;
exec("pdftk ".$pdffile." dump_data_fields 2>&1", $output,$retval);
//got an error for some pdf if these are secure
if(strpos($output[0],'Error') !== false)
{
$unsafepdf=Yii::getPathOfAlias('webroot').'/uploads/forms/un-'.$pdfname;
//echo "pdftk ".$pdffile." input_pw foopass output ".$unsafepdf;
exec("pdftk ".$pdffile." input_pw foopass output ".$unsafepdf);
exec("pdftk ".$unsafepdf." dump_data_fields 2>&1", $outputunsafe,$retval);
return $outputunsafe ;
//$response=array('0'=>'error','error'=>$output[0]);
//return $response;
}
//if (strpos($output[0],'Error') !== false){ echo "error to run" ; } // this is the option to handle error
return $output;
}

PdfTk is a tool that was created by compiling an obsolete version of iText to an executable using the GNU Compiler for Java (GCJ) (PdfTk is not endorsed by iText Group NV).
I have examined your PDF and it uses two technologies that weren't supported by iText at the time PdfTk was created: XFA and compressed cross-reference tables.
The latter is what causes your problem. PdfTk expects your file to end like this:
xref
0 7
0000000000 65535 f
0000000258 00000 n
0000000015 00000 n
0000000346 00000 n
0000000146 00000 n
0000000397 00000 n
0000000442 00000 n
trailer
<</ID [<c8bf0ac531b0fc7b5b9ec5daf0296834><ec4dde54d00305ebbec62f3f6bbca974>]/Root 5 0 R/Size 7/Info 6 0 R>>
%iText-5.4.3
startxref
595
%%EOF
In this snippet startxref marks the byte offset of xref which is where the cross-reference table starts. This table contains the byte-offsets of all the objects in the PDF.
When you look at the PDF you refer to, you see that it ends like this:
64 0 obj
<</DecodeParms<</Columns 5/Predictor 12>>/Encrypt 972 0 R/Filter/FlateDecode/ID[<85C47EA3EFE49E4CB0F087350055FDDC><C3F1748360D0464FBA02D711DE864630>]/Info 970 0 R/Length 283/Root 973 0 R/Size 971/Type/XRef/W[1 3 1]>>stream
hÞìÒ±JQÐ™·»7J¢©ÕØ(Xþ„ù »h%¤É¤¶”€mZ+;ÁN,,ÁÆ6 XÁ&‚("î½YŒI‘Bî‡áÎ¼]ö1Áð÷³cfþ‹ûÐÚLî`zÂ„Ýôœùw÷N×X?Ã™kNv`hÁÒj¦G[œiÀå»›œ?b½Än…ÉëàÍþ gY—i7WW‡òj®îÍ°u¸Ò‡Ñ:óÆÛ™ñÎë&'×Ýˆ§ü†ù!ÿñ€ù%,\ácçÙ9˜ì±Þ€S¼Ãd—‰Áy~×.ø¶Åìþßn_˜$9Ôüw£X9#åxzçgRüüóÙwÝ¡œÄNJ©½’Ú+©½’R{%µWR{%ÿ·á”;`_ z6Ø
endstream
endobj
startxref
116
%%EOF
In this case, startxref still refers to where the first cross-reference table starts (it's a linearized PDF), but the cross reference table is stored inside an object, and that object is compressed (see the gibberish between the stream and endstream keywords).
Compressed cross-reference tables and compressed objects were introduced in PDF 1.5 (2003), but they aren't supported by PdfTk. You'll have to find a tool that can deal with such streams (e.g. a recent version of iText, which is the real stuff when compared to PdfTk), or you have to save your PDF as a PDF 1.4 before you treat it with PdfTk (but you'll lose the XFA, because XFA was also introduced in PDF 1.5).
Update:
Since you are asking about form fields, I'm adding the following attachment:
This screenshot was taken using iText RUPS (which proves that iText can open the document). To the right, you see that the same form is defined twice:
If you would walk down the tree under Fields, you'd find all the fields that are stored in the PDF using AcroForm technology. To the left, you can see the description of such a field:
If you look under XFA, you notice that the same form is also defined using the XML Forms Architecture. If you click on datasets, you see the XML description of the dataset in the lower panel:
All of this information can be accessed programmatically using iText (Java) or iTextSharp (C#). PdfTk is merely a tool based on a very old version of this technology.

this may be a little trick solution but should work for you . as #bruno said that this is encrypted file . You should decrypt this before you use for the pdftk . For this i found a way to decrypt that is qpdf a free opem source library to decrypt the pdf, remove the owner and user passwords etc and many more. You can find this here Qpdf. install it on your system . and run this command
qpdf --decrypt input.pdf output.pdf
then use the output file in the pdftk command . it should work .

Randomly appearing gzip headers

I have a long running script in a shared hosting environment that outputs a bunch of XML
Sometimes (only sometimes) a random GZIP header will appear in my output, and the output will be terminated.
For instance
0000000: 3c44 4553 435f 4c4f 4e47 3e3c 215b 4344 <DESC_LONG><![CD
0000010: 4154 415b 1fc2 8b08 0000 0000 0000 03c3 ATA[............
0000020: b3c3 8b57 c388 c38c 2b28 2d51 48c3 8bc3 ...W....+(-QH...
0000030: 8c49 5528 2e48 4dc3 8e4c c38b 4c4d c391 .IU(.HM..L..LM..
0000040: c3a3 0200 c291 4464 c383 1900 0000 0d0a ......Dd........
or
0000000: 3c2f 5052 4f44 5543 543e 0d0a 1fc2 8b08 </PRODUCT>......
0000010: 0000 0000 0000 03c3 b3c3 8b57 c388 c38c ...........W....
0000020: 2b28 2d51 48c3 8bc3 8c49 5528 2e48 4dc3 +(-QH....IU(.HM.
0000030: 8e4c c38b 4c4d c391 c3a3 0200 c291 4464 .L..LM........Dd
0000040: c383 1900 0000 0d0a ........
or
0000000: 3c4d 4544 4941 5f55 524c 3e2f 696d 6167 <MEDIA_URL>/imag
0000010: 6573 2f69 6d70 6f72 7465 642f 7374 6f63 es/imported/stoc
0000020: 6b5f 7072 6f64 3235 3339 365f 696d 6167 k_prod25396_imag
0000030: 655f 3531 3737 3439 3436 302e 6a70 673c e_517749460.jpg<
0000040: 2f4d 4544 4941 5f55 1fc2 8b08 0000 0000 /MEDIA_U........
0000050: 0000 03c3 b3c3 8b57 c388 c38c 2b28 2d51 .......W....+(-Q
0000060: 48c3 8bc3 8c49 5528 2e48 4dc3 8e4c c38b H....IU(.HM..L..
0000070: 4c4d c391 c3a3 0200 c291 4464 c383 1900 LM........Dd....
0000080: 0000 0d0a ....
The switch to GZIP does not seem to hit at any particular time og byte count, it can be after 1MB of data or after 15MB
The compiled blade template at the corresponding lines are as follows
<DESC_LONG><![CDATA[<?php echo $product->display_name; ?>]]></DESC_LONG>
-
</PRICES>
</PRODUCT>
<?php foreach($product->models()->get() as $model): ?>
-
<MEDIA_URL>/images/imported/<?php echo $picture->local_name; ?></MEDIA_URL>
I am at my wits end, I have tried the following:
Disable gzip on the server.
Run while(ob_get_level()){ ob_end_clean(); } before running the script
In .htaccess i have tried SetEnv no-gzip 1, SetEnv no-gzip dont-vary and various permutations thereof.
When I visit other pages, no gzip encoding or headers appear, so I'm thinking this is something with the output size or output buffer.

Did you finally find out where these headers come from? I mean apache or php?
You can simulate xml generator scipt with something like:
echo file_get_contents('your_good_test.xml');
If you won't see any headers, I suggest to debug your xml generator. You can try to call header_remove(); before output.
If you see headers, you have to debug your web server. Try to disable gzip in apache by rewrite rule:
`RewriteRule . - [E=no-gzip:1]`
Whenever you have any proxy or balancer (nginx, squid, haproxy) you automaticly get one more firing line.

your gziping is not related to server output that returns your main xml body. Otherwise the whole xml would be compressed.
These methods return GZIP sometimes because the source where these take the items is set to support gzip and are not asked properly.
$product->display_name
$product->models()->get()
$picture->local_name
Look inside these.
- Check web calls for all places where headers are set.
- temporally disable compression for database connection if any.
Add CDATA tags for all places where binary data could be returned to avoid main xml body building termination. Wait for an xml with bin data, Save bin data, unzip it and look what is inside. :-)

This is more of a set of comments, but it is too long for the comment box.
First, this very likely NOT an output buffer issue. Even though <![CDATA[ and ]]> is not within PHP tags this doesn't mean that it doesn't pass through PHP's output buffer. To be clear, anything within a .php file will be placed in the PHP output buffer. The content within a .php file (including static content) is buffered outside of Apache and then passed back to Apache through this buffer when the script is finished. This means that your problem must lie within the code itself, which is a shot in the dark to solve without viewing the code.
My suggestions:
1) do a search within the script to find any instances of gz functions (gzcompress, gzdeflate, gzdecode, etc). I have seen scripts compress content if it was greater than a specific size and then decompress the content on the fly when taken from the DB. If that is the case you are likely dealing with a faulty comparison operation. In short, the logic within compression and decompression conditions is slightly off so it is failing to decompress SOME of the content.
2) do a search within the script to see how this data is fetched. Is it all from a database? Does any of it come from a stream? Is any of it fetched remotely? These questions might not directly lead to an answer but are vital. It can safely be assumed that these variables are being set with data already compressed when it shouldn't be. It requires knowing where/why/how the compression is taking place in order to answer why it is not being decompressed.
3) It matters greatly that it is working as expected on one system but not the other. The only times I have seen this happen was always due to differences in configuration. What operating system was your local machine using? What's the difference in local database (if any), what extensions might be missing/present on one or the other, possibly causing a function to fall back on different procedure on the two different machines.
EDIT:
Also, and this is a small chance, but are you dealing with data that originated from an SQL dump from a different server? You said it works on your local host but not on a different host, so we know your dealing with two machines. Was there a third at some point? If so, it might have been compressed using a mismatched version/form of compression, or might be an issue with encoding.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.