php gzip match unix gzip - php

I am trying to use php to create a file and zip it. How can i match the same compression levels/headers/and so on as gzip that is run in unix?
using php
ls -l
total 8
-rw-rw-r-- 1 owner owner 486 Jul 21 17:05 file.xml.gz
using gzip on unix command line:
ls -l
total 8
-rw-rw-r-- 1 owner owner 479 Jul 21 17:05 file.xml.gz
in php
$zip = gzencode($xml,2);
i have tried 0 through 9 as the compression level here, i have also tried
$zip = gzencode($xml,x,FORCE_DEFLATE)
again where x is 0-9
my problem is this:
I have a 3rd party vendor that takes the gzipped file, unzips it and does fun things with it. The problem i am running into is when i use php i get an error "cannot parse file.xm.gz", when i use gzip on cli it works fine. I have no vision into what the 3rd party is doing or why its failing. Could it be something like carriage returns or spaces or something in the xml? I know its a tough question to answer. heres a snippet of my php xml.
$xml ='<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<localRoutes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
';
$xml.='<route>
<user type="string">' . $mac . '</user>
';
$xml.='<next type="regex">!^(.*$)!sip:#' . $ip . "</next>
</route>
";
$xml .= '</localRoutes>';

The compressed data is identical. What's missing is a field in the header indicating the original name of the file (e.g, probably file.xml here). This field is generated by the gzip utility, but the gzencode() PHP function doesn't have an original filename to work with, so it doesn't write this field.
I'm not aware of any way to make PHP generate this field with the zlib extension. Its absence is very unlikely to cause any problems, though.

You can't.
You don't need to.
First, gzip and zlib (which is what php is using) have different compression algorithms, and so for large enough data, they will never produce the same compressed data, even at the same compression level.
Second, as noted by #duskwuff, you will not be able to replicate the same gzip header, unless you pull off the gzip header that php made and write your own. The modification dates in the headers will be different. The way you're doing it, one will have a file name and one will not. Though you can invoke gzip with -n to not store the file name.
Third, there is no reason to try to make the results identical. All that matters is that both decompress to the same thing. Which they will.

Related

How to force file saving with ISO-8859-1 encoding instead of UTF-8

I have to able xml files downloading with ISO-8859-1 (I know, that UTF-8 is much better, but our partner has strict requirements to encoding and we cannot force him to change his policy).
Server background:
Google Chrome 71.0.3578.98 (Official Build) (64-bit)
Ubuntu 16.04
nginx
php 7.2
Symfony 4.0.15
Controller returns a response with proper charset:
return (new Response($xml->content(), Response::HTTP_CREATED, ['Content-Type' => $xml->contentType()]))
->setCharset($xml->charset());
It looks perfectly fine (at least in Chrome DevTools there is a correct response header):
But problem is that file stored in the file system with UTF-8 encoding.
$ file --mime test.xml
test.xml: application/xml; charset=utf-8
and XML file renders incorrect after opening it in the browser:
<INSIGMA>
<AktuarMed>
<Person>
<Name>Hans Müller</Name>
<Surname>Müller</Surname>
<Forename>Hans</Forename>
</Person>
</AktuarMed>
</INSIGMA>
Surname has to be Müller, but it is displaying wrong. If I change the encoding of this file to the expected one, then it displays it correct:
$ iconv -f UTF-8 test.xml -t ISO-8859-1 > test.xml
$ file --mime test.xml
test.xml: application/xml; charset=iso-8859-1
TL;DR: So the question is
Why this file stores with utf-8 encoding at all, if the server responds, that ISO-8859-1 charset should be used?
Do I need to send some extra headers to force downloading file with ISO-8859-1 charset? or
Does it default behaviour of the browser? or
Does it default behaviour of the operating system?
How to catch this problem and on which step should I find a solution?
You can try the following:
Create xml file that should act as your server response in gedit, in 'Save as' select proper character encoding (ISO-8859-1) at the bottom of the dialog.
Put the file into nginx and make at accessible (public) by some URL (like http://localhost/sample-response.xml or something else).
Access it with your browser
Ensure that it's correct on the client side (after saving with browser)
Record xml file downloading (accessing) request-response log (with wireshark or tcpflow or something else) as plain text.
Now access php application for the xml file generated by your php app and record its request-response log as in step #5
Compare request-response logs from step 5 & 6 with file comparison tool (meld, kdiff3 or something else)
After that you might see where the problem is.

PHP or Apache limiting Content-Length in HTML Header?

I have to distribute a huge file to some people (pictures of a prom) via my Apache2/PHP server which is giving me some headaches: Chrome and Firefox both show a filesize of 2GB but the file is actually >4GB, so I started to track things down.
I am doing the following thing in my php script:
header("Content-Length: ".filesize_large($fn));
header("Actual-File-Size: ".filesize_large($fn)); //Debug
readfile($fn);
filesize_large() is returning the correct filesize for >4gb files as a string (yes, even on 32-bit PHP).
Now the interesting part; the actual HTTP header:
Content-Length: 2147483647
Actual-File-Size: 4236525700
So the filesize_large() method is working totally fine, but PHP or Apache somehow limit the value of Content-Length?! Why that?
Apache/2.2.22 x86, PHP 5.3.10 x86, I am using SSL over https
Just so you guys believe me when I say filesize_large() is correct:
function filesize_large($filename)
{
return trim(shell_exec('stat -c %s '.escapeshellarg($filename)));
}
Edit:
Seems like PHP casts the content length to an integer when communicating with apache2 over the sapi interface on 32-bit systems. No workaround sadly except not including the Content-Size in case of files >2GB
Workaround (and actually a far better solution in the first place): Use mod_xsendfile
You have to use 64-bit operation system in order to support long integers for Content-length header.
I would recommend to use Vagrant for development.
Header based on strings, but content length based on int. If take a look here https://books.google.com/books?id=HTo_AmTpQPMC&pg=PA130&lpg=PA130&dq=ap_set_content_length%28r,+r-%3Efinfo.size%29;&source=bl&ots=uNqmcTbKYy&sig=-Wth33sukeEiSnUUwVJPtyHSpXU&hl=en&sa=X&ei=GP0SVdSlFM_jsATWvoGwBQ&ved=0CDEQ6AEwAw#v=onepage&q=ap_set_content_length%28r%2C%20r-%3Efinfo.size%29%3B&f=false
you will see example of ap_set_content_length(); function which was used to serve content-length response. It accepts file length from system function. Try to call php filesize() and you'll probably see the same result.
If you take a look into ap_set_content_length declaration http://ci.apache.org/projects/httpd/trunk/doxygen/group__APACHE__CORE__PROTO.html#ga7ab393c56cf073ce7aadc3b7ca3db7b2
you will see that length declared as apr_off_t.
And here http://svn.haxx.se/dev/archive-2004-01/0871.shtml you can read, that this type depends from compiler options which is 32bit in your case.
I would recommend you to read source code of Apache and PHP projects.

PHP - file_put_contents() doesn't preserve filenames in output

I would like to compress a .csv file on my server and put it into .gz (gzip) file using PHP.
I used `file_put_contents() like below:
$input = "test.csv";
$output = $input.".gz";
file_put_contents("compress.zlib://$output", file_get_contents($input));
However, when I open the gzip file (using winrar / 7zip), file extension is missing in the .gz archive; it's just "test" (without the file extension)?
It's not showing "test.csv" as I wanted. How to fix it?
There is no information on any "filename" inside that compressed file. You're simply compressing the raw binary data of the input file and are dumping it into an output file. The .gz file has no meta information on how many files are contained within it or what their names are. That's what the TAR file format is for, to provide that kind of meta information. You should make a tarball, then compress it using gz into a .tar.gz.
I'm not sure how to do this using PHP other than running a shell command through exec.
You may want to look at ZIP as an alternative with native PHP support.
Lets try this.,
$input = "test.txt";
exec("gzip ".$input);
It will work on linux server...
Im not exactly sure about what you're asking, but PHP already has a function for gzip compression, gzencode.
Use it like this
<?php
$data = implode("", file("bigfile.txt"));
$gzdata = gzencode($data, 9);
$fp = fopen("bigfile.txt.gz", "w");
fwrite($fp, $gzdata);
fclose($fp);
?>
Your example works properly on PHP 5.3.10 at least:
-rw-rw-r-- 1 mats mats 8 Jul 17 13:05 test.csv
-rw-rw-r-- 1 mats mats 31 Jul 17 13:05 test.csv.gz
You're not hiding file extensions for known file types in your explorer / navigator?
This worked for me:
$input = "test.csv";
$output = $input.".xml.gz";
I know it's ugly, but when using gzopen and gzwrite, I've found this to be the only way to preserve xml extension inside the archive. This way, when I extract it, I get the xml file.
Later on, once file is created, you can rename it in order to remove [.xml] thing before extension [.gz]

gzcompress won't produce a valid zipped file?

Consider this:
$text = "hello";
$text_compressed = gzcompress($text, 6);
$success = file_put_contents('file.gz', $text_compressed);
When i try to open file.gz, i get errors. How can i open file.gz under terminal without calling php? (using gzuncompress works just fine!)
I can't recode every file i did, since that i now have almost a Billion files encoded this way! So if there is a solution... :)
You need to use gzencode() instead.
Luckily for you, the fix is easy: just write a script that opens each of your files one by one, uses gzuncompress() to uncompress that file, and then writes that file back out with gzencode() instead of gzcompress(), repeating the process for all of the files.
Alternatively (since you said you "didn't want to recode your files"), you could use uncompress to open the existing files from the command line (instead of gunzip/zcat).
As noted on the gzcompress() manual page:
gzcompress
This is not the same as gzip compression, which includes some header data. See gzencode() for gzip compression.
As said, you don't really have gzipped files. To open your files from a terminal you need the uncompress utility.

PHP gzcompress vs gzopen/gzwrite

I'm writing a PHP script that generates gzipped files. The approach I've been using is to build up a string in PHP and gzcompress() the string before writing it out to a file at the end of the script.
Now I'm testing my script with larger files and running into memory allocation errors. It seems that the result string is becoming too large to hold in memory at one time.
To solve this I've tried to use gzopen() and gzwrite() to avoid allocating a large string in PHP. However, the gzipped file generated with gzwrite() is very different from when I use gzcompress(). I've experimented with different zip levels but it doesn't help. I've also tried using gzdeflate() and end up with the same results as gzwrite(), but still not similar to gzcompress(). It's not just the first two bytes (zlib header) that are different, it's the entire file.
What does gzcompress() do differently from these other gzip functions in PHP? Is there a way I can emulate the results of gzcompress() while incrementally producing the result?
The primary difference is that the gzwrite function initiates zlib with the SYNC_FLUSH option, which will pad the output to a 4 byte boundary (or is it 2), and then a little extra (0x00 0x00 0xff 0xff 0x03).
If you are using these to create Zip files, beware that the default Mac Archive utility does NOT accept this format.
From what I can tell, SYNC_FLUSH is a gzip option, and is not allowed in the PKZip/Info-ZIP format, all .zip files and their derivatives come from.
If you deflate a small file/text, resulting in a single deflate block, and compare it to the same text written with gzwrite, you'll see 2 differences, one of the bytes in the header of the deflate block is different by 1, and the end is padded with the above bytes. If the result is larger than one deflate block, the differences start piling up. It is hard to fix this, as the deflate stream block headers aren't even byte aligned. There is a reason everybody uses the zlib. Few people are brave enough to even attempt to rewrite that format!
I am not 100% certain, but my guess is that gzcompress uses GZIP format, and gzopen/gzwrite use ZLIB. Honestly, I can't tell you what the difference between the two is, but I do know that GZIP uses ZLIB for the actual compression.
It is possible that none of that will matter though. Try creating a gzip file with gzopen/gzwrite and then decompress it using the command-line gzip program. If it works, then using gzopen/gzwrite will work for you.
I ran into a similar problem once - basically there wasn't enough ram allocated to php to do the business.
I ended up saving the string as a text file, then using exec() to gzip the file using the filesystem. Its not an ideal solution but it worked for my situation.
try increasing the memory_limit parameter in your php.ini file
Both gzcompress() and gzopen() use the DEFLATE method for compressing blocks.
But they have different header/trailer.

Categories