Extract a file from a ZIP string - php

I have a BASE64 string of a zip file that contains one single XML file.
Any ideas on how I could get the contents of the XML file without having to deal with files on the disk?
I would like very much to keep the whole process in the memory as the XML only has 1-5k.
It would be annoying to have to write the zip, extract the XML and then load it up and delete everything.

I had a similar problem, I ended up doing it manually.
https://www.pkware.com/documents/casestudies/APPNOTE.TXT
This extracts a single file (just the first one), no error/crc checks, assumes deflate was used.
// zip in a string
$data = file_get_contents('test.zip');
// magic
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$raw = gzinflate(substr($data,30+$head['namelen']+$head['exlen'],$head['csize']));
// first file uncompressed and ready to use
file_put_contents($filename,$raw);

After some hours of research I think it's surprisingly not possible do handle a zip without a temporary file:
The first try with php://memory will not work, beacuse it's a stream that cannot be read by functions like file_get_contents() or ZipArchive::open(). In the comments is a link to the php-bugtracker for the lack of documentation of this problem.
There is a stream support ZipArchive with ::getStream() but as stated in the manual, it only supports reading operation on an opened file. So you cannot build a archive on-the-fly with that.
The zip:// wrapper is also read-only: Create ZIP file with fopen() wrapper
I also did some attempts with the other php wrappers/protocolls like
file_get_contents("zip://data://text/plain;base64,{$base64_string}#test.txt")
$zip->open("php://filter/read=convert.base64-decode/resource={$base64_string}")
$zip->open("php://filter/read=/resource=php://memory")
but for me they don't work at all, even if there are examples like that in the manual. So you have to swallow the pill and create a temporary file.
Original Answer:
This is just the way of temporary storing. I hope you manage the zip handling and parsing of xml on your own.
Use the php php://memory (doc) wrapper. Be aware, that this is only usefull for small files, because its stored in the memory - obviously. Otherwise use php://temp instead.
<?php
// the decoded content of your zip file
$text = 'base64 _decoded_ zip content';
// this will empty the memory and appen your zip content
$written = file_put_contents('php://memory', $text);
// bytes written to memory
var_dump($written);
// new instance of the ZipArchive
$zip = new ZipArchive;
// success of the archive reading
var_dump(true === $zip->open('php://memory'));

toster-cx had it right,you should award him the points, this is an example where the zip comes from a soap response as a byte array (binary), the content is an XML file:
$objResponse = $objClient->__soapCall("sendBill",array(parameters));
$fileData=unzipByteArray($objResponse->applicationResponse);
header("Content-type: text/xml");
echo $fileData;
function unzipByteArray($data){
/*this firts is a directory*/
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$if=30+$head['namelen']+$head['exlen']+$head['csize'];
/*this second is the actua file*/
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,$if,30));
$raw = gzinflate(substr($data,$if+$head['namelen']+$head['exlen']+30,$head['csize']));
/*you can create a loop and continue decompressing more files if the were*/
return $raw;
}

If you know the file name inside the .zip, just do this:
<?php
$xml = file_get_contents('zip://./your-zip.zip#your-file.xml');
If you have a plain string, just do this:
<?php
$xml = file_get_contents('compress.zlib://data://text/plain;base64,'.$base64_encoded_string);
[edit] Documentation is there: http://www.php.net/manual/en/wrappers.php
From the comments: if you don't have a base64 encoded string, you need to urlencode() it before using the data:// wrapper.
<?php
$xml = file_get_contents('compress.zlib://data://text/plain,'.urlencode($text));
[edit 2] Even if you already found a solution with a file, there's a solution (to test) I didn't see in your answer:
<?php
$zip = new ZipArchive;
$zip->open('data::text/plain,'.urlencode($base64_decoded_string));
$zip2 = new ZipArchive;
$zip2->open('data::text/plain;base64,'.urlencode($base64_string));

If you are running on Linux and have administration of the system. You could mount a small ramdisk using tmpfs, the standard file_get / put and ZipArchive functions will then work, except it does not write to disk, it writes to memory.
To have it permanently ready, the fstab is something like:
/media/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=2M 0 0
Set your size and location accordingly so it suits you.
Using php to mount a ramdisk and remove it after using it (if it even has the privileges) is probably less efficient than just writing to disk, unless you have a massive number of files to process in one go.
Although this is not a pure php solution, nor is it portable.
You will still need to remove the "files" after use, or have the OS clean up old files.
They will of coarse not persist over reboots or remounts of the ramdisk.

if you want to read the content of a file from zip like and xml inside you shoud look at this i use it to count words from docx (wich is a zip )
if (!function_exists('docx_word_count')) {
function docx_word_count($filename)
{
$zip = new ZipArchive();
if ($zip->open($filename) === true) {
if (($index = $zip->locateName('docProps/app.xml')) !== false) {
$data = $zip->getFromIndex($index);
$zip->close();
$xml = new SimpleXMLElement($data);
return $xml->Words;
}
$zip->close();
}
return 0;
}
}

The idea comes from toster-cx is pretty useful to approach malformed zip files too!
I had one with missing data in the header, so I had to extract the central directory file header by using his method:
$CDFHoffset = strpos( $zipFile, "\x50\x4b\x01\x02" );
$CDFH = unpack( "Vsig/vverby/vverex/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr( $zipFile, $CDFHoffset, 46 ) );

Related

Use ob_start to avoid creating zip files and get their content

I have this code that creates a ".zip" file and inside it a ".xml" file obtained from a string.
As seen in the example later I get your information and convert it to base64 and hash.
The code is functional.
What I want now is to use "ob_start()" so as not to have to create the ".zip" file, I don't know if someone could help me with a basic example, greetings...
<?php
$content = '<?xml version="1.0"?><Catalog><Book id="bk101"><Author>Garghentini, Davide</Author><Title>XML Developers Guide</Title><Genre>Computer</Genre><Price>44.95</Price><PublishDate>2000-10-01</PublishDate><Description>An in-depth look at creating applicationswith XML.</Description></Book><Book id="bk102"><Author>Garcia, Debra</Author><Title>Midnight Rain</Title><Genre>Fantasy</Genre><Price>5.95</Price><PublishDate>2000-12-16</PublishDate><Description>A former architect battles corporate zombies,an evil sorceress, and her own childhood to become queenof the world.</Description></Book></Catalog>';
$route = './temp/';
$name = 'facturaElectronicaCompraVenta.xml.zip';
$file = "{$route}{$name}";
// CREATE ZIP
$zp = gzopen($file,'w9');
gzwrite($zp,$content);
gzclose($zp);
// READ ZIP
$fp = fopen($file,'rb');
$binary = fread($fp,filesize($file));
$res = [
'archivo' => base64_encode($binary),
'hashArchivo' => hash('sha256',$binary),
];
print_r($res);
First of all, the output buffer (ob...) functions don't accomplish anything related to files, they only capture the script output (e.g., echo 'Hello, World!).
If you want to keep using gzopen(), perhaps you can just provide a stream wrapper pointing to anything that isn't a physical file (I haven't investigated that option) but it looks easier to just switch to gzencode().

PharData offsetExists on filename prefixed with ".\"

I have a .tar.gz file downloaded from an external API which we have to implement. It contains images for an object.
I'm not sure how they managed to compress it this way, but the files are basically prefixed with the "current directory". It looks like this in WinRAR:
And like this in 7-Zip, note the .tar first level, and "." second level:
-> ->
When calling
$file = 'archive.tar.gz';
$phar = new PharData($file, FilesystemIterator::CURRENT_AS_FILEINFO);
var_dump($phar->offsetGet('./12613_s_cfe3e73.jpg'));
I get the exception:
Cannot access phar file entry '/12613_s_cfe3e73.jpg' in archive '{...}/archive.tar.gz'
Calling a file which does not exist, e.g.:
var_dump($phar->offsetGet('non-existent.jpg'));
Or calling it without the directory seperator, e.g.:
var_dump($phar->offsetGet('12613_s_cfe3e73.jpg'));
I get a
Entry 12613_s_cfe3e73.jpg does not exist
Exception.
It is not possible to get the archive formatted differently. Does anyone have an idea how to solve this?
Ended up using Archive_Tar. There must be something wrong in the source code of PHP, though I don't think this is the "normal" way of packaging a .tar either.
Unfortunately I'm not very good at C, but it's probably in here (line 1214) or here.
This library seems to handle it just fine, using this example code:
$file = 'archive.tar.gz';
$zip = new Archive_Tar($file);
foreach ($zip->listContent() as $file) {
echo $file['filename'] . '<br>';
}
Result:
./12613_s_f3b483d.jpg
./12613_s_cfe3e73.jpg
./1265717_s_db141dc.jpg
./1265717_s_af5de56.jpg
./1265717_s_b783547.jpg
./1265717_s_35b11f9.jpg
./1265716_s_83ef572.jpg
./1265716_s_9ac2725.jpg
./1265716_s_c5af3e9.jpg
./1265716_s_c070da3.jpg
./1265715_s_4339e8a.jpg
Note the filenames are still prefixed with "./" just like they are in WinRAR.
If you want to stick to using PharData, i suggest a more conservative, two-step approach, where you first decompress the gz and then unarchive all files of the tar to a target folder.
// decompress gz archive to get "/path/to/my.tar" file
$gz = new PharData('/path/to/my.tar.gz');
$gz->decompress();
// unarchive all files from the tar to the target path
$tar = new PharData('/path/to/my.tar');
$tar->extractTo('/target/path');
But it looks like you want to select individual files from the tar.gz archive directly, right?
It should work using fopen() with a StreamReader (compress.zlib or phar) and selecting the individual file. Some examples:
$f = fopen("compress.zlib://http://some.website.org/my.gz/file/in/the/archive", "r");
$f = fopen('phar:///path/to/my.tar.gz//file/in/archive', 'r');
$filecontent = file_get_contents('phar:///some/my.tar.gz/some/file/in/the/archive');
Streaming should also work, when using Iterators:
$rdi = new RecursiveDirectoryIterator('phar:///path/to/my.tar.gz')
$rii = new RecursiveIteratorIterator($rdi, RecursiveIteratorIterator::CHILD_FIRST);
foreach ($rii as $splFileInfo){
echo file_get_contents($splFileInfo->getPathname());
}
The downside is that you have to buffer the stream and save it to file.
Its not a direct file extraction to a target folder.

Reading XML file in parts

I am trying to read a XML file from the URL, with the help of XMLReader Iterators https://gist.github.com/hakre/5147685
$reader = new XMLReader();
$reader->open($filename);
$element = new XMLReaderNode($reader);
$it = new XMLElementIterator($reader, 'coupon');
$data = array();
$i = 0;
foreach($it as $index => $element) {
if( $i == 0 ) {
$xml = $element->asSimpleXML();
//print_r($xml->children());
foreach( $xml as $k=>$v ) {
$data[0][strtolower("{$k}")] = "{$v}";
}
}// End IF
}
print_r($data);
Its working fine with the small file, but its taking long time to read xml file from url.
Can i first download the file from url then READ it?
Is it the right way that i am doing?
Is there any other alternative?
If I understand your question right, it just takes long to download the large file all the time.
But you can just cache the file locally, by first download the XML from an http-URI and then store it to disk.
This is very useful when you develop your software because otherwise doing the remote-request all the time to fetch the XML is a needless overhead and I assume the data is not that fresh that it changes for each of your parsing tests and you would require those changes in the XML.
I suggest to do something along the lines from the answer of Download File to server from URL:
$filename = "http://someurl/file.xml";
$cachefile = "file.xml";
if (!is_readable($cachefile)
{
file_put_contents($cachefile, fopen($filename, 'r'));
}
$reader = new XMLReader();
$reader->open($cachefile);
This little example will create the $cachefile in case it does not exists. In case it does exists, it will not download again.
So this will only take once longer to load that file. You can also first download the XML file if it is really large with a HTTP client that supports resume (partial transfers) like the wget or the curl command-line utilities because if in case something goes wrong with the transfer, you don't have to download the whole file again.
You then just operate on your local copy. You wouldn't need to change your code then at all, just $filename would point to the local file instead.

Parsing a Zipped (GZ) JSON file in PHP

With help from the guys on Stackoverflow I can now Parse JSON code from a file and save a 'Value' into a database
However the file I intend to read from is actually a massive 2GB file. My web server will not hold this file. However it will hold a ZIPPED version of it - ie 80MB.(ie .GZ)
I believe there is a way to PARSE JSON from a ZIPPED file (.GZ)..........Can anybody help?
I have found the below function which I believe will do this (I think) but I don't know how to link it to my code
private function uncompressFile($srcName, $dstName) {
$sfp = gzopen($srcName, "rb");
$fp = fopen($dstName, "w");
while ($string = gzread($sfp, 4096)) {
fwrite($fp, $string, strlen($string));
}
gzclose($sfp);
fclose($fp);
}
My current PHP code is below and works. It reads a basic small file, JSON decodes it (The JSON is in a series of separate lines hence the need for FILE_IGNORE_NEW_LINES) and then takes a value and saves to MySQL database.
However I believe I need to somehow combine these two bits of code so I can read a ZIPPED file without exceeding my 100MB storage on my webserver
$file="CIF_ALL_UPDATE_DAILY_toc-update-sun";
$trains = file($json_filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($trains as $train) {
$json=json_decode($train,true);
foreach ($json as $key => $value) {
$input=$value['main_train_uid'];
$q="INSERT INTO railstptest (main_train_uid) VALUES ('$input')";
$r=mysqli_query($mysql_link,$q);
}
}
}
if (is_null($json)) {
die("Json decoding failed with error: ". json_last_error());
}
mysqli_close($mysql_link);
Many Thanks
EDIT
Here is a short snippet of the JSON . There are a series of these
I would only want to be getting a few key values. For example the value G90491 and P20328. A lot of the info I would not need
{"JsonAssociationV1":{"transaction_type":"Delete","main_train_uid":"G90491","assoc_train_uid":"G90525","assoc_start_date":"2013-09-07T00:00:00Z","location":"EDINBUR","base_location_suffix":null,"diagram_type":"T","CIF_stp_indicator":"O"}}
{"JsonAssociationV1":{"transaction_type":"Delete","main_train_uid":"P20328","assoc_train_uid":"P21318","assoc_start_date":"2013-08-23T00:00:00Z","location":"MARYLBN","base_location_suffix":null,"diagram_type":"T","CIF_stp_indicator":"C"}}
It may be possible to do stream extraction of the file and then use a stream JSON parser. ZipArchive has getStream, and someone created a streaming JSON parser for PHP.
You will have to write a listener that inserts the database values as they are found and discards unnecessary JSON so it does not consume memory.
$zip = new ZipArchive;
$zip->open("file.zip");
$parser = new JsonStreamingParser_Parser($zip->getStream("file.json"),
new DB_Value_Inserter);
$parser->parse();
Based on your question, you're working with gzip instead of zip. To get the stream you can use
fopen("compress.zlib://path/to/file.json", "r");
It's difficult to write the DB_Value_Inserter since you haven't provided the format of the JSON you need, but it seems like you can probably just override the Listener::value method and just write the string values you receive.
PHP has compression wrappers that can help with opening and reading lines from compressed files. One is for reading gzip files:
$gzipFile = 'CIF_ALL_UPDATE_DAILY_toc-update-sun.gz';
$trains = new SplFileObject("compress.zlib://{$gzipFile}", 'r');
$trains->setFlags(SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD
| SplFileObject::SKIP_EMPTY);
Because SplFileObject is iterable, you can keep your outer foreach loop the way it is. Of course, fgets() remains an alternative to using SplFileObject.

String to Zipped Stream in php

I have a processing server with my database and a serving database to serve up files with a low bandwidth cost. On the processing server, php is not able to create files so everything must be done with streams and/or stay in memory before being sent over to another server for download. A few days ago I found out about the stream abstraction with 'php://memory' and that I can do something like
$fp=fopen('php://memory','w+');
fwrite($fp,"Hello world");
fseek($fp,0,SEEK_SET);
//make a ftp connection here with $conn_id
$upload = ftp_fput($conn_id,"targetpath/helloworld.txt",$fp,FTP_BINARY);
to make the file in memory and then allow me to ftp it over to my other server. This is exactly what I want, except I also want to zip the data before sending it -- preferably using only native parts of php like ziparchive and not additional custom classes for special stream manipulation. I know that I am very close with the following...
$zip = new ZipArchive();
if($zip->open('php://memory', ZIPARCHIVE::CREATE)) {
$zip->addFromString('testtext.txt','Hello World!');
$fp = $zip->getStream('test'); if(!$fp) print "no filepointer";
//make a ftp connection here with $conn_id
$upload = ftp_fput($conn_id,"targetpath/helloworld.zip",$fp,FTP_BINARY);
} else print "couldn't open a zip like that";
The point at which this fails is the call to getStream (which always returns false although I think I am using correctly). It appears that the zip is fine making the file in 'php://memory' but for some reason getStream still fails although perhaps I don't sufficiently understand how ZipArchive makes zips...
How can I go from the string to the zipped filepointer so that I can ftp the zip over to my other server? Remember I can't make any files or else I would just make the zip file then ftp it over.
EDIT: based on skinnynerd's suggestions below I tried the following
$zip = new ZipArchive();
if($zip->open('php://memory', ZIPARCHIVE::CREATE)) {
$zip->addFromString('testtext.txt','Hello World!');
$zip->close();
$fp = fopen('php://memory','r+');
fseek($fp,0,SEEK_SET);
//connect to ftp
$upload = ftp_fput($conn_id,"upload/transfer/helloworld.zip",$fp,FTP_BINARY);
}
This does make a zip and send it over but the zip is 0 bytes large so I don't think that 'php://memory' works the way I thought... it actually fails at the close step -- the $zip->close() returns false which makes me wonder if I can open zips into 'php://memory' at all. Does anyone know what I can try along these line to get the zip?
$zip->getStream('test') is getting a stream to extract the file 'test' from the archive. Since there's no file 'test' in the archive, this fails. This is not the function you want to use.
As you said, what you want to do is send the finished archive to the ftp server. In this case, you would want to close the zip archive, and then reopen php://memory as a normal file (using fopen) to send it.
I don't know, but you may also be able to use $zip as a resource directly, without having to close and reopen the file.
And I think you can try create a stream pipe directly from ftp server
<?php
$zip = new ZipArchive();
if($zip->open('ftp://user:password#ftp.host.com/upload/transfer/helloworld.zip', ZipArchive::CREATE))
{
$zip->addFromString('testtext.txt','Hello World!');
$zip->close();
}
else
print "couldn't open zip file on remote ftp host.";
Does it have to be a Zip archive? Since you're trying to save bandwith it could be a gzip too.
<?php
$ftp_credentials = "ftp://USER:PASSWORD#HOST/helloworld.gz";
$gz = gzencode("Hello World!", 9);
$options = array('ftp' => array('overwrite' => true));
$stream_context = stream_context_create($options);
file_put_contents($ftp_credentials, $gz, 0, $stream_context);
?>

Categories