I've to parse a lot (10000+) of remote gzipped files. Each zipped file should contain a CSV inside it (maybe in a folder). Right now I'm able to get the body, check for content type and uncompress it, obtaining application/octet-stream.
Question is: what's the octet-stream and how can I check for files or folders inside it?
/** #var $guzzle \Guzzle\Http\Client */
$guzzle = $this->getContainer()->get('guzzle');
$request = $guzzle->get($url);
try {
$body = $request->send()->getBody();
// Check for body content-type
if('application/z-gzip' === $body->getContentType()) {
$body->uncompress();
$body->getContentType(); // application/octet-stream
}
else {
// Log and skip current remote file
}
}
catch(\Exception $e) {
$output->writeln("Failed: {$guzzle->getBaseUrl()}");
throw $e;
}
The EntityBody object that stores the body can only guess the content-type of local files. Use the content-length header of the response to get a more accurate value.
Something like this:
$response = $request->send();
$type = $response->getContentType();
Something like some shell command will work for u
shell_exec('gzip -d your_file.gz');
You can first unzip all your files in a particular directory and then can read each file or whatever computation you have to perform.
As a sidenote :
Take care where the command is run from (ot use a swith to tell "decompress to that directory")
You might want to take a look at escapeshellarg too ;-)
You should be able to use the built in gzuncompress function.
See http://php.net/manual/en/function.gzuncompress.php
Edit: Or other zlib functions depending on what data you are working with. http://php.net/manual/en/ref.zlib.php
Related
I am writing a scanner that will look for possibly hacked/malware files. One requirement is to check if a zip (or any compressed) file is password-protected using some PHP function.
I don't want to add any extra software requirements, so should work on multiple servers, using PHP 5.3+ . (Yes I know that 5.3 is old, but the process may need to run on older PHP installations.) If this detection is available in newer PHP versions, then I could have code that would run only on newer PHP version.
I can use the file_get_contents() function to read the file's contents into a string. How do I check that string for an indication that the zip file is password-protected? Note that I don't want to uncompress the file, just check it for password-protection.
Thanks.
This code appears to work, but might be improved.
The process seems to involve two steps:
use zip_open to open the file, returning a resource. No resource, zip couldn't be opened, so it might be passworded
use zip_read to read the files inside the zip. If fails, then might be passworded
In either of those two cases, return true, indicating probable password on the zip file.
// try to open a zip file; if it fails, probably password-protected
function check_zip_password($zip_file = '') {
/*
open/read a zip file
return true if passworded
*/
if (!$zip_file) { // file not specified
return false;
}
$zip = zip_open($zip_file); // open the file
if (is_resource($zip)) { // file opened OK
$zipfile = zip_read($zip); // try read of zip file contents
if (!$zipfile) { // couldn't read inside, so passworded
return true;
}
else
{ // file opened and read, so not passworded
return false;
}
} else { // couldn't open the file, might be passworded
return true;
}
return false; // file exists, but not password protected
}
Note that the code only determines that the files inside the zip can't be accessed, so they are probably password-protected. The code doesn't try to do any processing of files inside the zip.
I'm trying to save a file inside php: // output to send it as an answer (it's an excel).
The problem is that php does not find the directory, according to the documentation should be able to access it.
i add this validation to my code:
$folderName = 'php://output';
if(!is_dir($folderName)){
throw new FileNotFoundException($folderName . " directory not found.");
}
$objWriter->save($filePath);
and the exception has been throwed and return me:
"php://output directory not found.",
php://output is not a directory; it's an output stream. You use php://output to write stuff to the output buffer the same way echo or print does. For example, if you wanted to force the browser to display a PDF or an image straight away without saving it first, you would use php://output.
If you wanted to physically save the file in your filesystem then a proper path must be used.
I have a BASE64 string of a zip file that contains one single XML file.
Any ideas on how I could get the contents of the XML file without having to deal with files on the disk?
I would like very much to keep the whole process in the memory as the XML only has 1-5k.
It would be annoying to have to write the zip, extract the XML and then load it up and delete everything.
I had a similar problem, I ended up doing it manually.
https://www.pkware.com/documents/casestudies/APPNOTE.TXT
This extracts a single file (just the first one), no error/crc checks, assumes deflate was used.
// zip in a string
$data = file_get_contents('test.zip');
// magic
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$raw = gzinflate(substr($data,30+$head['namelen']+$head['exlen'],$head['csize']));
// first file uncompressed and ready to use
file_put_contents($filename,$raw);
After some hours of research I think it's surprisingly not possible do handle a zip without a temporary file:
The first try with php://memory will not work, beacuse it's a stream that cannot be read by functions like file_get_contents() or ZipArchive::open(). In the comments is a link to the php-bugtracker for the lack of documentation of this problem.
There is a stream support ZipArchive with ::getStream() but as stated in the manual, it only supports reading operation on an opened file. So you cannot build a archive on-the-fly with that.
The zip:// wrapper is also read-only: Create ZIP file with fopen() wrapper
I also did some attempts with the other php wrappers/protocolls like
file_get_contents("zip://data://text/plain;base64,{$base64_string}#test.txt")
$zip->open("php://filter/read=convert.base64-decode/resource={$base64_string}")
$zip->open("php://filter/read=/resource=php://memory")
but for me they don't work at all, even if there are examples like that in the manual. So you have to swallow the pill and create a temporary file.
Original Answer:
This is just the way of temporary storing. I hope you manage the zip handling and parsing of xml on your own.
Use the php php://memory (doc) wrapper. Be aware, that this is only usefull for small files, because its stored in the memory - obviously. Otherwise use php://temp instead.
<?php
// the decoded content of your zip file
$text = 'base64 _decoded_ zip content';
// this will empty the memory and appen your zip content
$written = file_put_contents('php://memory', $text);
// bytes written to memory
var_dump($written);
// new instance of the ZipArchive
$zip = new ZipArchive;
// success of the archive reading
var_dump(true === $zip->open('php://memory'));
toster-cx had it right,you should award him the points, this is an example where the zip comes from a soap response as a byte array (binary), the content is an XML file:
$objResponse = $objClient->__soapCall("sendBill",array(parameters));
$fileData=unzipByteArray($objResponse->applicationResponse);
header("Content-type: text/xml");
echo $fileData;
function unzipByteArray($data){
/*this firts is a directory*/
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$if=30+$head['namelen']+$head['exlen']+$head['csize'];
/*this second is the actua file*/
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,$if,30));
$raw = gzinflate(substr($data,$if+$head['namelen']+$head['exlen']+30,$head['csize']));
/*you can create a loop and continue decompressing more files if the were*/
return $raw;
}
If you know the file name inside the .zip, just do this:
<?php
$xml = file_get_contents('zip://./your-zip.zip#your-file.xml');
If you have a plain string, just do this:
<?php
$xml = file_get_contents('compress.zlib://data://text/plain;base64,'.$base64_encoded_string);
[edit] Documentation is there: http://www.php.net/manual/en/wrappers.php
From the comments: if you don't have a base64 encoded string, you need to urlencode() it before using the data:// wrapper.
<?php
$xml = file_get_contents('compress.zlib://data://text/plain,'.urlencode($text));
[edit 2] Even if you already found a solution with a file, there's a solution (to test) I didn't see in your answer:
<?php
$zip = new ZipArchive;
$zip->open('data::text/plain,'.urlencode($base64_decoded_string));
$zip2 = new ZipArchive;
$zip2->open('data::text/plain;base64,'.urlencode($base64_string));
If you are running on Linux and have administration of the system. You could mount a small ramdisk using tmpfs, the standard file_get / put and ZipArchive functions will then work, except it does not write to disk, it writes to memory.
To have it permanently ready, the fstab is something like:
/media/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=2M 0 0
Set your size and location accordingly so it suits you.
Using php to mount a ramdisk and remove it after using it (if it even has the privileges) is probably less efficient than just writing to disk, unless you have a massive number of files to process in one go.
Although this is not a pure php solution, nor is it portable.
You will still need to remove the "files" after use, or have the OS clean up old files.
They will of coarse not persist over reboots or remounts of the ramdisk.
if you want to read the content of a file from zip like and xml inside you shoud look at this i use it to count words from docx (wich is a zip )
if (!function_exists('docx_word_count')) {
function docx_word_count($filename)
{
$zip = new ZipArchive();
if ($zip->open($filename) === true) {
if (($index = $zip->locateName('docProps/app.xml')) !== false) {
$data = $zip->getFromIndex($index);
$zip->close();
$xml = new SimpleXMLElement($data);
return $xml->Words;
}
$zip->close();
}
return 0;
}
}
The idea comes from toster-cx is pretty useful to approach malformed zip files too!
I had one with missing data in the header, so I had to extract the central directory file header by using his method:
$CDFHoffset = strpos( $zipFile, "\x50\x4b\x01\x02" );
$CDFH = unpack( "Vsig/vverby/vverex/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr( $zipFile, $CDFHoffset, 46 ) );
I have a simple code written (based on some tutorials found around the internet) to parse and display an XML file. However, I only know how to reference an XML file stored on my server and I would like to be able to use an XML file that is being returned to me from a POST.
Right now my code looks like this:
if( ! $xml = simplexml_load_file('test.xml') )
{
echo 'unable to load XML file';
}
else
{
foreach( $xml as $event)
{
echo 'Title: ';
echo "$event->title<br />";
echo 'Description: '.$event->info.'<br />';
echo '<br />';
}
}
Is there some way I can replace the simpleXML_load_file function with one that will allow me to point to the POST URL that returns the XML file?
Use simplexml_load_string instead of loadfile:
simplexml_load_string($_POST['a']);
If you get the url to the file in the POST you can propably use the simplexml_load_file function with the url, but if that doesn't work you can use the file_get_contents in combination with the simplexml_load_string:
//say $_POST['a'] == 'http://example.com/test.xml';
simplexml_load_file($_POST['a']); // <-- propably works
simplexml_load_string(file_get_contents($_POST['a'])); //<-- defenitly works (propaly what happens internally)
also getting contents of external files could be prohibited by running PHP in safe mode.
If you are receiving a file that's been uploaded by the user, you can find it (the file) looking at the content of the $_FILES superglobal variable -- and you can read more about files uploads here (for instance, don't forget to call move_uploaded_file if you don't want the file to be deleted at the end of the request).
Then, you can work with this file the same way you already do with not-uploaded files.
If you are receiving an XML string, you can use simplexml_load_string on it.
And if you are only receiving the URL to a remote XML content, you have to :
download the file to your server
and, then, parse its content.
This can be done using simplexml_load_file, passing the URL as a parameter, if your server is properly configured (i.e. if allow_url_fopen is enabled).
Else, the download will have to be done using curl -- see curl_exec for a very basic example, and curl_setopt for the options you can use (you'll especially want to use CURLOPT_RETURNTRANSFER, to get the XML data as a string you can pass to simplexml_load_string).
From http://www.developershome.com/wap/wapUpload/wap_upload.asp?page=php4:
If you do not want to save the
uploaded file directly but to process
it, the PHP functions
file_get_contents() and fread() can
help you. The file_get_contents()
function returns a string that
contains all data of the uploaded
file:
if (is_uploaded_file($_FILES['myFile']['tmp_name']))
$fileData = file_get_contents($_FILES['myFile']['tmp_name']);
That will give you a handle on the raw text within that file. From there you will need to parse through the XML. Hope that helps!
Check out simplexml_load_string. You can then use cURL to do the post and fetch the result. An example:
<?php
$xml = simplexml_load_string($string_fetched_with_curl);
?>
I'm trying to write a script that will create a file on the server then use header() to redirect the user to that file. Then, after about 10 seconds I want to delete the file. I've tried this:
header('Location: '.$url);
flush();
sleep(10);
unlink($url);
But the browser just waits for the script to complete then gets redirected, but the file hes been deleted by that time. Is there someway to tell the browser "end of file", then keep computing? Or maybe have PHP start another script, but not wait for that script to finish?
You might be better off having the PHP page serve the file. No need to create a temporary file in this case and delete it, just send out the data you intended to write to the temporary file. You will need to set the headers correctly so the browser can identify the type of file you are sending. i.e. Content-Type: text/xml; for xml or image/jpeg for jpg's.
This method also handles slow clients that take longer to download the file.
The only way I've discovered to do this so far is to provide the content length in the header. Try adding this:
header("Content-Length: 0");
before your flush();
http://us2.php.net/ignore_user_abort
Be very careful using this, you can pretty quickly kill a server by abusing it.
Alternatively.... instead of messing with dynamically generating files on the fly... why not make a handler like so:
tempFile.php?key={md5 hash}
tempFile.php then either queries a DB, memcache ( with additional prepended key ), or apc for the content.
You can try doing smth like that:
<iframe src="<?=$url?>"></iframe>
....
<?
sleep(10);
unlink($url);
?>
Other option is to use curl - then you load file in request and display to the user.
Question - do you want to delete the file that user cannot have it - I'm afraid it's impossible, when user loads file it is temporaly in his browser - so he can save it.
Next option - if you know type of this file, you can generate content/type header so user will download the file. And then you delete it.
It's just simple ideas, don't know which will work for you( if any:) )
If you want to implement your original design, read this question about running a command in PHP that is "fire and forget" Asynchronous shell exec in PHP
As seen at \Symfony\Component\HttpFoundation\Response::send
/**
* Sends HTTP headers and content.
*
* #return Response
*
* #api
*/
public function send()
{
$this->sendHeaders();
$this->sendContent();
if (function_exists('fastcgi_finish_request')) {
fastcgi_finish_request();
} elseif ('cli' !== PHP_SAPI) {
// ob_get_level() never returns 0 on some Windows configurations, so if
// the level is the same two times in a row, the loop should be stopped.
$previous = null;
$obStatus = ob_get_status(1);
while (($level = ob_get_level()) > 0 && $level !== $previous) {
$previous = $level;
if ($obStatus[$level - 1]) {
if (version_compare(PHP_VERSION, '5.4', '>=')) {
if (isset($obStatus[$level - 1]['flags']) && ($obStatus[$level - 1]['flags'] & PHP_OUTPUT_HANDLER_REMOVABLE)) {
ob_end_flush();
}
} else {
if (isset($obStatus[$level - 1]['del']) && $obStatus[$level - 1]['del']) {
ob_end_flush();
}
}
}
}
flush();
}
return $this;
}
You're going about this the wrong way. You can create the file and serve it to them, and delete it in one step.
<?php
$file_contents = 'these are the contents of your file';
$random_filename = md5(time()+rand(0,10000)).'.txt';
$public_directory = '/www';
$the_file = $public_directory.'/'.$random_filename;
file_put_contents($the_file, $file_contents);
echo file_get_contents($the_file);
unlink($the_file);
?>
If you do it that way, the files get deleted immediately after the user sees them. Of course, this means that the file need not exist in the first place. So you could shorten the code to this:
<?php
$file_contents = 'these are the contents of your file';
echo $file_contents;
?>
It all depends on where you're getting the content you want to show them. If it's from a file, try:
<?php
$file_contents = file_get_contents($filename_or_url);
echo $file_contents;
?>
As for deleting files automatically, just setup a cron job that runs every 10 seconds, and deletes all the files in your temp folder that where filemtime($filename) is greater than 5 minutes' worth of seconds.