PHP - check if file is archived without using handle resource

PHP - check if file is archived without using handle resource - php

Is there a way to check if content received from remote server is a .zip archive? I found many solutions here, but all of them are for local files; therefore, these solutions use file handle resources like fopen, fgets, etc.
Put simply, curl is used to get content from remote server. Depending on validation results on remote server (which I don't have access to), it returns a .zip archive OR error message in plain text. If .zip archive is returned, the content I receive from curl looks like:
PK
A‘OCBÇ—= =
and many more symbols here....
Then I simply use fwrite function to write received content into local file. Naturally, if no archive is returned, content looks like:
some random message here
This is all I have. So, is there any way to check if returned content is a .zip archive? Sure, I could use a solution like:
if (stristr($content, "some random message here")) {//not zip archive}
but this is lame...

After some more research, I found out zip archives should start with "PK", so I wrote such a code (in case someone has a similar problem in future):
if (substr($content, 0, 2) == 'PK' && strlen($content)>100000) {echo "this is zip";}
This code is not the greatest choice (because you need to trust remote server), but it's better than nothing. I also added strlen function to verify that returned content is "big enough" (over 100 KB) to be a zip archive.
If someone has a better solution which works without using file handle resource, I'll be happy to hear it.

If you are on a Linux or *BSD server you can do something like this:
$result = shell_exec('file cccr_logo.zip');
echo 'result:['.$result.']';
Which should spit out something like this:
result:[cccr_logo.zip: Zip archive data, at least v2.0 to extract ]
If you want to test against a standard $result, you can use the -b switch like so:
$result = shell_exec('file -b cccr_logo.zip');
echo 'result:['.$result.']';
And now a valid zip will be (no file name):
result:[Zip archive data, at least v2.0 to extract ]

Related

Get contents from gzipped file

I am trying to get the contents from a gzipped file that is returned to me after using Mailchimp API doing a batch operation request. I expect to get only a JSON string as response, but also receive a bunch of numbers and random (?) strings.
This is what I do.
$gz = gzopen($response->response_body_url, "r");
$contents = trim(gzread($gz, 10000));
print_r($contents); //see output below
gzclose($gz);
This is what is returned to me.
0000777000000000000000000000000012705141572007721 5ustar
rootroot./05fa27ceab.json0000666000000000000000000000121212705141572012327
0ustar
rootroot[{"status_code":400,"operation_id":null,"response":"{\"type\":\"http://developer.mailchimp.com/documentation/mailchimp/guides/error-glossary/\",\"title\":\"Member
Exists\",\"status\":400,\"detail\":\"xxxx.xxxx#xxxx.xx is
already a list member. Use PUT to insert or update list
members.\",\"instance\":\"\"}"},{"status_code":400,"operation_id":null,"response":"{\"type\":\"http://developer.mailchimp.com/documentation/mailchimp/guides/error-glossary/\",\"title\":\"Member
Exists\",\"status\":400,\"detail\":\"xxxx2.xxxx2#xxxx2.xx is
already a list member. Use PUT to insert or update list
members.\",\"instance\":\"\"}"}]
What am I missing here? Why won't it work?

It looks like you may be dealing with a .tar.gz file instead of just gzip. The easiest way to do that is either with the PharData extension or by just saving it to disk and using a shell tool to unzip.
Here's an answer to a question on how to deal with .tar.gz files in php

Php count number of pages on PDF file upon upload prior to saving file

I have a function that uploads a file into a web storage and prior to saving the file on the storage system if the file is a pdf file i would like to determine how many pages a pdf file has.
Currently i have the following:
$pdftext = file_get_contents($path);
$num = preg_match_all("/\/Page\W/", $pdftext, $dummy);
return $num;
Where $path is the temporary path that i use with fopen to open the document
This function works at times but is not reliable. I know theres also this function
exec('/usr/bin/pdfinfo '.$pdf_file.' | awk \'/Pages/ {print $2}\'', $output);
But this requires the file to donwloaded on the server. Any ideas or suggestions to accomplish this?

PHP is a server-side language, meaning all processing happens on your server. There's no way for PHP to determine details of a file on the client side, it has no knowledge of it neither the required access to it.
So the answer to your question as it is now is: It's not possible. But you probably have a goal in mind why you want to check this, sharing this goal might help to get more constructive answers/suggestions.

As Oldskool already explained this is not possible with PHP on the client side. You would have to upload the PDF file to the server and then determine the amount of pages. There are libraries and command line tools that could accomplish this.
In case you don't want to upload the PDF file to the server (which seems to be the case here) you could use the pdf.js library. Now the client is able to determine the amount of pages in a PDF document on its own.
PDFJS.getDocument(data).then(function (doc) {
var numPages = doc.numPages;
}
There are other libraries as well but I'm not certain about their browser support (http://www.electronmedia.in/wp/pdf-page-count-javascript/)
Now you just submit the amount of pages from javascript to your php file that needs this information. In order to achive this you simply use ajax. In case you don't know ajax, just google it there are enough examples out there.
As a side note; Always remember to not trust the client. The client is able to modify the page count and send a completely different one.

For those of you running linux servers this actually is possible. You need the pdfinfo extension installed and using the function
$pages = exec('/usr/bin/pdfinfo '.$pdf_file.' | awk \'/Pages/ {print $2}\'', $output);
outputs the correct page number where $pdf_file is the temporary path on the server upon upload.
The reason it wasnt working for me was because i didnt have the PDFinfo installed.

How to determine file_get_contents structure before reading through the file?

I am trying to fetch the meta information from URL results passed after a search. I have been using the OpenGraph library and also PHP's native get_meta_tags function to retrieve the meta tags.
My problem is when I am reading through the contents of a URL that happens to be a .m4v extension. The program tries to read the contents of that file but it is way too large (and not mention, completely useless as it is all junk) and my program refuses to let it go. Therefore, I am stuck till the program throws a timeout error and moves on.
Is there any way to stop reading the contents of the file if it is way too large? I tried file_get_contents() with the maxlen parameter, but it still seems to read through the entire page. How can I quickly determine if a file is structured with tags before I dive in to farm it for meta?

get_headers() is what you need, there's a Content-Type and Content-Length in the response that you might be interested in.
You might want to:
$headers=get_headers($url,1);

Use php's filesize($yourFile); to find the file size in bytes:
$size = filesize($yourFile);
if ($size < 1000) {
$string = file_get_contents($yourFile);
}

Best Mime Type Method

Whats the best way to determine the mime type or file type , stopping anything malicious getting through and making sure a bug doesn't get in your system.
In my example I need a way of screening so just .mp3 are uploaded to the site. Now I know there is mime_content_type but that gives odd results depending on how the file was made and what browser you use, seeing as it gets its data from the browser, at least that's how I understand it.
this is my code for identifying using mime type.
if(mime_content_type_new($_FILES["userfile"]) == 'audio/mpeg' ) { do stuff }
then there is using unix command line and interpreting that
$fileinfo = exec("file -b 'song.mp3'"); echo $filinfo;
this outputs something like this.
Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1,
192 kbps, 44.1 kHz, Stereo
so we can sort through and check this t match to our file type.
$fileinfo = exec("file -b 'song.mp3'");
$filewewant = "MPEG";
$mpeg = stripos($fileinfo, $filewewant);
$filewewant = "layer III";
$mpeg3 = stripos($fileinfo, $filewewant);
if ($mpeg !== False & $mpeg3 !== False)
{ echo "success"; };
this way seems to work better, regardless of named extension (eg is it renamed it .png) but requires the file to be saved first then sorted through,and doesn't work on windows.
I've also been pointed at http://pear.php.net/package/MIME_Type
Does anyone else have a better way of doing it ? what is the correct way to identify what files are being uploaded to your server ?

MIME types are (should be) obtained by looking at the file's MIME header, a piece of data at the beginning of the file that indicates the MIME.
This is exactly what mime_content_type_new and your UNIX command are doing, so no issue there.
Not sure what you mean by a "better" way, you're doing it correctly.
If you are getting different MIME type results because of a browser issue, you should probably create an array of acceptable values and check it with the in_array() method.
I wouldn't recommend leaving MIME type checks like that in the hands of client-side code, especially when security is a big issue. The client has access to the code so it is much easier to fool.
You could, however, do a check on both the client side and the server side. This will save you bandwidth from bad uploads, but still keep the system secure from malicious users.
Here's a nice tutorial on Javascript's FILE API and processing images with Javascript.
http://www.html5rocks.com/en/tutorials/file/dndfiles/
Cheers.

This it maybe not a proof solution (just new / current browsers), but the new javascript FILE API allows to read the MIME-TYPE without uploading the file.
For any server-side validation you have to upload the file -> and you should validate them.

extracting SWF gives compiler errors in adobe flash CS4

I have been given an SWF to edit a link in the AS code.
The fact is the SWF uses some XML that is generated (actually retrieved) by PHP code from a database.
menuXML.load("/sub/page/dynamic.php?genre=" + genre);
so the point is we can use the same SWF 'mainfraim' and fill them with different animations/sources based on the link provided in dynamic.php?genre=###
Now, I've used Flash Decompiler Gold to extract all files in the SWF and can open it again in Adobe Flash to edit it. When done I enter CTRL+ENTER and there are immediately 4 compiler errors!! Errors:
1x < Unexpected 'if' encountered >
2x < Statement block must be terminated by '}' >
1x < Ecpected a field name after '.' operator. >
How can these errors be present, when the original SWF works perfectly??!
If I don't manage to solve this, I'll have to find out how to create an .php file the SWF tries to use which can select the proper resources (from a database I guess) to show them (using ?genre=###)
Thanks!

I'm not sure I understand your problem, but it seems like you need to change the url passed to the load method. It also seems like your swf is Actionscript 2.0.
Decompilers sort of work, but the fla file you can generate with a decompiler will seldom be useful to generate the same swf back. Sometimes the code is illegal, and almost always the graphics are screwed.
I once had to make some simple code changes (like changing a few urls and other simple stuff) to a swf, for which we had no sources (they were lost and there was no backup...).
I used flasm for this and it worked fine (also it wasn't as hard as I first supposed).
Flasm is not a decompiler, but a disassembler. It takes your swf, parses the actionscript bytecode and generates a text file with assembly-like code. You can edit that code and re-assemble the swf. It doesn't touch graphics and animations, so it was what I needed, and perhaps could work for you.
I've made a little test and it worked fine.
I started with this code in a fla:
var xml:XML = new XML();
xml.ignoreWhite = true;
xml.onLoad = function(ok:Boolean):Void {
if(ok) {
_debug_txt.text = "ok";
_debug_txt.text = xml;
} else {
_debug_txt.text = "error";
}
};
xml.load("/sub/page/dynamic.php");
Next, I opened a cmd prompt (I'm on Windows), cd to the directory that contains the swf and run:
flasm -d test_flasm.swf > test_flasm.flm
This disassembles the swf into a text file test_flasm.flm. I have added flasm to my executables path, but you can just use the full path to the flasm.exe instead.
The relevant part of the .flm file looks like this:
setMember
push '/sub/page/dynamic.php', 1, 'xml'
getVariable
push 'load'
callMethod
Yours may vary, but if you look for the url, you'll find it. Next, changed that url to:
setMember
push 'test.xml', 1, 'xml'
getVariable
push 'load'
callMethod
Then, I reassembled the swf using:
flasm -a test_flasm.flm
And now, test_flasm.swf loads "test.xml" instead of "/sub/page/dynamic.php".
Hope this helps.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - check if file is archived without using handle resource - php

Related

Get contents from gzipped file

Php count number of pages on PDF file upon upload prior to saving file

How to determine file_get_contents structure before reading through the file?

Best Mime Type Method

extracting SWF gives compiler errors in adobe flash CS4

Categories

Resources