check if file is fully downloaded using wget

check if file is fully downloaded using wget - php

I'm using php wget to download mp4 files from another server
exec("wget -P files/ $http_url");
but I didn't find any option to check if file downloaded correctly, or not yet.
I tried to get duration file using getID3(), but it always return good value, even if file not downloaded correctly
// Check file duration
$file = $getID3->analyze($filepath);
echo $file['playtime_string']; // 15:00 always good value
there is any function to check that?
Thanks

First off I would try https instead. If the server(s) you're connecting to happen to support it, you get around this entire issue because lost bytes are usually caused by flaky hardware or bad MTU settings on a router on their network. The http connections gracefully degrade to giving you as much of the file as it could manage, whereas https connections just plain fail when they lose bytes because you can't decrypt non-intact packets.
Lazy IT people tend to get prodded to fix complete failures of https, but they get less pressure to diagnose and fix corner cases like missing bytes that only occur larger transactions over http.
If https is not available, keep reading.
An HTTP server response may include a Content-Length header indicating the number of bytes in a particular transaction.
If the header is there, you should be able to see it by running wget directly, adding the -v flag.
If it's not there, I believe wget will report Length: unspecified followed by the content-type header's value.
If it tells you (and assuming the byte count is accurate) then you can just compare the byte count of the file you got and the one in the transaction.
If the server(s) you're contacting don't provide this header, you're left with less exact methods, like finding some player that will basically play the mp3 until it ends and then see how long it took and then compare that to the length listed in the ID3 tag (which is in the very beginning of the file). You're not going to be able to get an exact match though, because the time in the tag (if it's there) is only accurate to the second, meaning half a second could be gone from the end of the file and you wouldn't know.

Related

ImageMagick / GraphicsMagick / libvips Images randomly corrupted

We are using ImageMagick for resizing/thumbnailing JPGs to a specific size. The source file is loaded via HTTP. It's working as expected, but from time to time some images are partially broken.
We already tried different software like GraphicsMagick or VIPS, but the problem is still there. It also only seems to happen if there are parallel processes. So the whole script is locked via sempahores, but it also does not help
We found multiple similar problems, but all without any solution: https://legacy.imagemagick.org/discourse-server/viewtopic.php?t=22506
We also wonder, why it is the same behaviour in all these softwares. We also tried different PHP versions. It seems to happen more often on source images with a huge dimension/filesize.
Any idea what to do here?
Example 1 Example 2 Example 3

I would guess the source image has been truncated for some reason. Perhaps something timed out during the download?
libvips is normally permissive, meaning that it'll try to give you something, even if the input is damaged. You can make it strict with the fail flag (ie. fail on the first warning).
For example:
$ head -c 10000 shark.jpg > truncated.jpg
$ vipsthumbnail truncated.jpg
(vipsthumbnail:9391): VIPS-WARNING **: 11:24:50.439: read gave 2 warnings
(vipsthumbnail:9391): VIPS-WARNING **: 11:24:50.439: VipsJpeg: Premature end of JPEG file
$ echo $?
0
I made a truncated jpg file, then ran thumbnail. It gave a warning, but did not fail. If I run:
$ vipsthumbnail truncated.jpg[fail]
VipsJpeg: Premature end of input file
$ echo $?
1
Or in php:
$thumb = Vips\Image::thumbnail('truncated.jpg[fail]', 128);
Now there's no output, and there's an error code. I'm sure there's an imagemagick equivalent, though I don't know it.
There's a downside: thumbnailing will now fail if there's anything wrong with the image, and it might be something you don't care about, like invalid resolution.

After some additional investigation we discovered that indeed the sourceimage was already damaged. It was downloaded via a vpn connection which was not stable enough. Sometimes the download stopped, so the JPG was only half written.

How to prevent apache from sending connection close header <s>on 304 answers</s>

I have the following setup:
Some files are dynamically generated dependent on some (only a few) session parameters. Since they have not a great diversity, i allow caching in proxys/browsers. The files get an etag on their way, and the reaction of the whole web application at first glance seems correct: Files served in correct dependence from session situations, traffic saved.
And then this erroneous behavior:
But at closer inspection, i found that in his answer in case of a 304 for those dynamically generated files, apache wrongly sends a "Connection: close" Header instead of the normally sent "Connection: KeepAlive". What he should do is: Simply do not manipulate anything concerning "connection".
I cannot find any point where to pinpoint the cause of this behavior: Nowhere in the apache config files is anything written except one single line in one single file where it is instructed to send a keepalive - which it does - as long as it does not send a 304 response for a dynamically generated file. Nowhere in PHP do i instruct that guy to send anything other than keepalives (and the latter only to try to counter the connection:close).
The apache does not do this when it serves "normal" (non-dynamic) files (with 304 answers). So in some way i assume that maybe the PHP kernel is the one who interferes here without permission or being asked. But then, an added "Header set Connection 'Keep-Alive'" in the apache config, which i too added to counter the closing of the connection, does not work, too. Normally, when you put such a header set rule (not of "early" type) in the apache config, this rules takes action AFTER finalization of any subordered work on the requested document (thus AFTER finalization of the PHP output). But in my case, nothing happens - well: in case of a 304 response. In all other cases, everything works normal and correct.
Since there do some other files go over the line at a page request, i would appreciate to get the apache rid of those connection-closures.
Is there anybody who has an idea what to do with this behavior?

P.S.: One day (and a good sleep) later, things are clearing:
The culprit in this case was a shortsightedly (on my behalf) copied example snippet, which had "HTTP/1.>>>0<<< 304" (the Null!) in it.
This protocol version number gets (correctly) post-processed by apache (after everything otherwise - including any apache modules work - got finalized), in that it decides not to send a "Connection: Keep-Alive" over the wire, since that feature didn't exist in version HTTP/1.0.
The problem in this case was to get the focus on the fact that everything inside php and apache modules worked correctly and something in the outer environment of them must have been wrong, and thereafter to shift the view to anything in the code that could possibly influence that outer environment (e.g. the protocol version).

Browser shows time out while Server process is still running

I am having following problem:
I am running BIG memory process but have divided memory load into smaller chunks so no CPU time out issue.
In the Server I am creating .xml files with around 100kb sizes and they will be created around 100+.
Now main problem is browser shows Response Time out and IE at the below (just upper status bar) shows .php file download message.
During this in the backend (Server side) process is still running and continuously creating .xml files in incremental order. So no issue with that.
I have following php.ini configuration.
max_execution_time = 10000 ; Maximum execution time of each script, in seconds
max_input_time = 10000 ; Maximum amount of time each script may spend parsing request data
memory_limit = 2000M ; Maximum amount of memory a script may consume (128MB)
; Maximum allowed size for uploaded files.
upload_max_filesize = 2000M
I am running my site on IE. And I am using ZSCE with PHP 5.3
Can anybody redirect me on proper way on this issue?
Edit:
Uploading image of Time out and that's why asking for .php file download.
Edit 2:
I briefly explain my execution flow:
I have one PHP file with objects of Class Hierarchies which will start to execute Function1() from each class Hierarchy.
I have class file.
First, let say, Function1() is executed which contains logic of creating XML files in chunks.
Second, let say, Function2() is executed which will display output generated by Function1().
All is done in Class Hierarchies manner. So I can't terminate, in between, execution of Function1() until it get executed. And after that Function2() will be called.
Edit 3:
This is specially for #hakre.
As you asked some cross questions and I agree with some points but let me describe more in detail about the issue.
First I was loading around 100+ MB size XML Files at a time and that's why my Memory in local setup was hanging and stops everything on Machine and CPU time was utilizing its most resources.
I, then, divided this big size XML files in to small size (means now I am loading single XML file at a time and then unloading it after its usage). This saved me from Memory overload and CPU issue on local setup.
Now my backend process is running no CPU or Memory issue but issue is with Browser Timeout. I even tried cURL but as per my current structure it does seems to fit because of my class hierarchy issue. I have a set of classes in hierarchy and they all execute first their Process functions and then they all execute their Output functions. So unless and until Process functions get executed the Output functions do not comes in picture and that's why Browser shows Timeout.
I even followed instructions suggested by #vortex and got little success but not what I am looking for. Why I could not implement cURl because My process function is Creating required XML files at one go so it's taking too much time to output to Browser. As Process function is taking that much time no output is possible to assign to client unless and until it get completed.
cURL Output:
URL....: myurl
Code...: 200 (0 redirect(s) in 0 secs)
Content: text/html Size: -1 (Own: 433) Filetime: -1
Time...: 60.437 Start # 60.437 (DNS: 0 Connect: 0.016 Request: 0.016)
Speed..: Down: 7 (avg.) Up: 0 (avg.)
Curl...: v7.20.0
Contents of test.txt file
* About to connect() to mylocalhost port 80 (#0)
* Trying 127.0.0.1... * connected
* Connected to mylocalhost (127.0.0.1) port 80 (#0)
\> GET myurl HTTP/1.1
Host: mylocalhost
Accept: */*
< HTTP/1.1 200 OK
< Date: Tue, 06 Aug 2013 10:01:36 GMT
< Server: Apache/2.2.21 (Win32) mod_ssl/2.2.21 OpenSSL/0.9.8o
< X-Powered-By: PHP/5.3.9-ZS5.6.0 ZendServer
< Set-Cookie: ZDEDebuggerPresent=php,phtml,php3; path=/
< Cache-Control: private
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Connection #0 to host mylocalhost left intact
* Closing connection #0
Disclaimer : An answer for this question is chosen based on the first little success based on answer selected. The solution from #Hakre is also feasible when this type of question is occurred. But right now no answer fixed my question but little bit. Hakre's answer is also more detail in case of person finding for more details about this type of issues.

assuming you made all the server side modifications so you dodge a server timeout [i saw pretty much everyting explained above], in order to dodge browser timeout it is crucial that you do something like this
<?php
set_time_limit(0);
error_reporting(E_ALL);
ob_implicit_flush(TRUE);
ob_end_flush();
I can tell you from experience that internet explorer doesn't have any issues as long as you output some content to it every now and then. I run a 30gb database update everyday [that takes around 2-4 hours] and opera seems to be the only browser that ignores the content output.
if you don't set "ob_implicit_flush" you need to do an "ob_flush()" after every piece of content.
References
ob_implicit_flush
ob_flush
if you don't use ob_implicit_flush at the top of your script as I wrote earlier, you need to do something like:
<?php
echo 'dummy text or execution stats';
ob_flush();
within your execution loop

1. I am running BIG memory process but have divided memory load into smaller chunks so no CPU time out issue.
Now that's a wild guess. How did you find out it was a CPU time out issue in the first place? Did you even? If yes, what does your test now gives? If not, how do you test now that this is not a time-out issue?
Despite you state there won't be a certain issue, you don't proof that and many questions are still open. That invites for guessing which is counter-productive for trouble-shooting (which you are doing here).
What you write here just means that you wrote code to chunk memory, however, this is not a test for CPU time out issues. The one is writing code the other part is test. Don't mix the two. And don't draw wild assumptions. Issues are for the test, otherwise it didn't happen.
So much for your first point already just to show you that when doing troubleshooting, look for facts (monitor, test, profile, step-debug) not run assumptions. This is curcial otherwise you look in the wrong places and ask the wrong questions.
From what you describe how the client (browser) behaves, this is not a time-out-issue per-se. The problem you've got is that the answer between the header response and the body response is taking to long for the taste of your browser. The one browser is assuming a time-out (as such a boundary value has been triggered and this looks more correct to me) and the other browser is assuming somthing is coming up, why not save it.
So you merely have a processing issue here. Please consult the menual of your internet browsers (HTTP clients) which configuration values you can change to change this behavior. E.g. monitor with a curl-request on the command-line how long the request actually take. Then configure your browser to not time-out when connecting to that server under such an amount of time you just measured. For example if you're using Internet Explorer: http://www.ehow.com/how_6186601_change-internet-timeout-options.html or if you're using Mozilla Firefox: http://forums.mozillazine.org/viewtopic.php?f=7&t=102322&start=0
As you didn't show any code on the server-side I assume you want to solve this problem with client settings. Curl will help you to measure the number of seconds such a request takes. Use the -v (Verbose) switch to obtain detailed information about the request.
In case you don't want to solve this on the client, curl will still help you to measure important data and easily reproduce any underlying server-related timing issue. So you should go for Curl on the command-line in any case, especially as looking into response-headers might reveal what triggers the (again) esoteric internet explorer behavior. Again the -v switch does reveal you request and response headers.
If you like to automate such tests with a PHP script, it's also possible with the PHP Curl Extension. This has been outlined in:
Php - Debugging Curl

The problem is with your web-server, not the browser.
If you're using Apache, you need to adjust your Timeout value at httpd.conf or virtual hosts config.

You have 3 pages
Process - Creates the XML files and then updates a database value saying that the process is done
A PHP page that returns {true} or {false} based on the status of the process completion database value
An ajax front end, polling page 2 every few seconds to check weather the process is done or not
Long Polling

I have had this issue several times, while reading large csv file and puting it in database. I solved it in way, that i divided the reading and putting in database process into smaller parts. Like i created a new table to make log of how much data is readed and inserted, and next time the page reloads itself and start from that position. So you can do it by creating one xml in one attempt,and reload page and start form next one. In this way the memory used by browser is refreshed.
Hope it will help.

Is it possible to send some output to browser from the script while it's still processing, even white space? If, then do it, it should reset the timeout counter.
If it's not possible, you have to increase the timeout of IE in the registry:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings
You need ReceiveTimeout, if it's not there, create it as dword, and set the value in miliseconds.

What is a "CPU time out issue"?
The right way to solve the problem is to run the heavy stuff asynchronously, in a seperate session group (not the webserver process tree).

Try to include set_time_limit(0); in your PHP script page.
The following links might help you.
http://php.net/manual/en/function.set-time-limit.php
http://php.net/manual/en/function.ignore-user-abort.php

Error when unzipping a group of images

I am importing public domain books from archive.org to my site, and have a php import script set up to do it. However, when I attempt to import the images and run
exec( "unzip $images_file_arg -d $book_dir_arg", $output, $status );
it will occasionally return me a $status of 1. Is this ok? I have not had any problems with the imported images so far. I looked up the man page for unzip, but it didn't tell me much. Could this possibly cause problems, and do I have to check each picture individually, or am I safe?

EDIT: Oops. I should have checked the manpage straight away. They tell us what the error codes mean:
The exit status (or error level) approximates the exit codes defined by PKWARE and takes on the following values, except under VMS:
normal; no errors or warnings detected.
one or more warning errors were encountered, but processing completed successfully anyway. This includes zipfiles where one or more files was skipped due to unsupported compression method or encryption with an unknown password.
a generic error in the zipfile format was detected. Processing may have completed successfully anyway; some broken zipfiles created by other archivers have simple work-arounds.
a severe error in the zipfile format was detected. Processing probably failed immediately.
(many more)
So, apparently some archives may have had files in them skipped, but zip didn't break down; it just did all it could.
It really should work, but there are complications with certain filenames. Are any of them potentially tricky with unusual characters? escapeshellarg is certainly something to look into. If you get a bad return status, you should be concerned because that means zip exited with some error or other. At the very least, I would suggest you log the filenames in those case (error_log($filename)) and see if there is anything that might cause problems. zip itself runs totally independantly of PHP and will do everything fine if it's getting passed the right arguments by the shell, and the files really are downloaded and ready to unzip.

Maybe you are better suited with the php integrated ziparchive-class.
http://www.php.net/manual/de/class.ziparchive.php
Especially http://www.php.net/manual/de/function.ziparchive-extractto.php it returns you TRUE, if extracting was successful, otherwise FALSE.

Hacked, what does this piece of code do?

WARNING: This is a possible exploit. Do not run directly on your server if you're not sure what to do with this.
http://pastehtml.com/view/1b1m2r6.txt
I believe this was uploaded via an insecure upload script. How do I decode and uncompress this code? Running it in the browser might execute it as a shell script, open up a port or something.
I can do a base64 decode online but i couldn't really decompress it.

So there's a string. It's gzipped and base64 encoded, and the code decodes the base64 and then uncompresses it.
When that's done, I am resulted with this:
<? eval(base64_decode('...')); ?>
Another layer of base64, which is 720440 bytes long.
Now, base64 decoding that, we have 506961 bytes of exploit code.
I'm still examining the code, and will update this answer when I have more understanding. The code is huge.
Still reading through the code, and the (very well-done) exploit allows these tools to be exposed to the hacker:
TCP backdoor setup
unauthorised shell access
reading of all htpasswd, htaccess, password and configuration files
log wiping
MySQL access (read, write)
append code to all files matching a name pattern (mass exploit)
RFI/LFI scanner
UDP flooding
kernel information
This is probably a professional PHP-based server-wide exploit toolkit, and seeing as it's got a nice HTML interface and the whole lot, it could be easily used by a pro hacker, or even a script kiddie.
This exploit is called c99shell (thanks Yi Jiang) and it turns out to have been quite popular, being talked about and running for a few years already. There are many results on Google for this exploit.

Looking at Delan's decoded source, it appears to be a full-fledged backdoor providing a web interface that can be used to control the server in various ways. Telling fragments from the source:
echo '<center>Are you sure you want to install an IP:Port proxy on this
website/server?<br />
or
<b>Mass Code Injection:</b><br><br>
Use this to add PHP to the end of every .php page in the directory specified.
or
echo "<br><b>UDP Flood</b><br>Completed with $pakits (" .
round(($pakits*65)/1024, 2) . " MB) packets averaging ".
round($pakits/$exec_time, 2) . " packets per second \n";
or
if (!$fp) {echo "Can't get /etc/passwd for password-list.";}
I'd advise you to scrub that server and reinstall everything from scratch.

I know Delan Azabani has done this, but just so you actually know how he got the data out:
Just in case you're wondering how to decompress this, use base64 -d filename > output to parse base64 strings and gunzip file.name.gz to parse gzipped data.
The trick is in recognising that what you've got is base64 or gunzip and decompressing the right bits.
This way it goes absolutely nowhere near a JS parser or PHP parser.

First, replace the eval with an echo to see what code it would execute if you'd let it.
Send the output of that script to another file, say, test2.php.
In that file, do the same trick again. Run it, and it will output the complete malicious program (it's quite a beast), ~4k lines of hacker's delight.

This is code for php shell.
to decode this
replace replace eval("?>". with print(
run this
php5 file.php > file2.php
then replace eval with print and run in browser. http://loclhost/file2.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.