php file_get_contents downloading too much data - php

I notice that when I use file_get_contents I seem to be using more bandwidth than I should. For example:
file_get_contents('https://example.com',false,$ctx,0,99000);
Will cause my network RX to jump up about 1.6mb (just using ifconfig and comparing before and after).... I would think it should only jump by 99kb, because I've specified that with the 99000?

file_get_contents is a rather buggy function in PHP. Consider using curl and following this solution:
how to set a maximum size limit to php curl downloads

Related

check if file is fully downloaded using wget

I'm using php wget to download mp4 files from another server
exec("wget -P files/ $http_url");
but I didn't find any option to check if file downloaded correctly, or not yet.
I tried to get duration file using getID3(), but it always return good value, even if file not downloaded correctly
// Check file duration
$file = $getID3->analyze($filepath);
echo $file['playtime_string']; // 15:00 always good value
there is any function to check that?
Thanks
First off I would try https instead. If the server(s) you're connecting to happen to support it, you get around this entire issue because lost bytes are usually caused by flaky hardware or bad MTU settings on a router on their network. The http connections gracefully degrade to giving you as much of the file as it could manage, whereas https connections just plain fail when they lose bytes because you can't decrypt non-intact packets.
Lazy IT people tend to get prodded to fix complete failures of https, but they get less pressure to diagnose and fix corner cases like missing bytes that only occur larger transactions over http.
If https is not available, keep reading.
An HTTP server response may include a Content-Length header indicating the number of bytes in a particular transaction.
If the header is there, you should be able to see it by running wget directly, adding the -v flag.
If it's not there, I believe wget will report Length: unspecified followed by the content-type header's value.
If it tells you (and assuming the byte count is accurate) then you can just compare the byte count of the file you got and the one in the transaction.
If the server(s) you're contacting don't provide this header, you're left with less exact methods, like finding some player that will basically play the mp3 until it ends and then see how long it took and then compare that to the length listed in the ID3 tag (which is in the very beginning of the file). You're not going to be able to get an exact match though, because the time in the tag (if it's there) is only accurate to the second, meaning half a second could be gone from the end of the file and you wouldn't know.

PHP: Save Dynamic URL Image to Disk

Having trouble capturing the following dynamic image on disk, all I get is a 1K size file
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER
I have setup PHP cURL feature to work just fine on static imagery, but does not work for the above link. Similarly, also copy function, file_put_contents (file_get_contents)...they all work fine for static image. Plenty of references in SO for usage of these PHP functions, so I will not get into details here. Just the copy command:
copy('http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER', 'precip5.png');
Behavior is same, getting precip5.png size 760 bytes, on my windows development box and linux staging box, so can rule OS issues out. Again, all PHP functions do exactly the same thing - generate a file - but empty. Command line curl program is also generating that same junk 1K file.
So, the issue seems to be source and the best I can tell is that it is a dynamic (streaming?) image.
Ideally, I would like this be done in PHP or some command line utility like curl. I am trying to avoid adding java (imageio) dependency just for this...until I absolutely have have to go there...
I am trying to understand the nature of the beast (the image) first ;-)...
The URL you are saving produces HTML output, not the image. You are missing the parameter &print=1
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER&print=1

faster php copy() function ( for remote ) like more connection?

I take the some youtube videos to my server with copy(); function but youtube give approximately 100kb/s speed with one connection ( my server has 100 mbit/s connection )
Is it possible to take more then 1 connection and reach faster speed with copy() function or sth else ?
Well, if you want to download / copy multiple URL from an external source concurrently,
curl_multi_exec is what you are look for.
This is a library made to synchronous curl multiple URL,
with the expenses of network bandwidth (and perhaps a little of CPU)
If you are looking to split a download,
once again, curl has the split option
RANGES
With HTTP 1.1 byte-ranges were introduced. Using this, a client can request
to get only one or more subparts of a specified document. Curl supports
this with the -r flag.
Get the first 100 bytes of a document:
curl -r 0-99 http://www.get.this/
Get the last 500 bytes of a document:
curl -r -500 http://www.get.this/
Curl also supports simple ranges for FTP files as well. Then you can only
specify start and stop position.
Get the first 100 bytes of a document using FTP:
curl -r 0-99 ftp://www.get.this/README

php file_get_contents fails sometimes during cronjob

I am trying to run a php script via a cronjob and sometimes (about half the time) I get the following warning:
PHP Warning: file_get_contents(http://url.com): failed to open stream: HTTP request failed! in /path/myfile.php on line 285
The program continues to run after that which makes me think it is not a timeout problem or a memory issue (timeout is set to 10 minutes and memory to 128M), but the variable that I am storing the results of that function call in is empty. The weird part is that I am making several other calls to this same website with other url parameters and they never have a problem. The only difference with this function call is that the file it is downloading is about 70 mb while the others are all around 300 kb.
Also, I never get this warning if I SSH into the web server and run the php script manually, only when it is run from a cron.
I have also tried using cURL instead of file_get_contents but then I run out of memory.
Thanks, any help here would be appreciated.
Perhaps the remote server on URL.com is sometimes timing out or returning an error for that particular (large) request?
I don't think you should be trying to store 70mb in a variable.
You can configure cURL to download directly to a file. Something like:
$file = fopen ('my.file', 'w');
$c = curl_init('http://url.com/whatever');
curl_setopt($c, CURLOPT_FILE, $file);
curl_exec($c);
curl_close($c);
fclose($file);
If nothing else, curl should provide you with much better errors about what's going wrong.
From another answer .. double check that this issue isn't occurring some of the time with the URL parameters you're using:
Note: If you're opening a URI with special characters, such as spaces, you need to encode the URI with urlencode() - http://docs.php.net/file%5Fget%5Fcontents

Does md5_file have a memory limit/timeout for remote files?

I've been trying to hash the contents of some zip files from a remote source using PHP's md5_file function:
md5_file($url);
I'm having a problem with a couple of URLs; I'm getting the following error:
Warning: md5_file($url): failed to open stream: HTTP request failed!
I think it's because the zip files are quite large in those cases.
But as yet I haven't been able to find much information or case studies for md5_file hashing remote files to confirm or refute my theory. It seems most people grab the files and hash them locally (which I can do if necessary).
So I suppose it's really out of curiosity: Does md5_file have any specific limits to how large remote files can be? Does it have a timeout which will stop it from downloading larger files?
Probably the simplest solution is to set the timeout yourself via:
ini_set('default_socket_timeout', 60); // 60 secs
Of course if these files are big, another option is to use file_get_contents() as you can specify the filesize limit. You don't want to assign this to a value, as it's more efficient to wrap it like so:
$limit = 64 * 1024; // 64 being the number of KB to limit your retrieval
md5(file_get_contents($url, false, null, 0, $limit ));
Now you can create MD5s off parts of the file, and not worry if somebody tries to send you a 2GB file. Of course keep in mind it's only a MD5 for part of the file, if anything after that changes, this breaks. You don't have to set a filesize limit at all, just try it like so:
ini_set('default_socket_timeout', 60); // 60 secs
md5(file_get_contents($url));
Some hosting environments don't allow you to access remote files in this manner. I think that the MD5 function would operate a lot like the file() function would. Make sure you can access the contents of remote files with that command first. If not, you may be able to CURL your way to the file and it's contents.
You could try set_time_limit(0); if file is relatively large and you are not definite with how much time it would consume

Categories