PHP fopen and file_get_contents limited download speed, why? - php

I'm trying to retrieve a remote file (6MB text file) with PHP and I noticed that with fopen the speed is limited to 100KB/s and with file_get_contents is 15KB/s.
Howewer with wget from the server the speed is above 5MB/s.
What controls these speeds?
I checked the live speeds with nethogs.

wget is great on it's own to mirror sites it can actually parse links from pages and download files.
file_get_contents doesn't send a "connection" HTTP header, so the remote web server considers by default that's it's a keep-alive connection and doesn't close the TCP stream until 15 seconds (It might not be a standard value - depends on the server conf).
A normal browser would consider the page is fully loaded if the HTTP payload length reaches the length specified in the response Content-Length HTTP header. File_get_contents doesn't do this and that's a shame.
SOLUTION
SO, if you want to know the solution, here it is:
$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
file_get_contents("http://www.something.com/somepage.html",false,$context);
The thing is just to tell the remote web server to close the connection when the download is complete, as file_get_contents isn't intelligent enough to do it by itself using the response Content-Length HTTP header.

Related

How to properly get live stream feed from another server and return the same feed with PHP

I have a ZoneMinder media server that serve a live feed of CCTV's through http. It's response with an image stream (not video).
Now I making a third-party web application that needs that live stream feed to be displayed.
The problem I have here, is I need to integrate the system without making any browser request to the ZoneMinder server, I wan't to do it from the third-party application backend and serve them back in the browser.
I use PHP for the application and so far I get this code running:
$link = get_zoneminder_url_stream();
header("Content-Type: multipart/x-mixed-replace;boundary=ZoneMinderFrame");
$fd = fopen($link, "r");
while(!feof($fd)) {
echo fread($fd, 1024 * 5);
ob_start();
ob_end_flush();
ob_flush();
flush();
}
fclose ($fd);
I get the expected live image stream for the ZoneMinder server. But I get several problems.
When I start the live stream (executing the code above), the rest of my web application become unresponsive. All the request to the php script of the server can't be handled (it always 'pending' in the network tabs on the browser developer console), but still can handle other file (such as assets file) GET request.
I know that in this point, I get some problems with the PHP itself rather than the web server.
So my question is what is the best way to do a live stream request to another server from PHP server and throws back the stream as a response to the browser.
Thanks in advance

FTP Timeout for file_get_contents & file_put_contents

I am trying to append files on ftp server using
file_put_contents("ftp://".$ftp_user_name.":".$ftp_user_pass."#".$ftp_server."/".$destFile, $outputStr, FILE_APPEND)
This works fine, but it takes a lot of time to generate a timeout on failure. I want to set the time out for appending the file on FTP. I had a look at stream_context_create() which does support FTP protocol but could not found option for connection timeout, like it has for HTTP protocol. What could be the other way for setting time out for file_put_contents or file_get_contents
Maybe this will help?
ini_set("default_socket_timeout", $seconds);

faster php copy() function ( for remote ) like more connection?

I take the some youtube videos to my server with copy(); function but youtube give approximately 100kb/s speed with one connection ( my server has 100 mbit/s connection )
Is it possible to take more then 1 connection and reach faster speed with copy() function or sth else ?
Well, if you want to download / copy multiple URL from an external source concurrently,
curl_multi_exec is what you are look for.
This is a library made to synchronous curl multiple URL,
with the expenses of network bandwidth (and perhaps a little of CPU)
If you are looking to split a download,
once again, curl has the split option
RANGES
With HTTP 1.1 byte-ranges were introduced. Using this, a client can request
to get only one or more subparts of a specified document. Curl supports
this with the -r flag.
Get the first 100 bytes of a document:
curl -r 0-99 http://www.get.this/
Get the last 500 bytes of a document:
curl -r -500 http://www.get.this/
Curl also supports simple ranges for FTP files as well. Then you can only
specify start and stop position.
Get the first 100 bytes of a document using FTP:
curl -r 0-99 ftp://www.get.this/README

file_get_contents returns empty string

I am hesitated to ask this question because it looks weird.
But anyway.
Just in case someone had encountered the same problem already...
filesystem functions (fopem, file, file_get_contents) behave very strange for http:// wrapper
it seemingly works. no errors raised. fopen() returns resource.
it returns no data for all certainly working urls (e.g. http://google.com/).
file returns empty array, file_get_contents() returns empty string, fread returns false
for all intentionally wrong urls (e.g. http://goog973jd23le.com/) it behaves exactly the same, save for little [supposedly domain lookup] timeout, after which I get no error (while should!) but empty string.
url_fopen_wrapper is turned on
curl (both command line and php versions) works fine, all other utilities and applications works fine, local files opened fine
This error seems inapplicable because in my case it doesn't work for every url or host.
php-fpm 5.2.11
Linux version 2.6.35.6-48.fc14.i686 (mockbuild#x86-18.phx2.fedoraproject.org)
I fixed this issue on my server (running PHP 5.3.3 on Fedora 14) by removing the --with-curlwrapper from the PHP configuration and rebuilding it.
Sounds like a bug. But just for posterity, here are a few things you might want to debug.
allow_url_fopen: already tested
PHP under Apache might behave differently than PHP-CLI, and would hint at chroot/selinux/fastcgi/etc. security restrictions
local firewall: unlikely since curl works
user-agent blocking: this is quite common actually, websites block crawlers and unknown clients
transparent proxy from your ISP, which either mangles or blocks (PHP user-agent or non-user-agent could be interpreted as malware)
PHP stream wrapper problems
Anyway, first let's proof that PHPs stream handlers are functional:
<?php
if (!file_get_contents("data:,ok")) {
die("Houston, we have a stream wrapper problem.");
}
Then try to see if PHP makes real HTTP requests at all. First open netcat on the console:
nc -l 80000
And debug with just:
<?php
print file_get_contents("http://localhost:8000/hello");
And from here you can try to communicate with PHP, see if anything returns if you variate the response. Enter an invalid response first into netcat. If there's no error thrown, your PHP package is borked.
(You might also try communicating over a "tcp://.." handle then.)
Next up is experimenting with http stream wrapper parameters. Use http://example.com/ literally, which is known to work and never block user-agents.
$context = stream_context_create(array("http"=>array(
"method" => "GET",
"header" => "Accept: xml/*, text/*, */*\r\n",
"ignore_errors" => false,
"timeout" => 50,
));
print file_get_contents("http://www.example.com/", false, $context, 0, 1000);
I think ignore_errors is very relevant here. But check out http://www.php.net/manual/en/context.http.php and specifically try to set protocol_version to 1.1 (will get chunked and misinterpreted response, but at least we'll see if anything returns).
If even this remains unsuccessful, then try to hack the http wrapper.
<?php
ini_set("user_agent" , "Mozilla/3.0\r\nAccept: */*\r\nX-Padding: Foo");
This will not only set the User-Agent, but inject extra headers. If there is a processing issue with construction the request within the http stream wrapper, then this could very eventually catch it.
Otherwise try to disable any Zend extensions, Suhosin, PHP xdebug, APC and other core modules. There could be interferences. Else this is potentiallyan issue specific to the Fedora package. Try a new version, see if it persists on your system.
When you use the http stream wrapper PHP creates an array for you called $http_response_header after file_get_contents() (or any of the other f family of functions) is called. This contains useful info on the state of the response. Could you do a var_dump() of this array and see if it gives you any more info on the response?
It's a really weird error that you're getting. The only thing I can think of is that something else on the server is blocking the http requests from PHP, but then I can't see why cURL would still be ok...
Is http stream registered in your PHP installation? Look for "Registered PHP Streams" in your phpinfo() output. Mine says "https, ftps, compress.zlib, compress.bzip2, php, file, glob, data, http, ftp, phar, zip".
If there is no http, set allow_url_fopen to on in your php.ini.
My problem was solved dealing with the SSL:
$arrContextOptions = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$context = stream_context_create($arrContextOptions);
$jsonContent = file_get_contents("https://www.yoursite.com", false, $context);
What does a test with fsockopen tell you?
Is the test isolated from other code?
I had the same issue in Windows after installing XAMPP 1.7.7. Eventually I managed to solve it by adding the following line to php.ini (while having allow_url_fopen = On):
extension=php_openssl.dll
Use http://pear.php.net/reference/PHP_Compat-latest/__filesource/fsource_PHP_Compat__PHP_Compat-1.6.0a2CompatFunctionfile_get_contents.php.html and rename it and test if the error occurs with this rewritten function.

Strange http gzip issue

Here's a strange one:
I've got nginx reverse proxying requests to apache 2 with mod_php.
A user (using firefox 3.1b3) reported that recently, he's started getting sporadic "What should firefox do with this file?" popups during normal navigation. We haven't had any other reports of this issue, and haven't been able to reproduce it ourselves.
I checked Nginx and apache's logs. Nothing in the error logs, and they both show a normal HTTP 200 for the request.
I had him send me the downloaded file, and it's generated HTML, as it should be -- except it has some trailing and leading bytes tacked on.
The opening byte sequence is the magic gzip header: 1F8B08
Here are the opening characters, C-escaped for convenience:
\x1F\x8B\x089608\r\n<!DOCTYPE HTML ...
and the file ends with:
...</html>\n\r\n0\r\n\r\n
When I fetch the same URL via wget, it starts with as expected; the mysterious opening and closing bytes are nowhere to be seen.
Has anyone ever seen anything similar to this? Could this be a FF 3.1b3 bug?
Never seen an issue exactly like that, but I did have an issue once with a transparent proxy that would claim to the web server that it could handle gzip compressed content when, in fact, it received the gzipped content from the server, stripped the gzip headers without decompressing it, and sent the result to the browser. The behavior we saw was what you describe: a save/open file dialog for what should have been a normal web page. In this case the browser in question was IE.
I'm not sure if you're problem is related, but as an experiment, you could look at the requests between the proxy and Apache and see if they are gzipped, or else turn off the gzip compression for requests in Apache and see if that fixes the issue. If so, then you probably have a problem with gzip handling in your proxy.
wget doesn't request a compressed response. Try:
curl --compressed <URL>
You could also try adding a -v to print the response headers, and check that a sensible Content-Type is being returned.

Categories