I need to retrieve a small amount of data from a very large remote XML file that I access via http. I only need a portion of the file at the beginning, but the files I am accessing can often be so large that downloading them all will cause a timeout. It seems like it should be possible with fsockopen to pull only as much as needed before closing the connection, but nothing I have tried has worked.
Below is a simplified version of what I have been trying. Can anyone tell me what I need to do differently?
<?php
$k = 0;
function socketopen($funcsite, $funcheader){
$fp = fsockopen ($funcsite, 80, $errno, $errstr, 5);
$buffer = NULL;
if ($fp) {
fwrite($fp, "GET " . $funcheader . " HTTP/1.0\r\nHost: " . $funcsite. "\r\n\r\n");
while (!feof($fp)) {
$buffer = fgets($fp, 4096);
echo $buffer;
if($k == 200){
break;
}
$k++;
}
fclose ($fp);
} else {
print "No Response:";
}
return ( html_entity_decode($buffer));
}
$site = "www.remotesite.com";
$header = "/bigdatafile.xml";
$data = socketopen($site, $header);
?>
This works fine, but always opens and downloads the entire remote file. (I actually use a different conditional than the if($k = x), but that shouldn't matter).
Any help greatly appreciated. -Jim
Any reason not to use file_get_contents() instead?
$buffer = html_entity_decode(file_get_contents('http://www.remotesite.com/bigdatafile.xml', 0, null, $offsetBytes, $maxlenBytes));
You just need to specify $offsetBytes and $maxlenBytes.
Try this:
set_time_limit(0);
echo $buffer = html_entity_decode(file_get_contents('http://www.remotesite.com/bigdatafile.xml', 0, null, 1024, 4096));
with this code you could download the entire rss
if (!$xml = simplexml_load_file("http://remotesite.com/bigrss.rss))
{
throw new RuntimeException('Unable to load or parse feed');
}
else
{
file_put_contents($xml,'mybigrss.rss');
}
but if you want to get just some parts then do the following;
$limit = 512000; // set here a limit
$sourceData = fread($s_handle,$limit);
// your code ect..
Or with eof
$source='';
while (!feof($s_handle))
$source.=fread($s_handle,1024); // set limit
Related
I need to transfer files of any type or size over HTTP/GET in ~1k chunks. The resulting file hash needs to match the source file. This needs to be done in native PHP without any special tools. I have a basic strategy but I'm getting odd results. This proof of concept just copies the file locally.
CODE
<?php
$input="/home/lm1/Music/Ellise - Feeling Something Bad.mp3";
$a=pathinfo($input);
$output=$a["basename"];
echo "\n> ".md5_file($input);
$fp=fopen($input,'rb');
if ($fp) {
while(!feof($fp)) {
$buffer=base64_encode(fread($fp,1024));
// echo "\n\n".md5($buffer);
write($output,$buffer);
}
fclose($fp);
echo "\n> ".md5_file($output);
echo "\n";
}
function write($file,$buffer) {
// echo "\n".md5($buffer);
$fp = fopen($file, 'ab');
fwrite($fp, base64_decode($buffer));
fclose($fp);
}
?>
OUTPUT
> d31e102b1cae9c73bbf5a12615a8ea36
> 9f03f6c88ed61c07cb534922d6d31864
Thanks in advance.
fread already advances the file pointer position, so there's no need to keep track of it. Same with frwite, so consecutive calls automatically append to the given file. Thus, you could simplify your approach to (code adapted from this answer on how to efficiently write a large input stream to a file):
$src = "a.test";
$dest = "b.test";
$fp_src = fopen($src, 'rb');
if ($fp_src) {
$fp_dest = fopen($dest, 'wb');
$buffer_size = 1024;
while(!feof($fp_src)) {
fwrite($fp_dest, fread($fp_src, $buffer_size));
}
fclose($fp_src);
fclose($fp_dest);
echo md5_file($src)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
echo md5_file($dest)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
}
If you want to keep both processes separated, you'd do:
$src = "a.test";
$dest = "b.test";
if (file_exists($dest)) {
unlink($dest); // So we don't append to an existing file
}
$fp = fopen($src,'rb');
if ($fp) {
while(!feof($fp)){
$buffer = base64_encode(fread($fp, 1024));
write($dest, $buffer);
}
fclose($fp);
}
function write($file, $buffer) {
$fp = fopen($file, 'ab');
fwrite($fp, base64_decode($buffer));
fclose($fp);
}
echo md5_file($src)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
echo md5_file($dest)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
As for how to stream files over HTTP, you might want to have a look at:
Streaming a large file using PHP
I want to retrieve email from gmails' imap server but the problem is that the responses from the server are multiple lines long (as demonstrated here) and fgets only retrieves one line.
I've tried using fgets, fread, socket_read but none of them work so either i'm using the wrong method or using the methods incorrectly. I also tried this tutorial but it didn't work either. I would appreciate if someone could help me with this.
Thanks and i'm really sorry if this is an amateur question.
Code:
<?php
$stuff = fsockopen('ssl://imap.gmail.com',993);
$reply = fgets($stuff,4096);
echo 'connection: '.$reply.'<br/>';
$request = fputs($stuff,"a1 LOGIN MyUserName Password\r\n");
$receive = socket_read($stuff, 4096);
echo 'login: '.$receive.'<br/>';
$request = fputs($stuff,"a2 EXAMINE INBOX\r\n");
$reply = '';
while(!feof($stuff))
$reply .= fread($stuff, 4096);
echo $reply;
/*
$request = fputs($stuff,'a3 FETCH 1 BODY[]\r\n');
$reply = fgets($stuff);
echo $reply;
*/
?>
Max's answer below works. This is my implementation of it.
private function Response($instructionNumber)
{
$end_of_response = false;
while (!$end_of_response)
{
$line = fgets($this->connection,self::responseSize);
$response .= $line.'<br/>';
if(preg_match("/$instructionNumber (OK|NO|BAD)/", $response,$responseCode))
$end_of_response = true;
}
return array('code' => $responseCode[1],
'response'=>$response);
}
Generally, you know to stop reading when you get the OK/BAD/NO response for the tag you sent. If you send a1 LOGIN ... you stop when you get a1 OK/BAD/NO ....
It's been a while since I wrote PHP, and I don't know that much about IMAP, but if it's anything like NNTP, your code would look a bit like this (wrote it in the SO editor, might be bugged) :
$buffer = '';
function read_line($socket) {
global $buffer;
while (strpos($buffer, "\n") === false)
$buffer .= fread($socket, 1024);
$lineEnd = strpos($buffer, "\n");
$line = substr($buffer, 0, $lineEnd-1);
$buffer = substr($buffer, $lineEnd);
return $line;
}
function send_line($socket, $line) {
fwrite($socket, $line);
}
$socket = fsockopen('ssl://imap.gmail.com',993);
$welcome = read_line($socket);
send_line("a1 LOGIN MyUserName Password\r\n");
$reply = read_line($socket);
send_line("a2 EXAMINE INBOX\r\n");
while (($reply = trim(read_line($socket))) != '.') {
echo $reply.PHP_EOL;
}
echo "Done";
The basic concepts are :
Always buffer all incoming data. PHP doesn't handle lines very well, so do the splitting yourself.
Don't randomly read everything, but know what to expect. You expect one welcome line, LOGIN has one reply, and EXAMINE INBOX keeps outputting data until there's a single dot, so immediately stop reading once you see that.
You'll most likely want a simple function to take care of the reading. You could even write another function to make it easy:
function read_block($socket) {
$block = '';
while ('.' != trim($reply = read_line($socket)) {
$block .= $reply;
}
return $block;
}
This what I'm trying to do:
$output = '';
$stream = popen("some-long-running-command 2>&1", 'r');
while (!feof($stream)) {
$meta = stream_get_meta_data($stream);
if ($meta['unread_bytes'] > 0) {
$line = fgets($stream);
$output .= $line;
}
echo ".";
}
$code = pclose($stream);
Looks like this code is not correct, since it gets stuck at the call to stream_get_meta_data(). What is the right way to check whether the stream has some data to read? The whole point here is to avoid locking at fgets().
The correct way to do this is with stream_select():
$stream = popen("some-long-running-command 2>&1", 'r');
while (!feof($stream)) {
$r = array($stream);
$w = $e = NULL;
if (stream_select($r, $w, $e, 1)) {
// there is data to be read
}
}
$code = pclose($stream);
One thing to note though (I'm not sure about this) is that it may be the feof() check that is "blocking" - it may be that the loop never ends because the child process does not close its STDOUT descriptor.
Hey everyone, I have written a script that downloads a zip file from a remote source, and then is supposed to extract the zip file to a directory. Below is the script:
<?php
$url = "http://example.com/some_file.zip";
download($url,'file.zip');
function download($url,$file_name = NULL){
if($file_name == NULL){ $file_name = basename($url);}
$url_stuff = parse_url($url);
$port = isset($url_stuff['port']) ? $url_stuff['port'] : 80;
$fp = fsockopen($url_stuff['host'], $port);
if(!$fp){ return false;}
$query = 'GET ' . $url_stuff['path'] . " HTTP/1.0\n";
$query .= 'Host: ' . $url_stuff['host'];
$query .= "\n\n";
fwrite($fp, $query);
while ($tmp = fread($fp, 8192)) {
$buffer .= $tmp;
}
preg_match('/Content-Length: ([0-9]+)/', $buffer, $parts);
$file_binary = substr($buffer, - $parts[1]);
if($file_name == NULL){
$temp = explode(".",$url);
$file_name = $temp[count($temp)-1];
}
if(!file_exists("packages")){ mkdir("packages", 0755);}
$file_open = fopen("packages/" . $file_name,'w');
if(!$file_open){ return false;}
fwrite($file_open,$file_binary);
$zip = zip_open(realpath("packages")."/".$file_name);
if ($zip) {
while ($zip_entry = zip_read($zip)) {
$fp = fopen("some_dir/".zip_entry_name($zip_entry), "w");
if(zip_entry_open($zip, $zip_entry, "r")) {
$buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
fwrite($fp,"$buf");
zip_entry_close($zip_entry);
fclose($fp);
}
}
zip_close($zip);
}
fclose($file_open);
return true;
}
?>
The issue that I have is that while the downloading of the remote file works flawlessly, I can't seem to extract it. The zip_read() and zip_close() return errors saying that it "expects parameter 1 to be resource, integer given...", which I have found means that the zip_open() was unable to extract and is returning an error code, which I have found to be "19" meaning "Zip File Function error: Not a zip archive". However, I know the file I am downloading is, in fact, a zip file. Can anyone explain this odd behavior and provide a fix? It would be much appreciated!
Quoting php.net: "zip_open() ... Returns a resource handle for later use with zip_read() and zip_close() or returns the number of error if filename does not exist or in case of other error."
This means you cannot test if ($zip) like that. Try
if ( is_resource($zip) ) {
// stuff
} else {
print "Zip_open() returned error $zip\n";
}
edit: Apart from that, you need to cut the response in 2 parts properly. You are relying heavily on the Content-Length parameter. You don't check if the preg_match actually matched. A lot of things can go wrong and you should check those things. Try splitting the content on the first empty line (explode on \r\n\r\n or something like that)
Besides the fread() loop should check for feof(), since you would stop reading now if for some reason you would encounter an empty read. Copy&paste from php.net:
while (!feof($handle)) {
$contents .= fread($handle, 8192);
}
But we can go on and on here. Three main points have to be made:
read the fantastic manual (php.net)
check return values
don't assume you know things you don't
those are related: you must lookup the manual to see what return values you might encounter.
I want to pass a function a string, which takes that string tacks it onto url. Then goes to that url and then returns the page to my server so I can manipulate it with JS.
Any Ideas would be much appreciated.
cheers.
If your fopen_wrappers are enabled, you can use file_get_contents() to retrieve the page, and then insert JavaScript into the content before echoing it as output.
$content = file_get_contents('http://example.com/page.html');
if( $content !== FALSE ) {
// add your JS into $content
echo $content;
}
This of course won't affect the original page.
You should be able to use fopen() for what you want. It can accept URLs.
echo "<script type='text/javascript' src='myjavascript.js'></script>";
$handle = #fopen("http://www.example.com/", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
echo $buffer;
}
fclose($handle);
}
Using CURL would probably be easiest but I prefer to do stuff myself. This will connect to a given address and return the contents of the page. It will also return the headers, though, so watch out for that:
function do_request ($host, $path, $data, $request, $specialHeaders=null, $type="application/x-www-form-urlencoded", $protocol="", $port="80")
{
$contentlen = strlen($data);
$req = "$request $path HTTP/1.1\r\nHost: $host\r\nContent-Type: $type\r\nContent-Length: $contentlen\r\n";
if (is_array($specialHeaders))
{
foreach($specialHeaders as $header)
{
$req.=$header;
}
}
$req.="Connection: close\r\n\r\n";
if ($data != null) {
$req.=$data;
}
$fp = fsockopen($protocol.$host, $port, $errno, $errstr);
if (!$fp) {
throw new Exception($errstr);
}
fputs($fp, $req);
$buf = "";
if (!feof($fp)) {
$buf = #fgets($fp);
}
return $buf;
}