For parsing large files on the internet, or just wanting to get the opengraph tags of a website, is there a way to GET a webpage's first 1000 characters and then to stop downloading anything else from the page?
When a file is several megabytes, it can take the server a while to parse the file. This is especially the case when operating with many of these files. Even more troublesome than bandwidth is CPU/RAM conditions as files that are too large are difficult to work with in PHP as the server can run out of memory.
Here are some PHP commands that can open a webpage:
fopen
file_get_contents
include
fread
url_get_contents
curl_init
curl_setopt
parse_url
Can any of these be set to download a specific number of characters and then exit?
Something like that?
<?php
if ($handle = fopen("http://www.example.com/", "rb")) {
echo fread($handle, 8192);
}
Got from php.net official functions doc examples...
Related
Let's say I'd like to download some information from a file on the internet within PHP, but I do not need the entire file. Therefore, loading the full file through
$my_file = file_get_contents("https://www.webpage.com/".$filename);
would use up more memory and resources than necessary.
Is there a way to download only e.g. the first 5kb of a file as plain text with PHP?
EDIT:
In the comments it was suggested to use e.g. maxlen arg for file_get_contents or similar. But what I noticed that the execution time of the call does not vary appreciably for different maxlen which means that the function loads the full file and then just returns a substring to the variable.
Is there a way to make PHP download just the required amount of bytes and no more, to speed things up?
<?php
$fp = fopen("https://www.webpage.com/".$filename, "r");
$content = fread($fp,5*1024);
fclose($fp);
?>
Note: Make sure allow_url_fopen is enabled.
PHP Doc: fopen, fread
I have a php script that needs to determine the size of a file on the file system after being manipulated by a separate php script.
For example, there exists a zip file that has a fixed size but gets an additional file of unknown size inserted into it based on the user that tries to access it. So the page that's serving the file is something like getfile.php?userid=1234.
So far, I know this:
filesize('getfile.php'); //returns the actual file size of the php file, not the result of script execution
readfile('getfile.php'); //same as filesize()
filesize('getfile.php?userid=1234'); //returns false, as it can't find the file matching the name with GET vars attached
readfile('getfile.php?userid=1234'); //same as filesize()
Is there a way to read the result size of the php script instead of just the php file itself?
filesize
As of PHP 5.0.0, this function can also be used with some URL
wrappers.
something like
filesize('http://localhost/getfile.php?userid=1234');
should be enough
Someone had posted an option for using curl to do this but removed their answer after a downvote. Too bad, because it's the one way I've gotten this to work. So here's their answer that worked for me:
$ch = curl_init('http://localhost/getfile.php?userid=1234');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //This was not part of the poster's answer, but I needed to add it to prevent the file being read from outputting with the requesting script
curl_exec($ch);
$size = 0;
if(!curl_errno($ch))
{
$info = curl_getinfo($ch);
$size = $info['size_download'];
}
curl_close($ch);
echo $size;
The only way to get the size of the output is to run it and then look. Depending on the script the result might differ though for practical use the best thing to do is to estimate basd on your knowledge. i.e. if you have a 5MB file and add another 5k user specific content it's still about 5MB in the end etc.
To expand on Ivan's answer:
Your string is 'getfile.php' with or without GET parameters, this is being treated as a local file, and therefore retrieving the filesize of the php file itself.
It is being treated as a local file because it isn't starting with the http protocol. See http://us1.php.net/manual/en/wrappers.php for supported protocols.
When using filesize() I got a warning:
Warning: filesize() [function.filesize]: stat failed for ...link... in ..file... on line 233
Instead of filesize() I found two working options to replace it:
1)
$headers = get_headers($pdfULR, 1);
$fileSize = $headers['Content-Length'];
echo $fileSize;
2)
echo strlen(file_get_contents($pdfULR));
Now it's working fine.
I have a file plain.cache which is little over 10MB and I made a gzcompressed file gz.cache out of the original plain.cache file. Then, I made two separate files which load each of the mentioned cache files and was kind of surprised that the page load speed of both files was almost the same. So, my question is - am I being right by concluding that gzcompressed file does not in any way benefit the load speed of the page? Now, I would conclude that the gzuncompress that I use in the gz.php file "makes" the same exact string just as when I read it from the plain file. Given all these staments - a general question is how can I (if it all in all is done this way) increase the load speed by compressing the file with gzcompress.
The image of the files is below, and the code of files is as follows:
_makeCache.php, in which I make the gzcompressed version of the plain.cache file:
$str = file_get_contents("plain.cache");
$strCompressed = gzcompress($str, 9);
$file = "gz.cache";
$fp = fopen($file, "w");
fwrite($fp, $strCompressed);
fclose($fp);
plain.php:
echo file_get_contents("plain.cache");
gz.php:
echo gzuncompress(file_get_contents("plain.cache"));
Your http server is compressing the plain.cache automatically on the fly, using gzip as well, and the client decompresses it. So you should see almost no difference.
is file_get_contents() enough for downloading remote movie files located on a server ?
i just think that perhaps storing large movie files to string is harmful ? according to the php docs.
OR do i need to use cURL ? I dont know cURL.
UPDATE: these are big movie files. around 200MB each.
file_get_contents() is a problem because it's going to load the entire file into memory in one go. If you have enough memory to support the operation (taking into account that if this is a web server, you may have multiple hits that generate this behavior simultaneously, and therefore each need that much memory), then file_get_contents() should be fine. However, it's not the right way to do it - you should use a library specifically intended for these sort of operations. As mentioned by others, cURL will do the trick, or wget. You might also have good luck using fopen('http://someurl', 'r') and reading blocks from the file and then dumping them straight to a local file that's been opened for write privileges.
As #mopoke suggested it could depend on the size of the file. For a small movie it may suffice. In general I think cURL would be a better fit though. You have much more flexibility with it than with file_get_contents().
For the best performance you may find it makes sense to just use a standard unix util like WGET. You should be able to call it with system("wget ...") or exec()
http://www.php.net/manual/en/function.system.php
you can read a few bytes at a time using fread().
$src="http://somewhere/test.avi";
$dst="test.avi";
$f = fopen($src, 'rb');
$o = fopen($dst, 'wb');
while (!feof($f)) {
if (fwrite($o, fread($f, 2048)) === FALSE) {
return 1;
}
}
fclose($f);
fclose($o);
I would like to know the best way to save an image from a URL in php.
At the moment I am using
file_put_contents($pk, file_get_contents($PIC_URL));
which is not ideal. I am unable to use curl. Is there a method specifically for this?
Using file_get_contents is fine, unless the file is very large. In that case, you don't really need to be holding the entire thing in memory.
For a large retrieval, you could fopen the remote file, fread it, say, 32KB at a time, and fwrite it locally in a loop until all the file has been read.
For example:
$fout = fopen('/tmp/verylarge.jpeg', 'w');
$fin = fopen("http://www.example.com/verylarge.jpeg", "rb");
while (!feof($fin)) {
$buffer= fread($fin, 32*1024);
fwrite($fout,$buffer);
}
fclose($fin);
fclose($fout);
(Devoid of error checking for simplicity!)
Alternatively, you could forego using the url wrappers and use a class like PEAR's HTTP_Request, or roll your own HTTP client code using fsockopen etc. This would enable you to do efficient things like send If-Modified-Since headers if you are maintaining a cache of remote files.
I'd recommend using Paul Dixon's strategy, but replacing fopen with fsockopen(). The reason is that some server configurations disallow URL access for fopen() and file_get_contents(). The setting may be found in php.ini and is called allow_url_fopen.