Get size of a remote file

Get size of a remote file - php

I want to get the size of a remote file. And it can be done via this code:
$headers = get_headers("http://addressoffile", 1);
$filesize= $headers["Content-Length"];
But i do not know the file address directly. But i have a address which redirects to the original file.
For example: I have address http://somedomain.com/files/34
When i put this address into url bar of browser or use function file_get_contents('myfile.pdf',"http://somedomain.com/files/34"); , it starts downloading the original file.
And if i use above functions for calculating size of file then using the address http://somedomain.com/files/34 it return size 0.
Is there any way to get the address where the http://somedomain.com/files/34 redirects .
Or any other solution for calculating the size of the redirected file(original file).

If you want to get a remote file's size you need to think some other way. In this post I am going to show you how we can get a remote file's size from it's header information with out downloading file. I will show you two example. One, using get_headers function and another using cURL. Using get_headers is very simple approach and works for every body. Using cURL is more advance and powerful. You can use any one of these. But I recommend to use cURL approach. Here we go..
get_headers Approach:
/**
* Get Remote File Size
*
* #param sting $url as remote file URL
* #return int as file size in byte
*/
function remote_file_size($url){
# Get all header information
$data = get_headers($url, true);
# Look up validity
if (isset($data['Content-Length']))
# Return file size
return (int) $data['Content-Length'];
}
usage
echo remote_file_size('http://www.google.com/images/srpr/logo4w.png');
cURL Approach
/**
* Remote File Size Using cURL
* #param srting $url
* #return int || void
*/
function remotefileSize($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
curl_exec($ch);
$filesize = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
curl_close($ch);
if ($filesize) return $filesize;
}
usage
echo remotefileSize('http://www.google.com/images/srpr/logo4w.png');

If the site redirects via. the Location header you can use:
// get the redirect url
$headers = get_headers("http://somedomain.com/files/34", 1);
$redirectUrl = $headers['Location'];
// get the filesize
$headers = get_headers($redirectUrl, 1);
$filesize = $headers["Content-Length"];
Please note that this code should not be used in production as there are no checks for existing array keys or error handling.

cURL Approach is good because in some servers get_headers is disable. But if your url have http,https and ... , you need this:
<?php
function remotefileSize($url)
{
//return byte
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
curl_exec($ch);
$filesize = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
curl_close($ch);
if ($filesize) return $filesize;
}
$url = "https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png";
echo remotefileSize($url);
?>

Related

how do php scripts for data mining from web pages work?

[Edited for better explanation and code included]
Hi! I have a php script on my web server that logs in to my heat pump web interface nibeuplink.com and gets all my temperature readings and so forth and returns them in a json-format.
freeboard.io is a free service for visualizing data, so I'm making a freeboard.io for my heat pump values. in freeboard.io I can add any json data as a data source, so I have added the link to my php-script. It fetches the data once but it seems there is some kind of cached values that it uses after that so they are not updated with new values from the script. freeboard.io uses a get-function to get the url. If i use a normal web browser to run the php script and refresh it, the values are updated - and also immediately updated in freeboard.io. Freeboard.io has a setting to automatically update the data source every 5 seconds.
It seems that there is something that triggers the script correctly when it is fetched from my web browser, but not when it is fetched from freeboard.io that uses a get function every 5 seconds to get new data.
in freeboard I can add headers to the get request, is there some header that would help me here to discard any cached data?
I hope that explains my problem better.
Is there anything i can add to my code in the beginning to always force an override of any cached data?
<?php
/*
* read nibe heatpump values from nibeuplink status web page and return them in json format.
* based on: https://www.symcon.de/forum/threads/25663-Heizung-Nibe-F750-Nibe-Uplink-auslesen-auswerten
* to get the code which is required as parameter, log into nibe uplink, open status page of your heatpump, and check url:
* https://www.nibeuplink.com/System/<code>/Status/Overview
*
* usage: nibe.php?email=<email>&password=<password>&code=<code>
*/
// to add additional debug output to the resulting page:
$debug = false;
date_default_timezone_set('Europe/Helsinki');
$date = time();
// Create temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
// URL to login page
$url = "https://www.nibeuplink.com/LogIn";
// Get Login page and its cookies and save cookies in the temp file
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile); // Stores cookies in the temp file
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
// Now you have the cookie, you can POST login values
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 2);
curl_setopt($ch, CURLOPT_POSTFIELDS, "Email=".$_GET['email']."&Password=".$_GET['password']);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile); // Uses cookies from the temp file
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Tells cURL to follow redirects
$output = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, "https://www.nibeuplink.com/System/".$_GET['code']."/Status/ServiceInfo");
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_POST, 0);
$result = curl_exec($ch);
$pattern = '/<h3>(.*?)<\/h3>\s*<table[^>]*>.+?<tbody>(.+?)<\/tbody>\s*<\/table>/s';
if ($debug) echo "pattern: <xmp>".$pattern."</xmp><br>";
$pattern2 = '/<tr>\s*<td>(.+?)<span[^>]*>[^<]*<\/span>\s*<\/td>\s*<td>\s*<span[^>]*>([^<]*)<\/span>\s*<\/td>\s*<\/tr>/s';
if ($debug) echo "pattern2: <xmp>".$pattern2."</xmp><br>";
preg_match_all($pattern, $result, $matches);
// build json format from matches
echo '{';
$first = true;
foreach ($matches[1] as $i => $title) {
echo ($first ? '"' : ',"').trim($title).'":{';
$content = $matches[2][$i];
preg_match_all($pattern2, $content, $values);
$nestedFirst = true;
foreach ($values[1] as $j => $field) {
echo ($nestedFirst ? '"' : ',"').trim($field).'":"'.$values[2][$j].'"';
$nestedFirst = false;
}
echo "}";
$first = false;
}
echo ",\"time\":{\"Last fetch\":\"$date\"}";
echo "}";
if ($debug) {
echo "<pre><xmp>";
echo print_r($matches);
echo "<br><br>";
echo $result;
echo "</xmp></pre>";
}
?>

You can make an ajax call to php script to refresh the part of webpage. I don't understand what do you mean by io i.e. are you talking about fetching the data from database and if any changes occurred in database then only newly added records must be fetched. If you mean it in that sense then you can use cookie to track any new records added into database and only if it finds new records it can make ajax call to php script to run your algorithm on fetched total dataset.

Collecting file with PHP CURL after validating request downloads an empty file

I am doing a system where one of my sites goes to the other to get documents.
On the first site I am using Curl to make a request to get the file wanted:
I am using the solution from Download file from URL using CURL :
function collect_file($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://example.com");
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
return($result);
}
function write_to_file($text,$new_filename){
$fp = fopen($new_filename, 'w');
fwrite($fp, $text);
fclose($fp);
}
$curlUrl = 'http://site2.com/file-depository/14R4NP8JkoIHwIyjnexSUmyJibdpHs5ZpFs3NLFCxcs54kNhHj';
$new_file_name = "testfile-new.png";
$temp_file_contents = collect_file($curlUrl);
write_to_file($temp_file_contents,$new_file_name);
I am testing downloading an image. If i use a direct URL into $curlUrl , for instance http://site2.com/file-depository/image.png it works perfect.
What I am doing is that the URL http://site2.com/file-depository/14R4NP8JkoIHwIyjnexSUmyJibdpHs5ZpFs3NLFCxcs54kNhHj is then parsed and checked against a database to match the document requested, once there is a document matched I need to provide this document to the Curl response.
I have tried many ways to read the file but everytime i am getting a file on the other end but it is only 1kb in size (45 expected) and when trying to open it i get an error unkown file type etc.
On the second site, once the URL is validated here is what I have:
$file = readfile('some-image.png');
echo $file;
I am guessing there is part of the information which belongs to the file missing but can't figure it out, any pointers appreciated!

I have replaced
function write_to_file($text,$new_filename){
$fp = fopen($new_filename, 'w');
fwrite($fp, $text);
fclose($fp);
}
by file_put_contents($new_file_name,trim($temp_file_contents));
Please note the trim(), the issue was that I was apparently collecting some empty space in front of the file content.

Save copy of a photo from Google Places API Place Photos using curl

I'm trying to grab a photo from Google Place Photos using curl and save it on my server.
The request format as per the Google API documentation is like this:
https://maps.googleapis.com/maps/api/place/photo?maxwidth=400&photoreference=CoQBegAAAFg5U0y-iQEtUVMfqw4KpXYe60QwJC-wl59NZlcaxSQZNgAhGrjmUKD2NkXatfQF1QRap-PQCx3kMfsKQCcxtkZqQ&sensor=true&key=AddYourOwnKeyHere
So I tried this function:
function download_image1($image_url, $image_file){
$fp = fopen ($image_file, 'w+');
$ch = curl_init($image_url);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // enable if you want
curl_setopt($ch, CURLOPT_FILE, $fp); // output to file
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000); // some large value to allow curl to run for a long time
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
// curl_setopt($ch, CURLOPT_VERBOSE, true); // Enable this line to see debug prints
curl_exec($ch);
curl_close($ch); // closing curl handle
fclose($fp); // closing file handle
}
download_image1($photo, "test.jpg");
..where $photo holds the request url.
This is not working, it saves an empty image with header errors, it probably is because the request is not the actual url of the photo. Also, in the request url, it's not possible to know which image extension I'm going to get (jpg, png, gif, etc) so that's another problem.
Any help on how to save the photos appreciated.
EDIT: I get the header errors "Can't read file header" in my image viewer software when I try to open the image. The script itself doesn't show any errors.

I found a solution here:
http://kyleyu.com/?q=node/356
It gives a very useful function to return the actual URL after redirection:
function get_furl($url)
{
$furl = false;
// First check response headers
$headers = get_headers($url);
// Test for 301 or 302
if(preg_match('/^HTTP\/\d\.\d\s+(301|302)/',$headers[0]))
{
foreach($headers as $value)
{
if(substr(strtolower($value), 0, 9) == "location:")
{
$furl = trim(substr($value, 9, strlen($value)));
}
}
}
// Set final URL
$furl = ($furl) ? $furl : $url;
return $furl;
}
So you pass the Google Place Photo request uRL to this function and it returns the actual URL of the photo after the redirection which then can be used with CURL. It also explains that sometimes, the curl option curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); doesn't always work.

https://stackoverflow.com/a/23540352/2979237
We can minimize the above code to change as adding 1 as the second parameter of get_header() function like $headers = get_headers($url, 1);. that will return the associate values.

Checking media type of audio and video from external links in php

Does any one know how to check media types of audio and video from external urls.
function get_url_mime_type($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_exec($ch);
return curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
}

To get file's MIME, use the following codes:
<?php
$finfo = new finfo(FILEINFO_MIME, "/usr/share/misc/magic"); // return mime type a.k.a. mimetype extension
/* get mime-type for a specific file */
$filename = "/usr/local/something.txt";
echo $finfo->file($filename);
?>
To get the extension of the file:
$tokens = explode('.', $filename);
$extension = $tokens[count($tokens)-1];
To get the MIME type of external file, either you can use cURL HEAD request to get the header ( may not be accurate , as web server may not response with correct answer ), or use getID3.
This question also asks the similar thing. Take a look of it.
Reference: finfo_open()

Get Final URL From Double Shortened URL (t.co -> bit.ly -> final)

I couldn't convert a double shortened URL to expanded URL successfully using the below function I got from here:
function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
preg_match('/Location: (.*)\n/', $response, $a);
if (!isset($a[1])) return $url;
return $a[1];
}
I got into trouble when the expanded URL I got was again a shortened URL, which has its expanded URL.
How do I get final expanded URL after it has run through both URL shortening services?

Since t.co uses HTML redirection through the use of JavaScript and/or a <meta> redirect we need to grab it's contents first. Then extract the bit.ly URL from it to perform a HTTP header request to get the final location. This method does not rely on cURL to be enabled on server and uses all native PHP5 functions:
Tested and working!
function large_url($url)
{
$data = file_get_contents($url); // t.co uses HTML redirection
$url = strtok(strstr($data, 'http://bit.ly/'), '"'); // grab bit.ly URL
stream_context_set_default(array('http' => array('method' => 'HEAD')));
$headers = get_headers($url, 1); // get HTTP headers
return (isset($headers['Location'])) // check if Location header set
? $headers['Location'] // return Location header value
: $url; // return bit.ly URL instead
}
// DEMO
$url = 'http://t.co/dd4b3kOz';
echo large_url($url);

Finally found a way to get the final url of a double shortened url. The best way is to use longurl api for it.
I am not sure if it is the correct way, but i am at last getting the output as the final url needed :)
Here's what i did:
<?php
function TextAfterTag($input, $tag)
{
$result = '';
$tagPos = strpos($input, $tag);
if (!($tagPos === false))
{
$length = strlen($input);
$substrLength = $length - $tagPos + 1;
$result = substr($input, $tagPos + 1, $substrLength);
}
return trim($result);
}
function expandUrlLongApi($url)
{
$format = 'json';
$api_query = "http://api.longurl.org/v2/expand?" .
"url={$url}&response-code=1&format={$format}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $api_query );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HEADER, false);
$fileContents = curl_exec($ch);
curl_close($ch);
$s1=str_replace("{"," ","$fileContents");
$s2=str_replace("}"," ","$s1");
$s2=trim($s2);
$s3=array();
$s3=explode(",",$s2);
$s4=TextAfterTag($s3[0],(':'));
$s4=stripslashes($s4);
return $s4;
}
echo expandUrlLongApi('http://t.co/dd4b3kOz');
?>
The output i get is:
"http://changeordie.therepublik.net/?p=371#proliferation"
The above code works.
The code that #cryptic shared is also correct ,but i could not get the result on my server (maybe because of some configuration issue).
If anyone thinks that it could be done by some other way, please feel free to share it.

Perhaps you should just use CURLOPT_FOLLOWLOCATION = true and then determine the final URL you were directed to.

In case the problem is not a Javascript redirect as in t.co or a <META http-equiv="refresh"..., this is reslolving stackexchange URLs like https://stackoverflow.com/q/62317 fine:
public function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
$cleanresponse= preg_replace('/[^A-Za-z0-9\- _,.:\n\/]/', '', $response);
preg_match('/Location: (.*)[\n\r]/', $cleanresponse, $a);
if (!isset($a[1])) return $url;
return parse_url($url, PHP_URL_SCHEME).'://'.parse_url($url, PHP_URL_HOST).$a[1];
}
It cleans the response of any special characters, that can occur in the curl output before cuttoing out the result URL (I ran into this problem on a php7.3 server)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get size of a remote file - php

Related

how do php scripts for data mining from web pages work?

Collecting file with PHP CURL after validating request downloads an empty file

Save copy of a photo from Google Places API Place Photos using curl

Checking media type of audio and video from external links in php

Get Final URL From Double Shortened URL (t.co -> bit.ly -> final)

Categories

Resources