Is there a way to get the size of a remote file http://my_url/my_file.txt without downloading the file?
Found something about this here:
Here's the best way (that I've found) to get the size of a remote
file. Note that HEAD requests don't get the actual body of the request,
they just retrieve the headers. So making a HEAD request to a resource
that is 100MB will take the same amount of time as a HEAD request to a
resource that is 1KB.
<?php
/**
* Returns the size of a file without downloading it, or -1 if the file
* size could not be determined.
*
* #param $url - The location of the remote file to download. Cannot
* be null or empty.
*
* #return The size of the file referenced by $url, or -1 if the size
* could not be determined.
*/
function curl_get_file_size( $url ) {
// Assume failure.
$result = -1;
$curl = curl_init( $url );
// Issue a HEAD request and follow any redirects.
curl_setopt( $curl, CURLOPT_NOBODY, true );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );
$data = curl_exec( $curl );
curl_close( $curl );
if( $data ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
}
// http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
Usage:
$file_size = curl_get_file_size( "http://stackoverflow.com/questions/2602612/php-remote-file-size-without-downloading-file" );
Try this code
function retrieve_remote_file_size($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$size = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
curl_close($ch);
return $size;
}
As mentioned a couple of times, the way to go is to retrieve the information from the response header's Content-Length field.
However, you should note that
the server you're probing not necessarily implements the HEAD method(!)
there's absolutely no need to manually craft a HEAD request (which, again, might not even be supported) using fopen or alike or even to invoke the curl library, when PHP has get_headers() (remember: K.I.S.S.)
Use of get_headers() follows the K.I.S.S. principle and works even if the server you're probing does not support the HEAD request.
So, here's my version (gimmick: returns human-readable formatted size ;-)):
Gist: https://gist.github.com/eyecatchup/f26300ffd7e50a92bc4d (curl and get_headers version)
get_headers()-Version:
<?php
/**
* Get the file size of any remote resource (using get_headers()),
* either in bytes or - default - as human-readable formatted string.
*
* #author Stephan Schmitz <eyecatchup#gmail.com>
* #license MIT <http://eyecatchup.mit-license.org/>
* #url <https://gist.github.com/eyecatchup/f26300ffd7e50a92bc4d>
*
* #param string $url Takes the remote object's URL.
* #param boolean $formatSize Whether to return size in bytes or formatted.
* #param boolean $useHead Whether to use HEAD requests. If false, uses GET.
* #return string Returns human-readable formatted size
* or size in bytes (default: formatted).
*/
function getRemoteFilesize($url, $formatSize = true, $useHead = true)
{
if (false !== $useHead) {
stream_context_set_default(array('http' => array('method' => 'HEAD')));
}
$head = array_change_key_case(get_headers($url, 1));
// content-length of download (in bytes), read from Content-Length: field
$clen = isset($head['content-length']) ? $head['content-length'] : 0;
// cannot retrieve file size, return "-1"
if (!$clen) {
return -1;
}
if (!$formatSize) {
return $clen; // return size in bytes
}
$size = $clen;
switch ($clen) {
case $clen < 1024:
$size = $clen .' B'; break;
case $clen < 1048576:
$size = round($clen / 1024, 2) .' KiB'; break;
case $clen < 1073741824:
$size = round($clen / 1048576, 2) . ' MiB'; break;
case $clen < 1099511627776:
$size = round($clen / 1073741824, 2) . ' GiB'; break;
}
return $size; // return formatted size
}
Usage:
$url = 'http://download.tuxfamily.org/notepadplus/6.6.9/npp.6.6.9.Installer.exe';
echo getRemoteFilesize($url); // echoes "7.51 MiB"
Additional note: The Content-Length header is optional. Thus, as a general solution it isn't bullet proof!
Php function get_headers() works for me to check the content-length as
$headers = get_headers('http://example.com/image.jpg', 1);
$filesize = $headers['Content-Length'];
For More Detail : PHP Function get_headers()
Sure. Make a headers-only request and look for the Content-Length header.
one line best solution :
echo array_change_key_case(get_headers("http://.../file.txt",1))['content-length'];
php is too delicius
function urlsize($url):int{
return array_change_key_case(get_headers($url,1))['content-length'];
}
echo urlsize("http://.../file.txt");
I'm not sure, but couldn't you use the get_headers function for this?
$url = 'http://example.com/dir/file.txt';
$headers = get_headers($url, true);
if ( isset($headers['Content-Length']) ) {
$size = 'file size:' . $headers['Content-Length'];
}
else {
$size = 'file size: unknown';
}
echo $size;
The simplest and most efficient implementation:
function remote_filesize($url, $fallback_to_download = false)
{
static $regex = '/^Content-Length: *+\K\d++$/im';
if (!$fp = #fopen($url, 'rb')) {
return false;
}
if (isset($http_response_header) && preg_match($regex, implode("\n", $http_response_header), $matches)) {
return (int)$matches[0];
}
if (!$fallback_to_download) {
return false;
}
return strlen(stream_get_contents($fp));
}
Since this question is already tagged "php" and "curl", I'm assuming you know how to use Curl in PHP.
If you set curl_setopt(CURLOPT_NOBODY, TRUE) then you will make a HEAD request and can probably check the "Content-Length" header of the response, which will be only headers.
Try the below function to get Remote file size
function remote_file_size($url){
$head = "";
$url_p = parse_url($url);
$host = $url_p["host"];
if(!preg_match("/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*/",$host)){
$ip=gethostbyname($host);
if(!preg_match("/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*/",$ip)){
return -1;
}
}
if(isset($url_p["port"]))
$port = intval($url_p["port"]);
else
$port = 80;
if(!$port) $port=80;
$path = $url_p["path"];
$fp = fsockopen($host, $port, $errno, $errstr, 20);
if(!$fp) {
return false;
} else {
fputs($fp, "HEAD " . $url . " HTTP/1.1\r\n");
fputs($fp, "HOST: " . $host . "\r\n");
fputs($fp, "User-Agent: http://www.example.com/my_application\r\n");
fputs($fp, "Connection: close\r\n\r\n");
$headers = "";
while (!feof($fp)) {
$headers .= fgets ($fp, 128);
}
}
fclose ($fp);
$return = -2;
$arr_headers = explode("\n", $headers);
foreach($arr_headers as $header) {
$s1 = "HTTP/1.1";
$s2 = "Content-Length: ";
$s3 = "Location: ";
if(substr(strtolower ($header), 0, strlen($s1)) == strtolower($s1)) $status = substr($header, strlen($s1));
if(substr(strtolower ($header), 0, strlen($s2)) == strtolower($s2)) $size = substr($header, strlen($s2));
if(substr(strtolower ($header), 0, strlen($s3)) == strtolower($s3)) $newurl = substr($header, strlen($s3));
}
if(intval($size) > 0) {
$return=intval($size);
} else {
$return=$status;
}
if (intval($status)==302 && strlen($newurl) > 0) {
$return = remote_file_size($newurl);
}
return $return;
}
Here is another approach that will work with servers that do not support HEAD requests.
It uses cURL to make a request for the content with an HTTP range header asking for the first byte of the file.
If the server supports range requests (most media servers will) then it will receive the response with the size of the resource.
If the server does not response with a byte range, it will look for a content-length header to determine the length.
If the size is found in a range or content-length header, the transfer is aborted. If the size is not found and the function starts reading the response body, the transfer is aborted.
This could be a supplementary approach if a HEAD request results in a 405 method not supported response.
/**
* Try to determine the size of a remote file by making an HTTP request for
* a byte range, or look for the content-length header in the response.
* The function aborts the transfer as soon as the size is found, or if no
* length headers are returned, it aborts the transfer.
*
* #return int|null null if size could not be determined, or length of content
*/
function getRemoteFileSize($url)
{
$ch = curl_init($url);
$headers = array(
'Range: bytes=0-1',
'Connection: close',
);
$in_headers = true;
$size = null;
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2450.0 Iron/46.0.2450.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0); // set to 1 to debug
curl_setopt($ch, CURLOPT_STDERR, fopen('php://output', 'r'));
curl_setopt($ch, CURLOPT_HEADERFUNCTION, function($curl, $line) use (&$in_headers, &$size) {
$length = strlen($line);
if (trim($line) == '') {
$in_headers = false;
}
list($header, $content) = explode(':', $line, 2);
$header = strtolower(trim($header));
if ($header == 'content-range') {
// found a content-range header
list($rng, $s) = explode('/', $content, 2);
$size = (int)$s;
return 0; // aborts transfer
} else if ($header == 'content-length' && 206 != curl_getinfo($curl, CURLINFO_HTTP_CODE)) {
// found content-length header and this is not a 206 Partial Content response (range response)
$size = (int)$content;
return 0;
} else {
// continue
return $length;
}
});
curl_setopt($ch, CURLOPT_WRITEFUNCTION, function($curl, $data) use ($in_headers) {
if (!$in_headers) {
// shouldn't be here unless we couldn't determine file size
// abort transfer
return 0;
}
// write function is also called when reading headers
return strlen($data);
});
$result = curl_exec($ch);
$info = curl_getinfo($ch);
return $size;
}
Usage:
$size = getRemoteFileSize('http://example.com/video.mp4');
if ($size === null) {
echo "Could not determine file size from headers.";
} else {
echo "File size is {$size} bytes.";
}
Most answers here uses either CURL or are basing on reading headers. But in some certain situations you can use a way easier solution. Consider note on filesize()'s docs on PHP.net. You'll find there a tip saying: "As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality".
So, if your server and PHP parser is properly configured, you can simply use filesize() function, fed it with full URL, pointing to a remote file, which size you want to get, and let PHP do the all magic.
Try this: I use it and got good result.
function getRemoteFilesize($url)
{
$file_headers = #get_headers($url, 1);
if($size =getSize($file_headers)){
return $size;
} elseif($file_headers[0] == "HTTP/1.1 302 Found"){
if (isset($file_headers["Location"])) {
$url = $file_headers["Location"][0];
if (strpos($url, "/_as/") !== false) {
$url = substr($url, 0, strpos($url, "/_as/"));
}
$file_headers = #get_headers($url, 1);
return getSize($file_headers);
}
}
return false;
}
function getSize($file_headers){
if (!$file_headers || $file_headers[0] == "HTTP/1.1 404 Not Found" || $file_headers[0] == "HTTP/1.0 404 Not Found") {
return false;
} elseif ($file_headers[0] == "HTTP/1.0 200 OK" || $file_headers[0] == "HTTP/1.1 200 OK") {
$clen=(isset($file_headers['Content-Length']))?$file_headers['Content-Length']:false;
$size = $clen;
if($clen) {
switch ($clen) {
case $clen < 1024:
$size = $clen . ' B';
break;
case $clen < 1048576:
$size = round($clen / 1024, 2) . ' KiB';
break;
case $clen < 1073741824:
$size = round($clen / 1048576, 2) . ' MiB';
break;
case $clen < 1099511627776:
$size = round($clen / 1073741824, 2) . ' GiB';
break;
}
}
return $size;
}
return false;
}
Now, test like these:
echo getRemoteFilesize('http://mandasoy.com/wp-content/themes/spacious/images/plain.png').PHP_EOL;
echo getRemoteFilesize('http://bookfi.net/dl/201893/e96818').PHP_EOL;
echo getRemoteFilesize('https://stackoverflow.com/questions/14679268/downloading-files-as-attachment-filesize-incorrect').PHP_EOL;
Results:
24.82 KiB
912 KiB
101.85 KiB
To cover the HTTP/2 request, the function provided here https://stackoverflow.com/a/2602624/2380767 needs to be changed a bit:
<?php
/**
* Returns the size of a file without downloading it, or -1 if the file
* size could not be determined.
*
* #param $url - The location of the remote file to download. Cannot
* be null or empty.
*
* #return The size of the file referenced by $url, or -1 if the size
* could not be determined.
*/
function curl_get_file_size( $url ) {
// Assume failure.
$result = -1;
$curl = curl_init( $url );
// Issue a HEAD request and follow any redirects.
curl_setopt( $curl, CURLOPT_NOBODY, true );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );
$data = curl_exec( $curl );
curl_close( $curl );
if( $data ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
} elseif( preg_match( "/^HTTP\/2 (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
} elseif( preg_match( "/content-length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
}
// http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
If you using laravel 7 <=
use Illuminate\Support\Facades\Http;
Http::head($url)->header('Content-Length');
Related
I am writing a PHP program that downloads a pdf from a backend and save to a local drive. Now how do I check whether the file exists before downloading?
Currently I am using curl (see code below) to check and download but it still downloads the file which is 1KB in size.
$url = "http://wedsite/test.pdf";
$path = "C:\\test.pdf;"
downloadAndSave($url,$path);
function downloadAndSave($urlS,$pathS)
{
$fp = fopen($pathS, 'w');
$ch = curl_init($urlS);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo $httpCode;
//If 404 is returned, then file is not found.
if(strcmp($httpCode,"404") == 1)
{
echo $httpCode;
echo $urlS;
}
fclose($fp);
}
I want to check whether the file exists before even downloading. Any idea how to do it?
You can do this with a separate curl HEAD request:
curl_setopt($ch, CURLOPT_NOBODY, true);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
When you actually want to download you can use set NOBODY to false.
Call this before your download function and it's done:
<?php function remoteFileExists($url) {
$curl = curl_init($url);
//don't fetch the actual page, you only want to check the connection is ok
curl_setopt($curl, CURLOPT_NOBODY, true);
//do request
$result = curl_exec($curl);
$ret = false;
//if request did not fail
if ($result !== false) {
//if request was ok, check response code
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
}
}
curl_close($curl);
return $ret;
}
?>
Since you are using HTTP to fetch a resource on the internet, what you really want to check is that the return code is a 404.
On some PHP installations, you can just use file_exists($url) out of the box. This does not work in all environments, however. http://www.php.net/manual/en/wrappers.http.php
Here is a function much like file_exists but for URLs, using curl:
<?php function curl_exists()
$file_headers = #get_headers($url);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
} ?>
source: http://www.php.net/manual/en/function.file-exists.php#75064
Sometimes the CURL extension isn't installed with PHP. In that case you can still use the socket library in the PHP core:
<?php function url_exists($url) {
$a_url = parse_url($url);
if (!isset($a_url['port'])) $a_url['port'] = 80;
$errno = 0;
$errstr = '';
$timeout = 30;
if(isset($a_url['host']) && $a_url['host']!=gethostbyname($a_url['host'])){
$fid = fsockopen($a_url['host'], $a_url['port'], $errno, $errstr, $timeout);
if (!$fid) return false;
$page = isset($a_url['path']) ?$a_url['path']:'';
$page .= isset($a_url['query'])?'?'.$a_url['query']:'';
fputs($fid, 'HEAD '.$page.' HTTP/1.0'."\r\n".'Host: '.$a_url['host']."\r\n\r\n");
$head = fread($fid, 4096);
$head = substr($head,0,strpos($head, 'Connection: close'));
fclose($fid);
if (preg_match('#^HTTP/.*\s+[200|302]+\s#i', $head)) {
$pos = strpos($head, 'Content-Type');
return $pos !== false;
}
} else {
return false;
}
} ?>
source: http://www.php.net/manual/en/function.file-exists.php#73175
An even faster function can be found here:
http://www.php.net/manual/en/function.file-exists.php#76246
In the first example above $file_headers[0] may contain more than or something other than 'HTTP/1.1 404 Not Found', e.g:
HTTP/1.1 404 Document+%2Fdb%2Fscotbiz%2Freports%2FR20131212%2Exml+not+found
So it's important to use some other test, e.g., regex, as '==' is not reliable.
Is it possible to make curl, access a url and the result as a file resource? like how fopen does it.
My goals:
Parse a CSV file
Pass it to fgetcsv
My obstruction: fopen is disabled
My chunk of codes (in fopen)
$url = "http://download.finance.yahoo.com/d/quotes.csv?s=USDEUR=X&f=sl1d1t1n&e=.csv";
$f = fopen($url, 'r');
print_r(fgetcsv($f));
Then, I am trying this on curl.
$curl = curl_init();
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
curl_setopt($curl, CURLOPT_URL, $url);
$content = #curl_exec($curl);
curl_close($curl);
But, as usual. $content already returns a string.
Now, is it possible for curl to return it as a file resource pointer? just like fopen? Using PHP < 5.1.x something. I mean, not using str_getcsv, since it's only 5.3.
My error
Warning: fgetcsv() expects parameter 1 to be resource, boolean given
Thanks
Assuming that by fopen is disabled you mean "allow_url_fopen is disabled", a combination of CURLOPT_FILE and php://temp make this fairly easy:
$f = fopen('php://temp', 'w+');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_FILE, $f);
// Do you need these? Your fopen() method isn't a post request
// curl_setopt($curl, CURLOPT_POST, true);
// curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
curl_exec($curl);
curl_close($curl);
rewind($f);
while ($line = fgetcsv($f)) {
print_r($line);
}
fclose($f);
Basically this creates a pointer to a "virtual" file, and cURL stores the response in it. Then you just reset the pointer to the beginning and it can be treated as if you had opened it as usual with fopen($url, 'r');
You can create a temporary file using fopen() and then fwrite() the contents into it. After that, the newly created file will be readable by fgetcsv(). The tempnam() function should handle the creation of arbitrary temporary files.
According to the comments on str_getcsv(), users without access to the command could try the function below. There are also various other approaches in the comments, make sure you check them out.
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '\\', $eol = '\n') {
if (is_string($input) && !empty($input)) {
$output = array();
$tmp = preg_split("/".$eol."/",$input);
if (is_array($tmp) && !empty($tmp)) {
while (list($line_num, $line) = each($tmp)) {
if (preg_match("/".$escape.$enclosure."/",$line)) {
while ($strlen = strlen($line)) {
$pos_delimiter = strpos($line,$delimiter);
$pos_enclosure_start = strpos($line,$enclosure);
if (
is_int($pos_delimiter) && is_int($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
) {
$enclosed_str = substr($line,1);
$pos_enclosure_end = strpos($enclosed_str,$enclosure);
$enclosed_str = substr($enclosed_str,0,$pos_enclosure_end);
$output[$line_num][] = $enclosed_str;
$offset = $pos_enclosure_end+3;
} else {
if (empty($pos_delimiter) && empty($pos_enclosure_start)) {
$output[$line_num][] = substr($line,0);
$offset = strlen($line);
} else {
$output[$line_num][] = substr($line,0,$pos_delimiter);
$offset = (
!empty($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
)
?$pos_enclosure_start
:$pos_delimiter+1;
}
}
$line = substr($line,$offset);
}
} else {
$line = preg_split("/".$delimiter."/",$line);
/*
* Validating against pesky extra line breaks creating false rows.
*/
if (is_array($line) && !empty($line[0])) {
$output[$line_num] = $line;
}
}
}
return $output;
} else {
return false;
}
} else {
return false;
}
}
There is a redirect to server for information and once response comes from server, I want to check HTTP code to throw an exception if there is any code starting with 4XX. For that I need to know how can I get only HTTP code from header? Also here redirection to server is involved so I afraid curl will not be useful to me.
So far I have tried this solution but it's very slow and creates script time out in my case. I don't want to increase script time out period and wait longer just to get an HTTP code.
Thanks in advance for any suggestion.
Your method with get_headers and requesting the first response line will return the status code of the redirect (if any) and more importantly, it will do a GET request which will transfer the whole file.
You need only a HEAD request and then to parse the headers and return the last status code. Following is a code example that does this, it's using $http_response_header instead of get_headers, but the format of the array is the same:
$url = 'http://example.com/';
$options['http'] = array(
'method' => "HEAD",
'ignore_errors' => 1,
);
$context = stream_context_create($options);
$body = file_get_contents($url, NULL, $context);
$responses = parse_http_response_header($http_response_header);
$code = $responses[0]['status']['code']; // last status code
echo "Status code (after all redirects): $code<br>\n";
$number = count($responses);
$redirects = $number - 1;
echo "Number of responses: $number ($redirects Redirect(s))<br>\n";
if ($redirects)
{
$from = $url;
foreach (array_reverse($responses) as $response)
{
if (!isset($response['fields']['LOCATION']))
break;
$location = $response['fields']['LOCATION'];
$code = $response['status']['code'];
echo " * $from -- $code --> $location<br>\n";
$from = $location;
}
echo "<br>\n";
}
/**
* parse_http_response_header
*
* #param array $headers as in $http_response_header
* #return array status and headers grouped by response, last first
*/
function parse_http_response_header(array $headers)
{
$responses = array();
$buffer = NULL;
foreach ($headers as $header)
{
if ('HTTP/' === substr($header, 0, 5))
{
// add buffer on top of all responses
if ($buffer) array_unshift($responses, $buffer);
$buffer = array();
list($version, $code, $phrase) = explode(' ', $header, 3) + array('', FALSE, '');
$buffer['status'] = array(
'line' => $header,
'version' => $version,
'code' => (int) $code,
'phrase' => $phrase
);
$fields = &$buffer['fields'];
$fields = array();
continue;
}
list($name, $value) = explode(': ', $header, 2) + array('', '');
// header-names are case insensitive
$name = strtoupper($name);
// values of multiple fields with the same name are normalized into
// a comma separated list (HTTP/1.0+1.1)
if (isset($fields[$name]))
{
$value = $fields[$name].','.$value;
}
$fields[$name] = $value;
}
unset($fields); // remove reference
array_unshift($responses, $buffer);
return $responses;
}
For more information see: HEAD first with PHP Streams, at the end it contains example code how you can do the HEAD request with get_headers as well.
Related: How can one check to see if a remote file exists using PHP?
Something like:
$ch = curl_init();
$httpcode = curl_getinfo ($ch, CURLINFO_HTTP_CODE );
You should try the HttpEngine Class.
Hope this helps.
--
EDIT
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $your_agent_variable);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $your_referer);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpcode ...)
The solution you found looks good. If the server is not able to send you the http headers in time your problem is that the other server is broken or under very heavy load.
Is there a way in php to check remote files size and date/time on the server ?
try for date
function GetRemoteLastModified( $uri )
{
// default
$unixtime = 0;
$fp = fopen( $uri, "r" );
if( !$fp ) {return;}
$MetaData = stream_get_meta_data( $fp );
foreach( $MetaData['wrapper_data'] as $response )
{
// case: redirection
if( substr( strtolower($response), 0, 10 ) == 'location: ' )
{
$newUri = substr( $response, 10 );
fclose( $fp );
return GetRemoteLastModified( $newUri );
}
// case: last-modified
elseif( substr( strtolower($response), 0, 15 ) == 'last-modified: ' )
{
$unixtime = strtotime( substr($response, 15) );
break;
}
}
fclose( $fp );
return $unixtime;
}
and for file size
function remotefilesize($remoteFile){
$ch = curl_init($remoteFile);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch);
curl_close($ch);
if ($data === false) {
echo 'cURL failed';
exit;
}
$contentLength = 'unknown';
$status = 'unknown';
if (preg_match('/^HTTP\/1\.[01] (\d\d\d)/', $data, $matches)) {
$status = (int)$matches[1];
}
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
}
echo 'HTTP Status: ' . $status . "\n";
echo 'Content-Length: ' . $contentLength;
}
Please check for explaination. I have got above functions from these links.
http://php.net/manual/en/function.filemtime.php
http://php.net/manual/en/function.filesize.php
You could use filemtime() and filesize().
Be aware of this BTW:
As of PHP 5.0.0, this function can also be used with some URL wrappers.
Refer to List of Supported Protocols/Wrappers for a listing of which wrappers
support stat() family of functionality.
What I'd like to do is find out what is the last/final URL after following the redirections.
I would prefer not to use cURL. I would like to stick with pure PHP (stream wrappers).
Right now I have a URL (let's say http://domain.test), and I use get_headers() to get specific headers from that page. get_headers will also return multiple Location: headers (see Edit below). Is there a way to use those headers to build the final URL? or is there a PHP function that would automatically do this?
Edit: get_headers() follows redirections and returns all the headers for each response/redirections, so I have all the Location: headers.
function getRedirectUrl ($url) {
stream_context_set_default(array(
'http' => array(
'method' => 'HEAD'
)
));
$headers = get_headers($url, 1);
if ($headers !== false && isset($headers['Location'])) {
return $headers['Location'];
}
return false;
}
Additionally...
As was mentioned in a comment, the final item in $headers['Location'] will be your final URL after all redirects. It's important to note, though, that it won't always be an array. Sometimes it's just a run-of-the-mill, non-array variable. In this case, trying to access the last array element will most likely return a single character. Not ideal.
If you are only interested in the final URL, after all the redirects, I would suggest changing
return $headers['Location'];
to
return is_array($headers['Location']) ? array_pop($headers['Location']) : $headers['Location'];
... which is just if short-hand for
if(is_array($headers['Location'])){
return array_pop($headers['Location']);
}else{
return $headers['Location'];
}
This fix will take care of either case (array, non-array), and remove the need to weed-out the final URL after calling the function.
In the case where there are no redirects, the function will return false. Similarly, the function will also return false for invalid URLs (invalid for any reason). Therefor, it is important to check the URL for validity before running this function, or else incorporate the redirect check somewhere into your validation.
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* #param string $url
* #return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = #parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* #param string $url
* #return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* #param string $url
* #return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}
And, as always, give credit:
http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
While the OP wanted to avoid cURL, it's best to use it when it's available. Here's a solution which has the following advantages
uses curl for all the heavy lifting, so works with https
copes with servers which return lower cased location header name (both xaav and webjay's answers do not handle this)
allows you to control how deep you want you go before giving up
Here's the function:
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
$url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close ($ch);
return $url;
}
Here's a more verbose version which allows you to inspect the redirection chain rather than let curl follow it.
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
while ($maxRequests--) {
//fetch
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
//try to determine redirection url
$location = '';
if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) {
if (preg_match('/Location:(.*)/i', $response, $match)) {
$location = trim($match[1]);
}
}
if (empty($location)) {
//we've reached the end of the chain...
return $url;
}
//build next url
if ($location[0] == '/') {
$u = parse_url($url);
$url = $u['scheme'] . '://' . $u['host'];
if (isset($u['port'])) {
$url .= ':' . $u['port'];
}
$url .= $location;
} else {
$url = $location;
}
}
return null;
}
As an example of redirection chain which this function handles, but the others do not, try this:
echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005')
At the time of writing, this involves 4 requests, with a mixture of Location and location headers involved.
xaav answer is very good; except for the following two issues:
It does not support HTTPS protocol => The solution was proposed as a comment in the original site: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Some sites will not work since they will not recognise the underlying user agent (client browser)
=> This is simply fixed by adding a User-agent header field: I added an Android user agent (you can find here http://www.useragentstring.com/pages/useragentstring.php other user agent examples according you your need):
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
Here's the modified answer:
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* #param string $url
* #return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = #parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* #param string $url
* #return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* #param string $url
* #return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}
Added to code from answers #xaav and #Houssem BDIOUI: 404 Error case and case when URL with no response. get_final_url($url) in that cases return strings: 'Error: 404 Not Found' and 'Error: No Responce'.
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect,
* or 'Error: No Responce',
* or 'Error: 404 Not Found'
*
* #param string $url
* #return string
*/
function get_redirect_url($url)
{
$redirect_url = null;
$url_parts = #parse_url($url);
if (!$url_parts)
return false;
if (!isset($url_parts['host']))
return false; //can't process relative URLs
if (!isset($url_parts['path']))
$url_parts['path'] = '/';
$sock = #fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return 'Error: No Responce';
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?' . $url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while (!feof($sock))
$response .= fread($sock, 8192);
fclose($sock);
if (stripos($response, '404 Not Found') !== false)
{
return 'Error: 404 Not Found';
}
if (preg_match('/^Location: (.+?)$/m', $response, $matches))
{
if (substr($matches[1], 0, 1) == "/")
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else
{
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* #param string $url
* #return array
*/
function get_all_redirects($url)
{
$redirects = array();
while ($newurl = get_redirect_url($url))
{
if (in_array($newurl, $redirects))
{
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect,
* or 'Error: No Responce'
* or 'Error: 404 Not Found',
*
* #param string $url
* #return string
*/
function get_final_url($url)
{
$redirects = get_all_redirects($url);
if (count($redirects) > 0)
{
return array_pop($redirects);
} else
{
return $url;
}
}
After hours of reading Stackoverflow and trying out all custom functions written by people as well as trying all the cURL suggestions and nothing did more than 1 redirection, I managed to do a logic of my own which works.
$url = 'facebook.com';
// First let's find out if we just typed the domain name alone or we prepended with a protocol
if (preg_match('/(http|https):\/\/[a-z0-9]+[a-z0-9_\/]*/',$url)) {
$url = $url;
} else {
$url = 'http://' . $url;
echo '<p>No protocol given, defaulting to http://';
}
// Let's print out the initial URL
echo '<p>Initial URL: ' . $url . '</p>';
// Prepare the HEAD method when we send the request
stream_context_set_default(array('http' => array('method' => 'HEAD')));
// Probe for headers
$headers = get_headers($url, 1);
// If there is a Location header, trigger logic
if (isset($headers['Location'])) {
// If there is more than 1 redirect, Location will be array
if (is_array($headers['Location'])) {
// If that's the case, we are interested in the last element of the array (thus the last Location)
echo '<p>Redirected URL: ' . $headers['Location'][array_key_last($headers['Location'])] . '</p>';
$url = $headers['Location'][array_key_last($headers['Location'])];
} else {
// If it's not an array, it means there is only 1 redirect
//var_dump($headers['Location']);
echo '<p>Redirected URL: ' . $headers['Location'] . '</p>';
$url = $headers['Location'];
}
} else {
echo '<p>URL: ' . $url . '</p>';
}
// You can now send get_headers to the latest location
$headers = get_headers($url, 1);