Read a remote file in php - php

I want to show contents of a remote file (a file on another server) on my website.
I used the following code, readfile() function is working fine on the current server
<?php
echo readfile("editor.php");
But when I tried to get a remote file
<?php
echo readfile("http://example.com/php_editor.php");
It showed the following error :
301 moved
The document has moved here 224
I am getting this error remote files only, local files are showing with no problem.
Is there anyway to fix this?
Thanks!

Option 1 - Curl
Use CURL and set the CURLOPT_FOLLOWLOCATION-option to true:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http//example.com");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
if(curl_exec($ch) === FALSE) {
echo "Error: " . curl_error($ch);
} else {
echo curl_exec($ch);
}
curl_close($ch);
?>
Option 2 - file_get_contents
According to the PHP Documentation file_get_contents() will follow up to 20 redirects as default. Therefore you could use that function. On failure, file_get_contents() will return FALSE and otherwise it will return the entire file.
<?php
$string = file_get_contents("http://www.example.com");
if($string === FALSE) {
echo "Could not read the file.";
} else {
echo $string;
}
?>

Related

retrieve file information from url using php

i am trying to retrieve information of file from the url containing the file. but how can i get the information of file before downloading it to my server.
i need file information like file size,file type etc
i had found the code to validate and download file but how to get information from it before downloading file actually to server
<?php
function is_url_exist($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if($code == 200)
{
$status = "true";
}
curl_close($ch);
if ( $status == true)
{
$name = "abc.png";
if (file_put_contents("uploads/$name", file_get_contents($url)))
echo "file uploaded";
else
echo "error check upload link";
}
}
$url = "http://theonlytutorials.com/wp-content/uploads/2015/06/blog-logo1.png";
echo is_url_exist($url);
?>
you can get all information of remote file by get_headers function. Try following code to find out type, content length etc.
$url = "http://theonlytutorials.com/wp-content/uploads/2015/06/blog-logo1.png";
$headers = get_headers($url,1);
print_r($headers);
Know more about get_headers click http://php.net/manual/en/function.get-headers.php

Downloading 16 Mb web page with curl and parsing content between <body> tags

I'm building a PHP application which does a curl request to a number of different URL's. It's then attempting to parse the string of data returned by curl to extract everything in the <body> </body> tags. This is working absolutely fine for 99% of URL's.
However, one such URL is a page, which takes some time to load in a browser. Upon inspection I realised that the markup for the page is 16 Mb.
The settings I have for curl are as follows:
$ch = curl_init();
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($ch);
if (!$data) {
echo 'ERROR: Curl has reported an error: ' . curl_error($ch) . "\n";
}
return $data;
The error message I added for the !$data condition is not output - so my assumption is there are no errors from curl itself. I attempted to change CURLOPT_CONNECTTIMEOUT to 120 seconds (as opposed to 5) but this doesn't fix the issue.
When $data is returned to my script:
if ($data) {
$body = '';
preg_match("/<body[^>]*>(.*?)<\/body>/is", $data, $body);
if (empty($body)) {
echo 'WARNING: nothing found in <body> tag: ' . "\n";
var_dump($body);
} else {
// Writing to file occurs here...
// This bit works ok when $body is available.
}
}
It's showing me the warning message "WARNING: nothing found in tag:" and the output from var_dump($body) is an empty array:
array(0) {
}
Does anyone know how I can further debug this, as I'm not sure where the error is originating? I have manually saved a copy of the web page and there are indeed a starting and closing <body> tag with lots of HTML in between.
My assumption is this is some problem due to the file size. The "average" file size on this application is about 1 Mb, and my script works perfectly with these files.
I am running this on a server from the cli, i.e. php download.php not through a browser.

cURL not getting HTML source of URL

I am trying to make a simple web crawler with PHP and I am having issues getting the HTML source of a given URL. I am currently using cURL to get the source.
My code:
$url = "http://www.nytimes.com/";
function url_get_contents($Url) {
if (!function_exists('curl_init')) {
die('CURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
if ($output === false) { die(curl_error($ch)); }
curl_close($ch);
return $output;
}
echo url_get_contents($url);
?>
Right now nothing gets echoed and there aren't any errors, so it is a bit of a mystery. Any suggestions or fixes will be appreciated
Edit: I added
if ($output === false) { die(curl_error($ch)); }
to the middle of the function and it ended up giving me an error (finally!):
Could not resolve host: www.nytimes.com
I still do not really know what the problem is. Any ideas?
Thanks
Turns out that it was not a cURL problem
My host server (Ubuntu VM) was working off of a "host-only" network adapter which blocked access to all other IPs or domains outside of it's host machine making it impossible for cURL to connect to URLs.
Once it was changed to "bridged" network adapter I had access to the outside world.
Hope this helps.
Variable case mismatch ($url vs. $Url). Change:
function url_get_contents($Url) {
to
function url_get_contents($url) {

Correct PHP way to check if external image exists?

I know that there are at least 10 the same questions with answers but none of them seems to work for me flawlessly. I'm trying to check if internal or external image exists (is image URL valid?).
fopen($url, 'r') fails unless I use #fopen():
Warning: fopen(http://example.com/img.jpg) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in file.php on line 21
getimagesize($img) fails when image doesn't exist (PHP 5.3.8):
Warning: getimagesize() [function.getimagesize]: php_network_getaddresses: getaddrinfo failed
CURL fails because it isn't supported by some servers (although it's present mostly everywhere).
fileExists() fails because it doesn't work with external URLs and
can't possibly check if we're dealing with image.
Four methods that are the most common answers to such question are wrong. What would be the correct way to do that?
getimagesize($img) fails when image doesn't exist: am not sure you understand what you want .....
FROM PHP DOC
The getimagesize() function will determine the size of any given image file and return the dimensions along with the file type and a height/width text string to be used inside a normal HTML IMG tag and the correspondant HTTP content type.
On failure, FALSE is returned.
Example
$img = array("http://i.stack.imgur.com/52Ha1.png","http://example.com/img.jpg");
foreach ( $img as $v ) {
echo $v, getimagesize($v) ? " = OK \n" : " = Not valid \n";
}
Output
http://i.stack.imgur.com/52Ha1.png = OK
http://example.com/img.jpg = Not valid
getimagesize works just fine
PHP 5.3.19
PHP 5.4.9
Edit
#Paul .but your question is essentially saying "How do I handle this so I won't get an error when there's an error condition". And the answer to that is "you can't". Because all these functions will trigger an error when there is an error condition. So (if you don't want the error) you suppress it. None of this should matter in production because you shouldn't be displaying errors anyway ;-) – DaveRandom
This code is actually to check file... But, it does works for images!
$url = "http://www.myfico.com/Images/sample_overlay.gif";
$header_response = get_headers($url, 1);
if ( strpos( $header_response[0], "404" ) !== false )
{
// FILE DOES NOT EXIST
}
else
{
// FILE EXISTS!!
}
function checkExternalFile($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$retCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $retCode;
}
$fileExists = checkExternalFile("http://example.com/your/url/here.jpg");
// $fileExists > 400 = not found
// $fileExists = 200 = found.
If you're using PHP >=5.0.0 you can pass an additional parameter into fopen to specify context options for HTTP, among them whether to ignore failure status codes.
$contextOptions = array( 'http' => array('ignore_errors' => true));
$context = stream_context_create($contextOptions);
$handle = fopen($url, 'r', false, $context);
Use fsockopen, connect to the server, send a HEAD request and see what status you get back.
The only time you need to be aware of problems is if the domain doesn't exist.
Example code:
$file = "http://example.com/img.jpg";
$path = parse_url($file);
$fp = #fsockopen($path['host'],$path['port']?:80);
if( !$fp) echo "Failed to connect... Either server is down or host doesn't exist.";
else {
fputs($fp,"HEAD ".$file." HTTP/1.0\r\n"
."Host: ".$path['host']."\r\n\r\n");
$firstline = fgets($fp);
list(,$status,$statustext) = explode(" ",$firstline,3);
if( $status == 200) echo "OK!";
else "Status ".$status." ".$statustext."...";
}
You can use the PEAR/HTTP_Request2 Package for this. You can find it here
Here comes an example. The Example expects that you have installed or downloaded the HTTP_Request2 package properly. It uses the old style socket adapter, not curl.
<?php
require_once 'HTTP/Request2.php';
require_once 'HTTP/Request2/Adapter/Socket.php';
$request = new HTTP_Request2 (
$your_url,
HTTP_Request2::METHOD_GET,
array('adapter' => new HTTP_Request2_Adapter_Socket())
);
switch($request->send()->getResponseCode()) {
case 404 :
echo 'not found';
break;
case 200 :
echo 'found';
break;
default :
echo 'needs further attention';
}
I found try catch the best solution for this. It is working fine with me.
try{
list($width, $height) = getimagesize($h_image->image_url);
}
catch (Exception $e)
{
}
I know you wrote "without curl" but still, somebody may find this helpfull:
function curl_head($url) {
$ch = curl_init($url);
//curl_setopt($ch, CURLOPT_USERAGENT, 'Your user agent');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1); # get headers
curl_setopt($ch, CURLOPT_NOBODY, 1); # omit body
//curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 1); # do SSL check
//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); # verify domain within cert
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); # follow "Location" redirs
//curl_setopt($ch, CURLOPT_TIMEOUT_MS, 700); # dies after 700ms
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
print_r(curl_head('https://www.example.com/image.jpg'));
You will see someting like this HTTP/1.1 200 OK or HTTP/1.1 404 Not Found in returned header array. You can do also multiple parallel requests with curl multi.
There are multiple steps, there is no single solution:
Validate URL
Check whether the file is available (can be done directly with step 3)
Download the image into a tmp file.
Use getimagesize to check the size of the image.
For this kind of work you can catch the exceptions and handle them well to define your answer. In this case you could even suppress errors because it's intended that they trick might fail. So you handle the errors correctly.
Because it's not possible to do a 100% check on it without having the actual image downloaded. So step 1 and 2 are required, 3 and 4 optional for a more definitive answer.

Get the filesize of a js file on another domain using php

How do I get the filesize of js file on another website. I am trying to create a monitor to check that a js file exists and that it is more the 0 bytes.
For example on bar.com I would have the following code:
$filename = 'http://www.foo.com/foo.js';
echo $filename . ': ' . filesize($filename) . ' bytes';
You can use a HTTP HEAD request.
<?php
$url = "http://www.neti.ee/img/neti-logo.gif";
$head = get_headers($url, 1);
echo $head['Content-Length'];
?>
Notice: this is not a real HEAD request, but a GET request that PHP parses for its Content-Length. Unfortunately the PHP function name is quite misleading. This might be sufficient for small js files, but use a real HTTP Head request with Curl for bigger file sizes because then the server won't have to upload the whole file and only send the headers.
For that case, use the code provided by Jakub.
Just use CURL, here is a perfectly good example listed:
Ref: http://www.php.net/manual/en/function.filesize.php#92462
<?php
$remoteFile = 'http://us.php.net/get/php-5.2.10.tar.bz2/from/this/mirror';
$ch = curl_init($remoteFile);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch);
curl_close($ch);
if ($data === false) {
echo 'cURL failed';
exit;
}
$contentLength = 'unknown';
$status = 'unknown';
if (preg_match('/^HTTP\/1\.[01] (\d\d\d)/', $data, $matches)) {
$status = (int)$matches[1];
}
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
}
echo 'HTTP Status: ' . $status . "\n";
echo 'Content-Length: ' . $contentLength;
?>
Result:
HTTP Status: 302
Content-Length: 8808759
Another solution. http://www.php.net/manual/en/function.filesize.php#90913
This is just a two step process:
Crawl the the js file and store it to a variable
Check if the length of the js file is greater than 0
thats it!!
Here is how you can do it in PHP
<?php
$data = file_get_contents('http://www.foo.com/foo.js');
if(strlen($data)>0):
echo "yay"
else:
echo "nay"
?>
Note: You can use HTTP Head as suggested by Uku but then if you are seeking for the page content if js file has content then you would have to crawl again :(

Categories