I use this code to get info about webpages and files before download.
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_NOBODY, 1);
curl_exec($curl);
if (curl_getinfo($curl, CURLINFO_HTTP_CODE) == 200)
{
echo $info['active'] = true;
echo $info['url'] = $url;
echo $info['size'] = curl_getinfo($curl,CURLINFO_CONTENT_LENGTH_DOWNLOAD);
echo $info['type'] = curl_getinfo($curl,CURLINFO_CONTENT_TYPE);
}
else
echo 'not active';
It works for file or urls format like this:
www.example.com/film.mp4
www.example.com/film.php
but not works without extension name in URL..it return me 'not active';
www.example.com/film
www.example.com/film/test
how can I fixed it?
UPDATE:
it retun 403 error with CURLINFO_HTTP_CODE
This is not a problem with cURL; the website you are trying to pull data from is returning the 403 when you specify a URL without a file.
Try loading those pages yourself in a browser to see.
403 Forbidden - The server understood the request, but is refusing to fulfill it.
If you are getting a 403 error with a HEAD request, and your URL is that of a directory (not a file name), then that means that the server refuses to expose the directory contents, and there isn't a default page (index.html, index.php, etc.) do display. This effectively means there is nothing to cURL at that location, and you can treat it as a 404 error (unless you need to do some extra authentication not described in your question).
Ref: HTTP status codes
Related
I know there are ways to verify if a URL returns a 404 or not.
I have been using the following function and it has been working fine but my problem is that I want to verify a URL of a domain that redirects me to a subdomain depending on the language used by my region.
function page_404($url) {
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
/* If the document has loaded successfully without any redirection or error */
if ($httpCode >= 200 && $httpCode < 300) {
echo $httpCode."<br/>";
return false;
} else {
echo $httpCode."<br/>";
return true;
}
}
For example:
https://example.com/video/123456
I'm redirected to the following URL:
https://es.example.com/video/123456
Which means that it is an http code "301" and my function detects it as redirection and therefore gives me the answer that the video does not exist, but in fact it exists only that the domain I redirected to that subdomain.
If I change the line $httpCode<300 for $httpCode<303 it work.
But the problem is that this page when it receives an invalid url redirects me to its main web so I do not receive a 404 code and it would serve me a 301 or 303.
What can I do? I hope I did well.
You can tell cURL to follow all redirects, and return the result from the final redirection. Use:
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
You would want to make this recursive, since you can redirect to a page that redirects to a page that ... well, you get the idea. And you want to know if the final page exists. And you have no idea up front how many redirects it will take to get there.
You would want a conditional after:
if ($httpCode >= 200 && $httpCode < 300) {
Something like this:
} elseif ($httpCode >= 301 && $httpCode <= 302) {
(This assumes that redirect codes are 301 and 302.. there may be others that I'm not including, so adjust this accordingly). Then in here, grab the URL you're being directed to, then have the function call itself with this URL. It will do this for each redirect.
However, if you do it this way, you may want to add a second parameter so you know how many times you've called this, something like:
function page_404($url, $iteration = 1)
So when you call it later on, you do so this way:
page_404($url, $iteration + 1);
Then, at the very beginning, do a check to make sure you end up in an infinite redirect:
if($iteration > 10) {
echo "Too many redirects";
return (some error);
}
Most browsers will puke if they encounter a URL that redirects 10 or 15 times, so this is probably a fairly safe number, and a safe behavior. Otherwise, you could end up redirecting forever if you hit a misconfigured URL.
We are executing below curl call from PHP.
$url = $fullurl;
if (isset($url)) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch , CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);
$check_url_status = $headers['http_code'];
if ($check_url_status == '200')
$ress = "Link Works";
else
$ress = "Broken Link";
}
What other HTTP status codes should we consider to check if the URL is not a broken /dead link.
Remember the 5 HTTP Status code classes : 1xx Continue (protocol switching), 2xx OK, 3xx Redirect, 4xx client error, 5xx server error.
If your Curl client follow the redirections (3xx), I think you can just test that status code <= 299. All other status code will make a "broken link".
Depending on how deep is your test, you can also think of theses cases :
401 Unauthorized/ 403 Forbidden : the ressource need authentification. It does not mean the link is broken, but that authorized client may see it, and other will not.
204 No Content : the ressource is accessible but does not return any content. Some analytics ressources returns 204. But the visual result will be a broken image or a link to an empty page.
If your goal is to change the display of a broken link you can use Javascript to manage it client-side, but it can be limited to your domain. See this question
I wanted to handle 403 error without using server side error redirection methods(i.e. without using .htaccess file). Sometimes server returns a 403 error message (forbidden access). So is it possible to have PHP script which handles this 403 error message?
For example, Before showing error page, I would like to obtain server status when it my specific php page runs, and without making a redirection, I would just like to display custom message in the that page.
Some solutions for you.
Check for URL errors and make sure the actual web page is specified. Its common reason for a web site to return the 403 Forbidden error, when the URL is pointing to a directory instead of a web page. Which can be done using HttpRequest Class in PHP. You can use http_get to perform GET request. You can also Test URL here.
<?php
$response = http_get("URL", array("timeout"=>1), $info);
print_r($info);
?>
Output:
array (
'effective_url' => 'URL',
'response_code' => 403,
.
and so on
)
What is important for you is response_code with which you can play further.
Use of curl.
function http_response($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$head = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if(!$head)
{
return FALSE;
}
return $httpCode;
}
$errorcode = http_response("URL"); //if success 200 otherwise different
We have an application that is dependent upon a remote service. I have been asked to implement some code whereby if the remote web server is down (due to maintenance or glithes) that I display an appropriate message.
The issue at hand is that when the remote is down for maintenance, they usually redirect to another page. So how do I get about implementing in PHP a robust function that can tell if a particular URL is up and running as opposed to it being redirected to a dummy page.
Thank You
Just check the response text. If the the response contains any text that is present in the redirected url, Then its surely in maintenance mode.
If the remote web server is down you can check it too. see https://stackoverflow.com/questions/9144825/php-check-if-a-site-is-down/9145124#9145124
Just check the HTTP return code. This is possible with curl for instance:
CURLINFO_HTTP_CODE
http://de2.php.net/manual/en/function.curl-getinfo.php
<?php
$success = 0;
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// Check if any error occured
if(!curl_errno($ch))
{
if(curl_getinfo($c, CURLINFO_HTTP_CODE) === 200)
$success = 1;
}
// close cURL resource, and free up system resources
curl_close($ch);
?>
When I curl the following
<?php
$ch = curl_init();
curl_setopt ($ch, CURLOPT_PORT, "8081");
curl_setopt ($ch, CURLOPT_URL, "http://192.168.0.14:8081/comingEpisodes/" );
curl_setopt($ch, CURLOPT_USERPWD, "user:pass");
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$curl_response = curl_exec($ch);
curl_close($ch);
echo $curl_response;
?>
The page is returned however the images aren't. I located the problem. 192.168.0.14 is my local host. I am calling a page from an app the runs off port 8081. Curl seems to drop the port and change 192.168.0.14 to locahost and therefore the images are no longer linked to the right place. How do I make sure that the port remains so the images remain. Thanks
EDIT: I think the /comingEpisodes after the port is also part of the problem..
Unless you're building a 100% proxy, you're dumping the contents of the cURL pull in to a browser. The results now refer from the page that the cURL results are dumped to, not from the originating cURL request.
Basically, if you visit http://localhost and the above code resides in the index.php, that page is requesting the :8081/comingEpisodes contents and dumping it within the context of the originating http://locahost. The browser is now basing all the found content from http://localhost and not as if it were from the curl request.
You could replace all the content links within the document before it's output to some "proxy.php?retrieve=old_url" and then make all those now call through the same cURL context, but that's the basis of a web proxy.
End-User Intermediary End-Website
(http://localhost) (localhost/index.php) (http://192.168.0.14:8081/comingEpisodes/)
------------------ --------------------- ------------------------------------------
Initial visit--------->
cURL Request------------->
Page Content (html basically)
Echoed back to user<------
Content<---------------
Finds <img> etc.------>
/comingEpisodes/img1.jpg // 404 error, it's actually on :8081
// that localhost has no idea about
// because it's being hidden using cURL
VERY SIMPLE DEMO
<?php
//
// Very Dummied-down proxy
//
// Either get the url of the content they need, or use the default "page root"
// when none is supplied. This is not robust at all, as this really only handles
// relative urls (e.g. src="images/foo.jpg", something like src="http://foo.com/"
// would become src="index.php?proxy=http://foo.com/" which makes the below turn
// into "http://www.google.com/http://foo.com/")
$_target = 'http://www.google.com/' . (isset($_GET['proxy']) ? $_GET['proxy'] : '');
// Build the cURL request to get the page contents
$cURL = curl_init($_target);
try
{
// setup cURL to your liking
curl_setopt($cURL, CURLOPT_RETURNTRANSFER, 1);
// execute the request
$page = curl_exec($cURL);
// Forward along the content type (so images, files, etc all are understood correctly)
$contentType = curl_getinfo($cURL, CURLINFO_CONTENT_TYPE);
header('Content-Type: ' . $contentType);
// close curl, we're done.
curl_close($cURL);
// test against the content type. If it HTML then we need to re-parse
// the page to add our proxy intercept in the URL so the visitor keeps using
// our cURL request above for EVEYRTHING it needs from this site.
if (strstr($contentType,'text/html') !== false)
{
//
// It's html, replace all the references to content using URLs
//
// First, load our DOM parser
$html = new DOMDocument();
$html->formatOutput = true;
#$html->loadHTML($page); // was getting parse errors, added # for demo purposes.
// simple demo, look for image references and change them
foreach ($html->getElementsByTagName('img') as $img)
{
// take a typical image:
// <img src="logo.jpg" />
// and make it go through the proxy (so it uses cURL again:
// <img src="index.php?proxy=logo.jpg" />
$img->setAttribute('src', sprintf('%s?proxy=%s', $_SERVER['PHP_SELF'], urlencode($img->getAttribute('src'))));
}
// finally dump it to client with the urls changed
echo $html->saveHTML();
}
else
{
// Not HTML, just dump it.
echo $page;
}
}
// just in case, probably want to do something with this.
catch (Exception $ex)
{
}