In PHP, how can I determine if any remote file (accessed via HTTP) exists?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$data = curl_exec($ch);
curl_close($ch);
if (!$data) {
echo "Domain could not be found";
}
else {
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if ($code == 200) {
echo "Page Found";
}
elseif ($code == 404) {
echo "Page Not Found";
}
}
Modified version of code from here.
I like curl or fsockopen to solve this problem. Either one can provide header data regarding the status of the file requested. Specifically, you would be looking for a 404 (File Not Found) response. Here is an example I've used with fsockopen:
http://www.php.net/manual/en/function.fsockopen.php#39948
This function will return the response code (the last one in case of redirection), or false in case of a dns or other error. If one argument (the url) is supplied a HEAD request is made. If a second argument is given, a full request is made and the content, if any, of the response is stored by reference in the variable passed as the second argument.
function url_response_code($url, & $contents = null)
{
$context = null;
if (func_num_args() == 1) {
$context = stream_context_create(array('http' => array('method' => 'HEAD')));
}
$contents = #file_get_contents($url, null, $context);
$code = false;
if (isset($http_response_header)) {
foreach ($http_response_header as $header) {
if (strpos($header, 'HTTP/') === 0) {
list(, $code) = explode(' ', $header);
}
}
}
return $code;
}
I recently was looking for the same info. Found some really nice code here: http://php.assistprogramming.com/check-website-status-using-php-and-curl-library.html
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode >= 200 && $httpcode < 300){
return true;
}
else {
return false;
}
}
if(Visit("http://www.site.com")){
echo "Website OK";
}
else{
echo "Website DOWN";
}
Use Curl, and check if the request went through successfully.
http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/
Just a note that these solutions will not work on a site that does not give an appropriate response for a page not found. e.g I just had a problem with testing for a page on a site as it just loads a main site page when it gets a request it cannot handle. So the site will nearly always give a 200 response even for non-existent pages.
Some sites will give a custom error on a standard page and not still not give a 404 header.
Not much you can do in these situations unless you know the expected content of the page and start testing that the expected content exists or test for some expected error text within the page and that is all getting a bit messy...
Related
Suppose I've one URL which is supposed to represent an image i.e. if I enter the same URL in an address bar and hit it, the image should display in a browser window.
If the URL doesn't have any image present at it it should return false otherwise it should return true.
How should this be done in an efficient and reliable way using PHP ?
I use this little guy:
function remoteFileExists($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if (curl_exec($ch)) return true;
else return false;
}
Use like:
if (remoteFileExists('https://www.google.com/images/srpr/logo11w.png')){
echo 'Yay! Photo is there.';
} else {
echo 'Photo no home.';
}
There are two options:
You can use curl, it is explained here : How can one check to see if a remote file exists using PHP?
Use PHP file_exists() : http://php.net/manual/en/function.file-exists.php
Example :
$file = 'http://www.domain.com/somefile.jpg';
$file_headers = #get_headers($file);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
Try this
$ch = curl_init("https://www.google.com/images/srpr/logo11w.png");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if($retcode==200)
echo 'File Exist';
I am currently attempting to configure a CURL & PHP function found online that when called checks if the HTTP response headers is in the 200-300 range to determine if the web page is up. This is successful once ran against an individual website with the code below (not the function itself but the if statements etc) The function returns true or false depending on the range of the HTTP Response header:
$page = "www.google.com";
$page = gzdecode($page);
if (Visit($page))
{
echo $page;
echo " Is OK <br>";
}
else
{
echo $page;
echo " Is DOWN <br>";
}
However when running against an array of URL's stored within the script through the use of a for each loop it reports every webpage within the list as down despite that the code is the same bar the added for loop of course.
Does anyone know what the issue may be surrounding this?
Edit - adding Visit function
My bad sorry, not fully thinking.
The visit function is the following:
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch,CURLOPT_SSLVERSION,3);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<310) return true;
else return false;
}
The foreach loop as mentioned looks like this:
foreach($Urls as $URL)
{
$page = $URL;
$page = gzdecode($page);
if (Visit($page))
The if loop for the visit part is the same as before.
$page = $URL;
$page = gzdecode($page);
Why are you trying to uncompress the non-compressed URL? Assuming you really meant to uncompress the content returned from the URL, why would the remote server server compress it when you you've told it that the client does not support compression? Why are you fetching the entire page to see the headers?
The code you've shown us here has never worked
I am pulling some content in to a text file and than using curl or file_get_contents to display it
Here it works perfectly fine
http://www.dev.phosting.eu/
but here
it returns 404
http://dev5.gozenhost.com/index.php/shortcodes/114-testing
and the file is accessible
http://dev5.gozenhost.com/media/plg_system_yjsg/yjsgparsed/raw-githubusercontent-com/yjsgframework/demo-docs/master/shortcodes/Icons.txt
$getContent returns the accessible link above , and this is curl.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $getContent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if (empty($data)) {
$content = 'Error processing url. ' . $httpCode;
} else if ($httpCode >= 200 && $httpCode < 300) {
if ($local) {
$content = $data;
} else {
$content = yjsg_clean_html($data);
JFile::write($filepath, $content);
}
} else {
$content = 'Error processing url.' . $httpCode;
}
I mean all files are in the right places , and accessible
Funny thing is if I use curl or file_get_contents to access someone else site it works fine , if I am accessing file on my own domain it fails. Again only on cloudlinux.
Does anyone know what the issue is and possible fix .
Thank you!
I need a way to check if tweet exists. I have link to tweet like https://twitter.com/darknille/status/355651101657280512 . I preferably want a fast way to check (without retrieving body of page, just HEAD request), so I tried something like this
function if_curl_exists($url)
{
$resURL = curl_init();
curl_setopt($resURL, CURLOPT_URL, $url);
curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback');
curl_setopt($resURL, CURLOPT_FAILONERROR, 1);
$x = curl_exec ($resURL);
//var_dump($x);
echo $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE);
curl_close ($resURL);
if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) {
return false;
}
else return true;
}
or like this
function if_curl_exists_1($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);//head request
$result = curl_exec($curl);
$ret = false;
if ($result !== false) {
//if request was ok, check response code
echo $statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
}
}
curl_close($curl);
return $ret;
}
but both those return null with curl_exec(), there is nothing to check for http status code.
The other way is to use twitter api, like GET statuses/show/:id https://dev.twitter.com/docs/api/1.1/get/statuses/show/%3Aid but there is no special return value if tweet doesn't exist, as said here https://dev.twitter.com/discussions/8802
I need advice whats the fastest way to check, I am doing in php.
You probably have to set the Return Transfer flag
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
If the code returns as 30x status you probably have to add the Follow Location flag as well
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
You can use #get_header. It will return an array in which the first item has the response code:
$response = #get_headers($url);
print_r($response[0]);
if($response[0]=='HTTP/1.0 404 Not Found'){
echo 'Not Found';
}else{
echo 'Found';
}
If you just enter the urls into the browser you can see that both work, cdon works even without javascript, have they blocked cURL somehow?
I'm trying to build a scraper to benifit legal movies online which would benifit them a whole lot, seems stupid blocking scrapers in general imho. Although I'm far from sure that's whats going on here! Might be just an error somewhere..
// Works
get_file1('http://sfanytime.com/sv-SE/Sokresultat/?field=all&q=The+Matrix', '/', 'sfanytime.html');
// Saves a blank 0 KB file
get_file1('http://downloads.cdon.com/index.phtml?action=search&search_terms=The+Matrix', '/', 'cdon.html');
function get_file1($file, $local_path, $newfilename) {
$out = fopen($newfilename, 'wb');
if ($out === FALSE) {
return false;
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_FILE, $out);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, $file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$error = curl_error($ch);
if (strlen($error) > 0) {
echo "<br>Error is : ". $error;
return false;
}
curl_close($ch);
return true;
}
You should change the line
curl_setopt($ch, CURLOPT_FAILONERROR, true);
...to...
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
CURLOPT_FAILONERROR will cause a "silent fail" - which from what you say, is not what you want. I have replaced this with CURLOPT_FOLLOWLOCATION, because when I visit the second URL, I get redirected to a "choose your country" type page, which will be a response with an empty body - which is why you get an empty file.
There is no problem with your code as such, simply a problem with the way you handle the response from the second URL. You don't see an error because, technically, there wasn't one.