I have a function that I use to test is the URL is valid before I store it in my db.
function url_exists($url)
{
ini_set("default_socket_timeout","5");
set_time_limit(5);
$f = fopen($url, "r");
$r = fread($f, 1000);
fclose($f);
return strlen($r) > 1;
}
if( !url_exists($test['urlRedirect']) ) { ... }
It works great, however one of my users reported an issue today and when I tested, indeed the following URL was flagged as invalid:
http://www.artleaguehouston.org/charge-grant-survey
So I tried to remove the page name and use only the domain and still got the error. What is it about this domain that my script chokes on?
You try to eat soup with a swiss knife there!
PHP supports URL wrappers in file_exists:
if (file_exists("http://www.artleaguehouston.org/charge-grant-survey")) {
// URL returns a good status code for your IP and User Agent "PHP/x.x.x"
}
CURL:
$ch = curl_init('http://www.artleaguehouston.org/charge-grant-survey');
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT,
'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0'
);
curl_exec($ch);
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($statusCode == 200) {
// Site up and good status code
}
(Mostly taken from How can one check to see if a remote file exists using PHP? , just to give correct credit).
Related
As mentioned above, the php file_get_contents() function or even the fopen()/fread() combination stucks and times out when trying to read this simple image url:
http://pics.redblue.de/artikelid/GR/1140436/fee_786_587_png
but the same image is easily loaded by browsers, whats the catch?
EDITED:
as requested in comments, I am showing the function I used to get the data:
function customRead($url)
{
$contents = '';
$handle = fopen($url, "rb");
$dex = 0;
while ( !feof($handle) )
{
if ( $dex++ > 100 )
break;
$contents .= fread($handle, 2048);
}
fclose($handle);
echo "\nbreaking due to too many calls...\n";
return $contents;
}
I also tried simply this:
echo file_get_contents('http://pics.redblue.de/artikelid/GR/1140436/fee_786_587_png');
Both give the same issue
EDITED:
As suggested in comment I used curl:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.1 Safari/537.11');
$res = curl_exec($ch);
$rescode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch) ;
echo "\n\n\n[DATA:";
echo $res;
echo "]\n\n\n[CODE:";
print_r($rescode);
echo "]\n\n\n[ERROR:";
echo curl_error($ch);
echo "]\n\n\n";
this is the result:
[DATA:]
[CODE:0]
[ERROR:]
If you don't get the remote data with file_get_contents, you can try it with cURL as it can provide error messages on curl_error. If you get nothing, even no error, then something on your server blocks outgoing connections. Maybe you even want to try curl over SSH. I'm not sure if that makes any difference but it's worth the try. If you don't get anything you may want to consider contacting the server admin (if you're not that) or the provider.
I'm attempting to use cURL to download an external image file. When used from the command line, cURL correctly states the response headers with content-type=image/png. When I attempt to use cURL in PHP however, it returns content-type=text/html.
When attempting to save the file using cURL in PHP, with the CURLOPT_BINARYTRANSFER option set to 1, in conjunction with fopen/fwrite/, the result is a corrupt file.
The only cURL flags I'm using in are -A to send a user agent with the request, which I've also done in PHP by calling curl_setopt($ch, CURLOPT_USERAGENT, ...).
The only thing I can think of that would cause this is perhaps some background request headers sent by cURL which aren't accounted for using the standard PHP functions?
For reference;
CLI
curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I http://find.icaew.com/data/imgs/736c476534ddf7b249d806d9aa7b9ee8.png
PHP
private function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1);
$response = array(
'html' => curl_exec($ch),
'http_code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
'contentLength' => curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD),
'contentType' => curl_getinfo($ch, CURLINFO_CONTENT_TYPE)
);
curl_close($ch);
return $response;
}
public function parseImage() {
$imageSrc = pq('img.firm-logo')->attr('src');
if (!empty($imageSrc)) {
$newFile = '/Users/firstlast/Desktop/Hashery/test01/imgdump/' . $this->currentListingId . '.png';
$curl = $this->curl('http://find.icaew.com' . $imgSrc);
if ($curl['http_code'] == 200) {
if (file_exists($newFile)) unlink($newFile);
$fp = fopen($newFile,'x');
fwrite($fp, $curl['html']);
fclose($fp);
return $this->currentListingId;
} else {
return 0;
}
} else {
return 0;
}
}
When I mentioned content-type=text/html The call to $this->curl() results in the contentLength and contentType properties of the returned $response variable having the values -1 and text/html respectively.
I can imagine this is quite an obscure question, so I've attempted to provide as much context as to what is going on/what I'm trying to achieve. Any help in understanding why this is the case, and what I can do to resolve/achieve my goal would be greatly appreciated
If you know exactly what you are getting then get_file_contents() is much simpler.
A URL can be used as a filename with this function
http://php.net/manual/en/function.file-get-contents.php
Also, it is helpful to go through the user comments on php.net as they have written many examples and potential issues or tricks to using the function.
I am currently attempting to configure a CURL & PHP function found online that when called checks if the HTTP response headers is in the 200-300 range to determine if the web page is up. This is successful once ran against an individual website with the code below (not the function itself but the if statements etc) The function returns true or false depending on the range of the HTTP Response header:
$page = "www.google.com";
$page = gzdecode($page);
if (Visit($page))
{
echo $page;
echo " Is OK <br>";
}
else
{
echo $page;
echo " Is DOWN <br>";
}
However when running against an array of URL's stored within the script through the use of a for each loop it reports every webpage within the list as down despite that the code is the same bar the added for loop of course.
Does anyone know what the issue may be surrounding this?
Edit - adding Visit function
My bad sorry, not fully thinking.
The visit function is the following:
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch,CURLOPT_SSLVERSION,3);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<310) return true;
else return false;
}
The foreach loop as mentioned looks like this:
foreach($Urls as $URL)
{
$page = $URL;
$page = gzdecode($page);
if (Visit($page))
The if loop for the visit part is the same as before.
$page = $URL;
$page = gzdecode($page);
Why are you trying to uncompress the non-compressed URL? Assuming you really meant to uncompress the content returned from the URL, why would the remote server server compress it when you you've told it that the client does not support compression? Why are you fetching the entire page to see the headers?
The code you've shown us here has never worked
I'm checking for the presence of a xml site map on different URLs. If I supply a URL example.com/sitemap.xml, and it has a 301 to www.example.com/sitemap.xml, I get a 301 obviously. If www.example.com/sitemap.xml doesnt exist, I wont see the 404. So, if I get a 301, I execute another cURL to see if a 404 returns for www.example.com/sitemap.xml. But, for reason, I get random 404 and 303 status codes.
private function check_http_status($domain,$file){
$url = $domain . "/" . $file;
$curl = new Curl();
$curl->url = $url;
$curl->nobody = true;
$curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
$curl->execute();
$retcode = $curl->httpCode();
if ($retcode == 301 || $retcode == 302){
$url = "www." . $domain . "/" . $file;
$curl = new Curl();
$curl->url = $url;
$curl->nobody = true;
$curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
$curl->execute();
$retcode = $curl->httpCode();
}
return $retcode;
}
Have a look at the list of response codes returned - http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
Usually a web browser will automatically handle these, but as you are doing things manually with curl, you need to understand what each response means. The 301 or 302 means that you should use the alternative url supplied to access the resource. This may be a simple as addin www to the request but it also may be more complex as a redirect to a different domain altogather.
The 303 means that you are using a POST attempt to access the resource, and should use GET.
Well, when you receive a 301 or 302 you should use the location found in the response, not just assume another location and try that.
As you can see in this example, the response from the server contains the new location of the file. Use that for your next request:
http://en.wikipedia.org/wiki/HTTP_301#Example
"followLocation" works very well. Here is how I implemented it:
$url = "http://www.YOURSITE.com//"; // Assign you url here.
$ch = curl_init(); // initialize curl.
curl_setopt($ch, CURLOPT_URL, $url); // Pass the URL as the option/target.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // 0 will print html. 1 does not.
curl_setopt($ch, CURLOPT_HEADER, 0); // Please curl, inlude the header in the output.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // ..and yes, follow what the server sends as part of the HTTP header.
$response_data = curl_exec($ch); // execute curl with the target URL.
$http_header = curl_getinfo($ch); // Gets information about the last transfer i.e. our URL
// Print the URLs that are not returning 200 Found.
if($http_header['http_code'] != "200") {
echo " <b> PAGE NOT FOUND => </b>"; print $http_header['http_code'];
}
// print $http_header['url']; // Print the URL sent back in the header. This will print the page to wich you were redirected.
print $url; // this will print the original URLs that you are trying to access
curl_close($ch); // we are done with curl; so let's close it.
I am using the Twitter API to display the statuses of a user. However, in some cases (like today), Twitter goes down and takes all the APIs with it. Because of this, my application fails and continuously displays the loading screen.
I was wondering if there is a quick way (using PHP or JS) to query Twitter and see if it (and the API) is up. I'm thinking it could be an easy response of some sort.
Thanks in advance,
Phil
Request http://api.twitter.com/1/help/test.xml or test.json. Check to make sure you get a 200 http response code.
If you requested XML the response should be:
<ok>true</ok>
The JSON response should be:
"ok"
JSONP!
You can have some function like this, declared in the head or before including the next script tag below:
var isTwitterWorking = false;
function testTwitter(status) {
if (status === "ok") {
isTwitterWorking = true;
}
}
And then
<script src="http://api.twitter.com/1/help/test.json?callback=testTwitter"></script>
Demo (might take a while, Twitter's API seems to be slow here)
function visit($url) {
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<300)
return true;
else
return false;
}
// Examples
if(visit("http://www.twitter.com"))
echo "Website OK"."n"; // site is online
else
echo "Website DOWN"; // site is offline / show no response
I hope this helps you.