PHP + CURL http response headers - php

I am currently attempting to configure a CURL & PHP function found online that when called checks if the HTTP response headers is in the 200-300 range to determine if the web page is up. This is successful once ran against an individual website with the code below (not the function itself but the if statements etc) The function returns true or false depending on the range of the HTTP Response header:
$page = "www.google.com";
$page = gzdecode($page);
if (Visit($page))
{
echo $page;
echo " Is OK <br>";
}
else
{
echo $page;
echo " Is DOWN <br>";
}
However when running against an array of URL's stored within the script through the use of a for each loop it reports every webpage within the list as down despite that the code is the same bar the added for loop of course.
Does anyone know what the issue may be surrounding this?
Edit - adding Visit function
My bad sorry, not fully thinking.
The visit function is the following:
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch,CURLOPT_SSLVERSION,3);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<310) return true;
else return false;
}
The foreach loop as mentioned looks like this:
foreach($Urls as $URL)
{
$page = $URL;
$page = gzdecode($page);
if (Visit($page))
The if loop for the visit part is the same as before.

$page = $URL;
$page = gzdecode($page);
Why are you trying to uncompress the non-compressed URL? Assuming you really meant to uncompress the content returned from the URL, why would the remote server server compress it when you you've told it that the client does not support compression? Why are you fetching the entire page to see the headers?
The code you've shown us here has never worked

Related

php file_get_contents() get stuck in loading an image

As mentioned above, the php file_get_contents() function or even the fopen()/fread() combination stucks and times out when trying to read this simple image url:
http://pics.redblue.de/artikelid/GR/1140436/fee_786_587_png
but the same image is easily loaded by browsers, whats the catch?
EDITED:
as requested in comments, I am showing the function I used to get the data:
function customRead($url)
{
$contents = '';
$handle = fopen($url, "rb");
$dex = 0;
while ( !feof($handle) )
{
if ( $dex++ > 100 )
break;
$contents .= fread($handle, 2048);
}
fclose($handle);
echo "\nbreaking due to too many calls...\n";
return $contents;
}
I also tried simply this:
echo file_get_contents('http://pics.redblue.de/artikelid/GR/1140436/fee_786_587_png');
Both give the same issue
EDITED:
As suggested in comment I used curl:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.1 Safari/537.11');
$res = curl_exec($ch);
$rescode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch) ;
echo "\n\n\n[DATA:";
echo $res;
echo "]\n\n\n[CODE:";
print_r($rescode);
echo "]\n\n\n[ERROR:";
echo curl_error($ch);
echo "]\n\n\n";
this is the result:
[DATA:]
[CODE:0]
[ERROR:]
If you don't get the remote data with file_get_contents, you can try it with cURL as it can provide error messages on curl_error. If you get nothing, even no error, then something on your server blocks outgoing connections. Maybe you even want to try curl over SSH. I'm not sure if that makes any difference but it's worth the try. If you don't get anything you may want to consider contacting the server admin (if you're not that) or the provider.

Download an Excel file with PHP and Curl

I have a repetitive task that I do daily. Log in to a web portal, click a link that pops open a new window, and then click a button to download an Excel spreadsheet. It's a time consuming task that I would like to automate.
I've been doing some research with PHP and cUrl, and while it seems like it should be possible, I haven't found any good examples. Has anyone ever done something like this, or do you know of any tools that are better suited for it?
Are you familiar with the basics of HTTP requests? Like, do you know the difference between a POST and a GET request? If what you're doing amounts to nothing more than GET requests, then it's actually super simple and you don't need to use cURL at all. But if "clicking a button" means submitting a POST form, then you will need cURL.
One way to check this is by using a tool such as Live HTTP Headers and watching what requests happen when you click on your links/buttons. It's up to you to figure out which variables need to get passed along with each request and which URLs you need to use.
But assuming that there is at least one POST request, here's a basic script that will post data and get back whatever HTML is returned.
<?php
if ( $ch = curl_init() ) {
$data = 'field1=' . urlencode('somevalue');
$data .= '&field2[]=' . urlencode('someothervalue');
$url = 'http://www.website.com/path/to/post.asp';
$userAgent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
$html = curl_exec($ch);
curl_close($ch);
} else {
$html = false;
}
// write code here to look through $html for
// the link to download your excel file
?>
try this >>>
$ch = curl_init();
$csrf_token = $this->getCSRFToken($ch);// this function to get csrf token from website if you need it
$ch = $this->signIn($ch, $csrf_token);//signin function you must do it and return channel
curl_setopt($ch, CURLOPT_HTTPGET, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 300);// if file large
curl_setopt($ch, CURLOPT_URL, "https://your-URL/anything");
$return=curl_exec($ch);
// the important part
$destination ="files.xlsx";
if (file_exists( $destination)) {
unlink( $destination);
}
$file=fopen($destination,"w+");
fputs($file,$return);
if(fclose($file))
{
echo "downloaded";
}
curl_close($ch);

file_get_contents returns 404 when URL is opened with the Browser and URL is valid

I get the following Error:
Warning:
file_get_contents(https://www.readability.com/api/content/v1/parser?url=http://www.redmondpie.com/ps1-and-ps2-games-will-be-playable-on-playstation-4-very-soon/?utm_source=dlvr.it&utm_medium=twitter&token=MYAPIKEY)
[function.file-get-contents]: failed to open stream: HTTP request
failed! HTTP/1.1 404 NOT FOUND in
/home/DIR/htdocs/readability.php
on line 23
With some Echoes I got the URL parsed by the function and it is fine and valid, I do the request from my Browser and it is OK.
The thing is that I get the Error Above with file_get_contents and I really don't understand why.
The URL is Valid and the Function is NOT Blocked by the Free Hosting Service (So I don't need Curl).
If someone could spot the error in my Code, I would appreciate it!
Thanks...
Here is my Code:
<?php
class jsonRes{
public $url;
public $author;
public $url;
public $image;
public $excerpt;
}
function getReadable($url){
$api_key='MYAPIKEY';
if(isset($url) && !empty($url)){
// I tried changing to http, no 'www' etc... -THE URL IS VALID/The browser opens it normally-
$requesturl='https://www.readability.com/api/content/v1/parser?url=' . urlencode($url) . '&token=' . $api_key;
$response = file_get_contents($requesturl); // * here the code FAILS! *
$g = json_decode($response);
$article_link=$g->url;
$article_author='';
if($g->author != null){
$article_author=$g->author;
}
$article_url=$g->url;
$article_image='';
if($g->lead_image_url != null){
$article_image=$g->lead_image_url;
}
$article_excerpt=$g->excerpt;
$toJSON=new jsonRes();
$toJSON->url=$article_link;
$toJSON->author=$article_author;
$toJSON->url=$article_url;
$toJSON->image=$article_image;
$toJSON->excerpt->$article_excerpt;
$retJSONf=json_encode($toJSON);
return $retJSONf;
}
}
?>
Sometimes a website will block crawlers(from remote servers) from getting to their pages.
What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are.
This is a function which uses the cURL library to do just that.
function get_data($url) {
$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
else{
return $html;
}
//End of cURL function
}
One would then call it as below:
$response = get_data($requesturl);
Curl offers much more options in fetching of remote content and error checking than file_get_contents does. If you even want to customize it further, check out the list of cURL options here - Abridged list of cURL options

Check if Twitter API is down (or the whole site is down in general)

I am using the Twitter API to display the statuses of a user. However, in some cases (like today), Twitter goes down and takes all the APIs with it. Because of this, my application fails and continuously displays the loading screen.
I was wondering if there is a quick way (using PHP or JS) to query Twitter and see if it (and the API) is up. I'm thinking it could be an easy response of some sort.
Thanks in advance,
Phil
Request http://api.twitter.com/1/help/test.xml or test.json. Check to make sure you get a 200 http response code.
If you requested XML the response should be:
<ok>true</ok>
The JSON response should be:
"ok"
JSONP!
You can have some function like this, declared in the head or before including the next script tag below:
var isTwitterWorking = false;
function testTwitter(status) {
if (status === "ok") {
isTwitterWorking = true;
}
}
And then
<script src="http://api.twitter.com/1/help/test.json?callback=testTwitter"></script>
Demo (might take a while, Twitter's API seems to be slow here)
function visit($url) {
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<300)
return true;
else
return false;
}
// Examples
if(visit("http://www.twitter.com"))
echo "Website OK"."n"; // site is online
else
echo "Website DOWN"; // site is offline / show no response
I hope this helps you.

Check if a remote page exists using PHP?

In PHP, how can I determine if any remote file (accessed via HTTP) exists?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$data = curl_exec($ch);
curl_close($ch);
if (!$data) {
echo "Domain could not be found";
}
else {
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if ($code == 200) {
echo "Page Found";
}
elseif ($code == 404) {
echo "Page Not Found";
}
}
Modified version of code from here.
I like curl or fsockopen to solve this problem. Either one can provide header data regarding the status of the file requested. Specifically, you would be looking for a 404 (File Not Found) response. Here is an example I've used with fsockopen:
http://www.php.net/manual/en/function.fsockopen.php#39948
This function will return the response code (the last one in case of redirection), or false in case of a dns or other error. If one argument (the url) is supplied a HEAD request is made. If a second argument is given, a full request is made and the content, if any, of the response is stored by reference in the variable passed as the second argument.
function url_response_code($url, & $contents = null)
{
$context = null;
if (func_num_args() == 1) {
$context = stream_context_create(array('http' => array('method' => 'HEAD')));
}
$contents = #file_get_contents($url, null, $context);
$code = false;
if (isset($http_response_header)) {
foreach ($http_response_header as $header) {
if (strpos($header, 'HTTP/') === 0) {
list(, $code) = explode(' ', $header);
}
}
}
return $code;
}
I recently was looking for the same info. Found some really nice code here: http://php.assistprogramming.com/check-website-status-using-php-and-curl-library.html
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode >= 200 && $httpcode < 300){
return true;
}
else {
return false;
}
}
if(Visit("http://www.site.com")){
echo "Website OK";
}
else{
echo "Website DOWN";
}
Use Curl, and check if the request went through successfully.
http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/
Just a note that these solutions will not work on a site that does not give an appropriate response for a page not found. e.g I just had a problem with testing for a page on a site as it just loads a main site page when it gets a request it cannot handle. So the site will nearly always give a 200 response even for non-existent pages.
Some sites will give a custom error on a standard page and not still not give a 404 header.
Not much you can do in these situations unless you know the expected content of the page and start testing that the expected content exists or test for some expected error text within the page and that is all getting a bit messy...

Categories