I found this function that does an AWESOME job (IMHO): http://nadeausoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
The only problem I have is that it doesn't work for https://. Anny ideas what I need to do to make this work for https? Thanks!
Quick fix, add this in your options:
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false)
Now you have no idea what host you're actually connecting to, because cURL will not verify the certificate in any way. Hope you enjoy man-in-the-middle attacks!
Or just add it to your current function:
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
I was trying to use CURL to do some https API calls with php and ran into this problem. I noticed a recommendation on the php site which got me up and running: http://php.net/manual/en/function.curl-setopt.php#110457
Please everyone, stop setting CURLOPT_SSL_VERIFYPEER to false or 0. If
your PHP installation doesn't have an up-to-date CA root certificate
bundle, download the one at the curl website and save it on your
server:
http://curl.haxx.se/docs/caextract.html
Then set a path to it in your php.ini file, e.g. on Windows:
curl.cainfo=c:\php\cacert.pem
Turning off CURLOPT_SSL_VERIFYPEER allows man in the middle (MITM)
attacks, which you don't want!
Another option like Gavin Palmer answer is to use the .pem file but with a curl option
download the last updated .pem file from https://curl.haxx.se/docs/caextract.html and save it somewhere on your server(outside the public folder)
set the option in your code instead of the php.ini file.
In your code
curl_setopt($ch, CURLOPT_CAINFO, $_SERVER['DOCUMENT_ROOT'] . "/../cacert-2017-09-20.pem");
NOTE: setting the cainfo in the php.ini like #Gavin Palmer did is better than setting it in your code like I did, because it will save a disk IO every time the function is called, I just make it like this in case you want to test the cainfo file on the fly instead of changing the php.ini while testing your function.
One important note, the solution mentioned above will not work on local host, you have to upload your code to server and then it will work. I was getting no error, than bad request, the problem was I was using localhost (test.dev,myproject.git). Both solution above work, the solution that uses SSL cert is recommended.
Go to https://curl.haxx.se/docs/caextract.html, download the latest cacert.pem. Store is somewhere (not in public folder - but will work regardless)
Use this code
".$result;
//echo "Path:".$_SERVER['DOCUMENT_ROOT'] . "/ssl/cacert.pem";
// this is for troubleshooting only ?>
Upload the code to live server and test.
Related
I'm getting the following error when running a script. The error message is as follows...
Warning: file_get_contents() [function.file-get-contents]: https:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /home/satoship/public_html/connect.php on line 22
I know this is a server issue but what do I need to do to the server in order to get rid of the above warning?
#blytung Has a nice function to replace that function
<?php
$url = "http://www.example.org/";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
$contents = curl_exec($ch);
if (curl_errno($ch)) {
echo curl_error($ch);
echo "\n<br />";
$contents = '';
} else {
curl_close($ch);
}
if (!is_string($contents) || !strlen($contents)) {
echo "Failed to get contents.";
$contents = '';
}
echo $contents;
?>
If you do not have the ability to modify your php.ini file, use cURL:
PHP Curl And Cookies
Here is an example function I created:
function get_web_page( $url, $cookiesIn = '' ){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, //return headers in addition to content
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLINFO_HEADER_OUT => true,
CURLOPT_SSL_VERIFYPEER => true, // Validate SSL Cert
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_COOKIE => $cookiesIn
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$rough_content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header_content = substr($rough_content, 0, $header['header_size']);
$body_content = trim(str_replace($header_content, '', $rough_content));
$pattern = "#Set-Cookie:\\s+(?<cookie>[^=]+=[^;]+)#m";
preg_match_all($pattern, $header_content, $matches);
$cookiesOut = implode("; ", $matches['cookie']);
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['headers'] = $header_content;
$header['content'] = $body_content;
$header['cookies'] = $cookiesOut;
return $header;
}
NOTE: In revisiting this function I noticed that I had disabled SSL checks in this code. That is generally a BAD thing even though in my particular case the site I was using it on was local and was safe. As a result I've modified this code to have SSL checks on by default. If for some reason you need to change that, you can simply update the value for CURLOPT_SSL_VERIFYPEER, but I wanted the code to be secure by default if someone uses this.
Use this code in your php script (first lines)
ini_set('allow_url_fopen',1);
Edit your php.ini, find allow_url_fopen and set it to allow_url_fopen = 1
Using relative instead of absolute file path solved the problem for me.
I had the same issue and setting allow_url_fopen=on
did not help. This means for instance :
use
$file="folder/file.ext";
instead of
$file="https://website.com/folder/file.ext";
in
$f=fopen($file,"r+");
THIS IS A VERY SIMPLE PROBLEM
Here is the best method for solve this problem.
Step 1 : Login to your cPanel (http://website.com/cpanel OR http://cpanel.website.com).
Step 2 : SOFTWARE -> Select PHP Version
Step 3 : Change Your Current PHP version : 5.6
Step 3 : HIT 'Set as current' [ ENJOY ]
I am trying to read the content of a website using cURL to compare some data. I accomplished to receive the content of the webpage with cURL but when I want to extract some data out of the content is it not working. I parse the content with DOMDocument but it seems that characters like & and € and so on does not get converted in a good way, so it crashes. that is why I put htmlentities with it but that also does not work.
This is one of the errors i receive:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 37 in URL on line 40
Can anyone suggest me what I should do different?
This is how I get the content of a website:
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => false, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$html = get_web_page("url of a website");
And this is how i tought i should parse it:
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html["content"], 'HTML-ENTITIES', 'UTF- 8'));
foreach($dom->getElementsByTagName('div') as $div){
echo $div->nodeValue."<br>";
}
But actually I am looking for a value from a specific div with a class, only that value do you know how I am able to get that ?
I use SimpleHTMLDom, it is quite easy and well documented.
You can even find a bunch of questions here in StackOverflow
I'm working within the XAMPP environment on a windows 7 64-bit machine. I have Apache 2.4 service installed. The issue I'm having has baffled me for about a day now.
My php files have all executed as expected up to this point. Recently, I've created a file which begins with the following:
function get_web_page($url,$attempt=1){
if($attempt <4){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 30, // timeout on connect
CURLOPT_TIMEOUT => 30, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
if($err == 0){
return $content;
}else{
return get_web_page( $url, $attempt + 1 );
}
}else{
return FALSE;
}
}
A simple function to retrieve a web page, and it doesn't echo anything, either.
But when I visit this page in a browser (which at this point ONLY defines a function and nothing else), it prints to the page everything following the first instance of "=>" (without quotes). I don't understand why this is. All of my other php files in the same directory behave as expected.
Please help me understand why this is happening and what steps I should take to resolve it.
Look at the source of the page given to your browser and you'll probably see the entire php source in plaintext. It's only rendering what's after the first => because that's likely the first closing > it finds after the opening < in <?php. The first part doesn't render because your browser thinks it's inside some strange HTML tag.
Check your apache config, because it's not routing requests for *.php pages through the PHP interpreter.
I want to make a little script that returns me a result depending of how much a ip has been blacklisted.
Result must be like 23/100 means that 23 has blacklisted that ip or 45/100 2/100 ... and so on.
First of all i fetch trough CURL from http://whatismyipaddress.com/blacklist-check sending a post request some data :
<?php
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page($url,$argument1)
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (FM Scene 4.6.1)", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => "LOOKUPADDRESS=".$argument1,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo "<pre>";
$result = get_web_page("http://whatismyipaddress.com/blacklist-check","75.122.17.117");
// print_r($result['content']);
// in $result['content'] we have the whole pag
// Creating xpath and fill it with data
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile($result['content']); // loads your html
$xpath = new DOMXPath($doc);
// Get that table
$value = $xpath->evaluate("string(/html/body/div/div/div/table/text())");
echo "Table with blacklists: [$value]\n"; // prints your location
die;
?>
Now what i want is to parse the data with XPATH /html/body/div/div/div/table/text() and where i see the image (!) mark it as blacklisted, otherwise do nothing.
Can anyone help me?
I also observed that vewing the (!) image requires a token, i might switch to another site, but i like that particular website because it has all the websites.
Thank you!
definitely you need this :)
Simple DOM Parser
I want to use PHP to get the URL of the page to which the following address redirects:
http://peacecorpsjournals.com/journal/6731
The script should return the following URL to which the URL above redirects:
http://ghanakimsuri.blogspot.com/
One way (of many) to do this is to open the URL with fopen, then use stream_get_meta_data to grab the headers. This is a quick snippet I grabbed from something I wrote a while back:
$fh = fopen($uri, 'r');
$details = stream_get_meta_data($fh);
foreach ($details['wrapper_data'] as $line) {
if (preg_match('/^Location: (.*?)$/i', $line, $m)) {
// There was a redirect to $m[1]
}
}
Note you can have multiple redirections, and they can be relative as well as absolute.
You can do this using cURL.
<?php
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//$header['errno'] = $err;
// $header['errmsg'] = $errmsg;
//$header['content'] = $content;
print($header[0]);
return $header;
}
$thisurl = "http://www.example.com/redirectfrom";
$myUrlInfo = get_web_page( $thisurl );
echo $myUrlInfo["url"];
?>
Code found here: http://forums.devshed.com/php-development-5/curl-get-final-url-after-inital-url-redirects-544144.html
I've found this resource to be the most complete, thought-out approach and explanation. The code isn't the shortest snippit, but you'll end up being able to track multiple redirects with a couple lines like this:
$result = get_all_redirects('http://bit.ly/abc123');
print_r($result);
I found out that you may simply use the following code to get the redirect URL on a simple redirection. This will not work on recursive redirections.
$headers = get_headers("https://graph.facebook.com/me/picture?access_token=__token__", 1);
$image_url = $headers['Location'];
** The example above is to capture the Facebook profile image url from the Graph API call, which is issued along with HTTP 302 header.