curl unable to download webpages - php

I am trying to open homepages of websites and extract title and description from it's html markup using curl with php, I am successful in doing this to an extent, but many websites are there I am unable to open. My code is here:
function curl_download($Url){
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
// $url is any url
$source=curl_download($url);
$d=new DOMDocument();
$d->loadHTML($source);
$title=$d->getElementsByTagName("title")->item(0)->textContent)
$domx = new DOMXPath($d);
$desc=$domx->query("//meta[#name='description']")->item(0);
$description=$desc->getAttribute('content');
?>
This code is working fine for most websites but there are many whome it doesn't even able to open. What can be the reason?
When I tried getting headers of those websites using get_headers function, its working fine, but these are not being opened using curl. Two of these websites are blogger.com and live.com.

Replace:
$output = curl_exec($ch);
with
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSLVERSION, 3);
$output = curl_exec($ch);
if (!$output) {
echo curl_error($ch);
}
and see why Curl is failing.
It's a good idea to always check the result of function calls to see if they succeeded or not, and to report when they fail. While a function may work 99.999% of the time, you need to report the times it fails, and why, so the underlying cause can be identified and fixed, if possible.

Related

How to make a call to .aspx https from php script from my localhost with xamp?

I am trying to send SMS from my localhost with xamp installed.
Requested page is on https and an .aspx page.
I am getting error: "HTTP Error 400. The request is badly formed." or blank page only in some cases.
Detaisl is as follows :
$url = 'https://www.ismartsms.net/iBulkSMS/HttpWS/SMSDynamicAPI.aspx';
$postArgs = 'UserId='.$username.
'&Password='.$password.
'&MobileNo='.$destination.
'&Message='.$text.
'&PushDateTime='.$PushDateTime.
'&Lang='.$Lang;
function getSslPage($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$response = getSslPage($all);
echo "<pre>";
print_r($response); exit;
I tried every possible solution/combination found on internet but could not resolve that. The API developers do not have a example for php script.
I tried httpful php library and file_get_contents function but getting empty page. Also tried every combination with curl_setup.
I need to call this url without any post data and see the response from it.
Instead getting a blank page.
Please note that when I execute the url with all details in browser it works fine.
Can anybody help me in this regard.
Thank you,
Usman
First do urlencode over your data as follows:
$postArgs = 'UserId='. urlencode($username.
'&Password='.urlencode($password).
'&MobileNo='.urlencode($destination).
'&Message='.urlencode($text).
'&PushDateTime='.urlencode($PushDateTime).
'&Lang='.urlencode($Lang);
After that two possible solutions. One is using GET.
curl_setopt($ch, CURLOPT_URL, $url . "?" . $postArgs);
Second option is using POST method.
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postArgs);

Unable to use file_get_contents(), returns nothing

I'm trying to get some data from a website that is not mine, using this code.
<?
$text = file_get_contents("https://ninjacourses.com/explore/4/");
echo $text;
?>
However, nothing is being echo'd, and the string length is 0.
I've done this method before, and it has worked no problem, but with this website, it is not working at all.
Thanks!
I managed to get the contents using curl like this:
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, "https://ninjacourses.com/explore/4/");
$result = curl_exec($ch);
curl_close($ch);
cURL is a way you can hit a URL from your code to get a html response from it. cURL means client URL which allows you to connect with other URLs and use their responses in your code
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, "https://ninjacourses.com/explore/4/");
$result = curl_exec($ch);
curl_close($ch);
i think this is useful for you curl-with-php and another

PHP cURL Returning innacurate information

I'm doing a simple curl on this address: https://github.com/users/davidhariri/contributions_calendar_data
When i grab the result with this function:
function fetch_data($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close($ch);
print_r($result);
return $result;
}
The strings are correct, but the ints (the contributions) are wrong.
Results from curl
[...["2014/01/04",0],["2014/01/05",0],["2014/01/06",0],["2014/01/07",1],["2014/01/08",0]]
Results from just navigating to the address
[...["2014/01/04",0],["2014/01/05",0],["2014/01/06",1],["2014/01/07",5],["2014/01/08",5]]
Something during the curl process might be transforming ints to binary and back again? I have no idea what's happening here.
Check that you are not logged in in the browser. You may get different results if so.

scraping a secure page https in php

Im trying to crawl a secure page (https) such as google with curl
but I seem to get no data back from my crawler
php function
function getDOM($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_RANGE, '0-100');
$content = curl_exec($ch);
curl_close($ch);
echo $url."<br>";
echo $content;
$dom = new simple_html_dom();
$dom->load($content);
if($dom){
return $dom;
}
return null;
}
getDOM("https://www.google.co.uk/search?sugexp=chrome,mod=14&sourceid=chrome&ie=UTF-8&q=crawling%20https#hl=en&gs_nf=1&pq=site:stackoverflow.com%20crawling%20https%20php&cp=6&gs_id=s&xhr=t&q=stackoverflow&pf=p&sclient=psy-ab&oq=stacko&aq=0&aqi=g4&aql=&gs_l=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=8baefeb740f734a5&biw=1280&bih=685");
is there anything I can do to crawl a https as I don't seem to have this problem with normal pages
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
Add this to your code. This will allow any certificate to pass through, so it should be fine for your use (but not a good idea in general).

See what CURL sends from a PHP script

I'm having dificulties to query a webform using CURL with a PHP script. I suspect, that I'm sending something that the webserver does not like. In order to see what CURL realy sends I'd like to see the whole message that goes to the webserver.
How can I set-up CURL to give me the full output?
I did
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
but that onyl gives me a part of the header. The message content is not shown.
Thanks for all the answers! After all, they tell that It's not possible. I went down the road and got familiar with Wireshark. Not an easy task but definitely worth the effort.
Have you tried CURLINFO_HEADER_OUT?
Quoting the PHP manual for curl_getinfo:
CURLINFO_HEADER_OUT - The request string sent. For this to work, add
the CURLINFO_HEADER_OUT option to the handle by calling curl_setopt()
If you are wanting the content can't you just log it? I am doing something similar for my API calls
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, self::$apiURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_POST, count($dataArray));
curl_setopt($ch, CURLOPT_POSTFIELDS, $dataString);
$logger->info("Sending " . $dataString);
self::$results = curl_exec($ch);
curl_close($ch);
$decoded = json_decode(self::$results);
$logger->debug("Received " . serialize($decoded));
Or try
curl_setopt($ch, CURLOPT_STDERR, $fp);
I would recommend using curl_getinfo.
<?php
curl_exec($ch);
$info = curl_getinfo($ch);
if ( !empty($info) && is_array($info) {
print_r( $info );
} else {
throw new Exception('Curl Info is empty or not an array');
};
?>

Categories