Curl in PHP retrieve HTML into variable not working - php

My code snippet (this comes after previous Curl commands to log-in and store cookies):
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"https://www.example.com");
curl_setopt($curl,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie.txt');
$result = curl_exec ($curl);
if ($result === FALSE) {
echo "cURL Error: " . curl_error($curl);
}
curl_close ($curl);
print 'result: '.$result;
The displayed result just prints headers. In other words, the actual source HTML does not appear to be saved to the $result variable. I know the results are there because when I view source on the retrieved page - everything is there. If it makes a difference the page I'm requesting is XML.
Thank you

Replace print 'result: '.$result;
with
echo htmlspecialchars($result);
If you are looking to display the <tags>, you need to do like this.

You've set this variable:
curl_setopt($curl, CURLOPT_HEADER, 1);
This is why you're getting headers.
You're printing the result to the browser and the browser sees XML tags and does XML stuff with them. The data is there because you see it when you View->Source. If you remove the headers your browser may complain about a missing stylesheet and display the raw XML. But as you've already proved to yourself, you have the data.

To elaborate on #rand'Chris's answer, you can remove both:
curl_setopt($curl, CURLOPT_HEADER, 1);
and
curl_setopt($curl, CURLOPT_VERBOSE, 1);
And you will no longer receive headers from example.com. Of course, since you're accessing a different site, this may not be the complete (or even correct) solution.

Related

Parsing any webpage using CURL on PHP

Is it possible to write a PHP function that returns HTML-string of any possible link the same way the browser does? Example of links: "http://google.com", "", "mywebsite.com", "somesite.com/.page/nn/?s=b#85452", "lichess.org"
What I've tried:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$data = curl_exec($curl);
if(curl_errno($curl)){
echo 'Curl error: ' . curl_error($curl);
}
echo $data;
curl_close($curl);
Sadly enough, for some links this code returns blank page because of SSL or any other stuff, but for some links it works.
Or is there any alternative to CURL? I just do not understand why php cannot retrieve any html out of the box.
CURL may fail on SSL sites if you're running an older version of PHP. Make sure your OS and PHP version are up-to-date.
You may also opt to use file_get_contents() which works with URLs and is generally a simpler alternative if you just want to make simple GET requests.
$html = file_get_contents('https://www.google.com/');

Fetch URL from instagram bio without logging in

i want to fetch a URL from bio with php.
URL: https://www.instagram.com/sukhcha.in/ (It can be anyone's profile)
I tried using simple_html_dom but it always shows https error while fetching html from url.
As advised in my comment, you should use cURL, because it supports HTTPS protocol :
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 0); // Timeout (0 : no timeout)
curl_setopt($ch, CURLOPT_HEADER, false); // Do not download header
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'); // creates user-agent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // do not output content
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirections
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // do not check HTTPS host (very important, if you set it to true, it probably won't work)
curl_setopt($ch, CURLOPT_URL, 'https://www.instagram.com/sukhcha.in/');
$content = curl_exec($ch);
?>
Then you have to use XPath on your $content variable to extract the part you want.
You can use CURLto get data.
$url = 'https://weather.com/weather/tenday/l/USMO0460:1:US';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
$curl_response = curl_exec($curl);
Debug data
echo '<pre>';
print_r($curl_response);
echo '</pre>';
Close curl
curl_close($curl);

CURL page to Lynda.com

I am a member of Lynda.com, I want to fetch a HTML page from their site and save it onto my disk, the problem is whenever I try to fetch a page via CURL, I get the non-member page (it asks me to sign up), I cant understand why I cant get the members page :(
My code:
get_remote_file_to_cache();
function get_remote_file_to_cache()
{
$the_site = "http://www.lynda.com/AIR-3-0-tutorials/Flex-4-6-and-Mobile-Apps-New-Features/90366-2.html";
$curl = curl_init();
$fp = fopen("cache/temp_file.html", "w");
curl_setopt($curl, CURLOPT_URL, $the_site);
curl_setopt($curl, CURLOPT_COOKIE, '/cookie.txt');
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$http_headers = array(
'Host: www.lynda.com',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
'Accept: */*',
'Accept-Language: en-us,en;q=0.5',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Connection: keep-alive'
);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $http_headers);
curl_exec($curl);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if($httpCode == 404)
{
touch('cache/404_err.txt');
}
else
{
$contents = curl_exec($curl);
fwrite($fp, $contents);
}
curl_close($curl);
}
I am on Windows 7 and running on this on WAMP.
One of the things I am not sure about is if the "cookie.txt" file is getting read or not (not sure if the path is correct so I put the cookie.txt file in the root of the server as well as in the directory I am running this script from).
Thanks in advance!
----------- Found some code via the online manual ---------
// $url = page to POST data
// $ref_url = tell the server which page you came from (spoofing)
// $login = true will make a clean cookie-file.
// $proxy = proxy data
// $proxystatus = do you use a proxy ? true/false
function
curl_grab_page($url,$ref_url,$data,$login,$proxy,$proxystatus){
if($login == 'true') {
$fp = fopen("ryanCookie.txt", "w");
fclose($fp);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "ryanCookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "ryanCookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'true') {
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $ref_url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start();
return curl_exec ($ch); // execute the curl command
ob_end_clean();
curl_close ($ch);
unset($ch);
}
echo curl_grab_page("https://www.lynda.com/login/login.aspx", "http://www.lynda.com/", "simple_username=*******&simple_password=*******", "true", "null", "false")."done!";
But it still does not work :(
This is the page where I got the above code: http://php.net/manual/en/function.curl-setopt.php
You need to understand how the internet and http work. You see, when you access a website, they usually give you cookies to track your status. You will also start as non logged-in member. After you hit login button, the server will update your status to logged-in and store this status, either in server site session or in your browser using cookies.
Back to your question, since you want to access member page, this mean, you need to do the following step by first, learn how lynda.com work. However, my step below is rather general:
Load login page and get the form information
inject form information with your login info and send the form back to server
store cookies received from server
load member page (don't forget to include cookies information from step 4) and fetch the html
For more information, you can look at this resources:
http://www.codingforums.com/showthread.php?t=252335
http://simpletest.sourceforge.net/en/browser_documentation.html
https://gist.github.com/3697293
Maybe you need to send Authorization header, which contain your username and password for the site in the HTTP header part.
To get the member page you need to login on the website. To do that, you need to:
visit login page
make the same request as your browser would do to submit login credentials
fetch the member page
Alternatively, you could try to extract cookies from your browser after login and use them in curl with curl_setopt($ch, CURLOPT_COOKIE, 'a=b;c=d');, but this might not work as the website can also use IP or session check.

PHP Curl does not return a values from url

i am using above function to retrieve the two urls , second one works when i am retriving google or other websites, but below things is not getting any response. but when i enter the url in browser i am seeing the response.. can you guide me to fix this issue?
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$response=file_get_contents_curl('http://XX.XX.XX.XX:YYYYY/Request?Mob=999999999&Message=Rr+vodafone+999999999+10+1234&myTxId=10');
echo "<br/>respnse is...".$response;
$response=file_get_contents_curl('http://www.google.com');
echo "<br/>website respond is...".$response;
Edit:
Issue is with Port number i am trying to access 8090 is not enabled in my server. when i enabled it for outgoing it was working... Thanks every one for the support.
Can you try:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0');
Try to find error give this code before curl close:--
echo "Curl Error :--" . curl_error($ch);
if no error found do like this:-
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
then
print_r($result);
exit;

issues getting json content

Hi i'm trying to get the content from a json file, but i can't i have many troubles to do it, my code is:
<?PHP
$url = 'http://www.taringa.net/api/efc6d445985d5c38c5515dfba8b74e74/json/Users-GetUserData/apptastico';
$ch = curl_init();
$timeout = 0; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch); // take out the spaces of curl statement!!
curl_close($ch);
var_dump($file_contents);
?>
if i put the address in the browser i get the content without any issue, but if i try to get it on PHP or other code i have issues, what can i do? i try use file_get_contents(); too but i don't get anything, only issues and issues
It appears that they are checking user agents and blocking PHP. Set this before using file_get_contents:
ini_set('user_agent', 'Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1');
If they are checking user agents, they may be doing so to prevent people from doing this kind of thing.
Have you read a little bit about cURL on php.net?
here is how it works:
From php.net:
$url = 'http://www.taringa.net/api/efc6d445985d5c38c5515dfba8b74e74/json/Users-GetUserData/apptastico';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, FALSE); // set to true to see the header
curl_setopt($ch, CURLOPT_NOBODY, FALSE); // show the body
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$content = curl_exec($ch);
curl_close($ch);
echo $content

Categories