curl problem, can't download full web page - php

With this code I'm trying to download this web page: http://www.kayak.com/s/...
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://www.kayak.com/s/search/air?ai=kayaksample&do=y&ft=ow&ns=n&cb=e&pa=1&l1=ZAG&t1=a&df=dmy&d1=4/10/2010&depart_flex=exact&r1=y&l2=LON&t2=a&d2=11/10/2010&return_flex&r2=y');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_REFERER,"http://wwww.google.com");
$content = curl_exec ($ch);
echo $content;
You can see the demo at: http://www.pointout.org/test.php
As you can see the part with prices is missing.
What could be wrong?

This is not going to work the way you think it will. The reason is the prices are not in the initial HTML response that you get. Rather, there is some Javascript magic occurring which is using AJAX to load the prices when the page is loaded.

Related

php curl not retrieving as expected

I have the following code to capture the html code of a given url:
$url = "https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=77212&cvm=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_CAINFO, '/etc/ssl/certs/cacert.pem');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
echo "$url\n\n";
die($html);
For some reason the result of the following url is not as expected:
"https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=77212&cvm=true"
Instead of the code, the result is a giant meaningless string.
I've have successfully used the same code with other pages of the same domain.
I can assure that the desired page's content is not loaded by any js/ajax method (i did the test loading the page when disabling javascript).
My question is:
There is any cUrl option that i should set to correct this error?
My whole site depends on capturing this pages.
Any help would be truly appreciated.
That is base64 encoded, all you need to do is decode it back to plain text like this
echo base64_decode($html);
and you will see HTML

jibberish returned when using CURLOPT_URL

Im trying to grab a pages data using CURLOPT_URL, to do so ive used the below code, which is working fine for other pages (with the exception of where the page uses relative paths to its css / js in which case those dont load) .
function grab_page($site){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_TIMEOUT, 40000000);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $site);
ob_start();
return curl_exec ($ch);
ob_end_clean();
curl_close ($ch);
}
echo grab_page("$page_to_get");
But when i load the page i get returned a screen of jibberish like this, but a whole page, same when i view the source.
Looking at the source of the page, through my browser, they seem to be using charset=utf-8", im not sure if that has anything to do with it though ? Any ideas ?
Calling:
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
will fix it if the encoding is know to be gzipped or as you stated
curl_setopt($ch,CURLOPT_ENCODING , "");
should tickle curl into negotiating the encoding itself (why this is not the default is beyond me)

cURL can't follow redirection

my curl function cannot follow the redirection of Facebook external link redirector, l.php and i have no idea what's wrong...
here is the code that i'm working on and i commented the lines that i've tried... and an example link (http://www.facebook.com/l.php?u=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DGvhFyNLK66A%26feature%3Dyoutu.be&h=xAQFD_3svAQFKxF5YrtqNQ5cL3lIQxo0uaC9PoB7qAvG7Yw&enc=AZPxNZ8P5q54FREC37UC_MP02pwh2DOmsI5bbFkoQm5VUPUlYeNzQASjarRjhTtcedRkmM3mDjK7J_r_P5pRpYhL)
function connect($u) {
$ch= curl_init();
curl_setopt($ch, CURLOPT_URL, $u);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_HEADER, true);
//curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
//curl_setopt($ch, CURLOPT_REFERER, 'spie');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//curl_setopt($ch, CURLOPT_AUTOREFERER, true );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
//curl_setopt($ch, CURLOPT_VERBOSE, true);
//curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
$source=curl_exec($ch);
curl_close($ch);
return $source;
}
thank you..
I first thought this was a redirect issue with cURL (safe mode enabled for instance). But it actually comes from how Facebook redirector works.
There is no Location: header, so curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); won't help you with it.
The Facebook link page actually redirects you using Javascript:
<script type="text/javascript">document.location.replace("http:\/\/www.youtube.com\/watch?v=GvhFyNLK66A&feature=youtu.be");</script>
cURL cannot analyse the content of the page nor execute javascript so this is exepcted behaviour. If you still want to do this, you'll need to parse the content of the page, grab the URL from the javascript, and issue an new cURL request to this URL.
Apparently only HTTP redirects are supported by cURL with the '--location' option.
Reference: https://everything.curl.dev/http/redirects#non-http-redirects

How do I use cURL & PHP to spoof the referrer?

I'm trying to learn cURL with PHP to spoof the referrer to a website.
With the following script I expected to accomplish this...but it seems to not work.
Any ideas/suggestion where I am going wrong??
Or do you know of any tutorials that could help me figure this out?
Thanks!
Jessica
<?php
$host = "http://mysite.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://google.com");
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);
curl_close($ch);
?>
You wont be able to see the result in webserver's analytics because it might probably using a javascript to get the analytics and curl wont run/execute the javascript. All Curl will do is get the content of the page as it like it is a text file. It wont run any of the scripts or anything.
To be more clear if you have an html tag like
<img src="path/to/image/image.jpg" />
The curl will treat it as a line of text. it wont load the image.jpg from the server. The same goes with the js if their is a
<script type="text/javascript" src="analytics.js"></script>
Normally the browser will load that analytics.js and run it, but the curl wont.

Trying to log into a site with the cURL extension of PHP

Basically, I'm trying to log into a site. I've got it logging in, but the site redirects to another part of the site, and upon doing so, it redirects my browser as well.
For example:
It successfully logs into http://example.com/login.php
But then my browser goes to http://mysite.com/site.php?page=loggedin
I just want it to return the contents of the page, not be redirected to it.
How would I do this?
As requested, here is my code
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $loginURL);
//Some setopts
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FOLLOWREDIRECT, FALSE);
curl_setopt($ch, CURLOPT_REFERRER, $referrer);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
echo $output;
Figured it out. The webpage was echoing a meta refresh, and since I was echoing the output, my browser followed.
Removed the echo $output; and it no longer does that.
I feel kind of dumb for not recognizing that in the beginning.
Thanks everyone.
Using cURL you have to find the redirect and follow it, then return that page's content. I'm not sure why your browser would be redirecting unless you have some weird header code that you are returning from the login page.
set CURLOPT_FOLLOWLOCATION to false.
curl_setopt($ch , CURLOPT_FOLLOWLOCATION , FALSE);
this might help you.

Categories