I'm trying to parse a shopping site webpage using CURL in PHP.
the url is: http://computers.pricegrabber.com/printers/HP-Officejet-Pro-8600-Plus-All-One-Wireless-Inkjet-Printer/m916995235.html/zip_code=97045/sort_type=bottomline
Here's the code I use.
function getWebsiteCURL($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
echo getWebsiteCURL("http://computers.pricegrabber.com/printers/HP-Officejet-Pro-8600-Plus-All-One-Wireless-Inkjet-Printer/m916995235.html/zip_code=97045/sort_type=bottomline");
It works, but I couldn't get the full HTML code.
Anybody have any idea why?
TIA.
This is often caused by connection timeout.
Try setting the opt:
CURLOPT_TIMEOUT => 120
curl cannot interpret Javascript, you can see what curl will see if you disable javascript in a browser and navigate to the page. If you need to interpret Javascript then I would use a headless browser like phantomjs. In PHP you could use PHP PhantomJS.
Related
I have the following code to capture the html code of a given url:
$url = "https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=77212&cvm=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_CAINFO, '/etc/ssl/certs/cacert.pem');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
echo "$url\n\n";
die($html);
For some reason the result of the following url is not as expected:
"https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=77212&cvm=true"
Instead of the code, the result is a giant meaningless string.
I've have successfully used the same code with other pages of the same domain.
I can assure that the desired page's content is not loaded by any js/ajax method (i did the test loading the page when disabling javascript).
My question is:
There is any cUrl option that i should set to correct this error?
My whole site depends on capturing this pages.
Any help would be truly appreciated.
That is base64 encoded, all you need to do is decode it back to plain text like this
echo base64_decode($html);
and you will see HTML
Currently I have page say page1.php. Now in a certain event, I just want to run another link say http://example.com without actually refreshing this page. The link is a kind of script which updates my database. I tried using shell_exec('php '.$url); where $url='http://example.com' however it showed me an error that could not open file so I suppose shell_exec works only for internal files present on the server. Is there a way to do this directly or I have to go with AJAX? Thanks in advance
Try using curl to send the request to the server with php.
$url = 'http://example.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_exec($ch);
curl_close($ch);
Alternatively you could try file_get_contents
file_get_contents('http://example.com');
I would do this front-end and I would use JSONP: much clean and safer IMHO.
I use the following command in some old scripts:
curl -Lk "https:www.example.com/stuff/api.php?"
I then record the header into a variable and make comparisons and so forth. What I would really like to do is convert the process to PHP. I have enabled curl, openssl, and believe I have everything ready.
What I cannot seem to find is a handy translation to convert that command line syntax to the equivalent commands in PHP.
I suspect something in the order of :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
// What goes here so that I just get the Location and nothing else?
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// Get the response and close the channel.
$response = curl_exec($ch);
curl_close($ch);
The goal being $response = the data from the api “OK=1&ect”
Thank you
I'm a little confused by your comment:
// What goes here so that I just get the Location and nothing else?
Anyway, if you want to obtain the response body from the remote server, use:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
If you want to get the headers in the response (i.e.: what your comment might be referring to):
curl_setopt($ch, CURLOPT_HEADER, 1);
If your problem is that there is a redirection between the initial call and the response, use:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
I'm trying to learn cURL with PHP to spoof the referrer to a website.
With the following script I expected to accomplish this...but it seems to not work.
Any ideas/suggestion where I am going wrong??
Or do you know of any tutorials that could help me figure this out?
Thanks!
Jessica
<?php
$host = "http://mysite.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://google.com");
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);
curl_close($ch);
?>
You wont be able to see the result in webserver's analytics because it might probably using a javascript to get the analytics and curl wont run/execute the javascript. All Curl will do is get the content of the page as it like it is a text file. It wont run any of the scripts or anything.
To be more clear if you have an html tag like
<img src="path/to/image/image.jpg" />
The curl will treat it as a line of text. it wont load the image.jpg from the server. The same goes with the js if their is a
<script type="text/javascript" src="analytics.js"></script>
Normally the browser will load that analytics.js and run it, but the curl wont.
Basically, I'm trying to log into a site. I've got it logging in, but the site redirects to another part of the site, and upon doing so, it redirects my browser as well.
For example:
It successfully logs into http://example.com/login.php
But then my browser goes to http://mysite.com/site.php?page=loggedin
I just want it to return the contents of the page, not be redirected to it.
How would I do this?
As requested, here is my code
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $loginURL);
//Some setopts
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FOLLOWREDIRECT, FALSE);
curl_setopt($ch, CURLOPT_REFERRER, $referrer);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
echo $output;
Figured it out. The webpage was echoing a meta refresh, and since I was echoing the output, my browser followed.
Removed the echo $output; and it no longer does that.
I feel kind of dumb for not recognizing that in the beginning.
Thanks everyone.
Using cURL you have to find the redirect and follow it, then return that page's content. I'm not sure why your browser would be redirecting unless you have some weird header code that you are returning from the login page.
set CURLOPT_FOLLOWLOCATION to false.
curl_setopt($ch , CURLOPT_FOLLOWLOCATION , FALSE);
this might help you.