I am using CURl to retrieve a page for a small search engine project I am working on, but on some pages it's not retrieving the entire page.
The function that I have setup is:
public function grabSourceCode($url) {
// Try and get source code using #file_get_contents
$ch = curl_init();
$timeout = 50;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT,'NameBot/0.2');
$source_code = curl_exec($ch);
curl_close($ch);
return $source_code;
}
and I am retrieving the page using:
$Crawler->grabSourceCode('https://sedo.com/search/searchresult.php4?keyword=cats&language_output=e&language=e')
on this page I get everything, but on this page, I only get part of the page.
I have tried using file_get_contents() but that has the same results.
It seem's to be an issue with dynamic loading of the page, when I run the browser in JavaScript blocking mode it shows the same results as the CURl function.
Is there anyway to do this in PHP, or would I have to look at another language, such as JavaScript?
Thanks, Daniel
Related
I am using curl request to hit the has-offers conversion url from my sevrver with the help of curl but it is not working.But when I call the same URL using a browser, it works.Is they can block CURL requests?.I am not getting why, is there any port blocking issue.
Below is php code to call url using curl request.
<?php
function curl_get_contents($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$url="http://paravey.go2cloud.org/aff_l?offer_id=12&aff_id=1000";
$contents = curl_get_contents($url);
echo $contents;
?>
Please help me thanks in Advance
The url you are curling is a pixel tracking url:
http://paravey.go2cloud.org/aff_l?offer_id=12&aff_id=1000
The aff_l endpoint looks for a cookie with session information (hence why it works in the browser).
If you want to create conversions with server side code, you will need to store the session identifier (the transaction_id) in your system and use the aff_lsr endpoint to send that data to HasOffers to trigger a conversion.
The url for this would look like this:
http://paravey.go2cloud.org/aff_lsr?transaction_id= VALUE
Where Value is the session identifier you have stored.
I would ask the HasOffers support team if you have more issues with this.
I am trying to get CI session from an external file. I have a page on CI that dumps the current session. When i access direct it operates as expected. However when i access via CURL it returns nothing. I believe CI session is lost when sending request using CURL.
My question is how do i send this session data together with my curl request.
The code i am using is as below.
$url = "http://localhost/cdmcl/dashboard/getsession";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
You need to set CURLOPT_COOKIEFILE so that cURL saves its cookies into a file.
So, for Code Igniter you have to write like this:
$this->curl->option(CURLOPT_COOKIEFILE,'cookies_1.txt');
I'm trying to create a program using PHP where I can load a full webpage and navigate the site while still staying in a different domain. The problem I'm having is that I can't load things like stylesheets and images because they are relative links. I need a way to make the relative links in to absolute links.
Right now I can get just plain HTML from the page using this handy bit of code:
echo file_get_contents('http://tumblr.com');
I can't use an iframe to display the webpage.
Your code should work, but you must set allow_url_fopen to on before running it.
echo file_get_contents('http://othersiteurl.com');
You may also use cURL. Example:
function get_data($url, $timeout = 5) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Slightly modified code from: https://davidwalsh.name/curl-download
I am visiting one site, which already provide following functionality: When user browser visits this url, it will automatically prompt window to download containing file (video). In my script I need to download this attached file from this site to disk using php. I tried to use curl:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
And also similar file_get_contents but it doesnt work for me. It get me only containing HTML on that site. Have you any idea how to save this file on the disk using php?
One standard way that developers do this in PHP is to use the "header" method. After calling header, the page can output the video contents, which will prompt a user download.
See this example: PHP header attach AVI-file
I try to program a webboot using PHP/CURL, but I face a problem in handling a specific page that it's loading some contents dynamically !! .. to explain more :
when I try to download the page using PHP/CURL, I do not get some contents ! then I discovered that this contents are loaded after page is loaded. and this is why CURL does not handle these missed contents.
can any one help me !
my sample code is :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirect);
curl_setopt($ch, CURLOPT_COOKIEFILE, ABSOLUTE_PATH."Cookies/cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, ABSOLUTE_PATH."Cookies/cookies.txt");
$result = curl_exec($ch);
What URL are you trying to load? It could be that the page you're requesting has one or more AJAX requests that load content in after the fact. I don't think that cURL can accomodate runtime-loaded information via AJAX or other XHR request.
You might want to look at something like PhantomJS, which is a headless WebKit browser which will execute the page fully and return the dynamically assembled DOM.
Because the page uses javascript to load the content, you are not going to be able to do this via cURL. Check out this page for more information on the problem: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html