I am trying to crawl through a page but only loading GIF is retrieved not the page content.
$url = "https://www.truecaller.com";
$request = $url;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$request);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
print_r($data);
curl_close($ch);
any way to retrieve full page.
There is a reason for that.
Curl is not the browser.so Curl does not have the ability to run javascript.
Curl Does not care what is the response it gets you whatever you give it link for. if it gets a gif it will return gif, it that's doc, video or whatever it will return the response.
so what's happening is that It gets your response as soon as you hit the page. there is a gif that is being loaded at first it will return you loading gif. then on the base of javascript condition remaining page is loaded. as it fails to execute javascript so the only response you are getting is that loading gif.
If you want to load full page content there is a full webkit browser without an interface that helps programmers to achieve results as a browser gets.PhantomJS - Scriptable Headless Browser.
I see you have already tried adding a delay to your curl, but the fact is curl is not the right tool for this job. I would investigate http://phantomjs.org/ which will allow you to capture the page more robustly.
#hassan added below, this site has an API so that is also an option. Thanks hassan.
Related
I want to get page content using cURL but when I get it page css style is not load propertly and I don't know why.
error_reporting(E_ALL);
ini_set('display_errors', 1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_URL,"http://some-page.com/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);
echo $result;
Second thing is that, did I can get using cURL full working page like it is geting page by iframe? For now all links are rewrite so when I want go to subpage it's not working.
The CSS (and probably some Javascript as well) don't load because they're using absolute or relative paths that have no meaning on your domain.
You're going to either need to find all these links in the cURL response and replace them with valid URLs on the domain (and probably rewrite regular HTML links and other assets).
Your best bet is probably to configure Apache to act as a reverse proxy so it'll do all this for you. See ProxyPass.
I am using PHP script (currentimage.php) to get a snapshot from my CCTV IP camera, which works fine:
<?php
while (#ob_end_clean());
header('Content-type: image/jpeg');
// create curl resource
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, 'http://192.168.0.20/Streaming/channels/1/picture');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, 'username:password');
// $output contains the output string
$output = curl_exec($ch);
echo $output;
// close curl resource to free up system resources
curl_close($ch);
?>
and another PHP/HTML to display the data:
<IMG id="myImage" SRC='currentimage.php'>
but I can't get it to work to refresh the image in background every 30s.
I am trying AJAX, but no success:
<script>
function refresh_image(){
document.getElementbyId("myImage").setAttribute("src", "currentimage.php");
}
setInterval(function(){refresh_image()}, 30000);
</script>
What I am doing wrong? I will appreciate any help
I have tried the concept of changing the displayed image via timed requests to a PHP script, and it works fine. A new request is fired up to the image getting script if the URL is modified, and I did so by appending a version number as a parameter to the URL.
var version = 1;
function refresh_image(){
document.getElementById("myImage").setAttribute("src", "currentimage.php?ver="+version);
version++;
}
setInterval(function(){
refresh_image();
}, 2000);
I suspect the main problem here is that the browser caches your file and will not reload it if you set the attribute again. I've found a way around this though:
document.getElementbyId("myImage").setAttribute("src", "currentimage.php?version=1");
Save the version number and count up every time you make the request. This way, the browser will treat it as a new URL and it will be reloaded again (check this using the developer functions of your browser in the network tab).
I am visiting one site, which already provide following functionality: When user browser visits this url, it will automatically prompt window to download containing file (video). In my script I need to download this attached file from this site to disk using php. I tried to use curl:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
And also similar file_get_contents but it doesnt work for me. It get me only containing HTML on that site. Have you any idea how to save this file on the disk using php?
One standard way that developers do this in PHP is to use the "header" method. After calling header, the page can output the video contents, which will prompt a user download.
See this example: PHP header attach AVI-file
I am using cURL toaccess instagrams API on a webpage I am building. THe functionality works great, however, page load is sacrificed. For instance, consider this DOM structure:
Header
Article
Instagram Photos (retrieved via cURL)
Footer
When loading the page, the footer will not load until the instagram hotos have been fully loaded with cURL. Below is the cURL function that is being called:
function fetchData($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$result = fetchData("https://api.instagram.com/v1/media/search?lat={$lat}&lng={$lng}&distance={$distance}&access_token={$accessToken}");
$result = json_decode($result);
So after this function is run, then the rest of the DOM is displayed. If I move the function call below the footer, it does not work.
Is there anything I can do to load the entire webpage and have the cURL request setn on top of the loading site (not cause a lag or holdup)?
UPDATE: Is the best solution to load it after the footer, and then append it to another area with js?
You can cache the result json into a file that is saved locally. You can make a cronjob that is called ever minute and update the locally cache file. This makes your page loading much more faster. The downside is that your cache is updating even when you don't have visitors and you have a delay of a minute in data from instagram.
I try to program a webboot using PHP/CURL, but I face a problem in handling a specific page that it's loading some contents dynamically !! .. to explain more :
when I try to download the page using PHP/CURL, I do not get some contents ! then I discovered that this contents are loaded after page is loaded. and this is why CURL does not handle these missed contents.
can any one help me !
my sample code is :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirect);
curl_setopt($ch, CURLOPT_COOKIEFILE, ABSOLUTE_PATH."Cookies/cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, ABSOLUTE_PATH."Cookies/cookies.txt");
$result = curl_exec($ch);
What URL are you trying to load? It could be that the page you're requesting has one or more AJAX requests that load content in after the fact. I don't think that cURL can accomodate runtime-loaded information via AJAX or other XHR request.
You might want to look at something like PhantomJS, which is a headless WebKit browser which will execute the page fully and return the dynamically assembled DOM.
Because the page uses javascript to load the content, you are not going to be able to do this via cURL. Check out this page for more information on the problem: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html