Curl grab HTML content - php

I have a webpage which requires a login.
I am using curl to build the HTTP authentication request. It works, but I am not able to grab all the content from my links. I miss all the images.
How can I grab the images as well?
<?php
// create cURL resource
$URL = "http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis_CC";
//Initl curl
$ch = curl_init();
//Set HTTP authentication option
curl_setopt($ch, CURLOPT_URL, $URL); // Load in the destination URL
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL
curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password
// grab URL and pass it to the browser
$content = curl_exec($ch);
$result = curl_getinfo($ch);
// close cURL resource, and free up system resources
curl_close($ch);
echo $content;
echo $result;
?>
I'm getting this warning message Warning: curl_error(): 2 is not a valid cURL handle resource in C:\xampp\htdocs\LiveServices\LoginTest.php on line 24

cURL doesn't get images or any other 'content', it just gets the raw HTML page. Are you saying you are missing <img /> tags that are present on the original page?
cURL also doesn't parse any CSS or JavaScript, so if the content is modified with those, it won't come through. For example, you may be unable to get a background-image of an element unless you do more scraping, that is, get the associated CSS file and parse that.

The main issue I have is that I cannot see the html, so I cannot be sure what the problem is. Having said that, two things occur to me.
The first thing to check is if the images are relative or not. If they are displayed in the form ../xyz/foo.jpg or foo.jpg then you will either need to edit the images src to the full url or add the base tag to the html
For parsing HTML, use the Simple HTML DOM library as it is faster than rolling your own.
The second issue may be that the images also require the user to be logged in. If this is the case you would also have to download all the images, and either embed them in the content after base 64 encoding them, or store them temporally on your server.

Here the some html codes:
The images that I want to get:
<img id="backgroundImage" style="z-index: 0;" src="/nagios/nagvis/nagvis/images/maps/Nagvis_CC.png"/>
<a href="/nagios/cgi-bin/extinfo.cgi?type=2&host=business_processes&service=NLThirdPartyLive" target="_self">
And a lot of javascript.
I tried to use simple HTML dom libray, but the output is array. nothing
require("/simplehtmldom/simple_html_dom.php");
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'WhateverBrowser1.45');
curl_setopt($ch, CURLOPT_URL, 'http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis_CC');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL
curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
$result = curl_exec($ch);
$html= str_get_html($result);
echo $ret= $html->find('table[class=header_table]');
echo $result;

Related

php curl not retrieving as expected

I have the following code to capture the html code of a given url:
$url = "https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=77212&cvm=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_CAINFO, '/etc/ssl/certs/cacert.pem');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
echo "$url\n\n";
die($html);
For some reason the result of the following url is not as expected:
"https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=77212&cvm=true"
Instead of the code, the result is a giant meaningless string.
I've have successfully used the same code with other pages of the same domain.
I can assure that the desired page's content is not loaded by any js/ajax method (i did the test loading the page when disabling javascript).
My question is:
There is any cUrl option that i should set to correct this error?
My whole site depends on capturing this pages.
Any help would be truly appreciated.
That is base64 encoded, all you need to do is decode it back to plain text like this
echo base64_decode($html);
and you will see HTML

How to properly display MJPEG snapshot from CCTV in PHP

I have a IP CCTV camera which support MJPEG. I would like to display image in HTML (from PHP script) together with other data.
I know that I can get a screenshot from the camera using:
http://username:password#<servername>/Streaming/channels/1/picture
however when I am using this simple html code image is not always displayed:
<html>
<body>
<IMG id="myImage" SRC='http://username:password#192.168.0.20/Streaming/channels/1/picture'>
</body>
</html>
On Microsoft Edge, it is not possible even to login to camera.
On Firefox works fine.
On Chrome camera is logged however image is not displayed.
So the question is how I can get image in HTML or better in PHP to display it on the page. I prefer to use PHP because I want to add more data to page, like temperature etc.
Also will be nice to refresh the image, but this can be done in AJAX later.
Use a second PHP script to retrieve the image and point the <img> tag to that. It's possibly because the page you're displaying the image on uses HTTPS while the camera only supports HTTP. Proxying through PHP will allow it all to appear to come from the same source and PHP will handle the HTTP authentication, which will avoid any of your browser compatibility problems.
<IMG id="myImage" SRC='currentimage.php'>
currentimage.php:
<?php
$crl = curl_init('http://<servername>/Streaming/channels/1/picture');
curl_setopt($crl, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($crl, CURLOPT_USERPWD, 'username:password');
curl_setopt($crl, CURLOPT_RETURNTRANSFER, TRUE);
$contents = curl_exec($crl);
curl_close($crl);
header("Content-Type: image/jpeg");
echo $contents;
?>
To save on server load and allow it to handle more traffic, you could also cache the last-retrieved image to a file for 5-10 seconds, but that's a separate problem to solve.
This code seems to work fine for me:
<?php
while (#ob_end_clean());
header('Content-type: image/jpeg');
// create curl resource
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, 'http://<servername>/Streaming/channels/1/picture');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, 'username:password');
// $output contains the output string
$output = curl_exec($ch);
echo $output;
// close curl resource to free up system resources
curl_close($ch);
?>

get (dynamic loading page) contents using PHP/CURL?

I try to program a webboot using PHP/CURL, but I face a problem in handling a specific page that it's loading some contents dynamically !! .. to explain more :
when I try to download the page using PHP/CURL, I do not get some contents ! then I discovered that this contents are loaded after page is loaded. and this is why CURL does not handle these missed contents.
can any one help me !
my sample code is :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirect);
curl_setopt($ch, CURLOPT_COOKIEFILE, ABSOLUTE_PATH."Cookies/cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, ABSOLUTE_PATH."Cookies/cookies.txt");
$result = curl_exec($ch);
What URL are you trying to load? It could be that the page you're requesting has one or more AJAX requests that load content in after the fact. I don't think that cURL can accomodate runtime-loaded information via AJAX or other XHR request.
You might want to look at something like PhantomJS, which is a headless WebKit browser which will execute the page fully and return the dynamically assembled DOM.
Because the page uses javascript to load the content, you are not going to be able to do this via cURL. Check out this page for more information on the problem: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html

When I get a page with a cURL request, how to navigate that page if paths are relative?

This is probably an easy question but I can't find the answer... I have a PHP script named 'send.php' which makes a cURL request to open an external web page. It outputs the external page to the browser. All completely by the books.
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_exec($ch);
All it does is posts some POST data to a processing script on an external site and then displays on the browser whatever that external script would display normally; ie, a confirmation message, thank you, etc.
Problem is: My 'send.php' is still the url that appears up in the navigation bar. So if I click around on that page, and the links are using relative paths, it tries to append my current path with those relative paths, which of course leads to a 404. Additionally, if there are more form fields on the page, and the action path is an empty string, it will try to post those submissions to send.php again on my server, which then generates errors.
How can I make it so it will still send the post data and output the result of the processing script but still allow the user to navigate the output page as they normally would? Or if it's a multi-page form, they can continue filling out page 2 as if they were just on that site?
Thanks in advance
Update: Solved by adding these lines to the above code:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$response = str_ireplace('<head>', "<head><base href=\"$url\" />", $response);
echo $response;
You can get the URL that curl resolves to (if you're using FOLLOWLOCATION with curl_getinfo and CURLINFO_EFFECTIVE_URL. You can prepend this URL to all relative paths. As for how to tell whether a path is relative .. well .. if it starts with a '/' it's absolute, which actually makes it "relative" to the domain. If it starts with a scheme, it's also absolute, and it may even lead to a different domain.
As to how to actually find the URLs .. you could use DOMDocument::loadHTML and use DOMXPath to find all anchor tags (or links, if you like). Think about how much money Google engineers get paid for site scraping and URL following -- this is probably not the simplest thing in the world to do optimally.
I may be reaching here
$url = "http://example.com";
$url2 = "http://www.example.com";
$url3 = "https://example.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$text = curl_exec($ch);
$text = str_replace("href=\"$url","href=\"",$text);
$text = str_replace("href=\"$url2","href=\"",$text);
$text = str_replace("href=\"$url3","href=\"",$text);
$text = str_replace("href=\"","href=\"$url",$text);
echo "$text";
curl_close($ch);
Is there some reason you can't just have your page post directly to the other server from the client side? Why use cURL at all when you just want to redirect your user to the other page?
<form action="https://other.server.com/url" method="post">
<!-- if the data has been previously collected and isn't being entered right now by the user... -->
<?php foreach ($postdata as $key => $val) { ?>
<input type="hidden" name="<?= $key; ?>" value="<?= $val; ?>">
<? } ?>
</form>

Download file attached to header with Curl and Php

I'm connecting to a website daily to collect some statistics, the website runs .net to make things extra difficult. What i would like to do is to mechanize this process.
I go to http://www.thesite.com:8080/statistics/Login.aspx?ReturnUrl=%2Fstatistics%2Fdataexport.ashx%3FReport%3D99, (the return url is /statistics/dataexport.ashx?Report=99 decoded).
The Login.aspx displays a form, in which I enter my user/pass and when the form is submitted the dataexport.ashx starts to download the file directly. The filename delivered is always statistics.csv.
I have experimented with this a few days now. Are there any resources or does anyone have some kind of hint of what I should try next?
Here is some of my code.
<?php
// INIT CURL
$ch = curl_init();
// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL, $url);
// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);
// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
$viewstate = urlencode('/wEPDwUKM123123daE2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFGG1fTG9naW4kTG9naW5JbWFnZUJ1dHASdasdRvbij2MVoasdasdYibEXm/eSdad4hS');
$eventval = urlencode('/wEWBAKMasd123LKJJKfdAvD8gd8KAoCt878OED00uk0pShTQHkXmZszVXtBJtVc=');
curl_setopt ($ch, CURLOPT_POSTFIELDS, "__VIEWSTATE=$viewstate"."__EVENTVALIDATION=$eventval&UserName=myuser&Password=mypassword");
// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
// FOLLOW REDIRECTS AND READ THE HEADER
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, true);
// EXECUTE REQUEST (FORM LOGIN)
$store = curl_exec ($ch);
// print the result
print_r($store);
// CLOSE CURL
curl_close ($ch);
?>
Thanks
Trikks
You also need to use CURLOPT_COOKIEFILE to send the cookies along with the next request. Another thing if i remember correctly is that ASPX would set unique value each time for variables like __VIEWSTATE. See if these 2 pointers help.

Categories