Bypassing loading screen when getting HTML content with curl

Bypassing loading screen when getting HTML content with curl - php

We are using curl to get a response from a third-party webserver. there's a code snippet:
$url = "https://book.some-site.com/cgi-bin/booking-form.cgi";
$uagent = "Opera/9.80 (Windows NT 6.1; WOW64) Presto/2.12.388 Version/12.14";
$ch = curl_init( $url );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_USERAGENT, $uagent);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
Everything is working fine till we hit a loading screen on one of the pages. We get the following response from the webserver "...We are processing your request...Your search results will display shortly." which is a loading/waiting screen. after that we get nothing.
When working in a browser after the loading screen the actual response is displayed.
Any ideas how to get the actual response nad bypass the loading screen?
Thanks in advance.

Usually, when a website has a loading screen, then shows the results without redirecting you to a new page, it means they loaded the results via Ajax. So the HTML page loads with nothing but a "hey, it's loading" message, and then some JavaScript runs that downloads the actual content from a different page. You'll need to investigate their JS code and then load the page that they load via Ajax.
You might look into enabling "logging XMLHttpRequests" in your web browser's developer tools to make it easier to figure out what page they're loading via Ajax.

Related

PHP: Fetch remote page AFTER JS events have loaded

I'm using a simple method of loading a remote webpage that works fine mostly:
$output = file_get_contents($item['URL']);
$html->loadHTML($output);
After which I can search for tags by type or name or ID, but the problem is that the main content I want is generated after the fact by JS in the last second. When loading in a browser, you don't notice it, but when trying to get it with file_get_contents, I get the page as it exists before the last minute JS runs.
Here's the partial code that loads what I want so you can see what I mean, but it's pretty straighforward: the page I get isn't the "complete" page.
<script type="text/javascript">ImageMachine.prototype.ImageMachine_Generate_Thumbnail = function (thumbnail_image, main_image, closeup_image, type_code) {
var thumbnail,
img;
I tried using CURL too, but no luck.
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,;q=0.8";
$header[] = "Connection: keep-alive";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $item['URL']);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header_str);
// curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_USERAGENT, $this_time);
$output = curl_exec($ch);
curl_close($ch);
#$html->loadHTML($output);
Is there a way to get the whole thing? I want the same page a browser or user would see if they load the page.

How to use PHP CURL to bypass cross domain

I need PHP to submit paramaters from one domain to another. JavaScript is not an option for my situation. I'm now trying to use CURL with PHP, but have not been successful in bypassing the cross domain.
From domain_A, I have a page with the following PHP with CURL script:
if (_iscurl()){
echo "<p>CURL is enabled</p>";
$url = "http://domain_B/process.php?id=123&amt=100&jsonp=?";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($ch, CURLOPT_USERAGENT , "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)");
curl_setopt($ch, CURLOPT_URL, $url );
$return = curl_exec($ch);
curl_close($ch);
echo "<p>Finished operations</p>";
}
else{
echo "CURL is disabled";
}
?>
I am not getting any results, so I am assuming that the PHP CURL script is not successful. Any ideas to fix this?
Thanks

Well, its bit late. But adding this answer for further readers who might face similar issue. This issue arises some times when we are sending php curl request from a domain hosted over http to a domain hosted over https (http over ssl).
Just add below code snippet before curl execution.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);

Using false in CURLOPT_RETURNTRANSFER doesn't return anything by curl. make it true(or 1)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

How to get page content using cURL?

I would like to scrape the content of this Google search result page using curl.
I've been trying setting different user agents, and setting other options but I just can't seem to get the content of that page, as I often get redirected or I get a "page moved" error.
I believe it has something to do with the fact that the query string gets encoded somewhere but I'm really not sure how to get around that.
//$url is the same as the link above
$ch = curl_init();
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0'
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
curl_setopt ($ch,CURLOPT_TIMEOUT,120);
curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
echo curl_exec ($ch);
What do I need to do to get my php code to show the exact content of the page as I would see it on my browser? What am I missing? Can anyone point me to the right direction?
I've seen similar questions on SO, but none with an answer that could help me.
EDIT:
I tried to just open the link using the Selenium WebDriver, that gives the same results as cURL. I am still thinking that this has to do with the fact that there are special characters in the query string which are getting messed up somewhere in the process.

this is how:
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
Example
//Read a web page and check for errors:
$result = get_web_page( $url );
if ( $result['errno'] != 0 )
... error: bad url, timeout, redirect loop ...
if ( $result['http_code'] != 200 )
... error: no page, no permissions, no service ...
$page = $result['content'];

For a realistic approach that emulates the most human behavior, you may want to add a referer in your curl options. You may also want to add a follow_location to your curl options. Trust me, whoever said that cURLING Google results is impossible, is a complete dolt and should throw his/her computer against the wall in hopes of never returning to the internetz again.
Everything that you can do "IRL" with your own browser can all be emulated using PHP cURL or libCURL in Python. You just need to do more cURLS to get buff. Then you will see what I mean. :)
$url = "http://www.google.com/search?q=".$strSearch."&hl=en&start=0&sa=N";
$ch = curl_init();
curl_setopt($ch, CURLOPT_REFERER, 'http://www.example.com/1');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
curl_setopt($ch, CURLOPT_URL, urlencode($url));
$response = curl_exec($ch);
curl_close($ch);

Try This:
$url = "http://www.google.com/search?q=".$strSearch."&hl=en&start=0&sa=N";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
curl_setopt($ch, CURLOPT_URL, urlencode($url));
$response = curl_exec($ch);
curl_close($ch);

I suppose that have you noticed that your link is actually an HTTPS link....
It seems that CURL parameters do not include any kind of SSH handling... maybe this could be your problem.
Why don't you try with a non-HTTPS link to see what happens (i.e Google Custom Search Engine)...?

Get content with Curl php
request server support Curl function, enable in httpd.conf in folder Apache
function UrlOpener($url)
global $output;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
echo $output;
If get content by google cache use Curl you can use this url: http://webcache.googleusercontent.com/search?q=cache:Put your url
Sample: http://urlopener.mixaz.net/

PHP CURL - Problems storing and using cookies when scraping

I've been trying to write a script that retrieves Google trends results for a given keyword. Please note im not trying to do anything malicious I just want to be able to automate this process and run it a few times every day.
After investigating the Google trends page I discovered that the information is available using the following URL:
http://www.google.com/trends/trendsReport?hl=en-GB&q=keyword&cmpt=q&content=1
You can request that information mutliple times with no issues from a browser, but if you try with "privacy mode" after 4 or 5 requests the following is displayed:
An error has been detected You have reached your quota limit. Please
try again later.
This makes me think that cookies are required. So I have written my script as follows:
$cookiefile = $siteurl . '/wp-content/plugins/' . basename(dirname(__FILE__)) . '/cookies.txt';
$url = 'http://www.google.com/trends/trendsReport?hl=en-GB&q=keyword&cmpt=q&content=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$x='error';
while (trim($x) != '' ){
$html=curl_exec($ch);
$x=curl_error($ch);
}
echo "test cookiefile contents = ".file_get_contents($cookiefile)."<br />";
echo $html;
However I just can't get anything written to my cookies file. So I keep on getting the error message. Can anyone see where I'm going wrong with this?

I'm pretty sure your cookie file should exist before you can use it with curl.
Try:
$h = fopen($cookiefile, "x+");

CURL login & submit another post

In order to learn PHP my boss asked me to do some sort of project. I've done so far a To Do List & Reminder (www.frontpagewebdesign.com/newfolder) but what I'm trying to do right now is sending SMS notifications.
Because I cannot afford to buy a SMS gateway for such a small project, I decided to use my account on this website: www.sms-gratuite.ro. My trouble is the automatic Login and SMS sending with CURL.
I followed a tutorial and this is what I've done so far:
<?php
$form_vars = array();
//array for SMS sending form values
$username = '****#****.com';
$password = '********';
$loginUrl = 'http://sms-gratuite.ro/page/autentificare';
$postUrl='http://sms-gratuite.ro/page/acasa';
$form_vars['to'] = "076xxxxxxx";
//my own phone number
$form_vars['mesaj'] = "test";
//SMS text
$encoded_form_vars = http_build_query($form_vars);
$user_agent="Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
//init curl
$ch = curl_init();
//Set the URL to work with
curl_setopt($ch, CURLOPT_URL, $loginUrl);
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
//Set the post parameters (mail and parola are the IDs of the form input fields)
curl_setopt($ch, CURLOPT_POSTFIELDS, 'mail='.$username.'&parola='.$password);
//Handle cookies for the login
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
//execute the request (the login)
$store = curl_exec($ch);
//check if the Login was succcesful by finding a string on the resulting page
if(strpos($store, "Trimite mesaj")===FALSE)
echo "logged in";
else
echo "not logged";
//set the landing url
curl_setopt($ch, CURLOPT_URL, 'http://sms-gratuite.ro/page/autentificare');
$referer='';
curl_setopt($ch, CURLOPT_URL, $postUrl);
//curl_setopt($ch, CURLOPT_HTTPHEADER,array("Expect:"));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'to='.$form_vars['to'].'&mesaj='.$form_vars['mesaj']);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
$result = curl_exec($ch);
if(strpos($result, "Mesajul a fost trimis")===FALSE)
echo "<br>sms sent";
else
echo "<br>sms not sent";
curl_close($ch);
?>
I don't have any errors but it surely doesn't work. First of all the login fails. Is this form a particular one and the curl cannot handle it?

You could to put every curl request result into a different file, say page1.html, page2.html, etc. This way you can open then in browser and see what's the exact page you got in return for your request
You need to make exactly same request, as browser would do. There are browser addons like HttpFox (if you are using FireFox) that can show you all fields that were sent, all cookies and everything else related. You can compare that lists to what your curl is forming to find lacking pieces
Try theese steps and comment with further errors that you got, preferably detailed.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Bypassing loading screen when getting HTML content with curl - php

Related

PHP: Fetch remote page AFTER JS events have loaded

How to use PHP CURL to bypass cross domain

How to get page content using cURL?

PHP CURL - Problems storing and using cookies when scraping

CURL login & submit another post

Categories

Resources