Crawling a second page answer; curl - php

I'm crawling a few websites, everything it's working fine,
but .... I have one specific website that I'm trying to crawl,
and it's making a few "redirects" before landing to the web I want.
So it's something like ...
http://www.example.com/?day=01/01/2016&action=search_prices
this will go to http://www.example.com/search/default.aspx take a few seconds to search the answer page and then show it on there.
Is there any way to easily do this? any hint, clue, etc would be awesome
Simple code right now (almost all the sites I was crawling were jsons):
function get_web_page( $url ){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_HTTPHEADER => array('HeaderName: HeaderValue'),
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}

Related

cURL error 35 SSL Connect Error on hostgator live server

I have searched a lot for this issue on the internet even on stackoverflow but unable to reach at certain solution. I'm using the following code to make a curl request to one of https website.
<?php
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false, // Disabled SSL Cert checks
CURLOPT_SSL_VERIFYHOST => false
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo '<pre>';
print_r(get_web_page('https://savedeo.com/download?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DbVl3om0-GFE'));die;
?>
This code is absolutely working fine at my localhost xampp but not working on hostgator shared hosting. Can anybody tell me the exact issue or what am i doing wrong here.
Add 'CURLOPT_VERBOSE => true' to your options array to give a more detailed output this should give you the cause.
If you have access to the firewall settings ensure the port for the connection is actually open.

how to include https site using php

I tried to include iframe videos in my php, like this one Vk Video, so my php file will do the exact job as that video page, but i haven't gone so far, i tried file_get_contents but i got Warning with https videos. so is there any way that i can include http and https pages inside my php file?
note: i am bad at English so sorry ...
note: i am beginner in php ...
Like this:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
echo $content;
}
get_web_page( 'http://vk.com/video_ext.php?oid=250349454&id=169859391&hash=af2327b1f2db9f87&hd=1' );

cURL returning "Object not found!"

I'm having problem with cURL, it's returning no result or "Object not found!"
I know the site actually provide their own API, but i need more information to be grabbed.
How to solve this problem ?
here's the code
<?php
$URL = "http://myanimelist.net/manga.php?q=nisekoi";
//Initl curl
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo get_web_page($URL)['content'];
?>

Get result of redirection as the client (curl)

I'm trying to get the "true" url behind a redirection, i have no trouble getting the url with curl, my problem is that the link returned by curl is only valid for the server's IP address.
Now i'm looking for something to get the real url "as the client", without java or flash, and i have no idea how to do it to be honest.
My current code:
function get_url( $url ) {
$res = array();
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_USERAGENT => $_SERVER['HTTP_USER_AGENT'],
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$res['content'] = $content;
return $res;
}
This is not possible through CURL. The server probably generates the link based on your IP address. You can not override your IP address with client's through curl. So try to use other methods you have mentioned on the question.

cURL HTTPS follow redirection

I had the below code working to get a HTTP url follow the redirect and then pass back the new page url it was on.
// Follow URL
private function follow_url($url) {
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$output = $header["url"];
return $output;
}
I am now trying to get it to work with HTTPS but it does not follow on it stops at the inputted URL.
Is there anything I can do to fix this?
Add the following options:
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_SSL_VERIFYPEER => false,

Categories