Can't HTML Scrape Site Because Of SSL Error - php

I am working on a scraping script. It works on most websites but I cannot access a specific SSL site.
Here is my code:
if (!extension_loaded('openssl')){
// not occurring
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.chase.com/');
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
if($result === false)
{
$err = curl_error($ch);
//$err = SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
}
$result is always FALSE, and it shows this error message:
SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
But it works on other websites that have SSL. I also checked phpinfo(), cURL and OpenSSL are active. I am using WAMP, any ideas?

You need to set a Useragent. I tested with and without one and it fixes the issue. It appears Chase is wanting a UA to be provided in the request.
So add this:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US)');

I solved issue by just using following PHP librery.
https://github.com/rmccue/Requests
[use this library code on your Linux based server, may be it will not work on xampp or wamp ]

Related

curl_init is enabled but not working

I am trying to open a url using curl_init, but didn't get success to get the right responce. I am not able to share the exact url with all of you because of security reasons.
Below is my code
$ch = curl_init("Site URL");
print_r($ch);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$response = curl_exec($ch);
echo "Below is Responce <br/>";
echo $response;
print_r($response);
Below is the output
Resource id #2Below is Responce
Output is not throwing any error even if i used error_reporting(1) and report error is enabled in my webhosting php setting
I also checked error logs but nothing.
Could you please help me to find the cause.
Above code is working from different servers but not from my actual production server.
Please guide me to find the cause.
Here's a working code:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://xyz.curlinittest.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$data = curl_exec($ch);
echo $data;
curl_close($ch);
?>
Output: Hello World
Your third-party server was rejecting queries based on user-agent.
Set curl to act as if it was a firefox and .. tadaaa
So my code is correct so no need to provide an resolution for the same. Actually i raised the TKT with my hosting provider. As per the communication with Hosting Provider they confirm it is a server issue due to blocking of Port 80. I hope this will help others. So Port blocking is also an issue if your Curl_init is not working.

CURL PHP handshake failure SSL

I have the following code :
<?php
$cookie_file_path = "cookie.txt"; //
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'theurl');
curl_setopt($ch, CURLOPT_POSTFIELDS,'blocPnr_textField_labelNom='.urlencode('www').'&blocPnr_textField_labelPnr='.urlencode('xxx').'&blocPnr_valider=Submit');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSLVERSION,3);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "theurl");
$page = curl_exec($ch);
var_dump($page);
echo 'error:' . curl_error($ch);
?>
It gives me the following error:
bool(false) error:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure
I can't figure out where the pb comes from. I looked for similar error message on Google and S/O but haven't found any solution.
You're trying to use version 3 of the SSL protocol which is either refused or unsupported by the server. The POODLE attack pushed a lot of system administrators to drop support for SSLv3 and its usage is not so widespread anymore (and definitely not recommended).
When you have SSL handshake errors, try different versions of SSL/TLS until one works (preferably the most secure). If you have a doubt, using CURL_SSLVERSION_DEFAULT works in most cases.
curl_setopt($ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_DEFAULT);
It seems that formulaire.sncf.com supports TLSv1.0. You could also force use that protocol version:
curl_setopt($ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1_0);

php curl run send proxy ip

i made curl script , my script get blocked after website detects bot.
how can i prvent it ,
i tried below code
$ch = curl_init();
$proxy = "10.128.60.40:3128";
// needed to disable SSL checks for this site
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,0);
curl_setopt($ch,CURLOPT_VERBOSE, 0);
curl_setopt($ch,CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, "$proxy");
but still my i get data as you are blocked due to automated script,
how can u send dynamic ip to avoid this issue
you should use:
1) Anonimous proxies (dies quick and need to parse them)
OR
2) TOR https://www.torproject.org
OR
3) Be not so active. use sleep(1); in your code

Is this site blocking/ignoring my HTTP requests?

I'm only able to fetch from a site when I use cURL with a proxy. cURL without a proxy and file_get_contents() return nothing (cURL HTTP code "0" and curl_error()
Empty reply from server). I'm able to fetch other sites just fine without a proxy.
Aside from being blocked, is there any other possible explanation of why I can only access this site via proxy?
Did you set a USER AGENT in cURL? Sometimes websites will block you if your USER AGENT isn't set or if your HTTP request looks suspicious.
To set your USER AGENT in PHP:
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
Is this from your workplace or something? Many companies disable file_get_contents() on shared PHP installs, as it's quite risky.
The site probably has user agent detection. You can fake that in your curl call but I don't believe that's possible with file_get_contents(). Another method sites use is to only display content once a cookie has been set so site scrapers will never see the data.
Try this:
function curl_scrape($url,$data,$proxy,$proxystatus)
{
$fp = fopen("cookie.txt", "w");
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'on')
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start(); // prevent any output
return curl_exec ($ch); // execute the curl command
ob_end_clean(); // stop preventing output
curl_close ($ch);
unset($ch);
}
I'm guessing I was truly blocked. Using proxy now and it works fine.

Payment Gateway using cURL with SSL?

I am processing credit cards using a payment gateway. To POST the data to their servers, I am using cURL in PHP. I have an SSL certificate issued to my domain, to ensure all POST'ed data is encrypted. Because the SSL certificate is already installed, do I still need to use the SSL options for cURL? If so, which of the options do I need to set given my setup?
I have tried the following code unsuccessfully:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://secure.paymentgateway.com/blah.php");
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_CAINFO, getcwd().'/cert/ca.crt');
curl_setopt($ch, CURLOPT_SSLCERT, getcwd().'/cert/mycert.pem');
curl_setopt($ch, CURLOPT_SSLCERTPASSWD, 'password');
curl_setopt($ch, CURLOPT_POST, $count);
curl_setopt($ch,CURLOPT_POSTFIELDS,"variables...");
$output = curl_exec($ch);
echo $output;
curl_close($ch);
Well you already disabled the verification (which I don't recommend: curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);). This opens you for Man-in-the-middle attacks.
Here's a simple tutorial that might help you:
http://developer.paypal-portal.com/pdn/board/message?board.id=ipn&message.id=12754#M12754

Categories