I have a PHP scraper that works perfectly on my local. But when I uploaded it to my VPS (Ubuntu 16.04), it's not able to get data from the website. Instead, it's showing this error message:
"curl: (56) GnuTLS recv error (-54): Error in the pull function"
I updated Openssl, Curl, GnuTLS still no luck. Tried to perform CURL from command line, it showed same error. It should be something to do with CURL /GnuTLS. I saw some people had same error message when using Git and fixed it but that solution isn't working in my case. Is there any way to fix it?
Here is the PHP function I use to get data from website:
function get_html($url)
{
$agent= 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Mobile Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language: q=0.9,en-US;q=0.8,en;q=0.7',
'Connection: keep-alive'
));
curl_setopt($ch, CURLOPT_URL,$url);
$pageContent = curl_exec($ch);
curl_close($ch);
return $pageContent;
}
Thanks in advance.
Related
I have a php script that periodically perform an http request to a remote api server.
I use curl to perform this task. My code is like the following exemple:
$url='http://apiserver.com/services/someservice.php?apikey='.$my_key;
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept-Encoding: gzip, deflate',
'Accept: */*',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:48.0) Gecko/20100101 Firefox/48.0'
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
$data=curl_exec($ch);
curl_close($ch);
$json_data = json_decode($data,TRUE);
Recently I verified that it no longer fetch new data, it is using instead a cached version. I tried to add the following curl flags to the code:
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
This did not solve the problem. I still receive the same cached response.
If I add an extra parameter to the api url like "&time=".time(), that fix the problem, but I dont want to add extra parameters to the url.
What can I do to solve the problem?
I wrote a script to request information from a remote website. I debugged everything locally and deployed to server.
Everything run smoothly on my localhost until I loaded to the server where the curl_execute wasn't able to connect to the target host. I debugged with another URL and it worked so I guess there is no configurations needed from the server side. I am guessing the target host denies or something a response to the request - I just don't know how nor why.
This is the code I use to make the request.
ch = curl_init();
$http_headers = array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124'
);
curl_setopt($ch, CURLOPT_URL, $targetURL);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $http_headers);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
What can I do to emulate a 'normal' request and avoid being denied by the target host? Any tips, appreciated.
Regards
Even though I have set curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true) cURL doesn't want to follow redirects, it only shows the "301 Moved page". Tried it with multiple sites.
Strange thing is that it works on localhost, but when I upload it to my webspace then refuses to work.
Is it possible that my web hosting provider made some tweaks that it doesn't work? Never seen such thing :(
Here's the code:
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://google.com');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Connection: keep-alive'
));
$result = curl_exec($ch);
curl_close($ch);
I had a similar issue and it was due to cURL executing a GET immediately after receiving the redirect header. To fix this i specified CURLOPT_CUSTOMREQUEST
Example:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
Why I get empty source? When I uncommit
//curl_setopt($ch, CURLOPT_URL, 'www.onet.pl');
and commit
'Accept-Encoding: gzip,deflate,sdch',
everything works fine for www.onet.pl. Why it doesn't work for www.ebok.pl (?
<?php
$COOKIEFILE = $_SERVER['DOCUMENT_ROOT'] . '/../data/config/ebok.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $COOKIEFILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch,CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml',
'Accept-Encoding: gzip,deflate,sdch',
'Accept-Language: pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36')
);
//curl_setopt($ch, CURLOPT_URL, 'http://www.ebok.pl');
curl_setopt($ch, CURLOPT_URL, 'https://ssl.plusgsm.pl/ebok-web/');
//curl_setopt($ch, CURLOPT_URL, 'www.onet.pl');
echo curl_exec($ch);
?>
I need to log-in to this page.
If something isn't working right, there's almost always a way to find out why. In curl's case, curl_error will tell you.
If i change your curl_exec line to instead say
$result = curl_exec($ch);
if ($result === false) die(curl_error($ch));
I get this:
[cHao#hydra-vm ~]$ php curl.php
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac%
Looks like this has something to do with the server doing SSL/TLS handshaking incorrectly, but i don't know enough about curl or SSL to say for sure.
Reason #15 to always check the result from functions that can return errors.
Either way, if i force the SSL version to TLS v1.0, like so:
curl_setopt($ch, CURLOPT_SSLVERSION, 4);
then it works for me. Generic TLSv1 fails with the same error as above, versions higher than 1.0 give me an error about "unsupported protocol", SSLv3 says "alert handshake failure", and SSLv2 simply isn't supported on my machine.
I wrote a PHP script that curls URLs to get the page html. The page content comes back about 50% of the time and the rest of the time only part of the content is returned and the script fails to terminate. I'm not getting any errors...
$headers = array(
'Accept-Language: en-US,en;q=0.8',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.youtube.com/channel/UCkN6ktadXpZl_viwRCSEGUQ');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$curl_info = curl_getinfo($ch);
$response = curl_exec($ch);
curl_close($ch);
print $response;
print_r($curl_info);
Run on CLI:
php script_name.php
If ran 10 times or so you will see that it is unable to complete at least a few times with no warnings or errors...
Ubuntu had performed a bunch of system updates. I was working with the code when the updates finished. After a system restart this issue went completely away. Go figure.