PHP Curl Randomly Hangs - php

I wrote a PHP script that curls URLs to get the page html. The page content comes back about 50% of the time and the rest of the time only part of the content is returned and the script fails to terminate. I'm not getting any errors...
$headers = array(
'Accept-Language: en-US,en;q=0.8',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.youtube.com/channel/UCkN6ktadXpZl_viwRCSEGUQ');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$curl_info = curl_getinfo($ch);
$response = curl_exec($ch);
curl_close($ch);
print $response;
print_r($curl_info);
Run on CLI:
php script_name.php
If ran 10 times or so you will see that it is unable to complete at least a few times with no warnings or errors...

Ubuntu had performed a bunch of system updates. I was working with the code when the updates finished. After a system restart this issue went completely away. Go figure.

Related

Google Places Nearby API returns only 6 results with CURL PHP

I am using google places API and when i request provided URL in chrome browser (on mac) than it returns more then 20 results.
But when i request same URL in PHP curl code it returns only 6 results.
Couldn't get it working. Please help.
$ch = curl_init();
$url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=31.4814662,74.411793&radius=20000&keyword='.urlencode('ac technician').'&key=API_KEY';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$response = curl_exec($ch);
echo "#####".$response; exit();
I couldn't reproduce this as your code is returning me 15 results for this same api call. I read a bit and it seems it returns only 5 results after doing many calls.
However, if you say that in a browser you get more results than in a curl call I would mimic the headers sent by the browser starting by the User-Agent. Check if adding this works for you:
$headers = array();
$headers[] = 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
This header is copying a header that could be sent by a firefox browser.

GnuTLS recv error (-54): Error in the pull function

I have a PHP scraper that works perfectly on my local. But when I uploaded it to my VPS (Ubuntu 16.04), it's not able to get data from the website. Instead, it's showing this error message:
"curl: (56) GnuTLS recv error (-54): Error in the pull function"
I updated Openssl, Curl, GnuTLS still no luck. Tried to perform CURL from command line, it showed same error. It should be something to do with CURL /GnuTLS. I saw some people had same error message when using Git and fixed it but that solution isn't working in my case. Is there any way to fix it?
Here is the PHP function I use to get data from website:
function get_html($url)
{
$agent= 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Mobile Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language: q=0.9,en-US;q=0.8,en;q=0.7',
'Connection: keep-alive'
));
curl_setopt($ch, CURLOPT_URL,$url);
$pageContent = curl_exec($ch);
curl_close($ch);
return $pageContent;
}
Thanks in advance.

PHP Curl disable cache

I have a php script that periodically perform an http request to a remote api server.
I use curl to perform this task. My code is like the following exemple:
$url='http://apiserver.com/services/someservice.php?apikey='.$my_key;
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept-Encoding: gzip, deflate',
'Accept: */*',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:48.0) Gecko/20100101 Firefox/48.0'
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
$data=curl_exec($ch);
curl_close($ch);
$json_data = json_decode($data,TRUE);
Recently I verified that it no longer fetch new data, it is using instead a cached version. I tried to add the following curl flags to the code:
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
This did not solve the problem. I still receive the same cached response.
If I add an extra parameter to the api url like "&time=".time(), that fix the problem, but I dont want to add extra parameters to the url.
What can I do to solve the problem?

PHP cURL doesn't follow redirects even if the flag is set

Even though I have set curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true) cURL doesn't want to follow redirects, it only shows the "301 Moved page". Tried it with multiple sites.
Strange thing is that it works on localhost, but when I upload it to my webspace then refuses to work.
Is it possible that my web hosting provider made some tweaks that it doesn't work? Never seen such thing :(
Here's the code:
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://google.com');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Connection: keep-alive'
));
$result = curl_exec($ch);
curl_close($ch);
I had a similar issue and it was due to cURL executing a GET immediately after receiving the redirect header. To fix this i specified CURLOPT_CUSTOMREQUEST
Example:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");

php - curl and source

Why I get empty source? When I uncommit
//curl_setopt($ch, CURLOPT_URL, 'www.onet.pl');
and commit
'Accept-Encoding: gzip,deflate,sdch',
everything works fine for www.onet.pl. Why it doesn't work for www.ebok.pl (?
<?php
$COOKIEFILE = $_SERVER['DOCUMENT_ROOT'] . '/../data/config/ebok.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $COOKIEFILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch,CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml',
'Accept-Encoding: gzip,deflate,sdch',
'Accept-Language: pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36')
);
//curl_setopt($ch, CURLOPT_URL, 'http://www.ebok.pl');
curl_setopt($ch, CURLOPT_URL, 'https://ssl.plusgsm.pl/ebok-web/');
//curl_setopt($ch, CURLOPT_URL, 'www.onet.pl');
echo curl_exec($ch);
?>
I need to log-in to this page.
If something isn't working right, there's almost always a way to find out why. In curl's case, curl_error will tell you.
If i change your curl_exec line to instead say
$result = curl_exec($ch);
if ($result === false) die(curl_error($ch));
I get this:
[cHao#hydra-vm ~]$ php curl.php
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac%
Looks like this has something to do with the server doing SSL/TLS handshaking incorrectly, but i don't know enough about curl or SSL to say for sure.
Reason #15 to always check the result from functions that can return errors.
Either way, if i force the SSL version to TLS v1.0, like so:
curl_setopt($ch, CURLOPT_SSLVERSION, 4);
then it works for me. Generic TLSv1 fails with the same error as above, versions higher than 1.0 give me an error about "unsupported protocol", SSLv3 says "alert handshake failure", and SSLv2 simply isn't supported on my machine.

Categories