PHP Curl disable cache - php

I have a php script that periodically perform an http request to a remote api server.
I use curl to perform this task. My code is like the following exemple:
$url='http://apiserver.com/services/someservice.php?apikey='.$my_key;
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept-Encoding: gzip, deflate',
'Accept: */*',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:48.0) Gecko/20100101 Firefox/48.0'
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
$data=curl_exec($ch);
curl_close($ch);
$json_data = json_decode($data,TRUE);
Recently I verified that it no longer fetch new data, it is using instead a cached version. I tried to add the following curl flags to the code:
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
This did not solve the problem. I still receive the same cached response.
If I add an extra parameter to the api url like "&time=".time(), that fix the problem, but I dont want to add extra parameters to the url.
What can I do to solve the problem?

Related

PHP cURL doesn't follow redirects even if the flag is set

Even though I have set curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true) cURL doesn't want to follow redirects, it only shows the "301 Moved page". Tried it with multiple sites.
Strange thing is that it works on localhost, but when I upload it to my webspace then refuses to work.
Is it possible that my web hosting provider made some tweaks that it doesn't work? Never seen such thing :(
Here's the code:
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://google.com');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Connection: keep-alive'
));
$result = curl_exec($ch);
curl_close($ch);
I had a similar issue and it was due to cURL executing a GET immediately after receiving the redirect header. To fix this i specified CURLOPT_CUSTOMREQUEST
Example:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");

php - curl and source

Why I get empty source? When I uncommit
//curl_setopt($ch, CURLOPT_URL, 'www.onet.pl');
and commit
'Accept-Encoding: gzip,deflate,sdch',
everything works fine for www.onet.pl. Why it doesn't work for www.ebok.pl (?
<?php
$COOKIEFILE = $_SERVER['DOCUMENT_ROOT'] . '/../data/config/ebok.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $COOKIEFILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch,CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml',
'Accept-Encoding: gzip,deflate,sdch',
'Accept-Language: pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4',
'Connection: keep-alive',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36')
);
//curl_setopt($ch, CURLOPT_URL, 'http://www.ebok.pl');
curl_setopt($ch, CURLOPT_URL, 'https://ssl.plusgsm.pl/ebok-web/');
//curl_setopt($ch, CURLOPT_URL, 'www.onet.pl');
echo curl_exec($ch);
?>
I need to log-in to this page.
If something isn't working right, there's almost always a way to find out why. In curl's case, curl_error will tell you.
If i change your curl_exec line to instead say
$result = curl_exec($ch);
if ($result === false) die(curl_error($ch));
I get this:
[cHao#hydra-vm ~]$ php curl.php
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac%
Looks like this has something to do with the server doing SSL/TLS handshaking incorrectly, but i don't know enough about curl or SSL to say for sure.
Reason #15 to always check the result from functions that can return errors.
Either way, if i force the SSL version to TLS v1.0, like so:
curl_setopt($ch, CURLOPT_SSLVERSION, 4);
then it works for me. Generic TLSv1 fails with the same error as above, versions higher than 1.0 give me an error about "unsupported protocol", SSLv3 says "alert handshake failure", and SSLv2 simply isn't supported on my machine.

browser headers for curl request

We have an assignment to filter authentic curl requests from robots. I am sending a curl request to the site, but it's returning to me an invalid image file(i know because when i view it with my browser it works). It somehow knows my request is not authentic. Is there a field I'm overlooking here, I'm trying to mimic a browser request exactly.
$header_arr = array(
'0' =>'Host: www.myittest.com',
'1' =>'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0',
'2' =>'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8',
'3' =>'Accept-Language: en-US,en;q=0.5',
'4' =>'Accept-Encoding: gzip, deflate',
'5' =>'Connection: keep-alive',
);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header_arr);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 6);
$raw=curl_exec($ch);
You have requested gzip/deflate encoding but haven't made curl aware of it so it doesn't decode the image. Adding this should fix it:
curl_setopt($ch, CURLOPT_ENCODING, '');

PHP Curl Randomly Hangs

I wrote a PHP script that curls URLs to get the page html. The page content comes back about 50% of the time and the rest of the time only part of the content is returned and the script fails to terminate. I'm not getting any errors...
$headers = array(
'Accept-Language: en-US,en;q=0.8',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.youtube.com/channel/UCkN6ktadXpZl_viwRCSEGUQ');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$curl_info = curl_getinfo($ch);
$response = curl_exec($ch);
curl_close($ch);
print $response;
print_r($curl_info);
Run on CLI:
php script_name.php
If ran 10 times or so you will see that it is unable to complete at least a few times with no warnings or errors...
Ubuntu had performed a bunch of system updates. I was working with the code when the updates finished. After a system restart this issue went completely away. Go figure.

file_get_contents()/curl getting unexpected page

I'm doing some scraping with php. I've been extracting data including link to the next relevant page so the whole thing is automatic. The problem is that I seem to be getting a page which is slightly modified compared to what I would expect using that URL in my browser (for e.g. the dates are different).
I've tried using curl and get_file_contents but both get the wrong file.
At the moment I am using:
$url = "http://www.example.com";
$ch = curl_init();
$timeout = 5;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
url_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$temp = curl_exec($ch);
curl_close($ch);
What is going on here?
UPDATE:
I've tried imitating a browser using the following code but still unsuccessful. I find this bizarre.
function get_url_contents($url){
$crl = curl_init();
$timeout = 10;
$header=array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive',
);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($crl, CURLOPT_AUTOREFERER, FALSE);
curl_setopt ($crl, CURLOPT_FOLLOWLOCATION, FALSE);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
Further update:
Seems that the site is using my location to discriminate. Is there a locale option?
Can be many things...
Server may render pages differently based on cookies and header sent
Server may render pages differently based on existing pre-conditions and states on the server
You may have a proxy in between that modifies the content based on user-agent and since you don't have a specific user-agent (such as CURL browser) then your proxy is sending back different content
This is just a few things that could happen!

Categories