PHP cURL File Download over HTTP2 - php

I have a php 7.4 script that downloads a zip file using cURL. Both servers are
Apache/2.4.51 (Fedora)
Fedora 35
OpenSSL version 1.1.11
If I use CURL_HTTP_VERSION_1_0 all works. CURL_HTTP_VERSION_2_0 does not. Apache on the server I am calling has protocol h2 set. Below are the pertinent lines of code.
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0); // this is where I change to ver 2
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 BonEcho/2.0");
$html = curl_exec($ch);
the error I get using CURL_HTTP_VERSION_2_0 is Curl Error: transfer closed with 4 bytes remaining to read
Also, I can successfully cURL from the cli to the server from the same box the script is on with --http2.
What else should I try? Is there other info I should post to help answer?
EDIT: Is it possible the Content-Length header is being incorrectly set on the sending side?

Related

Curl and file_get_contents timeouts in PHP (command line curl with same URL works normally)

I'm trying to retrieve the contents of a URL: https://www.cyber.gov.au/.
If I use wget or curl from the command line, all is fine. The response is almost instant.
$ wget https://www.cyber.gov.au/
--2020-11-17 08:47:12-- https://www.cyber.gov.au/
Resolving www.cyber.gov.au (www.cyber.gov.au)... 92.122.153.122, 92.122.153.201
Connecting to www.cyber.gov.au (www.cyber.gov.au)|92.122.153.122|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41951 (41K) [text/html]
Saving to: ‘index.html’
index.html 100%[=========================================>] 40.97K --. KB/s in 0.002s
2020-11-17 08:47:13 (18.8 MB/s) - ‘index.html’ saved [41951/41951]
However, when I try to connect to the same URL through PHP curl, it times out with the message:
Operation timed out after 5001 milliseconds with 0 bytes received
I've reduced this to a test case:
$handle = curl_init('https://www.cyber.gov.au/');
curl_setopt($handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($handle, CURLOPT_TIMEOUT, 5);
curl_setopt($handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36');
$output = curl_exec($handle);
echo $output;
curl_close($handle);
I also tried with various combinations of these additional curl settings, with no change:
curl_setopt($handle, CURLOPT_FRESH_CONNECT, true);
curl_setopt($handle, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4); // Also tried specifying v6
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, 0);
It doesn't seem to be the DNS resolution time:
echo curl_getinfo($handle, CURLINFO_NAMELOOKUP_TIME); // 0.012 seconds
I've tried this on different machines, with different versions of PHP (7.2.12 and 7.4.10), and I get the same behaviour. Other URLs, both HTTP and HTTPS, work as expected. I get the same on CLI PHP as through Apache. Trying file_get_contents() gives a similar result, it just times out. Adding verbose curl logging didn't provide any more information.
curl --version gives curl 7.47.0 and curl 7.58.0 on the machines I tested on.
Can anyone spot what's going on or point me in the right direction to find out more about the problem?

Issue replicating PHP CURL request like browser

I am having some issues with a browser request that I am trying to replicate with curl. I am currently working on a university project and am stuck.
I am trying to replicate a browser request to the following URL: http://vm.tiktok.com/e9VDx8/ When I visit the page in my browser I am redirected to a page with a video and some other content. When I try using CURL I am being shown a 404 page not found error. My curl request looks like the following.
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, $USER_AGENT);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt( $ch, CURLOPT_COOKIEJAR, realpath('./cookies.txt') );
curl_setopt( $ch, CURLOPT_COOKIEFILE, realpath('./cookies.txt') );
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
$result = curl_exec($ch);
I have looked at the headers from the original URL in the browser and tried to copy paste them into curl but still I get the 404 page. If I copy the browser request as a curl request from chrome developer tools and run it in terminal it works fine.
curl "http://vm.tiktok.com/e9VDx8/" -H "Connection: keep-alive" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" -H "Accept-Encoding: gzip, deflate" -H "Accept-Language: en-US,en;q=0.9,fr-CA;q=0.8,fr;q=0.7" -H "Cookie: _ga=GA1.2.213365735.1552156986; _gid=GA1.2.1717226934.1552319684; tt_webid=6667489497775638018" --compressed
Any help would be really appreciated. I am stumped.
It turns out that I figure this issue out minutes after posting for help. Earlier in my script I truncate the URL to make sure there's no invalid characters and such. While doing so I changed the URL to lower case which caused the issue since the URL's are case sensitive.

PHP and cURL: same code but unable to fetch content on existing server conf

I'm running into some awkward issue when trying to fetch content from a URL.
The issue is the URL returns no response and the http "request header" size is 0.
I've run into similar issue like this on the same website before and have solved it (with some help on StackOverflow).
Previously, the issue was due to their SSL certificate being misconfigured and the solution was to set both CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER to false. However, it seems that this time, even with these two cURL parameters being set to false, cURL is still having problems fetching from the URL.
other notes:
i.) The web content can be fetched with a regular web browser, even with incognito mode.
ii.) The CURLOPT_HTTPHEADER that I set below is the exact headers sent by the web browser. Whether I set this parameter or not, cURL doesn't return anything (and the "request header" is still zero).
I've later tested the code on a LEMP server with Ubuntu 14.04 (using simply a stock server image provided by a web host) and the code works. So this seems to be a server issue and not a code issue.
the following are the configurations:
CentOS 6 LEMP server with PHP 5.3.3 (doesn't work):
curl:
curl 7.19.7
SSL Version NSS/3.14.0.0
openssl:
OpenSSL Library Version OpenSSL 1.0.0-fips 29 Mar 2010
CentOS 5 LAMP server with PHP 5.4.40 (doesn't work. this configuration is from the server setup by the shared web hosting provider. I've no control over it.):
curl:
curl 7.38.0
SSL Version OpenSSL/0.9.8b
openssl:
OpenSSL Library Version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
Ubuntu 14.04 LEMP server with PHP 5.5.9 (works. this is from the stock server image provided by a VPS web host):
curl:
curl 7.35.0
SSL Version OpenSSL/1.0.1f
openssl:
OpenSSL 1.0.1f 6 Jan 2014
Since I need to make this code work on a shared hosting environment, I wonder if anyone can tell me what cause the difference so I can make a change or request a change if possible.
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 20);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0");
curl_setopt($ch, CURLOPT_HTTPHEADER,
array(
'Host: www.example.com',
'User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0',
'Accept: application/json, text/javascript, */*; q=0.01',
'Accept-Language: en-US,en;q=0.5',
'DNT: 1',
'X-Requested-With: XMLHttpRequest',
'Referer: https://www.example.com/path/to/example.html',
'Connection: keep-alive',
));
$url = 'https://www.example.com/api/v1/example';
curl_setopt($ch, CURLOPT_URL, $url);
$output = curl_exec($ch);
print_r( curl_getinfo($ch) );
?>
To summarize. It does not work with SSL versions NSS/3.14.0.0 and OpenSSL/0.9.8b but works with OpenSSL/1.0.1f.
Looking at the SSLLabs report it can be seen that the host requires at least TLS 1.1, i.e. no support for TLS 1.0 or even lower. Support for TLS 1.1 was added to OpenSSL with version 1.0.1, i.e. it is not available with OpenSSL 0.9.8b. NSS 3.14.0 has support for TLS 1.1 included (just added with version 3.14) but as can be seen from this bug curl on RHEL6/CentOS6 will use TLS 1.0 only by default.

Retrieving RSS Feed from iTunes using cURL and PHP

Been trying to get some PHP cURL code to work that gets the RSS feed from iTunes when you give it the podcast URL. Here is the code:
$inputString = "curl -A 'iTunes/12.1.1.4 (Windows; U; Microsoft Windows 7 Home Premium Edition Service Pack 1 (Build 7601) DPI/96' -s 'https://itunes.apple.com/podcast/id530114975'";
$input = shell_exec($inputString);
$dom = new DOMDocument();
$html = $dom->loadHTML($input);
The cURL call when executed using shell_exec returns a blank string.
When I call the loadHTML function it gives the following error, which is pretty obvious given the cURL call doesn't return anything.....
Warning: DOMDocument::loadHTML(): Empty string supplied as input in C:\php scripts\itunesFeedExtractor.php on line 130
Now, I got the PHP cURL code from somewhere else, haven't used it before, and tried to modify it to match my computers setup.... I've changed Windows version, service pack, build no. (Don't know why the DPI/96 is needed so I left it alone)
You'd be better off using the PHP curl extension:
$ch = curl_init("https://itunes.apple.com/podcast/id530114975");
curl_setopt($ch, CURLOPT_USERAGENT, "iTunes/12.1.1.4 (Windows; U; Microsoft Windows 7 Home Premium Edition Service Pack 1 (Build 7601) DPI/96");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
But if you really want to use the shell_exec method, make sure curl is in your path - you can check by running the curl command from cmd / a terminal
Well I got it working by adding to more curl_setopt() options. The full code now reads:
$ch = curl_init("https://itunes.apple.com/podcast/id530114975");
curl_setopt($ch, CURLOPT_USERAGENT, "iTunes/12.1.1.4 (Windows; U; Microsoft Windows 7 Home Premium Edition Service Pack 1 (Build 7601) DPI/96");
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
Cheers.....

PHP cURL methods time out on some URLs, but command line always works

When I attempt to use PHP's cURL methods for SOME URLs, it times out. When I use the commandline for the same URL, it works just fine.
I am using AWS and have a t2.medium box running the php-55 apache libraries from yum.
Here is my PHP code:
function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 2);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept-Language: en-us'
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
$fh = fopen('/home/ec2-user/curllog', 'w');
curl_setopt($ch, CURLOPT_STDERR, $fh);
$a = curl_exec($ch);
curl_close($ch);
fclose($fh);
$headers = explode("\n",$a);
var_dump($headers);
var_dump($a);
exit;
return $result;
}
So here is call that works just fine:
curl('http://www.google.com');
And this returns the data for the homepage of google.
However, I try another URL:
curl('http://www.trulia.com/profile/agent-1391347/overview');
And I get this in the curllog:
[ec2-user#central Node]$ cat ../curllog
* Hostname was NOT found in DNS cache
* Trying 23.0.160.99...
* Connected to www.trulia.com (23.0.160.99) port 80 (#0)
> GET /profile/agent-1391347/overview HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
Host: www.trulia.com
Accept: */*
Accept-Language: en-us
* Operation timed out after 10002 milliseconds with 0 bytes received
* Closing connection 0
If I run this from the command line:
curl -s www.trulia.com/profile/agent-1391347/overview
It IMMEDIATELY returns (within 1 second) with NO output. This is expected. However when I run this:
curl -sL www.trulia.com/profile/agent-1391347/overview
It returns the page properly, just as I would want.
So, what is wrong with my curl?
PHP 5.5.20
Here is the cURL bit from my phpinfo():
curl
cURL support => enabled
cURL Information => 7.38.0
Age => 3
Features
AsynchDNS => Yes
CharConv => No
Debug => No
GSS-Negotiate => No
IDN => Yes
IPv6 => Yes
krb4 => No
Largefile => Yes
libz => Yes
NTLM => Yes
NTLMWB => Yes
SPNEGO => Yes
SSL => Yes
SSPI => No
TLS-SRP => No
Protocols => dict, file, ftp, ftps, gopher, http, https, imap, imaps, ldap, ldaps, pop3, pop3s, rtsp, scp, sftp, smtp, smtps, telnet, tftp
Host => x86_64-redhat-linux-gnu
SSL Version => NSS/3.16.2 Basic ECC
ZLib Version => 1.2.7
libSSH Version => libssh2/1.4.2
I have checked your function curl() It seems fine. No need to change anything in the function. What should you need to do is just pass the URL as it is as parameter no need to change HTTPS to HTTP
curl('http://www.trulia.com/profile/agent-1391347/overview');
Reason:
You already told curl to don't verify the SSL
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
Let me know if you need any explanation.
The verbose output shows a clear timeout problem:
Operation timed out after 10002 milliseconds with 0 bytes received
This signals a problem with your network setup. These are harder to locate, this can be on your own end (e.g. in context of the webserver or the PHP executable) or on the other end. Both places are possible to a certain extend, however the server accepts both requests even if they have different request headers, so it is more likely that this is execution context related which is also how you generally describe it.
Check if there are any restrictions on security and other networking layers regarding performing those requests via PHP. E.g. try a different server image if you're not so into system administration and trouble-shooting. From what is shared in your question, this is hard to say what exactly causes your timeout.
Try increasing the timeout values in the following lines:
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
Those are pretty short timeout values - the CURLOPT_TIMEOUT specifically limits the entire execution time, try giving larger values:
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
You have 2 VARIABLES
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
The first one, CURLOPT_CONNECTTIMEOUT is maximum amount of time allowed to make connection to the server`
You can disable it by setting it to 0.
That is
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
But it is not a good method if you are in a production environment because it will never timeout.
Now CURLOPT_TIMEOUT
From PHP Documentation
The maximum number of seconds to allow cURL functions to execute.
Set it to some higher value
curl_setopt($ch, CURLOPT_TIMEOUT, 20); // 20 Seconds.

Categories