Curl giving 503 Service Unavailable in response - php

My URL "www.example.com" is working in browser but when I get response via curl of URL "www.example.com" I get 503 service unavailable response.
I used the following code:
$url = 'http://www.example.com';
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($curl_handle, CURLOPT_TIMEOUT, 0);
curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl_handle, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, TRUE);
$JsonResponse = curl_exec($curl_handle);
$http_code = curl_getinfo($curl_handle);
print_r($http_code);die;

I'm pretty sure the remote server requires specific HTTP headers (cookies for example), like a session token or a language preference.
You have to analyze the HTTP traffic sent from your browser to the remote server and find the required HTTP headers yourself. I recommend a tool like Fiddler.
An example:
GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: foo=bar
Connection: keep-alive
Assuming the remote server requires clients to send a cookie with the name foo, he will probably send you a 503 or 400 error message in the case you omit it. You have to send the cookie from cURL as well in order to get a successful response, acting like a regular client.

Related

getting content of page with 'br' encoding and decoding it by php curl

I want to get content of this page by php curl:
my curl sample:
function curll($url,$headers=null){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
if ($headers){
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
}
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$response = curl_exec($ch);
$res['headerout'] = curl_getinfo($ch,CURLINFO_HEADER_OUT);
$res['rescode'] = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($response === false) {
$res['content'] = $response;
$res['error'] = array(curl_errno($ch),curl_error($ch));
return $res;
}
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$res['headerin'] = substr($response, 0, $header_size);
$res['content'] = substr($response, $header_size);
return $res;
}
response:
array (size=4)
'headerout' => string 'GET /wallets HTTP/1.1
Host: www.cryptocompare.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: br
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Upgrade-Insecure-Requests: 1
' (length=327)
'rescode' => string '200' (length=3)
'content' => boolean false
'error' =>
array (size=2)
0 => int 23
1 => string 'Unrecognized content encoding type. libcurl understands deflate, gzip content encodings.' (length=88)
response encoding is br and response content is false
I am aware that using gzip or deflate as encoding would get me a content. However, the content that I have in mind is only shown by br encoding.
I read on this page that Curl V7.57.0 supports the Brotli Compression Capability. I currently have version 7.59.0 installed, but Curl encounters an error as it recieves content in br encoding.
now I want to know how can I get content of a page with br encoding and uncompress it by php curl ?
I had the exact same issue because one server was only able to return brotli and my PHP Curl-bundled version didn't support Brotli. I had to use a PHP extension: https://github.com/kjdev/php-ext-brotli
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'URL');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output_brized = curl_exec($ch);
$output_ok = brotli_uncompress($output_brized);
I checked and, with PHP 7.4.9 on Windows with bundled Curl version 7.70.0, setting the CURLOPT_ENCODING option to '' (like you did) forced the bundled Curl to do the request with one additionnal header accept-encoding: deflate, gzip which are the content encodings the bundled Curl can decode. If I omited this option, there was just 2 headers: Host: www.google.com and accept: */*.
Indeed, searching the PHP source code (https://github.com/php/php-src/search?q=CURLOPT_ENCODING) for this CURLOPT_ENCODING option lead to nothing that may set a default value or change value from PHP. PHP sends the option value to Curl without altering it so what I am observing is the default behavior of my bundled Curl version.
I then discovered Curl supports Brotli from version 7.57.0 (https://github.com/curl/curl/blob/bf1571eb6ff24a8299da7da84408da31f0094f66/docs/libcurl/symbols-in-versions) from november 2018 (https://github.com/curl/curl/blob/fd1ce3d4b085e7982975f29904faebf398f66ecd/docs/HISTORY.md) but requires to be compiled with a --with-brotli flag (https://github.com/curl/curl/blob/9325ab2cf98ceca3cf3985313587c94dc1325c81/configure.ac) which was probably not used for my PHP version.
Unfortunately, there is no curl_getopt() function to get the default value of an option. But, phpinfo() gives a valuable info as I got a BROTLI => No line which confirms my version was not compiled with Brotli support. You may want to check your phpinfo to find out if your Curl-bundled version should support Brotli. If it doesn't, use my solution. If it does, more investigation need to be done to find out if it's a bug or a misuse.
If you want to know what your Curl sent, you have to use a proxy like Charles/Fiddler or use Curl verbose mode.
Additionnaly, for the sake of completness, in the HTTP1/1 specs (https://www.rfc-editor.org/rfc/rfc2616#page-102), it's said:
If an Accept-Encoding field is present in a request, and if the
server cannot send a response which is acceptable according to the
Accept-Encoding header, then the server SHOULD send an error response
with the 406 (Not Acceptable) status code.
If no Accept-Encoding field is present in a request, the server MAY
assume that the client will accept any content coding.
So, if your PHP version behaved the same as mine, the website should have received a Accept-Encoding not containing br so should NOT have replied with a br content and, instead, should have replied with a gzip or deflate content or, if it was not able to do so, replied with a 406 Not Acceptable instead of a 200.
if you using cloudflare, then you can try to disable brotli extension from cloudflare.

Curl different response with browser/postman , PHP

I try to crawl Twitter search using curl. last month it works but now it got 302 http response. but using browser and postman return 200 OK
this is my curl
$param = "?f=tweets&q=+LAPOR1708&src=typd&max_position=".$scrollCursor;
$url = "https://twitter.com/i/search/timeline".$param;
$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_setopt($ch, CURLOPT_HTTPHEADER, ["Accept: text/html"]);
dd(curl_getinfo($ch));
curl_close($ch);
and this is my curl_getinfo
my image
and response using postman
enter image description here
A 302 response is a redirect.
Postman automatically follows redirects.
cURL does not.
This is normal. You should follow the redirect.
Twitter’s Terms of Service prohibits crawling in this manner. You should use the official developer API to retrieve search results.

PHP curl headers(?) issue

There is an addon for firefox called httprequester. (https://addons.mozilla.org/en-US/firefox/addon/httprequester/)
When I use the addon to send a GET request with a specific cookie, everything works fine.
Request header:
GET https://store.steampowered.com/account/
Cookie: steamLogin=*removed because of obvious reasons*
Response header:
200 OK
Server: Apache
... (continued, not important)
And then I am trying to do the same thing with cURL:
$ch = curl_init("https://store.steampowered.com/account/");
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: steamLogin=*removed because of obvious reasons*"));
curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
$response = curl_exec($ch);
$request_header = curl_getinfo($ch, CURLINFO_HEADER_OUT);
echo "<pre>$request_header</pre>";
echo "<pre>$response</pre>";
Request header:
GET /account/ HTTP/1.1
Host: store.steampowered.com
Accept: */*
Cookie: steamLogin=*removed because of obvious reasons*
Response header:
HTTP/1.1 302 Moved Temporarily
Server: Apache
... (continued, not important)
I don't know if it has anything to do with my problem, but a thing I noticed is that the first lines of the request headers are different
GET https://store.steampowered.com/account/
and
GET /account/ HTTP/1.1
Host: store.steampowered.com
My problem is that I get 200 http code with the addon and 302 with curl, however I'm sending (or trying to send) the same request.
The page is doing some redirect, so you must follow it
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
If i really understand your problem, the thing is cURL is not following the redirect. He don't do that by default, you need to set a option:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
With this, cURL is able to follow the redirects.
To set the Cookies to the request use, (You may need pass the user agent):
curl_setopt($ch, CURLOPT_COOKIE, "Cookie: steamLogin=*removed because of obvious reasons*; User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0");
I think your addon sends the useragent string by default from the browser. If you add useragent string with your curl request, I believe your problem will resolve!
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
"Cookie: steamLogin=*removed because of obvious reasons*",
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"
));

HTTP headers sent by PHP

I am using CURL and file_get_contents to find out the basic difference between a server request for a page and a browser request (organic).
I am requesting for a PHPINFO page both ways and found that it is giving different output in different cases.
For example, when I am using a browser the PHPINFO shows this:
_SERVER["HTTP_CACHE_CONTROL"] no-cache
This info is missing when I am requesting the same page through PHP.
My CURL:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/phpinfo.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0");
curl_setopt($ch, CURLOPT_INTERFACE, $testIP);
$output = curl_exec($ch);
curl_close($ch);
My file_get_contents:
$opts = array(
'socket' => array('bindto' => 'xxx.xx.xx.xx:0'),
'method' => 'GET',
'user_agent ' => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0", // this doesn't work
'header' => array('Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8')
);
My goal:
To make a PHP request look identical to a browser request.
one of possible ways for server to detect you are a php code not a browser is check your cookie. with php curl request to the server once and inject the cookie you get to your next request.
check here :
http://docstore.mik.ua/orelly/webprog/pcook/ch11_04.htm
one other way that server can understand you are a robot(php code) is check referer http header.
you can learn more here :
http://en.wikipedia.org/wiki/HTTP_referer

cURL gets less cookies than FireFox! How to fix it?

How can I make cURL to get all cookies?
I thought maybe firefox gets different cookies as the page loads or it has some built-in javascript that sets some cookies after the page is loaded, or maybe it redirects to other pages and other pages set other cookies, but I don't know how to make curl do the same thing. I set curl to follow redirects but still no success. Curl does sets some cookies but not all of them.
following is the code I use in php:
$url = 'https://www.example.com';
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_COOKIESESSION, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($handle, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($handle, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($handle, CURLOPT_AUTOREFERER, true);
curl_setopt($handle, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$htmlContent = curl_exec($handle);
Following is from Live HTTP header in Firefox
https://www.example.com
GET /index.ext HTTP/1.1
Host: www.example.com User-Agent:
Mozilla/5.0 (Macintosh; U; Intel Mac
OS X 10.6; en-US; rv:1.9.2.10)
Gecko/20100914 Firefox/3.6.10
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie:
JSESSIONID=3E85C5D0436D160D0623C085F68DC50E.catalog2;
__utma=137925942.1883663033.1299196810.1299196810.1299198374.2; __utmz=137925942.1299196810.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
citrix_ns_id=0pQdumY48kxToPcBPS/QQC+w2vAA1;
__utmc=137925942
HTTP/1.1 200 OK
Date: Fri, 04 Mar 2011 01:20:30 GMT
Server: Apache/2.2.15
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html;charset=UTF-8
I only get JSESSIONID with curl
Please help!
possibly page you are loading has some other content that actually sets cookies and since ou are only rading one page you don't get them, or some cookies are set through javascript.
Try using a Firefox user agent on CURL and see if you get the same amount of cookies. You should.
Use a network sniffer or a proxy to compare requests and responses, you have differences for sure. Post the requests and responses here if you still can't find.
If faking the user agent on curl side does not work, try to do the opposite by installing a firefox extension which fakes the user agent, and set it to the one used by curl. If it works, it may be some passive browser fingerprinting (such as p0f by lcamtuf) relying on network timing, and you may have hard time to workaround it. Would be extremely surprising though!
I figured it out. It was actually JavaScript that set cookies after the page was loaded:)
Thanks everybody

Categories