So if I'm browsing http://www.example.com/user1.jpg I see the user's picture.
But if I'm making curl request via PHP from my localhost webserver (so the same IP) it throws 401 unauthorized.
I even tried to change the user agent and still no success.
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => 'http://example.com/user1.jpg',
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'
));
$resp = curl_exec($curl);
echo $resp;
curl_close($curl);
What can be wrong?
I used Fiddler tool analyzing the headers sawing 3 GET requests. First two were 401 Unauthorized, third was accepted without typing credentials (probably SSON implemented).
It was using NTLM authentication protocol, so running curl from CLI with "--ntlm username:password" did the job for me.
Related
There's a specific website I want to get the source code from with PHP cURL.
Visiting this website with a bowser from my computer works without any problems.
But when I want to access this website with my PHP script, the website recognizes that this is an automated request and shows an error message.
This is my PHP script:
<?php
$url = "https://www.example.com";
$user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15";
$header = array('http' => array('user_agent' => $user_agent));
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
?>
The user agent is the same I'm also using with the browser. I'm using a local server with MAMP PRO. This means I'm using the same IP address for both, browser access and PHP script access.
I already tried my PHP script with many different headers and options but nothing worked.
There must be anything that makes a PHP script access look different than a browser access, for the web server I want so access the website from. But what? Do you have an idea?
EDIT
I found out that it's working with this cURL:
curl 'https://www.example.com/' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'accept-language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7'
If I type this in e.g. the Terminal, it's showing the correct source code.
I converted it to a PHP script as follows:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.example.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$headers = array();
$headers[] = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
$headers[] = 'Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
?>
Unfortunately, this way it's still showing the error message.
This means, there must be anything that makes a command line access look different than a browser access, for the web server I want so access the website from. But what is it?
There is no difference between a cURL request and the request that a browser makes, apart from the HTTP headers it requests, and that a browser has JavaScript running on the client.
The only thing that identifies an HTTP client is its headers -- typically the user agent string -- and seeing as you have set the user agent to exactly the same as the browser, there must be other checks in place.
By default, cURL doesn't send any default Accept header, whereas browsers request pages with this header to show the capabilities of the browser. I expect the web server will be checking on something like this.
Take a look at the screenshot above of Chrome Developer Tools. It allows you to copy the whole request as a cURL request, including all the headers that were sent from Chrome, for testing in the terminal.
Try to match all the headers exactly from within your PHP, and I'm sure the web server will not be able to identify you as a script.
You should try to mimic a real browser by forging "real" http request. Add more headers than the User-Agent, like "Accept", "Accept-Language", "Accept-Encoding". Also, you probably need to accept (and handle correctly) cookies.
If your targeted website use javascript to detect a real browser, this is an other challenge.
I am using CURL and file_get_contents to find out the basic difference between a server request for a page and a browser request (organic).
I am requesting for a PHPINFO page both ways and found that it is giving different output in different cases.
For example, when I am using a browser the PHPINFO shows this:
_SERVER["HTTP_CACHE_CONTROL"] no-cache
This info is missing when I am requesting the same page through PHP.
My CURL:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/phpinfo.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0");
curl_setopt($ch, CURLOPT_INTERFACE, $testIP);
$output = curl_exec($ch);
curl_close($ch);
My file_get_contents:
$opts = array(
'socket' => array('bindto' => 'xxx.xx.xx.xx:0'),
'method' => 'GET',
'user_agent ' => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0", // this doesn't work
'header' => array('Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8')
);
My goal:
To make a PHP request look identical to a browser request.
one of possible ways for server to detect you are a php code not a browser is check your cookie. with php curl request to the server once and inject the cookie you get to your next request.
check here :
http://docstore.mik.ua/orelly/webprog/pcook/ch11_04.htm
one other way that server can understand you are a robot(php code) is check referer http header.
you can learn more here :
http://en.wikipedia.org/wiki/HTTP_referer
I'm using cURL to generate HTTP Requests to login to my twitter and I'm trying to follow people using it, one of the problems I'm currently facing is that the follow button doesn't seem to be a form but actually just a button that sends the following request
GET https://twitter.com/i/user/follow
Status: HTTP/1.1 200 OK
Request Headers
Accept image/webp,*/*;q=0.8
Accept-Encoding gzip,deflate,sdch
Accept-Language en-US,en;q=0.8,ar;q=0.6
User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36
What I found weird was that no ID for the user was being passed on to be followed so I have no idea how is it being passed on, I thought maybe from the referer but I tried mimicking the request from my browser by visiting a profile then visiting that URL but I got a 405 error
Does anyone have any idea how does twitter send requests to follow people currently?
The thing is handled using cookie. Somehow you couldn't capture the Cookie from the request. Twitter is tracking the login using cookie. That's why the simple GET request works after login, and doesn't work(405 error) where you are not loggedin. The steps are:
Login using curl and store the cookie in a file. Just catch the login request as you caught for follow request.
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'user=username&pass=password&someparam=xyz');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'c:/temp/cookie.txt');
curl_close($ch); // don't forget to close curl
Now call the http GET for the follow request that you posted in description. This time use the cookie that you stored in previous call.
curl_setopt($ch, CURLOPT_COOKIEFILE, 'c:/temp/cookie.txt');
I need to get data from a webpage on a server which uses the https protocol (i.e. https://site.com/page). Here's the PHP code I've been using:
$POSTData = array('');
$context = stream_context_create(array(
'http' => array(
//'ignore_errors' => true,
'user_agent' => "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
'header' => "User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11.1 (KHTML, like Gecko) Version/3.0.3 Safari/522.12.1",
'request_fulluri' => true,
'method' => 'POST',
'content' => http_build_query($POSTData),
)
));
$pageHTML = file_get_contents("https://site.com/page", FALSE, $context);
echo $pageHTML;
However, this doesn't seem to work, giving out a Warning: file_get_contents with no information on the error. What might be the case, and how do I work around it to connect to the server and get the page?
EDIT: Thanks to everyone who answered, my problem was that I was using an HTTP proxy, which I removed from the code so that it wouldn't confuse you, as I thought it couldn't possibly have been the problem. To make the code load an HTTPS page via an HTTP proxy, I modified the stream_context_create I used like this:
stream_context_create(array(
'https' => array(
//...etc
Have a look at cURL if you haven't already. With cURL you can remotely access a webpage/API/file and have it downloaded to your server. The curl_setopt() function allows you to specify whether or not to verify the certificate of the remote server
$file = fopen("some/file/directory/file.ext", w);
$ch = curl_init("https://site.com/page");
curl_setopt($ch, CURLOPT_FILE, $file);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); //false to disable the cert check if needed
$data = curl_exec($ch);
curl_close($ch);
fclose($file);
Something like that will allow you to connect to an HTTPS server and then download the file that you want. If you know the server has a valid certificate (i.e. you aren't developing on a server that doesn't have a valid certificate) then you can leave out the curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); line, as cURL will attempt to verify the certificate by default.
cURL also has the curl_getinfo() function that will give you details about the most recently processed transfer that will help you debug the program.
I'm trying to get contents of a SSL ASPX page using cURL, here's my code:
$ourFileName = "cookieFIle.txt";
$ourFileHandle = fopen($ourFileName, 'w') or die("can't open file");
fclose($ourFileHandle);
function curl_get($url) {
$ch = curl_init();
$options = array(
CURLOPT_HEADER => 1,
CURLOPT_URL => $url,
CURLOPT_USERPWD => 'XXX:XXX',
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_HTTPAUTH => CURLAUTH_ANY,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
CURLOPT_HTTPHEADER => $header,
CURLOPT_COOKIESESSION, true,
CURLOPT_COOKIEFILE, "cookieFIle.txt",
CURLOPT_COOKIEJAR, "cookieFIle.txt"
);
curl_setopt_array($ch, $options);
$return = curl_exec($ch);
curl_close($ch);
return $return;
}
echo curl_get('https://somepage.com/intranet/loginprocess.aspx');
And whenever I perform the code I receive this:
Header:
HTTP/1.1 401 Unauthorized
Content-Length: 1656
Content-Type: text/html
Server: Microsoft-IIS/6.0
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
X-UA-Compatible: IE=EmulateIE8
Date: Sat, 16 Nov 2013 19:05:18 GMT
Message:
You are not authorized to view this page
You do not have permission to view this directory or page using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.
The login and password are 100% correct and url is too. OpenSSL is installed on the RaspverryPI I'm using and cURL is enabled in php.ini
The loginprocess.aspx redirects you to studenthome.aspx after authorisation is complete, but I think the problem is in the authorisation itself.
You are trying to connect with basic authentication, while the server is requesting integrated windows authentication (NTLM).
Use the option CURLAUTH_NTLM.
You should not set cookiesession to true. From the manual :
TRUE to mark this as a new cookie "session". It will force libcurl to ignore all cookies it is about to load that are "session cookies" from the previous session. By default, libcurl always stores and loads all cookies, independent if they are session cookies or not. Session cookies are cookies without expiry date and they are meant to be alive and existing for this "session" only.
Your code also has typos. It should read like this :
CURLOPT_HTTPHEADER => $header,
CURLOPT_COOKIEFILE => "cookieFIle.txt",
CURLOPT_COOKIEJAR => "cookieFIle.txt"
);