I am using CURL and file_get_contents to find out the basic difference between a server request for a page and a browser request (organic).
I am requesting for a PHPINFO page both ways and found that it is giving different output in different cases.
For example, when I am using a browser the PHPINFO shows this:
_SERVER["HTTP_CACHE_CONTROL"] no-cache
This info is missing when I am requesting the same page through PHP.
My CURL:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/phpinfo.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0");
curl_setopt($ch, CURLOPT_INTERFACE, $testIP);
$output = curl_exec($ch);
curl_close($ch);
My file_get_contents:
$opts = array(
'socket' => array('bindto' => 'xxx.xx.xx.xx:0'),
'method' => 'GET',
'user_agent ' => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0", // this doesn't work
'header' => array('Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8')
);
My goal:
To make a PHP request look identical to a browser request.
one of possible ways for server to detect you are a php code not a browser is check your cookie. with php curl request to the server once and inject the cookie you get to your next request.
check here :
http://docstore.mik.ua/orelly/webprog/pcook/ch11_04.htm
one other way that server can understand you are a robot(php code) is check referer http header.
you can learn more here :
http://en.wikipedia.org/wiki/HTTP_referer
Related
So if I'm browsing http://www.example.com/user1.jpg I see the user's picture.
But if I'm making curl request via PHP from my localhost webserver (so the same IP) it throws 401 unauthorized.
I even tried to change the user agent and still no success.
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => 'http://example.com/user1.jpg',
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'
));
$resp = curl_exec($curl);
echo $resp;
curl_close($curl);
What can be wrong?
I used Fiddler tool analyzing the headers sawing 3 GET requests. First two were 401 Unauthorized, third was accepted without typing credentials (probably SSON implemented).
It was using NTLM authentication protocol, so running curl from CLI with "--ntlm username:password" did the job for me.
I am trying to echo site data & for 95% of sites file_get_content, curl works just fine but for few sites, it never works whatever I tried. I tried to define proper user agent, changes SSL verify to false but nothing worked.
test site where it fails with forbidden https://norskbymiriams.dk/
wget is unable to copy ssl sites however wget is compiled with ssl support. checked with wget -V
i tried these codes.none worked for this particular site
file_get_contents
$list_url = "https://norskbymiriams.dk/";
$html = file_get_contents($list_url);
echo $html;
curl
$handle=curl_init('https://norskbymiriams.dk');
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_VERBOSE, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36");
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
$content = curl_exec($handle);
echo $content;
any help will be great
Some websites analyse a request extremely good. If there is a single thing that makes that web server think you are a crawling bot, it might return 403.
I would try this:
make a request from browser, see all request headers, and place them in my curl request (simulate a real browser).
my curl request would look like this:
curl 'https://norskbymiriams.dk/'
-H 'Upgrade-Insecure-Requests: 1'
-H
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100
Safari/537.36'
--compressed
Please try it. it works.
You can make a request in Chrome for example, and use Network tab from Developer tools to inspect a page request. If you right click on it, you will see Copy as cURL
Therefore test each header separately in your actual cURL request, see which is the missing link, then add it and continue your crawling.
There is an addon for firefox called httprequester. (https://addons.mozilla.org/en-US/firefox/addon/httprequester/)
When I use the addon to send a GET request with a specific cookie, everything works fine.
Request header:
GET https://store.steampowered.com/account/
Cookie: steamLogin=*removed because of obvious reasons*
Response header:
200 OK
Server: Apache
... (continued, not important)
And then I am trying to do the same thing with cURL:
$ch = curl_init("https://store.steampowered.com/account/");
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: steamLogin=*removed because of obvious reasons*"));
curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
$response = curl_exec($ch);
$request_header = curl_getinfo($ch, CURLINFO_HEADER_OUT);
echo "<pre>$request_header</pre>";
echo "<pre>$response</pre>";
Request header:
GET /account/ HTTP/1.1
Host: store.steampowered.com
Accept: */*
Cookie: steamLogin=*removed because of obvious reasons*
Response header:
HTTP/1.1 302 Moved Temporarily
Server: Apache
... (continued, not important)
I don't know if it has anything to do with my problem, but a thing I noticed is that the first lines of the request headers are different
GET https://store.steampowered.com/account/
and
GET /account/ HTTP/1.1
Host: store.steampowered.com
My problem is that I get 200 http code with the addon and 302 with curl, however I'm sending (or trying to send) the same request.
The page is doing some redirect, so you must follow it
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
If i really understand your problem, the thing is cURL is not following the redirect. He don't do that by default, you need to set a option:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
With this, cURL is able to follow the redirects.
To set the Cookies to the request use, (You may need pass the user agent):
curl_setopt($ch, CURLOPT_COOKIE, "Cookie: steamLogin=*removed because of obvious reasons*; User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0");
I think your addon sends the useragent string by default from the browser. If you add useragent string with your curl request, I believe your problem will resolve!
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
"Cookie: steamLogin=*removed because of obvious reasons*",
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"
));
I need to get data from a webpage on a server which uses the https protocol (i.e. https://site.com/page). Here's the PHP code I've been using:
$POSTData = array('');
$context = stream_context_create(array(
'http' => array(
//'ignore_errors' => true,
'user_agent' => "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
'header' => "User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11.1 (KHTML, like Gecko) Version/3.0.3 Safari/522.12.1",
'request_fulluri' => true,
'method' => 'POST',
'content' => http_build_query($POSTData),
)
));
$pageHTML = file_get_contents("https://site.com/page", FALSE, $context);
echo $pageHTML;
However, this doesn't seem to work, giving out a Warning: file_get_contents with no information on the error. What might be the case, and how do I work around it to connect to the server and get the page?
EDIT: Thanks to everyone who answered, my problem was that I was using an HTTP proxy, which I removed from the code so that it wouldn't confuse you, as I thought it couldn't possibly have been the problem. To make the code load an HTTPS page via an HTTP proxy, I modified the stream_context_create I used like this:
stream_context_create(array(
'https' => array(
//...etc
Have a look at cURL if you haven't already. With cURL you can remotely access a webpage/API/file and have it downloaded to your server. The curl_setopt() function allows you to specify whether or not to verify the certificate of the remote server
$file = fopen("some/file/directory/file.ext", w);
$ch = curl_init("https://site.com/page");
curl_setopt($ch, CURLOPT_FILE, $file);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); //false to disable the cert check if needed
$data = curl_exec($ch);
curl_close($ch);
fclose($file);
Something like that will allow you to connect to an HTTPS server and then download the file that you want. If you know the server has a valid certificate (i.e. you aren't developing on a server that doesn't have a valid certificate) then you can leave out the curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); line, as cURL will attempt to verify the certificate by default.
cURL also has the curl_getinfo() function that will give you details about the most recently processed transfer that will help you debug the program.
I am transferring an Object Array. I have a cURL client (submitter) on own Server and listening script on other's Server, which one is not under my control. Then i think there, they are blocking the incoming cURL requests because when i test with the normal HTML <form>, it is working. But not via cURL anyway.
So i think they have done some restriction to cURL.
Then my questions here are:
Can a Server restrict/block the cURL incoming requests?
If so, can i trick/change the HTTP Header (User Agent) in my initiating cURL script?
Or is there any other possible stories?
Thanks!
IF you are still facing the problem then do the following.
1.
$config['useragent'] = 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';
curl_setopt($curl, CURLOPT_USERAGENT, $config['useragent']);
curl_setopt($curl, CURLOPT_REFERER, 'https://www.domain.com/');
2.
$dir = dirname(__FILE__);
$config['cookie_file'] = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';
curl_setopt($curl, CURLOPT_COOKIEFILE, $config['cookie_file']);
curl_setopt($curl, CURLOPT_COOKIEJAR, $config['cookie_file']);
NOTE: You need a COOKIES folder in directory.
3.
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
If doing these don't solve the problem then Give the Sample Input/Output/Error/etc.
So, that more precise solution can be provided.
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)';
$curl=curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
In the server side, we can block some requests by recognize the header fields(including refer, cookie, user-agent and so on) in http request, the ip address, access frequency. And in most case, requests generated by machine usually has something different than human requests,for example, no refer & cookie, or with higher access frequency, we can write some rules to deny these requests.
According to 1, you can try your best to simulate real requests by filling the header fields, using random and slower frequency, using more ip addresses. (sounds like attack)
Generally, using lower frequency and do not make heavy load for their server, follow their access rules, they will seldom block your requests.
Server cannot block only cURL requests because they are just HTTP requests. So changing User Agent of your cURL can solve your problem, as server will think you are connecting through browser presented in UA.
Example of curl GET call in php.
ftp file in a variable.
The solution was on Stackoverflow... where ?!?
not mine.
BTW, you need to be able to execute php code from within html
modify your /etc/apache2/mods-enabled' edit '#mime.conf
if you want to do so...
Go to end of file and add the following line:
"AddType application/x-httpd-php .html .htm"
BEFORE tag '< /ifModules >'
verified and tested with 'apache 2.4.23' and 'php 5.6.17-1' under 'debian'
I choose to execute php in html file because faster development.
example code begin :
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<?php
$host = "https://tgftp.nws.noaa.gov/data/observations/metar/decoded/CYHU.TXT";
$agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $host);
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1) ;
curl_exec($curl);
$ftp_result=curl_exec($curl);
print_r($ftp_result);
//and the big work commencing,
//extracting text ...
$zelocation="";
$zedatetime="";
$zewinddirection="";
$zewindspeed="";
$zeskyconditions="";
$zetemp="";
$zehumidity="";
?>
</body>
</html>
I've faced the same issue when I was trying login to a website using CURL, the server was rejecting my request until I've sent the user-agent header and the cookies returned when entering the login page, however, you can use this curl library if you don't familiar with curl.
$curl = new Curl();
$curl->setHeaders('user-agent', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0');
// Disable SSL verification
$curl->setOpt(CURLOPT_SSL_VERIFYPEER, '0');
$curl->post($url, $data);
$response = $curl->getRawResponse();