I'm trying to access a file from a distant server with
$fp = #fsockopen($ip,$port,$errno,$errstr,1);
fputs($fp, "GET /randomfiletoget.html HTTP/1.0\r\nUser-Agent: Mozilla\r\n\r\n");
But the server needs a HTML authentification.
How should I pass the login/password to access the file?
Thanks!
To potential downvoters, I wrote authentification on purpose. Change Host to you domain name. It should return HTML output, with valid username and password.
function authentification($url, $username, $password){
$headers = array(
"Host=example.com",
"User-Agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11",
"Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language=en-us,en;q=0.5",
"Accept-Encoding=gzip,deflate",
"Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Date: ".date(DATE_RFC822)
);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_USERPWD, "$username:$password");
return curl_exec($curl);
}
As webarto mentioned in his comment, you should look into cURL, it supports basic HTTP auth among many other handy features.
For one, you should use Host header too. Todays hosting environments need it in 99%.
Basic http autentication is done with headers and Base64 encoding. This wikipedia article will tell you everÃthing. Or use PHP's cURL functions.
Related
I am trying to login into to a remote site using curl. ( before doing some data scraping)
Using the following code I am producing a cookies.txt file that has the following:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
#HttpOnly_www.xxx.com FALSE / TRUE 0 xxxv5 h_r4hXtn-gNAilZwhvHjYdE3Vr4HewhxtGrxja57LbW03-M9MLNqZSeiW7lQ2wRT9lZypNsAiX0gS0Ev1PrvNkGLmwL3B8ZmyOUMLYbTYbSW0y_aPGrIFlEp4skDzh0GJGIGtFHisCmQjEMlu0CJr0UEw2rCT9jbjzg0IyOnFYxNffaMPo229NZWV7HDfCK5M1_y6MPNvW_Kt-h4qTy8YmqGbfBwKxB-bulV78MSXU9ZWz_DVvdu6jXfPiHwCBDMV8FFBLaXm5rqYgNzvbsq8JLe1xkTPn1PNJhyizUa-hlwB6ev8HNwIwBpzs7406l6mL3VgyrDJpay6bHNoMtjh4fLwI7KapFANhFHfn57mg4
#HttpOnly_www.xxx.com FALSE / TRUE 0 ASP.NET_SessionId txakhdi15oeqxyfq53f44dts
When I manually log into the web site the cookie names are correct. So I think I am creating the login ( otherwise the cookies would not be created) but when I output
echo 'HELLO html1 = '.$html1;
I see the page telling me I have entered the wrong username and password.
Code as follows:
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
// echo 'STARTING';
//login form action url
$url="https://www.xxxx.com/Login";
$postinfo = "username=".$username."&password=".$password;
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS,5); // return into a variable
// curl_setopt($ch, CURLOPT_UPLOAD, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST" );
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
// set content length
$headers[] = 'Content-length: 0';
$headers[] = 'Transfer-Encoding: chunked';
curl_setopt($ch, CURLOPT_HTTPHEADER , $headers);
$html1 = curl_exec($ch);
echo 'HELLO html1 = '.$html1;
I cannot show the site for security reasons. ( which may be a killer)
Can anyone point me in the right direction?
first off, this won't work: ini_set('display_startup_errors', 1);
- the startup phase is already finished before the userland php code starts to run,
so this setting is set too late. it must be set in the php.ini config file. (not strictly true, but close enough, like on windows you can do crazy registry hacks to enable it, and you can set it with .user.ini files, etc, more info here http://php.net/manual/en/configuration.php )
second, obvious error here is that you don't urlencode $username and $password in $postinfo = "username=".$username."&password=".$password; -
if the username OR password contains any characters with special meanings in urlencoded format, you'll send the wrong credentials and won't get logged in (this includes &,=,#, spaces, and many other characters). fixed version would look like $postinfo = "username=".urlencode($username)."&password=".urlencode($password);
third, don't use CURLOPT_CUSTOMREQUEST for POST requests,
just use CURLOPT_POST.
fourth, your Content-length header is outright lying. the
correct length is actually 'Content-length: '.strlen($postinfo) - which with your code, is definitely not 0 -
but you shouldn't set this header at all, curl will do it for you
if you don't, and unlike you, curl won't mess up the code calculating
the size, so get rid of the entire line.
fifth, this code is also wrong:
$headers[] = 'Transfer-Encoding: chunked';
your curl code here is NOT using chuncked transfers,
and if it were, curl would send that header automatically,
so get rid of it.
sixth, don't just call curl_setopt, if there's an
error setting any of your options, curl_setopt will return
bool(false), and you should watch out for such errors,
use curl_error to extract the error message, and throw an exception,
if such an error occur. - instead of what your code is doing right now,
silently ignoring any curl_setopt errors. use something like
function ecurl_setopt($ch,int $option, $value){if(!curl_setopt($ch,$option,$value)){throw new \RuntimeException('curl_setopt failed!: '.curl_error($ch));}}
if fixing all of these problems is not enough to log in, you're not giving us enough information to help you any further. what does the browsers http login request look like? or what is the login url?
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
//login form action url
$url="https://www.xxxx.com/Login";
$postinfo = array("username"=>$username,"password"=>$password);
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
$html = curl_exec($ch);
echo $html;
Above code must works fine.
If there is still an issue, you must check cookie.txt file permissions.
Also if there is an invisible data needs to be sent including post, you can check it using firefox Live Http Headers plugin.
It is not as simple as reading the HTML page using curl. You need to supply a POST value for the submit button. If there is any javascript that executes prior to the activation of ACTION script, then that has to be looked at as well.
Usually you get better results if you use Selenium. See http://www.seleniumhq.org/
EDIT1:
If the server is rejecting your post string try: curl_setopt($handle, CURLOPT_POSTFIELDS, http_build_query($data));
I apologize in advance for my English. I have small problem.
I want to get Final Effective URL from page
streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e
When you put a link into the browser gives me to .flv file
But when I put it through PHP gives me s3.streamuj.tv/unauthorized.flv
When I try it through this: getlinkinfo.com/info?link=http%3A%2F%2Fwww.streamuj.tv%2Fvideo%2F00e276bf5841bf77c8de%3Fstreamuj%3Doriginal%26authorize%3Dac13bb77d3d863ca362315b9b4dcdf3e&x=49&y=11
So everything is fine indicates that
s4.streamuj.tv:8080/vid/d0fe77e1020b6414a16aa5316c759add/58aaf1dd/00e276bf5841bf77c8de_hd.flv?start=0
My PHP CODE:
<?php
session_start();
include "simple_html_dom.php";
$proxy = array("189.3.93.114:8080");
$proxyNum = 0;
$proxy = explode(':', $proxy[$proxyNum]);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://www.streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e');
curl_setopt($curl, CURLOPT_FILETIME, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxy[0]);
curl_setopt($curl, CURLOPT_PROXYPORT, $proxy[1]);
$header = curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
$u1 = $info['url'];
echo "u1: $u1</br>";
$u2 = str_replace("flv?start=0","flv",$u1);
echo $u2;
?>
Where is the problem? Why it makes unauthorized.flv?
Solution
Server was checking client legitimacy via user-agent HTTP header parameter.
Using custom user-agent solved the problem.
curl_setopt($curl, CURLOPT_HTTPHEADER, array( 'user-agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2950.0 Iron Safari/537.36' ));
Original post:
Most likely the generated flv URL is not pointing to static place. It
probably uses sessionID + cookie / verifies IP (one of these, or
both).
Without knowing what header you have to request with via CURL, you
probably won't get a relevant response.
I'm using cURL with PHP to connect to my University admin website to provide a mobile-user-friendly interface in order to access my informations, such as grades. So basically I just have a form in which I put the login informations and my code uses these credentials to connect to the admin website and returns the wanted informations that I show in a mobile way.
The problem is that the connection doesn't work and I get a blank page excepted when I log on my account from any other device before attempting to connect with my program. That is, if for example this morning I accessed my account normally (and log out since), my code will work, but tomorrow it won't unless I access first normally my account.
I've been studying the connection process with Chrome's development tool again and again and I don't know where I'm wrong. The only suspicion I have is that the first page loaded with the credentials sent returns a 302 FOUND and the redirection is not applied by cURL, but the first page returns a connexion cookie, which I assumed was the only thing needed to be log in correctly. I think maybe the insertion of the created cookie into the server's database should be done on the second page in order for this one to be accepted next...
Here is my code:
$lien = 'https://isa.epfl.ch/imoniteur_ISAP/!logins.tryToConnect';
$login = $_POST['login'];
$password = $_POST['passwd'];
$postfields = array(
'ww_x_username' => $login,
'ww_x_password' => $password,
'ww_x_urlAppelant' => ''
);
$path_cookie = 'cookie.txt';
if (!file_exists(realpath($path_cookie))) touch($path_cookie);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $lien);
curl_setopt($curl, CURLOPT_COOKIESESSION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $postfields);
curl_setopt($curl, CURLOPT_COOKIEJAR, realpath($path_cookie));
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
$host = array(
0 => 'Host: '. parse_url("http://isa.epfl.ch", PHP_URL_HOST),
1 => 'Referer: https://isa.epfl.ch/imoniteur_ISAP/!logins.htm',
2 => 'Origin: https://isa.epfl.ch'
);
curl_setopt($curl, CURLOPT_HTTPHEADER, $host);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13');
$return = curl_exec($curl);
curl_close($curl);
unset($curl);
I hope someone could help me! I'll be happy to give more precisions if necessary!
Thanks
Romain
I've solved the issue! It was pretty simple, maybe so simple that I didn't see the solution earlier...
If someone wants more explanation, I'll be happy to give it.
I need PHP to submit paramaters from one domain to another. JavaScript is not an option for my situation. I'm now trying to use CURL with PHP, but have not been successful in bypassing the cross domain.
From domain_A, I have a page with the following PHP with CURL script:
if (_iscurl()){
echo "<p>CURL is enabled</p>";
$url = "http://domain_B/process.php?id=123&amt=100&jsonp=?";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($ch, CURLOPT_USERAGENT , "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)");
curl_setopt($ch, CURLOPT_URL, $url );
$return = curl_exec($ch);
curl_close($ch);
echo "<p>Finished operations</p>";
}
else{
echo "CURL is disabled";
}
?>
I am not getting any results, so I am assuming that the PHP CURL script is not successful. Any ideas to fix this?
Thanks
Well, its bit late. But adding this answer for further readers who might face similar issue. This issue arises some times when we are sending php curl request from a domain hosted over http to a domain hosted over https (http over ssl).
Just add below code snippet before curl execution.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
Using false in CURLOPT_RETURNTRANSFER doesn't return anything by curl. make it true(or 1)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
Recently I moved my scraping code with Curl to CodeIgniter. I'm using Curl CI library from http://philsturgeon.co.uk/code/codeigniter-curl. I put the scraping process in a controller and then I found the execution time of my scraping is slower than the one I built in plain PHP.
It took 12 seconds for CodeIgniter to output the result, whereas it only takes 6 seconds in plain PHP. Both are including the parsing process with the HTML DOM parser.
Here's my Curl code in CodeIgniter:
function curl($url, $postdata=false)
{
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
$this->curl->create($url);
$this->curl->ssl(false);
$options = array(
'URL' => $url,
'HEADER' => 0,
'AUTOREFERER' => true,
'FOLLOWLOCATION' => true,
'TIMEOUT' => 60,
'RETURNTRANSFER' => 1,
'USERAGENT' => $agent,
'COOKIEJAR' => dirname(__FILE__) . "/cookie.txt",
'COOKIEFILE' => dirname(__FILE__) . "/cookie.txt",
);
if($postdata)
{
$this->curl->post($postdata, $options);
}
else
{
$this->curl->options($options);
}
return $this->curl->execute();
}
non codeigniter (plain php) code :
function curl($url ,$binary=false,$post=false,$cookie =false ){
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt ($ch, CURLOPT_URL, $url );
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
if($cookie){
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . "/cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . "/cookie.txt");
}
if($binary)
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
if($post){
foreach($post as $key=>$value)
{
$post_array_string1 .= $key.'='.$value.'&';
}
$post_array_string1 = rtrim($post_array_string1,'&');
//set the url, number of POST vars, POST data
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_array_string1);
}
return curl_exec ($ch);
}
Does anyone know why this CodeIgniter Curl is slower?? or maybe it's because the simple_html_dom parser??
I'm not sure I know the exact answer for this, but I have a few observations about Curl & CI as I use it extensively.
Check for the state of DNS caches/queries.
I noticed a substantial speedup when code was uploaded to a hosted staging server from my dev desktop. It was traced to a DNS issue that was solved by rebooting a bastion host... You can sometimes check this by using IP addresses instead of hostnames.
Phil's 'library' is really just a wrapper.
All he's really done is map CI-style functions to the PHP Curl library. There's almost nothing else going on. I spent some time poking around (I forget why) and it was really unremarkable. That said, there may well be some general CI overhead - you might see what happens in another similar framework (Fuel, Kohana, Laravel, etc).
Check your reverse lookup.
Some API's do reverse DNS checks as part of their security scanning. Sometimes hostnames or other headers are badly set in buried configs and can cause real headaches.
Use Chrome's Postman extension to debug REST APIs.
No comment, it's brilliant - https://github.com/a85/POSTMan-Chrome-Extension/wiki and you have fine grained control of the 'conversation'.
I would have to know more about the CI Library and if it is doing any extra tasks on the gathered data but I would try naming your method to something other than the library name. I have had issues where with the Facebook library, calling it in a method named facebook caused problems. $this->curl could be ambiguous to if you are talking about the library or the method.
Also, try adding the debug profiler and see what it comes up with. Add this either in the construct or the method:
$this->output->enable_profiler(TRUE);