I`m using cuRL to get some data from remote server... The response is in JSON format..
This is my code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_URL, 'http://www.myaddress.com/mypage.php');
curl_setopt($ch, CURLOPT_POSTFIELDS, array("id" => $id));
$return = curl_exec($ch);
curl_close($ch);
If I access the link in the browser the page load OK, but if I access through the cuRL return a 404 error...
I can guess a few things that it can be checked from the server side, to show the error.
1) As it is stated in other answers, be sure to set all the necessary headers, you can check them e.g. by firebug, as it is shown in here,
or you can get the headers by php get_headers function.
to set it use
curl_setopt($ch, CURLOPT_HTTPHEADER, array("HeaderName: HeaderValue"));
2) When you open a page in the browser(excluding form submit with post method) it makes a get request, instead of post, so if in the server side it is checked $_GET, then your post request will not be considered.
3) If you sure that it should be a post request(say, it is a form submit), then the following can be a problem: some forms can have hidden fields, that again are being checked in the server, and if they are not set, error can be returned. So, you should look at the source code of the form and add them(if there are any) to your post parameters.
4) if you are submitting a form, be sure to set the submit button with its name and value as well, because similar to hidden fields, this can be checked as well.
5) Cookies can be a problem as well, because by default browser has it , and curl does not. To to able to set and read cookies use this code
// set cookie
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
// use cookie
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
here, $cookie_file path to the cookies file. Do not know in linux or mac, but in windows be sure to use absolute path to the cookie file.
6) Also, you can set the referer by
curl_setopt($ch, CURLOPT_REFERER, 'http://www.myaddress.com/mypage.php');
EDIT: In case of ajax request you might want to add a header X-Requested-With with value as XMLHttpRequest
It's possible the server check the HTTP Header, it's the case in the majority of case.
So add the same HTTP Header of your browser, verify with Firebug :
curl_setopt($ch, CURLOPT_HTTPHEADER, array('SomeName: SomeValue'));
Probably there is something else the browser is sending your cURL code is not. You can use any of the tools other folks have suggested, Firebug, Wireshark, Fiddler, etc, etc.
What you need to do is add missing pieces to your request to match the browser as closely as possible in the cURL request until the remote page responds with a 200.
I notice you're doing a POST. In many cases what happens with your browser is you visit a page with a GET request. A session is initialized on the remote site and a cookie is saved in your browser with the session id.
This cookie then needs to be supplied along with subsequent POST requests. PHP cURL has many options to support cookies. There may be other requirements such as CSRF tokens and so forth.
Again, reverse-engineering is the key.
Related
I'm trying to login to an external webpage using a php script with cURL. I'm new to cURL, so I feel like I'm missing a lot of pieces. I found a few examples and modified them to allow access to https pages. Ultimately, my goal is to be able to login to the page and download a .csv by following a specified link once logged in. So far, what I have is a script that tests logging in to the page; the script is shown below:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.websiteurl.com/login');
curl_setopt($ch, CURLOPT_POSTFIELDS,'Email='.urlencode($login_email).'&Password='.urlencode($login_pass).'&submit=1');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "https://www.websiteurl.com/login");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$output = curl_exec($ch);
I have a few questions. First, is there a reason this does not redirect on its own? The only way for me to view the contents of the page is to
echo $output
even though CURLOPT_RETURNTRANSFER and CURLOPT_FOLLOWLOCATION are both set to True.
Second, the URL for the page stays at "localhost/folderName/test.php" instead of directing to the actual website. Can anyone explain why this happens? Because the script doesn't actually redirect to a logged in webpage, I can't seem to do anything that I need to do.
Does my issue have to do with cookies? My cookies.txt file is in the same folder that my .php script is. (I'm using wampServer btw). Should it be located elsewhere?
Once I'm able to fix these two issues, it seems that all I need to be able to do is to redirect to the link that start the download process for the .csv file.
Thanks for any help, much appreciated!
Answering part of your question:
From http://php.net/manual/en/function.curl-setopt.php :
CURLOPT_RETURNTRANSFER TRUE to return the transfer as a string of the
return value of curl_exec() instead of outputting it out directly.
In other words - doing exactly what you described. It's returning the response to a string and you echo it to see it. As requested...
----- EDIT-----
As for the second part of your question - when I change the last three lines of the script to
$output = curl_exec($ch);
header('Location:'.$website);
echo $output;
The address of the page as displayed changes to $website - which in my case is the variable I use to store my equivalent of your 'https://www.websiteurl.com/login'
I am not sure that is what you wanted to do - because I'm not sure I understand what your next steps are. If you were getting redirected by the login site, wouldn't the new address be part of the header that is returned? And wouldn't you need to extract that address in order to perform the next request (wget or whatever) in order to download the file you wanted to get?
To do so, you need to set CURLOPT_HEADER to TRUE,
You can get the URL where you ended up from
$last_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
(see cURL , get redirect url to a variable ).
The same link also has a useful script for completely parsing the header information (returned when CURLOPT_HEADER==true. It's in the answer by nico limpica.
Bottom line: CURL gets the information that your browser would have received if you had pointed it to a particular site; that doesn't mean your browser behaves as though you pointed it to that site...
So how can I check using codeigniter if the client is curl, and then return something different for it?
You can fake the user-agent when using cURL, so it's pointless depending on the user-agent sent when you KNOW it's a cURL request.
For example: I recently wrote an app which gets the pagerank of a url from google. Now Google doesn't like this, so it allows only a certain user agent to access its pagerank servers. Solution? Spoof the user-agent using cURL and Google will be none the wiser.
Moral of the story: cURL user agents are JUST NOT reliable.
If you still want to do this, then you should be able to get the passed user agent just like normal
$userAgent=$_SERVER['HTTP_USER_AGENT'];
EDIT A quick test proved this:
dumpx.php:
<?php
$url="http://localhost/dump.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
if($_GET['u']==y) {
curl_setopt($ch, CURLOPT_USERAGENT, "booyah!");
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
//curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
curl_setopt ($ch, CURLOPT_HEADER, 0);
$exec=curl_exec ($ch);
?>
dump.php:
<?php
var_dump($_SERVER);
?>
Case 1: http://localhost/dumpx.php?u=y
'HTTP_USER_AGENT' => string 'booyah!' (length=7)
Case 2: http://localhost/dumpx.php?u=n
No $_SERVER['HTTP_USER_AGENT']
This proves that there is no default user agent for curl: it will just not pass it in the request header
If you want to detect bots you can not rely on user agent. Best practices are:
Check, that your visitor runs js (not all human users also do).
Check, that your visitor loads additional files linked to webpage (css, images, etc.)
Check visitor timeouts. Humans usualy don't load 10 pages per second.
cURL stands for - Client URL Library and the whole point of it is to be able to make requests that are identical to what a client would make.
The only thing you can do is detect the information that is part of the request, such as the IP address, HTTP Request Headers, cookies/session id cookie, URL (path/page), and any post/get data. If the person using curl to make the request is doing it from an expected IP address and is supplying any expected header/cookie/token/URL/post/get values, then you would not be able to distinguish a curl request from a browser making the request.
You can spoof or set a custom user agent header when using cURL, so it wouldn't be reliable.
Otherwise, you can do this:
if(strtolower($this->input->server('HTTP_USER_AGENT', true)) == 'curl')
{
// Is using cURL
}
This would only occur if the cURL request contained curl in the user agent header.
As far as I know, there is no default user agent set when doing a curl request.
I've been banging my head against a wall for a few hours now - and it's probably something really obvious I've missed!
I'm trying to connect to a payment service provider (PSP) using CURL, post data and follow the post so the user actually ends up on the PSP's site.
Using the following:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://psp.com/theirpage');
curl_setopt($ch, CURLOPT_REFERER, "http://mysite.com/mypage");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$params);
curl_setopt($ch, CURLOPT_POST, 1);
$result=curl_exec($ch);
curl_close($ch);
This successfully connects, verifies the data I've passed, but instead of redirecting the user to the PSP, it just loads the HTML on my site. Safe mode is off, and open_basedir is blank.
What am I doing wrong?
CURL would do an internal redirect and it wont have any effect on the user viewing your curl script. Keep in mind that the payment was made by your server NOT the users computer, hence expecting the session to work for the user is incorrect. cURL 'is the browser'.
If you just want a redirect after payment is made via cURL, you will have to do it via header() or by using some JS like window.location.
The curl request is being made from your server, and as such your server is receiving the response page. There's no way to initiate the request from the server and have the client receive the response. Either return the HTML to the user from your site (as you're doing), or make the request from the client's browser using Javascript. Hope that helps
I have the following code to login into an external site application (asp.net app) from a local site login form (written in php):
<?php
$curl_connection = curl_init('www.external.com/login.aspx');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT,
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
// Post data array
$post_data['LoginControl$UserName'] = 'ExampleUName';
$post_data['LoginControl$Password'] = 'ExamplePWord';
// Add form fields into an array to get ready to post
foreach ($post_data as $key => $value)
{
$post_items[] = $key . '=' . $value;
}
$post_string = implode ('&', $post_items);
// Tell cURL which string to post
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);
// Execute and post
$result = curl_exec($curl_connection);
?>
I get directed to the login form of the external site instead of being directed to the application logged in. I think the problem is that I need to pass the viewstate values through, but i'm not sure how to go about doing that?
I don't have control over the external application. But we want users to be able to login to the application through our website, to maintain branding etc.
I've posted a couple of other threads recently about the use of php cURL, but I'm at the stage now where I think the viewstate is the problem ...
Thanks, Mark.
This seems to be a real problem when trying to scrape the asp.net pages.
The pages contain a hidden field named "__VIEWSTATE" which contains a base64 encoded set of va;ues containing some or all of the page state when the page was sent. It usually also contains the SHA1 of the viewstate.
What this means is that your post must contain everything in the _VIEWSTATE or it will fail.
I have been able to post a simple login page that has only 2 fields but not a more complex page in which the author has chosen to put the entire page state in the viewstate.
As yet I have not been able to come up with a solution.
Change:
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
To:
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
You also need to set up a cookie file, take a look at CURLOPT_COOKIEFILE
CURLOPT_COOKIEFILE:
The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
CURLOPT_COOKIE:
The contents of the "Cookie: " header to be used in the HTTP request. Note that multiple cookies are separated with a semicolon followed by a space (e.g., "fruit=apple; colour=red")
CURLOPT_COOKIEJAR:
he name of a file to save all internal cookies to when the connection closes.
#see http://www.php.net/manual/en/function.curl-setopt.php
curl_setopt($curl_connection, CURLOPT_COOKIEFILE, 'cookiefile.txt');
curl_setopt($curl_connection, CURLOPT_COOKIEJAR, 'cookiefile.txt');
Don't expect it to work without encoding the __VIEWSTATE string in php using
rawurlencode($viewstate);
I've encountered the same problem recently, so I just leave my way to go about it here, in case someone else stumbles on this thread looking for an answer too.
I solved this by preceding every POST request with a GET request to the same url, and scraping all the input fields into an array of key-value pairs out of the response from that GET. Then I replaced some values in that array (login field values, for example), and sent the whole thing back in the subsequent POST. This way my POST request contained all the valid __VIEWSTATE, __EVENTVALIDATOR and yada-yada data generated for that particular url too.
This way the site allowed me to log in and visit subdomains normally.
I'm using a web-service from a provider who is being a little too helpful in anticipating my needs. They have given me a HTML snippet to paste on my website, for users to click on to trigger their services. I'd prefer to script this process, so I've got a php script which posts a cURL request to the same url, as appropriate. However, this provider is keeping tabs on my session, and interprets each new request as an update of the first one, rather than each being a unique request.
I've contacted the provider regarding my issue, and they've gone so far as to inform me that their system is working as intended, and that it's impossible for me to avoid using the same ASP.NET session for each subsequent cURL request. While my favored option would be to switch to a different vendor, that doesn't appear to be an option right now. Is there a reliable way to get a new ASP.NET session with each cURL request?
I've tried the following set of CURLOPT's, to no avail:
//initialize curl
$ch = curl_init($url);
//build a string out of the post_vars
$post_str = http_build_query($post_vars);
//set the necessary curl options
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_str);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "UZ_".uniqid());
curl_setopt($ch, CURLOPT_REFERER, CURRENT_SITE_URL."index.php?newsession=".uniqid());
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Pragma: no-cache", "Cache-Control: no-cache"));
//execute the call to the backend script, retrieve the results
$xmlstr = curl_exec($ch);
If cURL isn't helping much, why not try other methods to call the services from your script, like php's file() function, or file_get_contents().
If you see do not see any difference at all, then the service provider might be using your ip to track your requests. Try using some proxy for a test.
Normal Asp.net session is tracked by a cookie called ASP.NET_SessionId. This cookie is sent within the response to your first request. So as long as your curl requests don't send back this asp.net cookie, each of your requests will have no connection to each other. Use the curl -c option to see what cookies are flying in-between you and them. Overriding this cookie with a cookie file should work if you confirm that it is normal asp.net session being used here.
It is quite poor for a service to use session (http has much cleaner ways of maintaining state which ReST exploits) so I wouldn't completely rule out the vendor switch option.
Well given the options you are using, it seems you have covered your basics. Can you find out how their sessions are setup?
If you know how they setup a session, IE what they use (if it is IP or what not) and then you can figure out a work around. Another option is trying to set the cookies in a different cookie file:
CURLOPT_COOKIEFILE - The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
But if all they do is check cookies your current code should work. If you can figure out what the cookie's name is, you can pass a custom cookie that is blank with the request to see if that works. But if you can get information out of them on how their session's work, that would be best.
use these two line to handle the session:
curl_setopt($ch, CURLOPT_COOKIEJAR, "path/to/cookies.txt"); // cookies.txt should be writable
curl_setopt($ch, CURLOPT_COOKIEFILE, "path/to/cookies.txt");