php curl to website without captcha - php

Please bear with me as I am completely new to php curl and its intricacies. I've picked up some tips here but am still stuck (for days) so hope someone can really help!
When I curl to this url http://agentnet.propertyguru.com.sg/ex_login?w=1&redirect=/ex_home, there is a key difference between viewing it using my web browser vs curl. i.e. a captcha field (together with an error message) will appear when viewed via curl. There is no captcha or error message when viewed via browser. How do I do it such that curl produces the same result as a browser?
Here's my simple code snippet.
$loginUrl = 'http://agentnet.propertyguru.com.sg/ex_login?w=1&redirect=/ex_home';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginUrl);
$cookie = realpath('cookie.txt'); // 'FSPrompt-6496=completed;' is stored in this file
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.0; rv:30.0) Gecko/20100101 Firefox/30.0');
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$request_headers = array();
$request_headers[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$request_headers[] = 'Accept-Language:en-US,en;q=0.5';
$request_headers[] = 'Connection: Keep-Alive';
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers);
$msg = curl_exec($ch);

This site require cookie to let you login.
When you access /ex_login (without cookie), it redirect you to /distil_identify_cookie.html?uid=
At /distil_identify_cookie.html?uid=…, the browser have to save cookie value, it redirect you back to the first login page.
On the first login page, you have valid cookie, no more cookie init needed.
So you have to update your script to save the cookie correctly. Guzzle is great library to build http client.

Related

cURL/PHP opening as logged user fails

I'm opening a page as logged user, and it kind of seems to work, except the website has some sort of a protection system. If I do this normally, I'll get the page I want, but if I do it with cURL, I'll get 'Welcome back user (userid)' and a link to the page I requested. Once I click the link, I'll get where I want to be. Now I tried faking the referer and checking the data that gets sent to the page, there's nothing special there. When I click the link, I simply get redirected to the page I wanted in the first place. My question is why doesn't this code get me there as well:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL , "http://www.site.com/sell/index");
curl_setopt($ch, CURLOPT_REFERER, 'http://www.site.com');
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_FAILONERROR, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, false);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
$response = curl_exec($ch);
curl_close($ch);
echo $response;
Just before I do this, I perform login procedure and grab the cookie. And I do get to open the page as logged in user, I just can't seem to access it without clicking the ahref.
PS. The same thing would happen if I logged in, open the page I wanted, closed browser and opened it again. So I'm thinking it has to do with referer?
cookie-jar means it will save your cookie from curl's response. That's why it is not working for you. Instead use cookie-file so that your curl send stored cookie with request:
curl_setopt($ch, CURLOPT_COOKEFILE, "cookie.txt");
Also, use absolute path(/var/tmp/cookie.txt) instead of relative path.
Now, Be Happy!

PHP cURL cookies blocking

<?php
$ebay_user_id = "id"; // Please set your Ebay ID
$ebay_user_password = "password"; // Please set your Ebay Password
$cookie_file_path = dirname(__FILE__).'/cookie.txt'; // Please set your Cookie File path
$LOGINURL = "http://signin.ebay.com/aw-cgi/eBayISAPI.dll?SignIn";
$agent = "Mozilla/4.0 (compatible;)";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
$result = curl_exec ($ch);
curl_close ($ch);
$LOGINURL = "http://signin.ebay.com/aw-cgi/eBayISAPI.dll";
$POSTFIELDS = 'MfcISAPICommand=SignInWelcome&siteid=0&co_partnerId=2&UsingSSL=0&ru=&pp=&pa1=&pa2=&pa3=&i1=-1&pageType=-1&userid='. $ebay_user_id .'&pass='. $ebay_user_password;
$reffer = "http://signin.ebay.com/aw-cgi/eBayISAPI.dll?SignIn";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$POSTFIELDS);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
$result = curl_exec ($ch);
curl_close ($ch);
print $result; ?>
I'm really new player on cURL...
I have this code now using in login into ebay.
The problem for now is the cookies it told me that it was blocked by something.
The message it shows: Your web browser settings are blocking cookies.
I use firefox for test and tried other browser also got the same issues.
I have confirmed that my browser setting are accepted for the cookies access.
Also, I have checked there has conntent inside the cookies.txt file, so that mean the cookies.txt can be access correctly.
So....What is the problem for this issue? The code I used are correct?
Thanks everyone for help.
Try modifying the agent to something similar;
'Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0.1'
Edit: actually I believe the problem is you need to query the signin page first,
first visit "http://signin.ebay.com/aw-cgi/eBayISAPI.dll?SignIn"
this will set the cookies, then sign in as you have.
you can try it in a browser, navigate to the eBay sign in page,
clear your cookies and then signin.
You will get the browser not supporting cookies error.
You need to understand something and that is that doing a HTTP request with curl through php has nothing to do with your browser. The website you are accessing doesn't care what browser you use to run the php script. The actual request is done by your server, not by your browser.
On the other hand, if eBay engineers are smart they'd block this, you probably aren's supposed to do things like this, that's what the Ebay API's are for.
And a little tip, use a HTTP Client library, doing things like this in plain cURL is a pita and gives some very bad and unreadable code.
Check https://github.com/guzzle/guzzle for example.

Automatic login to Facebook with cron/cURL

I've seen so many solutions for this but haven't been able to implement any of them sucessfully. I have created an App in Facebook and can successfully use FQL to retrieve data. I will be pulling in event information from pages and groups that I am part of. I will only parse the event information of those who explicitly register with my App/website.
I'm using the Facebook PHP SDK. The issue is that I want to create a cron task to retrieve event information periodically, but I don't know how to allow the cron task to log in automatically.
I've seen that there are real-time updates, but as far as I know, they don't show events.
I tried to use cURL but it just brought up a blank screen. The code was:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://login.facebook.com/login.php?');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'facebook_cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'facebook_cookies.txt');
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$email = 'xxxxxxxxxxx';
$pass = 'xxxxxxxxxxx';
curl_setopt($ch, CURLOPT_POSTFIELDS, 'email='.urlencode($email).'&
pass='.urlencode($pass).'&login=Login');
$result = curl_exec($ch);
echo $result;
curl_close($ch);
That was taken from another SO question: Remote login to facebook account
Maybe you don't need to log, app access_token can't solve your problem?

Difference between cURL and web browser?

I am trying to retrieve a web page from the following url:
http://www.medicare.gov/find-a-doctor/provider-results.aspx?searchtype=OHP&specgrpids=922&loc=43615&pref=No&gender=Unknown&dist=25&lat=41.65603&lng=-83.66676
It works when I paste it into a browser, but when I run it through cURL, I receive a page with the following error: "One or more query string parameters of requested url are invalid or has unexpected value, please correct and retry."
It doesn't seem to make a difference if I provide a different userAgent or referrer. There is a redirect, so I use CURLOPT_FOLLOWLOCATION.
Here is my code:
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);
echo $html;
Any thoughts on why a request like this will work in the browser and not with cURL?
Your browser is sending cookies that cURL is not. Check the cookies you are sending to the site using browser tools or Fidler - you'll need to pass the same.
The problem was with cookies. This particular site needed an ASP.NET_SessionId cookie set in order to respond. I added the following to my cURL request:
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIE, 'ASP.NET_SessionId=ho1pqwa0nb3ys3441alenm45; path=/; domain=www.medicare.gov');
I don't know if any session id will work, but it tried a couple random ones and they all worked.

PHP, cURL post to login to WordPress

I am working on a project for a client which needs an automatic login from a link click.
I'm using a handshake page to do this with the following code:
$username = "admin";
$password = "blog";
$url = "http://wordpressblogURL/";
$cookie = "cookie.txt";
$postdata = "log=" . $username . "&pwd=" . $password . "&wp-submit=Log%20In&redirect_to=" . $url . "blog/wordpress/wp-admin/&testcookie=1";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url . "blog/wordpress/wp-login.php");
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url . "blog/wordpress/wp-login.php");
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
curl_close($ch);
echo $result;
exit;
This works fine. It logs me in great.
The problem is that I believe WordPress keys off of the URL.
To elaborate, my handshake page (which logs me in) is in the "blog" directory and my WordPress application is in the "wordpress" directory which sits inside the "blog" directory. The URL in the browser says ..blog/handshake.php. However, it has the Admin section of WordPress in the browser window. WordPress Admin links now do not function correctly, because the URL is in the ../blog directory when it needs to be in the ..blog/wordpress/wp-admin directory.
Is there a way in cURL to make it so that the URL in the browser reflects the actual page?
Should I be using FSockOPen instead?
Kalium got this right -- paths in the WordPress interface are relative, causing the administration interface to not work properly when accessed in this manner.
Your approach is concerning in a few ways, so I'd like to make a few quick recommendations.
Firstly, I would try to find a way to remove the $username and $password variables from being hard-coded. Think about how easy this is to break -- if the password is updated via the administration interface, for instance, the hard-coded value in your code will no longer be correct, and your "auto-login" will now fail. Furthermore, if someone somehow comprises the site and gains access to handshake.php -- well, now they've got the username and password for your blog.
It looks like your WordPress installation rests on the same server as the handshake script you've written, given the path to /blog is relative (in your sample code). Accordingly, I'd suggest trying to mimic the session they validate against in your parent applications login. I've done this several times in the past -- just can't recall the specifics. So, for instance, your login script would not only set your login credentials, but also set the session keys required for WordPress authentication.
This process will involve digging through a lot of WordPress's code, but thats the beauty of open source! Instead of using cURL and hard-coding values, try to simply integrate WordPress's authentication mechanism into your application's login mechanism. I'd start by looking at the source for wp-login.php and going from there.
If all else fails and you're determined to not try to mesh your session authentication mechanism with that of WordPress, then you could immediately fix your problem (without fixing the more concerning aspects of your approach) with these changes to your code:
First, add the following curl_opt:
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); // Enables session support
Then, add this after closing the cURL handler:
curl_close($ch);
// Instead of echoing the result, redirect to the administration interface, now that the valid, authenticated session has been established
header('location: blog/wordpress/wp-admin/');
die();
So, in this less than ideal solution you'd use cURL to authenticate the user, and then rather than attempt to hijack the administration interface into that current page, redirect them to the regular administration interface.
I hope this helps! Let me know if you need more help / the solution isn't clear.
Here is the code that worked for me:
The key change is that I removed the parameter called "testcookie" from my post string.
Note: add your website instead of "mywordpress" and username and password in the below code
$curl = curl_init();
//---------------- generic cURL settings start ----------------
$header = array(
"Referer: https://mywordpress/wp-login.php",
"Origin: https://mywordpress",
"Content-Type: application/x-www-form-urlencoded",
"Cache-Control: no-cache",
"Pragma: no-cache",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_COOKIESESSION, true);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookies.txt');
//---------------- generic cURL settings end ----------------
$url = 'https://mywordpress/wp-login.php';
curl_setopt($curl, CURLOPT_URL, $url);
$post = 'log=username&pwd=password&wp-submit=Log+In&redirect_to=https%3A%2F% mywordpress%2Fwp-admin%2F';
curl_setopt($curl, CURLOPT_POST, TRUE);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post);
$output = curl_exec($curl);
curl_close ($curl);
echo ($output)
Check the HTML source. It sounds like WP's links may be relative. Instead of making this process even more complicated than it already is, however, I suggest you perform the login, hand the user whatever cookies are required, and redirect them.
Otherwise you're coding a proxy, piece by piece.
If your script doesn't perform all the functions you need in a single execution, you may need to parse out the cookie values, store them in a file, and then resend on the next execution. Check out the CURLOPT_COOKIEFILE option.
Use Zend Framework's Cookies class to manage them for you. I have used this in the past for crawling secure sections of a web site using cURL.

Categories