I'm trying to simulate http post from FireFox's LiveHTTP Replay by using curl. I believe the remote site has some sort of validation that checks where the request is coming from. If the request is coming from their own domain, then its fine. When i try to run a php curl script, i can see from the Live HTTP header that i'm making a GET request rather than a POST. Besides that, the Host, which is expected to be www.aliexpress.com has automatically changed to localhost.
If i use the Live HTTP Replay, it is working fine. So i copied the header data and tried to implement with curl but to no avail. Eg:
http://www.aliexpress.com/cross-domain/shoppingcart/index.html
POST /cross-domain/shoppingcart/index.html HTTP/1.1
Host: www.aliexpress.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
X-Requested-With: XMLHttpRequest
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: http://www.aliexpress.com/store/product/Wireless-N-Wifi-Repeater-802-11N-B-G-Network-Router-Range-Expander-300M-2dBi-Antennas-US/701252_523523529.html
Cookie: ali_apache_id=113.210.130.113.1374818286515.884332.4; ali_apache_track=mt=1|ms=|mid=my1023002521; xman_f=NkltWLLI3tebQbeQzQLiNBd2/KPKX0D81t0DghVMEl/frYuA+aVHnWGevMXWTEPqdLRqlKLbExYQkL61WPSt7Tr0LrdqOLLGM8yY5cBFOvY79qV9R5iTGSd44oPoKZruCpupEK9UBNSiOIf7Go1TN1AiM0ArpkHYTZ4rigCwLp5l2IEPYmFC8UzRnLivCFmLxbDuEewB52ulEop1Y9xtdEr88bjnwci1PldcvTxCmVDiOnm6rRfbnVfMAWaSWIkqQrnOEfwq2B4B/OER9K9IH7EHAMadb9IiOdMo3yavyt4DGWquCAq1izTtU8GE2mRmvi+PZ8WmR+PNOM3zYU4eaWM7uEevjmV2S7kTtlElmJGqxaT5RpSLcxiRxxbYJToejY36QxDf0MIIKTaaJTacVg==; aep_history=product_selloffer%5E%0Aproduct_selloffer%09709591781%091035163509%09523523529; __utma=3375712.263559759.1374818300.1375458795.1375606693.4; __utmz=3375712.1374818300.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); ali_beacon_id=113.210.130.113.1374818286515.884332.4; cna=+/14CsYcs08CAXGC0nFA22WC; xman_us_f=x_locale=en_US&no_popup_today=n&x_user=MY|dexter|wong|ifm|142465266&last_popup_time=1374818565653; u_info=qunLQLajxN+hFYWqPKiksew8tjAelFSu8cj+oG6e7nY=; intl_common_forever=wWZJ4jwSXakD7oylp5nnW9Nwmq8zgJYiqow0UyNV6PEUhc4f0KJghQ==; CoreID6=n; JSESSIONID=F86BD98D5E0CF42A7BE875F648DFA640; ali_apache_tracktmp=W_signed=Y; acs_usuc_t=acs_rt=9af4ec13bf134eb298193f9ac69395dc; acs_t=/iJST0zcbQeKUIQrTj1tDurMpZLQIdanO+zAZzyDlS+blTn+Rwd4skbiIdbQbEDO; __utmc=3375712; xman_t=rr/A0xwWzcNjpVptbDP061VCJ0dhjFwJPMn/JxOmi4eSjWlXq98nD8HBxnqOtR8ZIUClJqCqI39uwIkL6/R0WYQtBiqtFb8R0KGmzoiHDZ4R6dMhSZeEC5Am8y6iywMSG5My1MlUAhuWQI6/EPBlSYOWa8V/3IiNJnCOUd/Wm4DWQt6YHxS12kJbrUZxu2M7HeOquFa8Ga+yB/P0DT0Z9EhRum3S3uBC2+rFkh50z+91raLJiWJ0PV9NqHup3sPpAstiWlmem8QfBps0tFSx7tZn9WkllmyNJsTUYWO0cuxr0gpjWPU72Bb6fsroRovgRZ6xeqDah+WT94rnU2jrRybsL+7JDXmYPYC0GOTHjsSsloHSyGTvoD+FNyS3jGQPoP8KL7NXi+Dq+FrAqOETg3OH2oJp/h7nH5CWcsdojLHTngkABhNnB0ky/YRS8dV0s0oukEDPt+iXVEjQBBsIjAmtVX2fYx8KGRiRNiff/4rehQ4GDZzk2kdfJHItnUSk694SnpAgB6PrkNpGvu8adLjy8W6GuXk2XzujhsCSNkQ+3/uNpEbqoAimkCW+6KjJujJCPYIGevineVzSjMih7eDWpP/5TbgWtyhWKv3F5QbKZzibUq6w/YnerorvCHNPcssWgl0lswk=; __utmb=3375712.4.10.1375606693; xman_us_t=x_lid=my1023002521&sign=y&x_user=RmoP5to3fHwR+VNOC9lIAD7BpyTVa0YBflCR3S4eFIU=&need_popup=y
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Content-Length: 93
productId=523523529&standard=&quantity=7&country=MY&company=CPAM&cartfrom=main_store&skuAttr=
I've tried to implement the raw header(above) with the following:
<?php
// create a new cURL resource
$fields = array(
'productId' => 523523529,
'standard' => '',
'quantity' => 8,
'country' => 'MY',
'company' => 'CPAM',
'cartfrom' => 'main_store',
'skuAttr' => ''
);
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string, '&');
echo $fields_string . "<br/>";
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Host: www.aliexpress.com",
"Content-Type: application/x-www-form-urlencoded",
"Content-length: ". "93",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
"X-Requested-With: XMLHttpRequest")); //proceeding with the login.
curl_setopt($ch, CURLOPT_URL, urlencode("http://www.aliexpress.com/cross-domain/shoppingcart/index.html"));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
//The encoded url below is referring to the login form for aliexpress.com
curl_setopt($ch, CURLOPT_REFERER, "http%3A%2F%2Fwww.aliexpress.com%2Fstore%2Fproduct%2FDual-sim-I9300-S3-MTK6589-quad-core-android-phone-1G-RAM-4G-ROM-4-7-inch%2F901666_1035163509.html%3FpromotionId%3D210526801");//This tells the server where were you directed from.
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields_string);
//curl_setopt($ch, CURLOPT_COOKIESESSION, true);//indicates that this is a new session, i assume this forces the server to assign a new session?
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);//follows the redirection that is supplied by the server
curl_setopt($ch, CURLOPT_HEADER, true);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);//THIS IS VERY IMPORTANT! This one of the most common option that is used because this simply means that
//the response from the server is returned as a string rather than output directly.
curl_setopt($ch, CURLOPT_UNRESTRICTED_AUTH, true);//This means to keep sending the login information(username and password) when there is a redirection
$str = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
Thanks for your help!
Regards,
Dexter
try setting CURLOPT_POST to true
curl_setopt($ch, CURLOPT_POST, 1);
I think you should modify your code a bit:
remove the urlencode when setting the value of CURLOPT_URL
remove the Host and Content-length header from CURLOPT_HTTPHEADER
use http_build_query to build your $fields_string
For debugging purpose, I set the CURLOPT_RETURNTRANSFER to be true and var dump the respond.
My working code
$fields = array(
'productId' => 523523529,
'standard' => '',
'quantity' => 8,
'country' => 'MY',
'company' => 'CPAM',
'cartfrom' => 'main_store',
'skuAttr' => ''
);
$fields_string = http_build_query($fields);
echo $fields_string . "<br/>";
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/x-www-form-urlencoded",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
"X-Requested-With: XMLHttpRequest")); //proceeding with the login.
curl_setopt($ch, CURLOPT_URL, "http://www.aliexpress.com/cross-domain/shoppingcart/index.html");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
//The encoded url below is referring to the login form for aliexpress.com
curl_setopt($ch, CURLOPT_REFERER, "http://www.aliexpress.com/store/product/Dual-sim-I9300-S3-MTK6589-quad-core-android-phone-1G-RAM-4G-ROM-4-7-inch/901666_1035163509.html?promotionId=210526801");//This tells the server where were you directed from.
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);//THIS IS VERY IMPORTANT! This one of the most common option that is used because this simply means that
//the response from the server is returned as a string rather than output directly.
curl_setopt($ch, CURLOPT_UNRESTRICTED_AUTH, true);//This means to keep sending the login information(username and password) when there is a redirection
$str = curl_exec($ch);
var_dump($str);
var_dump(curl_error($ch));
// close cURL resource, and free up system resources
curl_close($ch);
And the respond
HTTP/1.1 200 OK
Date: Sun, 04 Aug 2013 11:51:31 GMT
Server: Apache
P3P: CP="CAO PSA OUR"
Content-Language: en-US
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
X-XSS-protection: 1;mode=block
Content-Length: 56
Content-Type: plain/text;charset=utf-8
Set-Cookie: ali_apache_id=1.54.42.221.1375617091161.869918.6; path=/; domain=.aliexpress.com; expires=Wed, 30-Nov-2084 01:01:01 GMT
Set-Cookie: JSESSIONID=6EB1295945C27F8A2F788587D4C0E0A7; Path=/
Set-Cookie: ali_apache_track=; Domain=.aliexpress.com; Expires=Fri, 22-Aug-2081 15:05:38 GMT; Path=/
Set-Cookie: ali_apache_tracktmp=; Domain=.aliexpress.com; Path=/
Set-Cookie: acs_usuc_t=acs_rt=8fdfad47f53b46d489d0a905a5a9fb7c; Domain=.aliexpress.com; Path=/
Set-Cookie: xman_t=ZwO1ZDjGpaou2015+mejeWnS90vHjsN3YIDxbrXYOz/mbbJeIZM3q7Pw6ZGTygK2; Domain=.aliexpress.com; Path=/; HttpOnly
Set-Cookie: acs_t=2nqPb5i+QB7aDai5FXRM12xDJghxP4qjmcwPjwaXQ4SI6eV7eGpxjRGNjukEXuEW; Domain=.aliexpress.com; Path=/; HttpOnly
Set-Cookie: xman_f=MC/MUpjkYCKP+PRcAK43k9eQrTR+PE1rldMoChEUHVVlAUcYwh10BKJ0lxWlsPe4p+pYIPC/Vy4wIHJK8fiy4koUaF68CAolRC6UH7q0nmU5HcqWzgyjnA==; Domain=.aliexpress.com; Expires=Fri, 22-Aug-2081 15:05:38 GMT; Path=/; HttpOnly
Related
Okay, i have some website which i should parse...
Firstly, i open debugger in Firefox hitting F12, and look at Network tab, then enter needed website, and reading first root GET request, like
Doman => website.com
File => /
I get there all the request headers and write them into php array manually, then in code i call
curl_setopt($curl, CURLOPT_HTTPHEADER, $headerArray);
and also other options, then call
curl_exec();
while inspecting the Network tab in Firefox, i see that request headers are maybe such as default, and no specific headers written manually into array were sent. Similar problem with CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR, cookies are just written to cookie file on server, but in fact, there are another cookies in next request instead of previously saved in cookies file.
Actual request headers in browser's inspector:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Cache-Control: max-age=0
Connection: keep-alive
Cookie: _ga=GA1.1.1951751996.1563984714; _gid=GA1.1.1564173251.1563984714; _userGUID=0:jyhg490v:AIQdD2Qpm9rmbla1U93mK2a45CFRe49c; jv_enter_ts_2VumZAPpbr=1563984717382; jv_visits_count_2VumZAPpbr=1; .....
Host: localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
PHP Code:
<?php
$headers = ['Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3',
'Cache-Control: max-age=0',
'Connection: keep-alive',
'Cookie: visid_incap_1987259....,
'Host: website.com',
'TE: Trailers',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'];
$curl = curl_init("https://www.website.com/");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_COOKIEFILE, dirname(__FILE__)."/cookies.txt");
curl_setopt($curl, CURLOPT_COOKIEJAR, dirname(__FILE__)."/cookies.txt");
echo curl_exec($curl);
?>
You will not be able to see the headers send CURL in the Browser Dev Tools. All requests are executed on the server side. Your headers are sent successfully. You can check it out like this:
curl_setopt($curl, CURLINFO_HEADER_OUT, true);
$sentHeaders = curl_getinfo($curl, CURLINFO_HEADER_OUT);
print_r($sentHeaders);
I read it; (https://sheet.zoho.com/help/api/v2/#authorization)
Note: I know for a fact the URL is working. If I copy and paste it in the browser, everything is fine.
How do I get this code(Authorization) on my server side?
I'm trying this on my server (php-curl):
<?php
$uri = 'www.xxx.com/zoho_return.php';
$scope = 'ZohoSheet.dataAPI.UPDATE,ZohoSheet.dataAPI.READ';
$clientid = '1000.XXXXXXXXXXXXXXX';
$zoho_client_secret = 'XXXXXXXXXXXXXXXXXXXXX';
$accestype = 'offline';
$ch = curl_init();
$url = 'https://accounts.zoho.com/oauth/v2/auth?scope=' .
$scope . '&client_id=' . $clientid . '&response_type=code&access_type=' .
$accestype . '&redirect_uri=' . $uri . '';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
$html = curl_exec($ch);
$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
curl_close($ch);
echo 'redirectURL: '.$redirectURL.'<br><br>';
echo 'header: '.$html;
Response on chrome:
redirectURL: https://accounts.zoho.com/oauth/v2/auth?scope=ZohoSheet.dataAPI.UPDATE,ZohoSheet.dataAPI.READ&client_id=1000.XXXXXXXXXXXXXXX&response_type=code&access_type=online&redirect_uri=www.xxx.com/zoho_return.php
header: HTTP/1.1 302 Found Server: ZGS Date: Fri, 26 Oct 2018 22:48:43 GMT Content-Length: 0 Connection: keep-alive Set-Cookie: a8c61fa0dc=8db261d30d9c85a68e92e4f91ec8079a; Path=/; Secure; HttpOnly X-Content-Type-Options: nosniff X-XSS-Protection: 1 Set-Cookie: iamcsr=108a1f8a-29cf-4408-bbaf-113f8c42a3d7;path=/;Secure;priority=high Pragma: no-cache Cache-Control: no-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT X-Frame-Options: SAMEORIGIN Location: https://accounts.zoho.com/signin?servicename=AaaServer&serviceurl=%2Foauth%2Fv2%2Fauth%3Fscope%3DZohoSheet.dataAPI.UPDATE%252CZohoSheet.dataAPI.READ%26client_id%1000.XXXXXXXXXXXXXXX%26response_type%3Dcode%26access_type%3Donline%26redirect_uri%3Dhttp%253A%252F%252Fxxx.com%252Fzoho_return.php Strict-Transport-Security: max-age=15768000
Zoho CRM API v2 supports only Authorization Code Grant which works in the browser as you mentioned.
The case here is that you have to use Password Grant to get the access token directly if you have a valid credentials which is not supported by Zoho.
You need to do some research about the OAuth 2.0.
Simply the answer is "You can not get the Authorization Code from your PHP code".
I am new to screen-scaraping and curl. I am planning to create a website like what http://www.skyscanner.com.my/ is doing that will allow a user to pull the origin, destination and date from the http://airasia.com website. Then the website return the flight schedule and ticket price to the user. The following is my code so far:
code:
<?php
$post_data['Origin']=$_POST['origin'];
$post_data['Destination']=$_POST['destination'];
$post_data['From']=$_POST['departDate'];
$post_data['To']=$_POST['returnDate'];
foreach ($post_data as $key => $value)
{
$post_items[] = $key . '=' . $value;
}
$post_string = implode ('&', $post_items);
$curl_connection = curl_init('https://booking.airasia.com/search.aspx');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, False);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);
$result = curl_exec($curl_connection);
print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' .
curl_error($curl_connection);
curl_close($curl_connection);
echo $result;
?>
The above do not return me any result from air asia. So i need some guidance to continue my task. Thank You
UPDATE
This Worked:
$request = array();
$request[] = "Host: mobile.airasia.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";
$post = 'hash=61582ddd1b6ab8782ad63f1a6c6c1e46&trip-type=round-trip&origin=PEK&destination=SGN&date-depart-d=25&date-depart-my=2015-04&date-return-d=30&date-return-my=2015-04&passenger-count=1&child-count=0&infant-count=0¤cy=MYR&depart-sellkey=&return-sellkey=&depart-details-index=&return-details-index=&depart-faretype=&return-faretype=&action=search&btnSearch=Search';
$url = 'https://mobile.airasia.com/en/search';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
if (curl_errno($ch)){
$data .= 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
$info = rawurldecode(var_export(curl_getinfo($ch),true));
// Get the cookies:
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$requestHeader= substr($data,0,$skip);
$data = substr($data,$skip);
echo $data
Request Header.
POST /en/search HTTP/1.1
Accept-Encoding: deflate, gzip
Host: mobile.airasia.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 366
Content-Type: application/x-www-form-urlencoded
Response Header:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Cache-control: no-cache="set-cookie"
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Mon, 20 Apr 2015 04:02:14 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Server: redishot
Set-Cookie: locale=en; expires=Mon, 20-Apr-2015 05:02:11 GMT; path=/; secure
Set-Cookie: currency=MYR; expires=Mon, 20-Apr-2015 05:02:11 GMT; path=/; secure
Set-Cookie: PHPSESSID=p8mjtiiga4615pnhuu6vl1htiqkqsn7v; path=/; HttpOnly
Set-Cookie: AWSELB=CDFDE3A70C862943856FF6079178A94249700C674BDFF1E117C02BF52443FE13448AB71BEA2EA3F41C01293A39C3579A0A03905034DA565F71B4820BD1807C5558B22ED5E0;PATH=/;MAX-AGE=1800
Vary: Accept-Encoding
transfer-encoding: chunked
Connection: keep-alive
This is the data you need from the returned HTML
Flight No.
Flight Time
Cost
.
Results","position":1}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7 317</div><div class=flight-time>02:15<br>12:10</div><div class=flight-info><div class=box><div class=total-price>MYR 2,091.42</div>
Results","position":2}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7 317</div><div class=flight-time>02:15<br>12:55</div><div class=flight-info><div class=box><div class=total-price>MYR 2,106.26</div>
Results","position":3}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7 317</div><div class=flight-time>02:15<br>15:50</div><div class=flight-info><div class=box><div class=total-price>MYR 2,483.82</div>
end of update
Do not prepend the "?" in the post data.
This is the format in the php manual:
$post = 'key1=value1&key2=value2&key3=value3';
You cannot curl http://booking.airasia.com/search.aspx because it requires javaScript.
You have to use the mobile site. When use a Browser to see the HTTP Request and Response Headers do it with JavaScript disabled on the Browser.
Use:
https://mobile.airasia.com/en/search
The problem is the mobile site is not functioning correctly right now and says to try later. so I could not get any further.
Regarding the post
This is what is being posted by the search:
Content-Type: application/x-www-form-urlencoded
Content-Length: 366
hash=26edce4024c5611451a2a95a74e2bf01
&trip-type=round-trip
&origin=KUL
&destination=OOL&date-depart-d=20
&date-depart-my=2015-04&date-return-d=25
&date-return-my=2015-04
&passenger-count=1
&child-count=0&infant-count=0
¤cy=MYR
&depart-sellkey=
&return-sellkey=
&depart-details-index=
&return-details-index=
&depart-faretype=
&return-faretype=
&action=search
&btnSearch=Search
Because their form is application/x-www-form-urlencoded your are almost doing the $post_string correctly. You can use an array for the post data but if value is an array, the Content-Type header will be set to multipart/form-data, which should be OK.
Because it is application/x-www-form-urlencoded you must urlencode $post_string :
$post_string` = urlencode(implode ('&', $post_items));
To get the cookies, you do not need, and possibly never will need:
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, False);
Remove it.
You will be getting a redirect and may need the cookie jar:
curl_setopt($ch,CURLOPT_COOKIEFILE, "/tmp/cookie.txt")
You may need to set the request header to match the Browser Request:
Create an array to put the Request Header Key Values
Fill in the Request array with exactly what is in the Request header of your upload.
EXAMPLE:
$request = array();
$request[] = "Host: www.example.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";
Add to curl:
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
Checking the the headers in a Browser
Then I use FireFox Inspector or Chrome Development Tool.
I go to the Network Tab
In FireFox I go to Settings and turn on "Enable Persistent logs"
In Chrome I click "Preserve log" on the Network Tab
Then I use the Browser to go wherever I want curl to go.
Now I can see every Request and Response including redirects and compare them with the save headers.
Inspect: Step by Step
right click select Inspect Element
Select the Network tab
Refresh the page
Select Documents (chrome) or HTML (firefox)
Clear the list
Post your upload
Select the upload Request in the list of Requests
I use FireFox with user agent switcher using an old Motorola user agent to retrieve the headers and HTML. Then I use the same user agent in curl's HTTPHEADER:
request[] = 'User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0
It is possible, not likely, the above cased the error when I tried
Your Query String $post_string is right but you are missing to prepend it with ? before sending curl. Try following:
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, "?".$post_string);
I'm trying to login to kiala website. Kiala is a "shipping" compagny and I like to get a new shipping token with php. Since there is no api for doing this I tried with curl. Now I don't have much experience with curl and I can't make curl saving the cookies to the jar. I've tried many things but now I get to the point that I want to rip my hair off. I need these cookies to make further requests.
I've made a dummy account for testing
website: http://www.kialaverzendservice.be/
login form: http://www.kialaverzendservice.be/login.required.action?os_destination=%2Fsender%2Fstart.action
email: kialatest#mailinator.com
pwd: test123
I get the following header
HTTP/1.1 200 OK Date: Thu, 10 Oct 2013 09:46:37 GMT Server:
Apache-Coyote/1.1 Content-Type: text/html;charset=ISO-8859-15 Vary:
Accept-Encoding Set-Cookie:
berkano-seraph-login=eGV5ZcKEZXhlwoZlwoJkfGJ5Y31jemTCgGJ7YsKGYsKCYnhifmLCgmLChmN6YsKGY3dmwoNiwoZjwoNjeGh+YnhjfWJ5Yn1mwoFmwoFm;
Expires=Fri, 10-Oct-2014 09:46:37 GMT; Path=/ Set-Cookie:
kiala-c2c-language=nl; Expires=Tue, 28-Oct-2081 13:00:44 GMT; Path=/
Transfer-Encoding: chunked
My simplified php code: I'm able to login but the cookies in the header are not set in my cookiejar-file? btw I'm on localhost (wamp), but I don't think it matters.
loginToKiala();
function loginToKiala(){
$url = 'http://kialaverzendservice.be/sender/start.action';
//POST vars
$fields = array(
'os_username' => urlencode('kialatest#mailinator.com'),
'os_password' => urlencode('test123'),
'os_cookie'=>urlencode('true')//remember me
);
//url-ify the data for the POST
$fields_string='';
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string, '&');
$ch=curl_init();
$cookie_file = './cookies.txt';
if (! file_exists($cookie_file) || ! is_writable($cookie_file))
{
echo 'Cookie file missing or not writable.';
exit;
}
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
//curl_setopt($ch,CURLOPT_AUTOREFERER, true);
//curl_setopt($ch, CURLOPT_REFERER, 'http://www.kialaverzendservice.be/');//set referer for first request
//curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
curl_setopt($ch, CURLOPT_HEADER, 1);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_exec ($ch); // execute the curl command
curl_close ($ch);
unset($ch);
}
any help is appreciated!
First of all you're not using CURLOPT_POST correctly. cURL is expecting TRUE, FALSE, 1 or 0. The function you have there, count($fields) will (probably) return a number higher than one.
Do this instead:
curl_setopt( $ch, CURLOPT_POST, 1 );
At least, if I understand correctly that you want to use POST instead of GET.
I have this cURL code in php.
curl_setopt($ch, CURLOPT_URL, trim("http://stackoverflow.com/questions/tagged/java"));
curl_setopt($ch, CURLOPT_PORT, 80); //ignore explicit setting of port 80
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_HTTPHEADER, $v);
curl_setopt($ch, CURLOPT_VERBOSE, true);
The contents of HTTPHEADER are ;
Proxy-Connection: Close
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1017.2 Safari/535.19
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: __qca=blabla
Connection: Close
Each of them individual items in the array $v.
When I upload the file on my host and run the code, what I get is :
400 Bad request
Your browser sent an invalid request.
But when I run it on my system using command line PHP, what I get is this and the full page.
< HTTP/1.1 200 OK
< Vary: Accept-Encoding
< Cache-Control: private
< Content-Type: text/html; charset=utf-8
< Content-Encoding: gzip
< Date: Sat, 03 Mar 2012 21:50:17 GMT
< Connection: close
< Set-Cookie: buncha cokkies; path=/; HttpOnly
< Content-Length: 22151
<
* Closing connection #0
.
It's not only on stackoverflow, this happens, it happens also on 4shared, but works on google and others.
Thanks for any help.
Your empty CURLOPT_ENCODING argument is causing the issue. If you don't want gzip/deflate, simply omit the header.
I also see you're defining encoding both in your curl_setopt() and in the HTTP_HEADER array.
You should use native curl_setopt() commands when possible. CURLOPT_USERAGENT is one you can move out of your HTTP_HEADER array.
But as Andrew Marshall mentioned, screen-scraping isn't something you should be doing; especially since they have an API.
EDIT
Here's the sample script I'm using:
<?php
$v = Array(
'Proxy-Connection: Close',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1017.2 Safari/535.19',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.8',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Cookie: __qca=blabla',
'Connection: Close'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, trim("http://stackoverflow.com/questions/tagged/java"));
//curl_setopt($ch, CURLOPT_PORT, 80); //ignore explicit setting of port 80
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
//curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_HTTPHEADER, $v);
curl_setopt($ch, CURLOPT_VERBOSE, true);
echo curl_exec($ch);
?>
Now I'm running this via command-line, but the net effect is the same. I removed the Accept-Encoding in the $v array simply so I could get un-compressed output.
The one thing we haven't established is your PHP and libcurl versions. For me, this is PHP 5.3.2 with libcurl 7.12.1. This can be important. You can find your libcurl version either by php -i | grep -i curl on the command line, or phpinfo() via a web-based script on your server.
It seems some header is breaking the expected request pattern on some sites. The easiest way to fix this would be to remove the headers one by one and test.
I think it should be the encoding one.
It seems the "Host" header is missing:
Host: stackoverflow.com