How to simulate a real request from browser using CURL?

How to simulate a real request from browser using CURL? - php

I'm trying to simulate a real browser request using CURL with proxy rotate, I searched about it, But none of the answers worked.
Here is the code:
$url= 'https://www.stubhub.com/';
$proxy = '1.10.185.133:30207';
$userAgent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36';
$curl = curl_init();
curl_setopt( $curl, CURLOPT_URL, trim($url) );
curl_setopt($curl, CURLOPT_REFERER, trim($url));
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 0 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 0 );
curl_setopt( $curl, CURLOPT_AUTOREFERER, TRUE );
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
$cacert='C:/xampp/htdocs/cacert.pem';
curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
curl_setopt($curl, CURLOPT_COOKIEFILE,__DIR__."/cookies.txt");
curl_setopt ($curl, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookies.txt');
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt( $curl, CURLOPT_USERAGENT, $userAgent );
//Headers
$header = array();
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[] = "Accept-Language: cs,en-US;q=0.7,en;q=0.3";
$header[] = "Accept-Encoding: utf-8";
$header[] = "Connection: keep-alive";
$header[] = "Host: www.gumtree.com";
$header[] = "Origin: https://www.stubhub.com";
$header[] = "Referer: https://www.stubhub.com";
curl_setopt( $curl, CURLOPT_HEADER, $header );
curl_setopt($curl, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxy);
curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
$data = curl_exec( $curl );
$info = curl_getinfo( $curl );
$error = curl_error( $curl );
echo '<pre>';
print_r($all);
echo '</pre>';
Here is what I get when I run the script:
Array
(
[data] => HTTP/1.1 200 OK
HTTP/1.0 405 Method Not Allowed
Server: nginx
Content-Type: text/html; charset=UTF-8
Accept-Ranges: bytes
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: private, no-cache, no-store, must-revalidate
Surrogate-Control: no-store, bypass-cache
Content-Length: 9411
X-EdgeConnect-MidMile-RTT: 203
X-EdgeConnect-Origin-MEX-Latency: 24
Date: Sat, 03 Nov 2018 17:15:56 GMT
Connection: close
Strict-Transport-Security: max-age=31536000; includeSubDomains
[info] => Array
(
[url] => https://www.stubhub.com/
[content_type] => text/html; charset=UTF-8
[http_code] => 405
[header_size] => 487
[request_size] => 608
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 38.484
[namelookup_time] => 0
[connect_time] => 2.219
[pretransfer_time] => 17.062
[size_upload] => 0
[size_download] => 9411
[speed_download] => 244
[speed_upload] => 0
[download_content_length] => 9411
[upload_content_length] => -1
[starttransfer_time] => 23.859
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 1.10.186.132
[certinfo] => Array
(
)
[primary_port] => 42150
[local_ip] => 192.168.1.25
[local_port] => 59320
)
[error] =>
)
As well as a Recaptcha, As it says:
Due to high volume of activity from your computer, our anti-robot software has blocked your access to stubhub.com. Please solve the puzzle below and you will immediately regain access.
When I visit the website using any browser, The website is displayed.
But with the above script, It's not.
So what am I missing to make the curl request like a real browser request and not be detected as a bot?
Or if there is an API/library that could do it, Please mention it.
Would Guzzle or similar fix this issue?

"So what am I missing to make the curl request like a real browser request"
My guess is they are using a simple cookie check. There are more sophisticated methods that allow recognizing automation such as cURL with a high degree of reliability, especially if coupled with lists of proxy IP addresses or IPs of known bangers.
Your first step is to intercept the outgoing browser request using pcap or something similar, then try and replicate it using cURL.
One other simple thing to check is whether your cookie jar has been seeded with some telltale. I routinely do that too, since most scripts on the Internet are just copy-pastes and don't pay much attention to these details.
The thing that would for sure make you bounce from any of my systems is that you're sending a referer, but you don't seem to actually have connected to the first page. You're practically saying "Well met again" to a server that is seeing you for the first time. You might have saved a cookie from that first encounter, and the cookie has now been invalidated (actually been marked "evil") by some other action. At least in the beginning, always replicate the visiting sequence from a clean slate.
You might try and adapt this answer, also cURL-based. Always verify actual traffic using a MitM SSL-decoding proxy.
Now, the real answer - what do you need that information for? Can you get it somewhere else? Can you ask for it explicitly, maybe reach an agreement with the source site?

Related

No XML response from cURL POST to REST API in PHP

The curl function I use is as follows. It has extra debugging information in the output and the default settings can be easily overridden at runtime by supplying a different $options argument. I'm not suggesting this is the answer but with a better set of options configured and better debug info you should get closer.
function curl( $url=NULL, $options=NULL, $headers=false ){
$cacert='c:/wwwroot/cacert.pem';
$vbh = fopen('php://temp', 'w+');
/*
Download a copy of CACERT.pem from
https://curl.haxx.se/docs/caextract.html
save to webserver and modify the $cacert variable
to suit - ensuring that the path you choose is
readable.
*/
$res=array(
'response' => NULL,
'info' => array( 'http_code' => 100 ),
'headers' => NULL,
'errors' => NULL
);
if( is_null( $url ) ) return (object)$res;
session_write_close();
/* Initialise curl request object - these should be OK as-is */
$curl=curl_init();
if( parse_url( $url,PHP_URL_SCHEME )=='https' ){
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, true );
curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 2 );
curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
curl_setopt( $curl, CURLOPT_CAPATH, $cacert );
}
/* Define standard options */
curl_setopt( $curl, CURLOPT_URL,trim( $url ) );
curl_setopt( $curl, CURLOPT_AUTOREFERER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_FAILONERROR, true );
curl_setopt( $curl, CURLOPT_HEADER, false );
curl_setopt( $curl, CURLINFO_HEADER_OUT, false );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_BINARYTRANSFER, true );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 20 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 60 );
curl_setopt( $curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.38 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.38' );
curl_setopt( $curl, CURLOPT_MAXREDIRS, 10 );
curl_setopt( $curl, CURLOPT_ENCODING, '' );
/* enhanced debug */
curl_setopt( $curl, CURLOPT_VERBOSE, true );
curl_setopt( $curl, CURLOPT_NOPROGRESS, true );
curl_setopt( $curl, CURLOPT_STDERR, $vbh );
/* Assign runtime parameters as options to override defaults if needed. */
if( isset( $options ) && is_array( $options ) ){
foreach( $options as $param => $value ) curl_setopt( $curl, $param, $value );
}
/* send any headers with the request that are needed */
if( $headers && is_array( $headers ) ){
curl_setopt( $curl, CURLOPT_HTTPHEADER, $headers );
}
/* Execute the request and store responses */
$res=(object)array(
'response' => curl_exec( $curl ),
'info' => (object)curl_getinfo( $curl ),
'errors' => curl_error( $curl )
);
rewind( $vbh );
$res->verbose=stream_get_contents( $vbh );
fclose( $vbh );
curl_close( $curl );
return $res;
}
Then, to use it:
$url='https://www.example.com/api/';
$args=array();
$headers=array(
'xxxxxx-Username: xxx',
'xxxxxx-Password: xxx',
'Content-Type: application/xml'
);
$res=curl( $url, $args, $headers );
if( $res->info->http_code==200 ){
#cool - use $res->response in further processing
print_r($res->response,true);
}else{
# useful information will be displayed here...
printf('<h1>Verbose debug info</h1><pre>%s</pre>',print_r($res->verbose,true));
printf('<h1>Info</h1><pre>%s</pre>',print_r($res->info,true));
}
update to indicate how to send POST data
You use the $options parameter to supply different runtime configuration to the curl request, like so:
$url='https://www.example.com/api/';
$args=array(
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $send_body
);
$headers=array(
'xxxxxx-Username: xxx',
'xxxxxx-Password: xxx',
'Content-Type: application/xml'
);
$res=curl( $url, $args, $headers );

HTTP/1.0 405 Method Not Allowed and false with CURL

I'm trying to get some data from that website: https://stubhub.com .
1- With file_get_contents:
$url= 'https://www.stubhub.com';
$html = file_get_contents($url);
echo $html;
I get:
Warning: file_get_contents(https://stubhub.com): failed to open stream: HTTP request failed! HTTP/1.0 405 Method Not Allowed
2- With CURL:
$url= 'https://www.stubhub.com';
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_REFERER, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$html = curl_exec($curl);
$response = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
var_dump($html);
var_dump($response);
But I get:
bool(false) int(0)
I tried to add some headers like User-Agent and proxy:
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
$proxy = '185.135.226.159:23500';
curl_setopt($curl, CURLOPT_PROXY, $proxy);
But again I get the same.
I have allow_url_fopen=On, So what's wrong?

function curl( $url=NULL, $options=NULL ){
$cacert='c:/wwwroot/cacert.pem'; # <----- download your own copy and configure this path
$vbh = fopen('php://temp', 'w+');
$res=array(
'response' => NULL,
'info' => array( 'http_code' => 100 ),
'headers' => NULL,
'errors' => NULL
);
if( is_null( $url ) ) return (object)$res;
session_write_close();
/* Initialise curl request object */
$curl=curl_init();
if( parse_url( $url,PHP_URL_SCHEME )=='https' ){
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, true );
curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 2 );
curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
}
/* Define standard options */
curl_setopt( $curl, CURLOPT_URL,trim( $url ) );
curl_setopt( $curl, CURLOPT_AUTOREFERER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_FAILONERROR, true );
curl_setopt( $curl, CURLOPT_HEADER, false );
curl_setopt( $curl, CURLINFO_HEADER_OUT, false );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_BINARYTRANSFER, true );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 20 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 60 );
curl_setopt( $curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36' );
curl_setopt( $curl, CURLOPT_MAXREDIRS, 10 );
curl_setopt( $curl, CURLOPT_ENCODING, '' );
curl_setopt( $curl, CURLOPT_VERBOSE, true );
curl_setopt( $curl, CURLOPT_NOPROGRESS, true );
curl_setopt( $curl, CURLOPT_STDERR, $vbh );
/* Assign runtime parameters as options */
if( isset( $options ) && is_array( $options ) ){
foreach( $options as $param => $value ) curl_setopt( $curl, $param, $value );
}
/* Execute the request and store responses */
$res=(object)array(
'response' => curl_exec( $curl ),
'info' => (object)curl_getinfo( $curl ),
'errors' => curl_error( $curl )
);
rewind( $vbh );
$res->verbose=stream_get_contents( $vbh );
fclose( $vbh );
curl_close( $curl );
return $res;
}
$url='https://www.stubhub.com/';
$res = curl( $url );
if( $res->info->http_code==200 ){
printf('<pre>%s</pre>',print_r( $res->info,true ));
printf('<pre>%s</pre>',print_r( $res->verbose,true ));
}
This will output:
stdClass Object
(
[url] => https://www.stubhub.com/
[content_type] => text/html
[http_code] => 200
[header_size] => 1304
[request_size] => 214
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.609
[namelookup_time] => 0.25
[connect_time] => 0.265
[pretransfer_time] => 0.39
[size_upload] => 0
[size_download] => 1194
[speed_download] => 1960
[speed_upload] => 0
[download_content_length] => 1194
[upload_content_length] => -1
[starttransfer_time] => 0.609
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 23.43.75.46
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => 192.168.0.56
[local_port] => 5042
)
* Trying 23.43.75.46...
* TCP_NODELAY set
* Connected to www.stubhub.com (23.43.75.46) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
CAfile: c:/wwwroot/cacert.pem
CApath: none
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=California; L=San Francisco; O=Stubhub, Inc.; OU=Technology; CN=www.stubhub.com
* start date: Jun 11 00:00:00 2018 GMT
* expire date: Jan 9 12:00:00 2020 GMT
* subjectAltName: host "www.stubhub.com" matched cert's "www.stubhub.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert ECC Secure Server CA
* SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.stubhub.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept: */*
Accept-Encoding: deflate, gzip
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: text/html
< Expires: Thu, 01 Jan 1970 00:00:01 GMT
< Cache-Control: private, no-cache, no-store, must-revalidate
< Surrogate-Control: no-store, bypass-cache
< Content-Encoding: gzip
< X-EdgeConnect-MidMile-RTT: 163
< X-EdgeConnect-Origin-MEX-Latency: 24
< X-Akamai-Transformed: 9 624 0 pmb=mTOE,1mRUM,1
< Date: Sat, 20 Oct 2018 16:25:57 GMT
< Content-Length: 1194
< Connection: keep-alive
< Vary: Accept-Encoding
< Set-Cookie: DC=lvs31;Path=/;Domain=stubhub.com;Expires=Sat, 20-Oct-2018 16:55:56 GMT;Max-Age=1800
< Set-Cookie: akacd_PCF_Prod=1540053357~rv=98~id=53e183ee10a83152497c9102c8c7dee7; path=/; Expires=Sat, 20 Oct 2018 16:35:57 GMT
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Set-Cookie: _abck=10D08E1267D29C2EDBEA32445BD116805C7A3616AB3500001557CB5B9AD22713~-1~e+BGOJkoD/UwtPOWH75YXUSo6Kzyd7sF6nTkkw89JfE=~-1~-1; expires=Sun, 20 Oct 2019 16:25:57 GMT; max-age=31536000; path=/; domain=.stubhub.com
< Set-Cookie: bm_sz=7C06CFF7557E22DEC7855EC89DF628B0~QAAQFjZ6XGg5goBmAQAAIypMkhVJRZxwtVU8097T7Q8Z2TcGPZR0XRtAVFY3TBHGsR4EW51MqZlCAyk3cMPDJEmukVvLunM36/5Kn1gtoxarUtgkqBvlfudWZBJb2xc1rHdnMhdsAXoHWLaGt0NwROSXckDe48kkqu2Kw3suRgrWcqDlj7Y1akARK8OYnoa6; Domain=.stubhub.com; Path=/; Expires=Sat, 20 Oct 2018 20:25:56 GMT; Max-Age=14399; HttpOnly
<
* Connection #0 to host www.stubhub.com left intact
To access the actual response body you would process $res->response - load it into DOMDocument or whatever you intend to do... good luck

Curl http_code 400

I am trying to send the push Notification but it is getting problem
$url = 'https://android.googleapis.com/gcm/send';
$fields = array(
'registration_ids' => $id,
'data' => $load,
);
$headers = array(
'Content-Type: application/json',
'Authorization: key=' . GOOGLE_API_KEY,
);
// Open connection
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
//curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11');
// Disabling SSL Certificate support temporarly
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($fields, true));
// Execute post
$result = curl_exec($ch);
$httpcode = curl_getinfo($ch);
And when I try to see the curl info I see there is http_code 400. I did everything but still I am getting problem and push notification is not working.
Can you guys please help me ? I am stuck here.
Array
(
[url] => https://android.googleapis.com/gcm/send
[content_type] => text/plain; charset=UTF-8
[http_code] => 400
[header_size] => 406
[request_size] => 698
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.22549
[namelookup_time] => 0.028427
[connect_time] => 0.030052
[pretransfer_time] => 0.189248
[size_upload] => 382
[size_download] => 41
[speed_download] => 181
[speed_upload] => 1694
[download_content_length] => -1
[upload_content_length] => 382
[starttransfer_time] => 0.225382
[redirect_time] => 0
[certinfo] => Array
(
)
[primary_ip] => 172.217.6.234
[primary_port] => 443
[local_ip] => 162.243.229.189
[local_port] => 54327
[redirect_url] =>
)

Error 400 means Bad Request. I see in your code, in the $headers array that you have the GOOGLE_API_KEY which is not defined in your script. You also have not defined $id and $load from what I can see.
Firstly, you will need a Google Cloud Messaging API Key, which you can obtain from the Google Developer's Console.
Please note that the method which you are employing requires a client side app installed on the mobile device. An alternative would be to send a web-push notification, which pushes the notification to Chrome (or any other compatible web browser), so long as it is running and has any tab open. If this is an option you think is worth looking at, I suggest you look at the following web-push library on Git-Hub.

PHP cURL return empty header and body despite HTTP Code 200

So I try to scrap this URL: xxxx.fr with cURL, but impossible to get access to the page HTML code, both header and body are empty.
HTTP code return is 200
I tried with other URL (different domain) and it works like a charm.
I also try with different User Agent and Referer
Do you know what is wrong ? At lest can someone try this code on your own server and let me know if you have the same issue ?
Thank you
Below is my code:
$url = 'http://www.xxxx.fr';
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: timeout=5, max=100";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = ""; // BROWSERS USUALLY LEAVE BLANK
$curl = curl_init ();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0");
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_REFERER, "http://www.google.fr");
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLINFO_HEADER_OUT, 1);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, getcwd().'/cookies.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, getcwd().'/cookies.txt');
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$curlData = curl_exec($curl);
$infos = curl_getinfo($curl);
print_r($infos);
curl_close ( $curl );
echo "<hr>Page:<br />";
echo htmlentities($curlData);
and here is the result from the print_r($infos):
Array (
[url] => http://www.xxxx.fr
[content_type] => text/html
[http_code] => 200
[header_size] => 625
[request_size] => 465
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.032535
[namelookup_time] => 0.001488
[connect_time] => 0.002581
[pretransfer_time] => 0.002639
[size_upload] => 0
[size_download] => 10234
[speed_download] => 314553
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => 0
[starttransfer_time] => 0.032088
[redirect_time] => 0
[certinfo] => Array ( )
[primary_ip] => xxx
[primary_port] => 80
[local_ip] => xxx
[local_port] => 37319
[redirect_url] =>
[request_header] => GET / HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Host: www.xxxx.fr Accept-Encoding: gzip,deflate Referer: http://www.google.fr Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Cache-Control: max-age=0 Connection: keep-alive Keep-Alive: timeout=5, max=100 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5
)

//EDIT
htmlentities($curlData) returns empty string because encoding of source is non UTF-8 string see this link
that should works:
htmlentities($curlData, ENT_QUOTES,'ISO-8859-1' );
in PHP 5.4 release, htmlspecialchars() doesn’t use ISO-8859-1 as default encoding. In fact htmlspecialchars() as of PHP 5.4 uses UTF-8. You might expect, that htmlspecialchars() would just skip non-UTF-8 byte sequences or translate them to a ‘no found’ character. In fact, htmlspecialchars() returns a blank string: No error gets generated, no errorcode gets returned, no exception gets raised, just a blank string gets returned if non-valid UTF-8 sequences get passed in

curl error HTTP status code 0-- empty reply from server

I'm trying to scrap a website but it always said that Empty Reply from server
can any one look at the code and tell me what am I doing wrong?
Here is the code
function spider($url){
$header = array(
"Host" => "www.example.net",
//"Accept-Encoding:gzip,deflate,sdch",
"Accept-Language:en-US,en;q=0.8",
"Cache-Control:max-age=0",
"Connection:keep-alive","Content-Length:725","Content-Type:application/x-www-form-urlencoded",
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
,"X-Requested-With:XMLHttpRequest"
);
$cookie = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0); // return headers 0 no 1 yes
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return page 1:yes
curl_setopt($ch, CURLOPT_TIMEOUT, 200); // http request time-out 20 seconds
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Follow redirects, need this if the URL changes
curl_setopt($ch, CURLOPT_MAXREDIRS, 2); //if http server gives redirection response
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7");
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath( $cookie)); // cookies storage / here the changes have been made
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath( $cookie));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // false for https
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,"view=ViewDistrict&param=7&uniqueid=1397991494188&PHPSESSID=f134vrnv7glosgojvf4n1mp7o2&page=http%3A%2F%2Fwww.example.com%2Fxhr.php");
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.com/");
$data = curl_exec($ch); // execute the http request
$info = curl_getinfo($ch);
curl_close($ch); // close the connection
return $data;
}
Here is function call
echo spider("http://www.example.net/");
Edit
Array ( [url] => http://www.example.net/ [content_type] => text/html [http_code] => 301 [header_size] => 196 [request_size] => 840 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 1 [total_time] => 61.359 [namelookup_time] => 0 [connect_time] => 0.281 [pretransfer_time] => 0.281 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => 178 [upload_content_length] => 0 [starttransfer_time] => 60.593 [redirect_time] => 0.766 [certinfo] => Array ( ) [redirect_url] => ) Empty reply from server
this is the header now
also I'd updated my post data
it's now
curl_setopt($ch, CURLOPT_POSTFIELDS,"view=ViewDistrict&param=7&uniqueid=".time(). rand(101,500)."&PHPSESSID=f134vrnv7glosgojvf4n1mp7o2&page=http%3A%2F%2Fexample.com%2Fxhr.php");
and also had removed "X-Requested-With:XMLHttpRequest" from headers

Have you tried removing this from the headers ?
X-Requested-With:XMLHttpRequest

My guess is that your problem is in this line:
curl_setopt(
$ch,
CURLOPT_POSTFIELDS,
"view=ViewDistrict&param=7&uniqueid=1397991494188&PHPSESSID=f134vrnv7glosgojvf4n1mp7o2&page=http%3A%2F%2Fwww.example.com%2Fxhr.php"
);
Notice that you're passing a value for PHPSESSID. I'm guessing you copied & pasted a URL from a visit to the site, right? That session ID was probably valid when you visited the site, but the odds of it being valid now are pretty slim. And if the server doesn't like the session ID, chances are it's not going to give you any data.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to simulate a real request from browser using CURL? - php

Related

No XML response from cURL POST to REST API in PHP

HTTP/1.0 405 Method Not Allowed and false with CURL

Curl http_code 400

PHP cURL return empty header and body despite HTTP Code 200

curl error HTTP status code 0-- empty reply from server

Categories

Resources