I'm trying to simulate a real browser request using CURL with proxy rotate, I searched about it, But none of the answers worked.
Here is the code:
$url= 'https://www.stubhub.com/';
$proxy = '1.10.185.133:30207';
$userAgent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36';
$curl = curl_init();
curl_setopt( $curl, CURLOPT_URL, trim($url) );
curl_setopt($curl, CURLOPT_REFERER, trim($url));
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 0 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 0 );
curl_setopt( $curl, CURLOPT_AUTOREFERER, TRUE );
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
$cacert='C:/xampp/htdocs/cacert.pem';
curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
curl_setopt($curl, CURLOPT_COOKIEFILE,__DIR__."/cookies.txt");
curl_setopt ($curl, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookies.txt');
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt( $curl, CURLOPT_USERAGENT, $userAgent );
//Headers
$header = array();
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[] = "Accept-Language: cs,en-US;q=0.7,en;q=0.3";
$header[] = "Accept-Encoding: utf-8";
$header[] = "Connection: keep-alive";
$header[] = "Host: www.gumtree.com";
$header[] = "Origin: https://www.stubhub.com";
$header[] = "Referer: https://www.stubhub.com";
curl_setopt( $curl, CURLOPT_HEADER, $header );
curl_setopt($curl, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxy);
curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
$data = curl_exec( $curl );
$info = curl_getinfo( $curl );
$error = curl_error( $curl );
echo '<pre>';
print_r($all);
echo '</pre>';
Here is what I get when I run the script:
Array
(
[data] => HTTP/1.1 200 OK
HTTP/1.0 405 Method Not Allowed
Server: nginx
Content-Type: text/html; charset=UTF-8
Accept-Ranges: bytes
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: private, no-cache, no-store, must-revalidate
Surrogate-Control: no-store, bypass-cache
Content-Length: 9411
X-EdgeConnect-MidMile-RTT: 203
X-EdgeConnect-Origin-MEX-Latency: 24
Date: Sat, 03 Nov 2018 17:15:56 GMT
Connection: close
Strict-Transport-Security: max-age=31536000; includeSubDomains
[info] => Array
(
[url] => https://www.stubhub.com/
[content_type] => text/html; charset=UTF-8
[http_code] => 405
[header_size] => 487
[request_size] => 608
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 38.484
[namelookup_time] => 0
[connect_time] => 2.219
[pretransfer_time] => 17.062
[size_upload] => 0
[size_download] => 9411
[speed_download] => 244
[speed_upload] => 0
[download_content_length] => 9411
[upload_content_length] => -1
[starttransfer_time] => 23.859
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 1.10.186.132
[certinfo] => Array
(
)
[primary_port] => 42150
[local_ip] => 192.168.1.25
[local_port] => 59320
)
[error] =>
)
As well as a Recaptcha, As it says:
Due to high volume of activity from your computer, our anti-robot software has blocked your access to stubhub.com. Please solve the puzzle below and you will immediately regain access.
When I visit the website using any browser, The website is displayed.
But with the above script, It's not.
So what am I missing to make the curl request like a real browser request and not be detected as a bot?
Or if there is an API/library that could do it, Please mention it.
Would Guzzle or similar fix this issue?
"So what am I missing to make the curl request like a real browser request"
My guess is they are using a simple cookie check. There are more sophisticated methods that allow recognizing automation such as cURL with a high degree of reliability, especially if coupled with lists of proxy IP addresses or IPs of known bangers.
Your first step is to intercept the outgoing browser request using pcap or something similar, then try and replicate it using cURL.
One other simple thing to check is whether your cookie jar has been seeded with some telltale. I routinely do that too, since most scripts on the Internet are just copy-pastes and don't pay much attention to these details.
The thing that would for sure make you bounce from any of my systems is that you're sending a referer, but you don't seem to actually have connected to the first page. You're practically saying "Well met again" to a server that is seeing you for the first time. You might have saved a cookie from that first encounter, and the cookie has now been invalidated (actually been marked "evil") by some other action. At least in the beginning, always replicate the visiting sequence from a clean slate.
You might try and adapt this answer, also cURL-based. Always verify actual traffic using a MitM SSL-decoding proxy.
Now, the real answer - what do you need that information for? Can you get it somewhere else? Can you ask for it explicitly, maybe reach an agreement with the source site?
Related
I am sending a POST request in PHP via cURL to a REST API that uses XML. When I use Postman or Advanced REST Client, I get a XML response to my POST request. However, when I use PHP and cURL I do not seem able to see back the XML responses. What do I need to do to get these back? Eventually I need to retrieve a token that I can then use to process INSERT, UPDATES and GETS through this API via XML.
Here is the code that I am currently using:
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_HTTPHEADER => array(
'xxxxxx-Username: xxx',
'xxxxxx-Password: xxx',
'content-type: application/xml'
),
));
$response = curl_exec($curl);
curl_close($curl);
echo $response;
and currently I am getting a blank page. I have tried quite a few solutions, like the following
//header("Content-Type: text/xml");
//header('Content-type: application/xml');
//$decoded = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $response);
//echo $decoded;
//echo $response;
//print_r($response);
// set up your xml result
$xml = new SimpleXMLElement($response, LIBXML_NOCDATA);
// loop through the results
$cnt = count($xml->Result);
for($i=0; $i<$cnt; $i++){
echo 'XML : First Name: = ';
}
but nothing seems to give me back what I get from Postman or Advanced REST Client, which on this particular command is the following
<?xml version="1.0" encoding="UTF-8"?>
<AuthInfo>
<token/>
<AuthStatus>
<Id>503</Id>
<Description>There's no proapi manager running with the given company code: crmapp</Description>
</AuthStatus>
</AuthInfo>
I understand that at this stage there is an issue with my url that I need to fix, but I still should be able to receive that error back via XML.
Can anyone please help me get this XML response back so that I can progress my interface?
Thank you in advance,
Adri
Thanks again Professor, here is the full debug with the latest version of PHP and cUrl
Verbose debug info
* Trying xxx.xx.xxx.xxx:443...
* Connected to xxxxx-xx-xx.xxxxxxxx.com.au (xxx.xx.xxx.xxx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: D:/Adri/PHP/MoW/famac/cacert.pem
* CApath: D:/Adri/PHP/MoW/famac/cacert.pem
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=*.prontohosted.com.au
* start date: Jun 2 00:00:00 2020 GMT
* expire date: Sep 4 00:00:00 2022 GMT
* subjectAltName: host "xxxxx-xx-xx.xxxxxxxx.com.au" matched cert's "*.xxxxxxxx.com.au"
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
* SSL certificate verify ok.
> GET /xxxxx/rest/xxx.xxx/login HTTP/1.1
Host: xxxxx-xx-xx.xxxxxxxx.com.au
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.38 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.38
Accept: */*
Accept-Encoding: deflate, gzip
xxxxxx-Username: xxx
xxxxxx-Password: xxx
Content-Type: application/xml
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Tue, 09 Nov 2021 11:34:57 GMT
< Server: Apache
< Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Security-Policy: img-src 'self' *.xxxxx.net *.xxxxx.com.au https://www.google.com https://*.googleapis.com/ www.google-analytics.com stats.g.doubleclick.net http://*.xxxxx-xxxxx.com *.twitter.com *.twimg.com data: blob: https://*.google.com https://*.gstatic.com https://*.googleapis.com; frame-src * blob:; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.xxxxx.net *.xxxxx.com.au https://*.google.com www.google-analytics.com *.twitter.com *.twimg.com https://*.googleapis.com https://jawj.github.io https://*.gstatic.com; connect-src 'self' wss: blob: *.twitter.com www.google-analytics.com stats.g.doubleclick.net; base-uri 'none'; style-src 'self' 'unsafe-inline' *.twitter.com *.twimg.com https://*.google.com *.googleapis.com https://*.gstatic.com; font-src 'self' data: https://*.googleapis.com https://fonts.gstatic.com; child-src * blob:; object-src 'none'; default-src 'self' blob:
< X-Permitted-Cross-Domain-Policies: master-only
< Content-Type: text/html; charset=UTF-8
< Content-Length: 994
* The requested URL returned error: 404
* Closing connection 0
Info
stdClass Object
(
[url] => https://xxxxx-xx-xx.xxxxxxxx.com.au/xxxxx/rest/xxx.xxx/login
[content_type] => text/html; charset=UTF-8
[http_code] => 404
[header_size] => 1271
[request_size] => 350
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.232624
[namelookup_time] => 0.029367
[connect_time] => 0.05058
[pretransfer_time] => 0.162497
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => 994
[upload_content_length] => 0
[starttransfer_time] => 0.232609
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => xxx.xx.xxx.xxx
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => xxx.xxx.x.xxx
[local_port] => 52711
[http_version] => 2
[protocol] => 2
[ssl_verifyresult] => 0
[scheme] => HTTPS
[appconnect_time_us] => 162464
[connect_time_us] => 50580
[namelookup_time_us] => 29367
[pretransfer_time_us] => 162497
[redirect_time_us] => 0
[starttransfer_time_us] => 232609
[total_time_us] => 232624
)
Can you please let me know what you think of this? While I am no longer getting the previous error, I still seem unable to receive the XML response back. :(
Thank you in advance, Adri
The curl function I use is as follows. It has extra debugging information in the output and the default settings can be easily overridden at runtime by supplying a different $options argument. I'm not suggesting this is the answer but with a better set of options configured and better debug info you should get closer.
function curl( $url=NULL, $options=NULL, $headers=false ){
$cacert='c:/wwwroot/cacert.pem';
$vbh = fopen('php://temp', 'w+');
/*
Download a copy of CACERT.pem from
https://curl.haxx.se/docs/caextract.html
save to webserver and modify the $cacert variable
to suit - ensuring that the path you choose is
readable.
*/
$res=array(
'response' => NULL,
'info' => array( 'http_code' => 100 ),
'headers' => NULL,
'errors' => NULL
);
if( is_null( $url ) ) return (object)$res;
session_write_close();
/* Initialise curl request object - these should be OK as-is */
$curl=curl_init();
if( parse_url( $url,PHP_URL_SCHEME )=='https' ){
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, true );
curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 2 );
curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
curl_setopt( $curl, CURLOPT_CAPATH, $cacert );
}
/* Define standard options */
curl_setopt( $curl, CURLOPT_URL,trim( $url ) );
curl_setopt( $curl, CURLOPT_AUTOREFERER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_FAILONERROR, true );
curl_setopt( $curl, CURLOPT_HEADER, false );
curl_setopt( $curl, CURLINFO_HEADER_OUT, false );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_BINARYTRANSFER, true );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 20 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 60 );
curl_setopt( $curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.38 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.38' );
curl_setopt( $curl, CURLOPT_MAXREDIRS, 10 );
curl_setopt( $curl, CURLOPT_ENCODING, '' );
/* enhanced debug */
curl_setopt( $curl, CURLOPT_VERBOSE, true );
curl_setopt( $curl, CURLOPT_NOPROGRESS, true );
curl_setopt( $curl, CURLOPT_STDERR, $vbh );
/* Assign runtime parameters as options to override defaults if needed. */
if( isset( $options ) && is_array( $options ) ){
foreach( $options as $param => $value ) curl_setopt( $curl, $param, $value );
}
/* send any headers with the request that are needed */
if( $headers && is_array( $headers ) ){
curl_setopt( $curl, CURLOPT_HTTPHEADER, $headers );
}
/* Execute the request and store responses */
$res=(object)array(
'response' => curl_exec( $curl ),
'info' => (object)curl_getinfo( $curl ),
'errors' => curl_error( $curl )
);
rewind( $vbh );
$res->verbose=stream_get_contents( $vbh );
fclose( $vbh );
curl_close( $curl );
return $res;
}
Then, to use it:
$url='https://www.example.com/api/';
$args=array();
$headers=array(
'xxxxxx-Username: xxx',
'xxxxxx-Password: xxx',
'Content-Type: application/xml'
);
$res=curl( $url, $args, $headers );
if( $res->info->http_code==200 ){
#cool - use $res->response in further processing
print_r($res->response,true);
}else{
# useful information will be displayed here...
printf('<h1>Verbose debug info</h1><pre>%s</pre>',print_r($res->verbose,true));
printf('<h1>Info</h1><pre>%s</pre>',print_r($res->info,true));
}
update to indicate how to send POST data
You use the $options parameter to supply different runtime configuration to the curl request, like so:
$url='https://www.example.com/api/';
$args=array(
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $send_body
);
$headers=array(
'xxxxxx-Username: xxx',
'xxxxxx-Password: xxx',
'Content-Type: application/xml'
);
$res=curl( $url, $args, $headers );
I'm trying to get some data from that website: https://stubhub.com .
1- With file_get_contents:
$url= 'https://www.stubhub.com';
$html = file_get_contents($url);
echo $html;
I get:
Warning: file_get_contents(https://stubhub.com): failed to open stream: HTTP request failed! HTTP/1.0 405 Method Not Allowed
2- With CURL:
$url= 'https://www.stubhub.com';
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_REFERER, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$html = curl_exec($curl);
$response = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
var_dump($html);
var_dump($response);
But I get:
bool(false) int(0)
I tried to add some headers like User-Agent and proxy:
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
$proxy = '185.135.226.159:23500';
curl_setopt($curl, CURLOPT_PROXY, $proxy);
But again I get the same.
I have allow_url_fopen=On, So what's wrong?
function curl( $url=NULL, $options=NULL ){
$cacert='c:/wwwroot/cacert.pem'; # <----- download your own copy and configure this path
$vbh = fopen('php://temp', 'w+');
$res=array(
'response' => NULL,
'info' => array( 'http_code' => 100 ),
'headers' => NULL,
'errors' => NULL
);
if( is_null( $url ) ) return (object)$res;
session_write_close();
/* Initialise curl request object */
$curl=curl_init();
if( parse_url( $url,PHP_URL_SCHEME )=='https' ){
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, true );
curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 2 );
curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
}
/* Define standard options */
curl_setopt( $curl, CURLOPT_URL,trim( $url ) );
curl_setopt( $curl, CURLOPT_AUTOREFERER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_FAILONERROR, true );
curl_setopt( $curl, CURLOPT_HEADER, false );
curl_setopt( $curl, CURLINFO_HEADER_OUT, false );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_BINARYTRANSFER, true );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 20 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 60 );
curl_setopt( $curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36' );
curl_setopt( $curl, CURLOPT_MAXREDIRS, 10 );
curl_setopt( $curl, CURLOPT_ENCODING, '' );
curl_setopt( $curl, CURLOPT_VERBOSE, true );
curl_setopt( $curl, CURLOPT_NOPROGRESS, true );
curl_setopt( $curl, CURLOPT_STDERR, $vbh );
/* Assign runtime parameters as options */
if( isset( $options ) && is_array( $options ) ){
foreach( $options as $param => $value ) curl_setopt( $curl, $param, $value );
}
/* Execute the request and store responses */
$res=(object)array(
'response' => curl_exec( $curl ),
'info' => (object)curl_getinfo( $curl ),
'errors' => curl_error( $curl )
);
rewind( $vbh );
$res->verbose=stream_get_contents( $vbh );
fclose( $vbh );
curl_close( $curl );
return $res;
}
$url='https://www.stubhub.com/';
$res = curl( $url );
if( $res->info->http_code==200 ){
printf('<pre>%s</pre>',print_r( $res->info,true ));
printf('<pre>%s</pre>',print_r( $res->verbose,true ));
}
This will output:
stdClass Object
(
[url] => https://www.stubhub.com/
[content_type] => text/html
[http_code] => 200
[header_size] => 1304
[request_size] => 214
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.609
[namelookup_time] => 0.25
[connect_time] => 0.265
[pretransfer_time] => 0.39
[size_upload] => 0
[size_download] => 1194
[speed_download] => 1960
[speed_upload] => 0
[download_content_length] => 1194
[upload_content_length] => -1
[starttransfer_time] => 0.609
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 23.43.75.46
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => 192.168.0.56
[local_port] => 5042
)
* Trying 23.43.75.46...
* TCP_NODELAY set
* Connected to www.stubhub.com (23.43.75.46) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
CAfile: c:/wwwroot/cacert.pem
CApath: none
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=California; L=San Francisco; O=Stubhub, Inc.; OU=Technology; CN=www.stubhub.com
* start date: Jun 11 00:00:00 2018 GMT
* expire date: Jan 9 12:00:00 2020 GMT
* subjectAltName: host "www.stubhub.com" matched cert's "www.stubhub.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert ECC Secure Server CA
* SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.stubhub.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept: */*
Accept-Encoding: deflate, gzip
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: text/html
< Expires: Thu, 01 Jan 1970 00:00:01 GMT
< Cache-Control: private, no-cache, no-store, must-revalidate
< Surrogate-Control: no-store, bypass-cache
< Content-Encoding: gzip
< X-EdgeConnect-MidMile-RTT: 163
< X-EdgeConnect-Origin-MEX-Latency: 24
< X-Akamai-Transformed: 9 624 0 pmb=mTOE,1mRUM,1
< Date: Sat, 20 Oct 2018 16:25:57 GMT
< Content-Length: 1194
< Connection: keep-alive
< Vary: Accept-Encoding
< Set-Cookie: DC=lvs31;Path=/;Domain=stubhub.com;Expires=Sat, 20-Oct-2018 16:55:56 GMT;Max-Age=1800
< Set-Cookie: akacd_PCF_Prod=1540053357~rv=98~id=53e183ee10a83152497c9102c8c7dee7; path=/; Expires=Sat, 20 Oct 2018 16:35:57 GMT
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Set-Cookie: _abck=10D08E1267D29C2EDBEA32445BD116805C7A3616AB3500001557CB5B9AD22713~-1~e+BGOJkoD/UwtPOWH75YXUSo6Kzyd7sF6nTkkw89JfE=~-1~-1; expires=Sun, 20 Oct 2019 16:25:57 GMT; max-age=31536000; path=/; domain=.stubhub.com
< Set-Cookie: bm_sz=7C06CFF7557E22DEC7855EC89DF628B0~QAAQFjZ6XGg5goBmAQAAIypMkhVJRZxwtVU8097T7Q8Z2TcGPZR0XRtAVFY3TBHGsR4EW51MqZlCAyk3cMPDJEmukVvLunM36/5Kn1gtoxarUtgkqBvlfudWZBJb2xc1rHdnMhdsAXoHWLaGt0NwROSXckDe48kkqu2Kw3suRgrWcqDlj7Y1akARK8OYnoa6; Domain=.stubhub.com; Path=/; Expires=Sat, 20 Oct 2018 20:25:56 GMT; Max-Age=14399; HttpOnly
<
* Connection #0 to host www.stubhub.com left intact
To access the actual response body you would process $res->response - load it into DOMDocument or whatever you intend to do... good luck
I am trying to send the push Notification but it is getting problem
$url = 'https://android.googleapis.com/gcm/send';
$fields = array(
'registration_ids' => $id,
'data' => $load,
);
$headers = array(
'Content-Type: application/json',
'Authorization: key=' . GOOGLE_API_KEY,
);
// Open connection
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
//curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11');
// Disabling SSL Certificate support temporarly
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($fields, true));
// Execute post
$result = curl_exec($ch);
$httpcode = curl_getinfo($ch);
And when I try to see the curl info I see there is http_code 400. I did everything but still I am getting problem and push notification is not working.
Can you guys please help me ? I am stuck here.
Array
(
[url] => https://android.googleapis.com/gcm/send
[content_type] => text/plain; charset=UTF-8
[http_code] => 400
[header_size] => 406
[request_size] => 698
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.22549
[namelookup_time] => 0.028427
[connect_time] => 0.030052
[pretransfer_time] => 0.189248
[size_upload] => 382
[size_download] => 41
[speed_download] => 181
[speed_upload] => 1694
[download_content_length] => -1
[upload_content_length] => 382
[starttransfer_time] => 0.225382
[redirect_time] => 0
[certinfo] => Array
(
)
[primary_ip] => 172.217.6.234
[primary_port] => 443
[local_ip] => 162.243.229.189
[local_port] => 54327
[redirect_url] =>
)
Error 400 means Bad Request. I see in your code, in the $headers array that you have the GOOGLE_API_KEY which is not defined in your script. You also have not defined $id and $load from what I can see.
Firstly, you will need a Google Cloud Messaging API Key, which you can obtain from the Google Developer's Console.
Please note that the method which you are employing requires a client side app installed on the mobile device. An alternative would be to send a web-push notification, which pushes the notification to Chrome (or any other compatible web browser), so long as it is running and has any tab open. If this is an option you think is worth looking at, I suggest you look at the following web-push library on Git-Hub.
So I try to scrap this URL: xxxx.fr with cURL, but impossible to get access to the page HTML code, both header and body are empty.
HTTP code return is 200
I tried with other URL (different domain) and it works like a charm.
I also try with different User Agent and Referer
Do you know what is wrong ? At lest can someone try this code on your own server and let me know if you have the same issue ?
Thank you
Below is my code:
$url = 'http://www.xxxx.fr';
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: timeout=5, max=100";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = ""; // BROWSERS USUALLY LEAVE BLANK
$curl = curl_init ();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0");
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_REFERER, "http://www.google.fr");
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLINFO_HEADER_OUT, 1);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, getcwd().'/cookies.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, getcwd().'/cookies.txt');
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$curlData = curl_exec($curl);
$infos = curl_getinfo($curl);
print_r($infos);
curl_close ( $curl );
echo "<hr>Page:<br />";
echo htmlentities($curlData);
and here is the result from the print_r($infos):
Array (
[url] => http://www.xxxx.fr
[content_type] => text/html
[http_code] => 200
[header_size] => 625
[request_size] => 465
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.032535
[namelookup_time] => 0.001488
[connect_time] => 0.002581
[pretransfer_time] => 0.002639
[size_upload] => 0
[size_download] => 10234
[speed_download] => 314553
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => 0
[starttransfer_time] => 0.032088
[redirect_time] => 0
[certinfo] => Array ( )
[primary_ip] => xxx
[primary_port] => 80
[local_ip] => xxx
[local_port] => 37319
[redirect_url] =>
[request_header] => GET / HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Host: www.xxxx.fr Accept-Encoding: gzip,deflate Referer: http://www.google.fr Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Cache-Control: max-age=0 Connection: keep-alive Keep-Alive: timeout=5, max=100 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5
)
//EDIT
htmlentities($curlData) returns empty string because encoding of source is non UTF-8 string see this link
that should works:
htmlentities($curlData, ENT_QUOTES,'ISO-8859-1' );
in PHP 5.4 release, htmlspecialchars() doesn’t use ISO-8859-1 as default encoding. In fact htmlspecialchars() as of PHP 5.4 uses UTF-8. You might expect, that htmlspecialchars() would just skip non-UTF-8 byte sequences or translate them to a ‘no found’ character. In fact, htmlspecialchars() returns a blank string: No error gets generated, no errorcode gets returned, no exception gets raised, just a blank string gets returned if non-valid UTF-8 sequences get passed in
I'm trying to scrap a website but it always said that Empty Reply from server
can any one look at the code and tell me what am I doing wrong?
Here is the code
function spider($url){
$header = array(
"Host" => "www.example.net",
//"Accept-Encoding:gzip,deflate,sdch",
"Accept-Language:en-US,en;q=0.8",
"Cache-Control:max-age=0",
"Connection:keep-alive","Content-Length:725","Content-Type:application/x-www-form-urlencoded",
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
,"X-Requested-With:XMLHttpRequest"
);
$cookie = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0); // return headers 0 no 1 yes
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return page 1:yes
curl_setopt($ch, CURLOPT_TIMEOUT, 200); // http request time-out 20 seconds
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Follow redirects, need this if the URL changes
curl_setopt($ch, CURLOPT_MAXREDIRS, 2); //if http server gives redirection response
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7");
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath( $cookie)); // cookies storage / here the changes have been made
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath( $cookie));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // false for https
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,"view=ViewDistrict¶m=7&uniqueid=1397991494188&PHPSESSID=f134vrnv7glosgojvf4n1mp7o2&page=http%3A%2F%2Fwww.example.com%2Fxhr.php");
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.com/");
$data = curl_exec($ch); // execute the http request
$info = curl_getinfo($ch);
curl_close($ch); // close the connection
return $data;
}
Here is function call
echo spider("http://www.example.net/");
Edit
Array ( [url] => http://www.example.net/ [content_type] => text/html [http_code] => 301 [header_size] => 196 [request_size] => 840 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 1 [total_time] => 61.359 [namelookup_time] => 0 [connect_time] => 0.281 [pretransfer_time] => 0.281 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => 178 [upload_content_length] => 0 [starttransfer_time] => 60.593 [redirect_time] => 0.766 [certinfo] => Array ( ) [redirect_url] => ) Empty reply from server
this is the header now
also I'd updated my post data
it's now
curl_setopt($ch, CURLOPT_POSTFIELDS,"view=ViewDistrict¶m=7&uniqueid=".time(). rand(101,500)."&PHPSESSID=f134vrnv7glosgojvf4n1mp7o2&page=http%3A%2F%2Fexample.com%2Fxhr.php");
and also had removed "X-Requested-With:XMLHttpRequest" from headers
Have you tried removing this from the headers ?
X-Requested-With:XMLHttpRequest
My guess is that your problem is in this line:
curl_setopt(
$ch,
CURLOPT_POSTFIELDS,
"view=ViewDistrict¶m=7&uniqueid=1397991494188&PHPSESSID=f134vrnv7glosgojvf4n1mp7o2&page=http%3A%2F%2Fwww.example.com%2Fxhr.php"
);
Notice that you're passing a value for PHPSESSID. I'm guessing you copied & pasted a URL from a visit to the site, right? That session ID was probably valid when you visited the site, but the odds of it being valid now are pretty slim. And if the server doesn't like the session ID, chances are it's not going to give you any data.