I tried to use file_get_contents and cURL to get the content of an website, I also tried to open the same site using Lynx and could not get the content. I got a 406 Not Acceptable, it seems that the site checks if I'm using a browser. Is there a work around?
It probably expects the user agent to be a web browser. You can set this easily using cURL:
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
Where $useragent is the string you want to use for a user agent. Try it with some common ones for the major browsers and see if that helps. This page lists some common user agents.
//make a call the the webpage to get his handicap
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.golfspain.com/portalgolf/HCP/handicap_resul.aspx?sLic=CB00693474");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt($ch, CURLOPT_REFERER, "http://google.com" );
curl_setopt($ch, CURLOPT_HEADER, TRUE );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13');
$header = array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language: en-us;q=0.8,en;q=0.6'
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$html = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
Maybe you have to set some more HTTP headers like a 'real' browser. With cURL:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13');
$header = array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language: en-us;q=0.8,en;q=0.6'
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
Related
I'm trying load facebook content with CURL, but the system is showing the login page instead of my timeline (also, I'm already signed in). Is something related to cookies? What I'm doing wrong? There's no errors in this code...
$ch = curl_init('https://www.facebook.com');
curl_setopt($ch, CURLOPT_POST, true );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
$header = array();
$header[] = "Accept-Language: pt-br,pt;q=0.5";
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_HEADER, false );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_VERBOSE,true);
$data = curl_exec( $ch );
echo $data;
I hope you help me! Thank you
php is server side and your server run your php code and after that return the output to browser. you need use in iframe for example for show facebook timeline.
i am having a problem with PHP file_get_contents.i am trying to fetch inforamtion following url but is getting captcha page.
$link = 'http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5';
$Page_information = file_get_contents($link);
print_r($Page_information);
Also i am trying to get page information using php curl but same captcha page is display.
$cookie='cookie.txt';
if(!file_exists($cookie)){
$fh = fopen($cookie, "w");
fwrite($fh, "");
fclose($fh);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, "http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5");
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIE,1);
curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$result11 = curl_exec($ch);
print_r($result11);
If you analyze the headers from a browser where cookies and javascript are disabled you should see the bare minimum sent - some, perhaps all might be required and are set with the context argument.
/* set the options for the stream context */
$args=array(
'http'=>array(
'method' => "GET",
'header' => array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Host: www.wayfair.com',
'Accept-Encoding: gzip, deflate'
)
)
);
/* create the context */
$context=stream_context_create( $args );
$link = 'http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5';
/* Get the response from remote url */
$res = file_get_contents( $link, FILE_TEXT, $context );
/* process the response */
print_r( $res );
$url = "http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5";
$cookie = getcwd().DIRECTORY_SEPARATOR.'cookie.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIE,1);
curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
//added
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36");
$result11 = curl_exec($ch);
print_r($result11);
try this
Even though I have set curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true) cURL doesn't want to follow redirects, it only shows the "301 Moved page". Tried it with multiple sites.
Strange thing is that it works on localhost, but when I upload it to my webspace then refuses to work.
Is it possible that my web hosting provider made some tweaks that it doesn't work? Never seen such thing :(
Here's the code:
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://google.com');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Connection: keep-alive'
));
$result = curl_exec($ch);
curl_close($ch);
I had a similar issue and it was due to cURL executing a GET immediately after receiving the redirect header. To fix this i specified CURLOPT_CUSTOMREQUEST
Example:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
URL:
You can see the url in Here (I put the url in the pastebin because the url is quite long).
Curl & Header :
$header=array();
$header[]="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[]="Accept-Encoding: gzip, deflate";
$header[]="Accept-Language: en-US,en;q=0.5";
$header[]="Connection: keep-alive";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7');
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_exec($ch);
Result:
Error 400--Bad Request
From RFC 2068 Hypertext Transfer Protocol -- HTTP/1.1:
10.4.1 400 Bad Request
The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications."
The Browser when go to the URL directly without curl:
Displayed Nicely.
There are problems with your URL, chances are it was computed wrong.
If you're generating that long URL from your script, make sure it's the right one.
The reason is that if you try deleting stuff, let's say you end up with https://wftc3.e-travel.com/plnext/garuda-indonesia/Override.action, you will see that accessing this page ends up in a 400 error.
I hope this helps.
/edit: this works, so it's probably $url.
<?php
$url = "https://wftc3.e-travel.com/plnext/garuda-indonesia/Override.action?SITE=CBEECBEE&LANGUAGE=ID&EMBEDDED_TRANSACTION=FlexPricerAvailability&ENCT=1&ENC=BE37D8CC9CE37D25AB7C16ED9DC32B9AB70052C76CC99D6460FB6990C37619D08A86D19D12DB2E39B1C2572C7C97A890E4D0CA079A35CB075FC284C469128F210A361D6121DEF1E64C3E153CCF855158302AA41136F317A8F143E4A2DDFEF68DB413BD337613EF92A4809626A4E3CB107A5666859C612C388539292CD16A1FF421C143F7D74A504845EBC98B1476E79EB32DFB32E46B43ADF0B514FB472B0258C41F696441043714660B3493F3395E8329B93A0C4E7EA3E1E466025EAE4AE2562754B6324C4C4CFEDC3CA4548A17AAFA500FC0A331F1FB1B770FE91E31B88D5C391DA66B00A5AD8F83D02BBB962F35B6BAAAAE34984EF07693352467B3AC7C1F62D8AA70B791C71CAC7AD4E4744563A096471F0CF79FF28425EFD39C2A58D0D52F279632268E3FBB1217DB8DF5A61181D466B62CA13CC7044EC9E90A550DC3CFD3A28EBC4FB9FC451D0C34BA94E7CF46B5FF1C9E1ADECA8EAE477B1112AA911E8826C07311B033F5D3AD39F26814A7399072805235049E856C9BC9E9C0819AC596471F0CF79FF28425EFD39C2A58D0D5A7B93A6BBADB799F3B3B95A975D76E523BD3C538C3B91D308FC57D8F84ACD46D57A25DAD2528B4486D0DC651B85D1DD27680F4762A813920C0D7DBF02676A659F6479A9F3A48B7202F10A44379467113DC817B3F3908F5F21D13389934D53CCDC787523D2953A5401152E090735051220AEA4FC852F0A20BCD957F8F2BDA35C0C9AC95AF6075C2923C1FA881D3D3C0484BA6740A4CCF8CBB8FB1E0C2FC9C1B0E35764D079EA758ED28405DDE81CB538B69A5C75758DA45C03F45EEF0D75B6E714ABED40ED5E467D99F4CBBFEAAAC28EB7BD54AA4E28454445AE822E53EDC3DAD7634FDBC01F64DB410A450D54EECBE4F9BB8BB8FA9E2B8CFE4CD88019D3BCAE97A041BDB6C72AD195FB68212FEC44CE587C5CA13B74686FD62FEE6AA43C4FA3DE765A4EAA2034043E2CB24F5BD0B48F771F51FDD8197D0C0B6DF85DD8D5EBED594F80B56C7963080333C519C67B88961921D48431AF465CA9A94060E8DC600EE3BDBAFEB22FBE9E105C6F386A0580A6F6E7FC39BC0D3F12A253DD73581FC7C6FA4EF58293CEE6869B817EE1F1D4801A3282C9F857924158FB6D2EFC7A02057155B69A8271F69B754AE7D978062AD01AC449A3D598CFB37921ECC4932CFE4A19A891C29A0B1C234E3950520529F97DDD2FA5793ADEF0FC1D327C3E38C77455FF12AF99DF582CB6BE66F9FA601DF00AD3EBE281CCC3B9BA63A47860E793A6D5002486A06345EB691B2521491F8694797EE0EFEC1C90C082B815D23EE0E46E4CA6A9EB06EB1483FC07C1D7B17818AA8B20F16223C113ACBB81B628BE6EDDA4E96D559E7A7BA9A1BCD31FB3FEEAE509704B54B426646A42CA6F7C75E85BBB32FA49E60102E76D13F7961343025E44CF14705EF7424EC3578B294BF87D34DB49040CBCEABC06B466033A4AB5BEF9660F69B68BFB71206446F8A8EDECD068C8EAE159840BE226495914996D001BC6872525FB8D5A43A545CCF106EA9E823CEEE64F6955AAF3340E15DE72ED4D1865D63C9D85C3B0CC627381311163D08103D86C0392C1FDCD7065892EF3519C6A802940125B7D6C167C10E3D4750BC762DE1F10A15C0C8FD23A77E1E1310AE9CE073CA809773C21794EC1F190E868C513B83CB35EDBDDC31297078D472BE9A37C2F70A1BE31C0A5042E89214851AC675AEB34690ADED5AFA187CE56AE5D3270B07B6986EF5FEF05AB2C4BF44F5281CC3779E98EAE5F090AA07928D4FFEC8A893799F5BB3BA57ED47422E7532DB4F570F7E2CF8F2C9FF76CE87DBEA84738C535794EFD373A1080D7E12CB3B7C37AA566D663E54CCCBDCFE9E7970B61AB40F02A528F2107E9DEBD6B0795D766FB16AA71E0BF1091F2F897BBE39B1E11B3B610B5DF0CF98ABDE6A9B1D5C5784144D68A4629FDD409B7D6349D888162741633A718ED89B555EB147B67A79B06055E02BFBCEC56CD768BEFD38391AE4B7F13CD3AC6D9FADE73C1C2E313B83FDE3FAB3D60BE111D43EFA7565D5614427F7F0CBD7913C0E9496FA2978868C1D983C14212C987D6B0E38BEF1701B4120E6147BB88E776E1C05574475A7E44F4D11963189DB5BA6EEDB6E514D543BA8CA23A216AF3C5E876E99BCBD46F3B066A5BCE4FDBAA0CA012DFEF2A256652B8DE8AF04A0C27E58379BB1768602DBE55717B38AF3EDB8570FD4A9CC80D7D27A41AA2AF727C833A46583C3955E5BD0CE289BAF1F9AFD9415619A00EE2E965A46AE7891A4F3A303F5E44183DD542F13";
$header=array();
$header[]="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[]="Accept-Encoding: gzip, deflate";
$header[]="Accept-Language: en-US,en;q=0.5";
$header[]="Connection: keep-alive";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7');
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
$x = curl_exec($ch);
die(($x));
THIS HAS BEEN SOLVED - SEE ANSWER AT THE END OF THIS POST
I am trying to retrieve data from a remote server using PHP / cURL
If I put the following URL into a browser the data comes back correctly.
http://realm103.c7.castle.wonderhill.com/api/map.json?user%5Fid=5245274&x=375&y=375×tamp=1310554325&%5Fsession%5Fid=5b2070a46a083a33e053d60dbc2d062e&dragon%5Fheart=098d2deb0a37f18c97428d636c456572f9bade24&version=3
However when I try to access if with PHP / cURL it just times out (error code 28).
$json = curl($jsonurl, $realm['intRealmID'], $realm['strRealmServer']);
function curl($url, $realm, $realmServer){
$header = array();
$header[] = 'Host: realm'.strval($realm).'.'.$realmServer.'.castle.wonderhill.com';
$header[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$header[] = 'Accept-Language: en-us,en;q=0.5';
$header[] = 'Accept-Encoding: gzip,deflate';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Connection: keep-alive';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
return curl_exec($ch);
curl_close($ch);
}
Anybody have any ideas why it works from the browser but not via cURL? Thanks
ADDITIONAL INFO
Whilst cURL isn't working for the URL above. For the URL below it works just fine. The only difference is the server the data is being requested from. The data itself and POST is identical.
http://realm4.c5.castle.wonderhill.com/api/map.json?user%5Fid=1053774&x=375&y=375×tamp=1310616808&%5Fsession%5Fid=5b2070a46a083a33e053d60dbc2d062e&dragon%5Fheart=f35f476facab91f0e901eaf2209a0c8a9b9bedcc&version=3
ANSWER
Finally back to this and found that the referrer was the problem. The server was expecting to see no referrer in the request header. When it did the request was blocked. That behaviour probably was not consistent across all servers at the time but it is now. Removing the referrer from the request header and leaving everything else the same now works.
The biggest difference between your cURL function and requesting the information directly is the CURLOPT_HEADER property, I would first try removing this from the code.
try this
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('your url');
Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this
$userAgent = ‘Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0’;
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
Some other options I use:
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
try this:
$ctx = stream_context_create( array(
'socket' => array(
'bindto' => '192.168.0.107:0',
)
));
$c= file_get_contents('http://php.net', 0, $ctx);