THIS HAS BEEN SOLVED - SEE ANSWER AT THE END OF THIS POST
I am trying to retrieve data from a remote server using PHP / cURL
If I put the following URL into a browser the data comes back correctly.
http://realm103.c7.castle.wonderhill.com/api/map.json?user%5Fid=5245274&x=375&y=375×tamp=1310554325&%5Fsession%5Fid=5b2070a46a083a33e053d60dbc2d062e&dragon%5Fheart=098d2deb0a37f18c97428d636c456572f9bade24&version=3
However when I try to access if with PHP / cURL it just times out (error code 28).
$json = curl($jsonurl, $realm['intRealmID'], $realm['strRealmServer']);
function curl($url, $realm, $realmServer){
$header = array();
$header[] = 'Host: realm'.strval($realm).'.'.$realmServer.'.castle.wonderhill.com';
$header[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$header[] = 'Accept-Language: en-us,en;q=0.5';
$header[] = 'Accept-Encoding: gzip,deflate';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Connection: keep-alive';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
return curl_exec($ch);
curl_close($ch);
}
Anybody have any ideas why it works from the browser but not via cURL? Thanks
ADDITIONAL INFO
Whilst cURL isn't working for the URL above. For the URL below it works just fine. The only difference is the server the data is being requested from. The data itself and POST is identical.
http://realm4.c5.castle.wonderhill.com/api/map.json?user%5Fid=1053774&x=375&y=375×tamp=1310616808&%5Fsession%5Fid=5b2070a46a083a33e053d60dbc2d062e&dragon%5Fheart=f35f476facab91f0e901eaf2209a0c8a9b9bedcc&version=3
ANSWER
Finally back to this and found that the referrer was the problem. The server was expecting to see no referrer in the request header. When it did the request was blocked. That behaviour probably was not consistent across all servers at the time but it is now. Removing the referrer from the request header and leaving everything else the same now works.
The biggest difference between your cURL function and requesting the information directly is the CURLOPT_HEADER property, I would first try removing this from the code.
try this
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('your url');
Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this
$userAgent = ‘Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0’;
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
Some other options I use:
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
try this:
$ctx = stream_context_create( array(
'socket' => array(
'bindto' => '192.168.0.107:0',
)
));
$c= file_get_contents('http://php.net', 0, $ctx);
Related
Actually I saw some similar cases in Stack Overflow but seems I have errors in my PHP code and can not show the page correctly. The Page I am trying to get is a resource from Pentaho BI. (Version 7.1.0.0.12). I tried many, many things, but nothing works.
Firstly I perform the authentication by 'Cookie-Based Authentication' (the method provided by Pentaho) -> Information: https://help.pentaho.com/Documentation/7.1/0R0/070/010/00A
In order to get the Cookie, I perform an HTTP POST by CURL PHP. That works well, I am able to get the Cookie from Pentaho.
Please check the code below;
$post_data['username'] = "suzy";
$post_data['password'] = "password";
foreach ($post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}
$post_string = implode('&', $post_items);
$curl_connection = curl_init('http://10.10.10.215:8080/pentaho/j_spring_security_check');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 3000 * 10);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, 'j_username=suzy&j_password=password');
curl_setopt($curl_connection, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
$result = curl_exec($curl_connection);
$sessionID = explode("=", $url);
$cookie = $sessionID[1];
So, the variable $cookie contains the SESSIONID that I should use to access to the resource from Pentaho.
And then I perform an HTTP GET by CURL PHP in order to get the page (resource) from Pentaho.
This is the part that doesn't work. Actually what can I see is that PHP is "connected" to Pentaho by the URL and the Cookie I previously requested, and also the call returns the whole page, but when I displays the page in the browser it throws a lot of errors as I said before (JS, CSS errors and more).
Please check the code bellow;
$url = "http://10.10.10.215:8080/pentaho/api/repos/%3Apublic%3ASteel%20Wheels%3ADashboards%3ACTools_dashboard.wcdf/generatedContent";
$ch = curl_init();
$headers[] = 'Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap';
$headers[] = 'Connection: Keep-Alive';
$headers[] = 'Cache-Control: no-cache';
$headers[] = 'Pragma: no-cache';
$headers[] = 'Transfer-Encoding: chunked';
$headers[] = 'Accept-Language: nl-NL,nl;q=0.8,en_US;q=0.6,en;q=0.4';
$headers[] = 'Accept-Encoding: gzip, deflate';
$headers[] = 'Accept: text/plain,* /*;q=0.01';
$headers[] = 'Content-Type: text/html; charset=utf-8';
$headers[] = 'Cookie: JSESSIONID='.$sessionID[1];
$user_agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, '');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$response = curl_exec($ch);
echo $response;die();
I'd like to clarify, I am able to get the page but with all of those erros I mentioned.
I have also tried to get this content into an IFRAME but couldn't do it. Is there any way to do it?
All information you can add is welcome! If you have some code that I can check, and more, as well.
I know this question has been dealt with on a few occasions but none of the fixes seem to work with my particular problem.
I am trying to grab any page from http://www.lewmar.com but some how they are managing to block all attempts. My latest script is as follows:
function curl_get_contents($url)
{
$ch = curl_init();
$browser_id = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0";
$ip = $_SERVER["SERVER_ADDR"];
curl_setopt($ch, CURLOPT_USERAGENT, $browser_id);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_REFERER, $ip);
$headers = array();
$headers[] = 'Cache-Control: max-age=0';
$headers[] = 'Connection: keep-alive';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$headers[] = 'Accept-Language: en-US,en;q=0.5';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$url = 'http://www.lewmar.com';
$contents = curl_get_contents($url);
echo strlen($contents);
I have tried to replicate most of the headers and the site doesn't seem to check for 'Javascript' compatibility but yet still can't get anything returned.
Does anyone have any idea how they might be recognizing cURL and blocking.
Cheers
When you first visit that site it checks to see if you have a cookie. If you don't, it will send you one and send a redirect (to the same page). You haven't got anything in your code to store cookies so you end up going round in a circle. Curl gives up after 20 redirects. Solution: enable cookies!
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies2.txt');
Onsite booking process now i am using rest api calling to get the data about booking process.But now the problem is that when I set the form url is :-
$url = 'https://book.api.ean.com/ean-services/rs/hotel/v3/res?
minorRev=99
&cid=55505
&sig=1893d9f7e3e9fbd3f8a36f43cd61287d
&apiKey=1bn8n4or4tjajq23fe4l6m18lp
&customerUserAgent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
&customerIpAddress=223.30.152.118
&customerSessionId=e80df6de9008af772cfb48a389465415
&locale=en_US
¤cyCode=USD
&hotelId=106347
&arrivalDate=10/30/2015
&departureDate=11/01/2015
&supplierType=E
&rateKey=469e1aff-49de-4944-a64d-25d96ccde3aa
&roomTypeCode=200127420
&rateCode=200706716
&chargeableRate=257.20
&room1=2,5,7
&room1FirstName=test
&room1LastName=testers
&room1BedTypeId=23
&room1SmokingPreference=NS
&email=test#yourSite.com
&firstName=tester
&lastName=testing
&homePhone=2145370159
&workPhone=2145370159
&creditCardType=CA
&creditCardNumber=5401999999999999
&creditCardIdentifier=123
&creditCardExpirationMonth=11
&creditCardExpirationYear=2015
&address1=travelnow
&city=Bellevue
&stateProvinceCode=WA
&countryCode=US
&postalCode=98004';
and when i manually posted the data it will get the response But when I am using curl to post the url which i have posted previous it will face the error.
My curl code is :-
$header[] = "Accept: application/json";
$header[] = "Accept-Encoding: gzip";
$header[] = "Content-length: 0";
$ch = curl_init();
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$verbose = fopen('php://temp', 'rw+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
$result = curl_exec($ch);
After posting data i will get the response
{"HotelRoomReservationResponse":{"EanWsError":{"itineraryId":-1,"handling":"UNRECOVERABLE","category":"EXCEPTION","exceptionConditionId":-1,"presentationMessage":"TravelNow.com cannot service this request.","verboseMessage":"Exception Caught: null"},"customerSessionId":"8ab1d482-f968-49d2-a429-a1cbab748fe5"}}
So i will get that error repeatedly. Please help me how i can find the right data.
Your problem here is that you are parsing the parameters in the url they need to be given in the body see thetop of this page: http://developer.ean.com/docs/book-reservation/examples/rest-reservation/
Not sure how you do this in PHP but you can use -d on the command line
I'm trying to download a URL : http://es.extpdf.com/nagore-pdf.html using the following code. But I'm getting statuscode as 0 in return. But when accessing it from : http://web-sniffer.net/ it shows 301 redirected. My code seems to be working fine for 301 redirected URLs too.
What could be the problem?
<?php
print disavow_download_url("http://es.extpdf.com/nagore-pdf.html");
function disavow_download_url($url) {
$custom_headers = array();
$custom_headers[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$custom_headers[] = "Pragma: no-cache";
$custom_headers[] = "Cache-Control: no-cache";
$custom_headers[] = "Accept-Language: en-us;q=0.7,en;q=0.3";
$custom_headers[] = "Accept-Charset: utf-8,windows-1251;q=0.7,*;q=0.7";
$ch = curl_init();
$useragent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1";
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
//curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);
//these two from https
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 10); //timeout in seconds
$txResult = curl_exec($ch);
$statuscode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
print "statuscode=$statuscode\n";
print "result=$txResult\n";
}
The url is accessible from USA, not from your region. It worked for the web-sniffer because their server is hosted at USA(or somewhere which region is allowed by the extpdf).
I have used an USA proxy with the curl and it returned me data.
curl_setopt($ch, CURLOPT_PROXY, "100.9.90.1:3128"); // change IP, Port
I'm doing some scraping with php. I've been extracting data including link to the next relevant page so the whole thing is automatic. The problem is that I seem to be getting a page which is slightly modified compared to what I would expect using that URL in my browser (for e.g. the dates are different).
I've tried using curl and get_file_contents but both get the wrong file.
At the moment I am using:
$url = "http://www.example.com";
$ch = curl_init();
$timeout = 5;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
url_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$temp = curl_exec($ch);
curl_close($ch);
What is going on here?
UPDATE:
I've tried imitating a browser using the following code but still unsuccessful. I find this bizarre.
function get_url_contents($url){
$crl = curl_init();
$timeout = 10;
$header=array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive',
);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($crl, CURLOPT_AUTOREFERER, FALSE);
curl_setopt ($crl, CURLOPT_FOLLOWLOCATION, FALSE);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
Further update:
Seems that the site is using my location to discriminate. Is there a locale option?
Can be many things...
Server may render pages differently based on cookies and header sent
Server may render pages differently based on existing pre-conditions and states on the server
You may have a proxy in between that modifies the content based on user-agent and since you don't have a specific user-agent (such as CURL browser) then your proxy is sending back different content
This is just a few things that could happen!