Cannot get XML output through cURL - php

I am using PHP cURL to fetch XML output from a URL. Here is what my code looks like:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.mydomain.com?querystring');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, "username:password");
$store = curl_exec($ch);
echo $store;
curl_close($ch);
But, instead of returning the XML it just shows my 404 error page. If I type the URL http://www.mydomain.com?querystring in the web browser I can see the XML in the browser.
What am I missing here? :(
Thanks.

Some website owners check for the existence of certain things to make sure the request comes from a web browser and not a bot (or cURL). You should try adding curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'); and see if that fixes the problem. That will send a user-agent string. The site may also check for the existence of cookies or other things.
To output the XML in a web-page, you'll need to use htmlentities(). You might want to wrap it inside a HTML <pre> element as well.

Related

Difference between cURL and web browser?

I am trying to retrieve a web page from the following url:
http://www.medicare.gov/find-a-doctor/provider-results.aspx?searchtype=OHP&specgrpids=922&loc=43615&pref=No&gender=Unknown&dist=25&lat=41.65603&lng=-83.66676
It works when I paste it into a browser, but when I run it through cURL, I receive a page with the following error: "One or more query string parameters of requested url are invalid or has unexpected value, please correct and retry."
It doesn't seem to make a difference if I provide a different userAgent or referrer. There is a redirect, so I use CURLOPT_FOLLOWLOCATION.
Here is my code:
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);
echo $html;
Any thoughts on why a request like this will work in the browser and not with cURL?
Your browser is sending cookies that cURL is not. Check the cookies you are sending to the site using browser tools or Fidler - you'll need to pass the same.
The problem was with cookies. This particular site needed an ASP.NET_SessionId cookie set in order to respond. I added the following to my cURL request:
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIE, 'ASP.NET_SessionId=ho1pqwa0nb3ys3441alenm45; path=/; domain=www.medicare.gov');
I don't know if any session id will work, but it tried a couple random ones and they all worked.

cURL retrieve only URL address

Using PHP and cURL, I'd like to check if I can login to a website using the provided user credentials. For that I'm currently retrieving the entire website and then use regex to filter for keywords that might indicate the login didn't work.
The url itself contains the string "errormessage" if a wrong username/password has been entered. Is it possible to only use curl to get the url address, without the contents to speed it up?
Here's my curl PHP code:
function curl_get_request($referer, $submit_url, $ch)
{
global $cookie_path;
// sends a request via curl to the string specifics listed
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_URL, $submit_url);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_path);
return $result = curl_exec ($ch);
}
Also, if somebody has a better idea on how to handle a problem like this, please let me know!
What you should do is check the URL each time there is a redirect. Most redirects are going to be done with the proper HTTP headers. If that is the case, see this answer:
PHP: cURL and keep track of all redirections
Basically, turn off automatic redirection following, and check the HTTP status code for 301 or 302. If you get one of those, you can continue to follow the redirection if needed, or exit from there.
If instead, the redirection is happening client side, you will have to parse the page with a DOM parser.

Show curl redirection but don't follow

I have the following code:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://www.site.com/check.php?id=1");
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)");
$curlData = curl_exec($curl);
curl_close($curl);
echo $curlData;
the script on the remote site will perform a certain check, and according to the check results it redirect to a small 15x15 gif image.
At the moment I have CURLOPT_FOLLOWLOCATION, 1 which means it will follow the redirection to the gif and when I echo $curlData I get the binary code of the image which is not what I want.
Is it possible to have curl display where the script tries to redirect me without actually following the redirect? So I can tell to which gif image it redirect me to instead of echoing the gif content?
Thanks,
Easily! Don't set CURLOPT_FOLLOWLOCATION, and then read the Location header from the response.
Edit: So, a bit more detail. The headers will be the lines of the response just after the status line, separated with \r\n. You'll need to break up these lines, and look for the line prefixed with Location:. This is a string parsing exercise - nothing terribly exciting or tricky. You can use curl_getinfo with the CURLINFO_HEADER_SIZE flag to discover the total length of the header portion of the response.

php: Get url content (json) with cURL

I want to access https://graph.facebook.com/19165649929?fields=name (obviously it's also accessable with "http") with cURL to get the file's content, more specific: I need the "name" (it's json).
Since allow_url_fopen is disabled on my webserver, I can't use get_file_contents! So I tried it this way:
<?php
$page = 'http://graph.facebook.com/19165649929?fields=name';
$ch = curl_init();
//$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
//curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
?>
With that code I get a blank page! When I use another page, like http://www.google.com it works like a charm (I get the page's content). I guess facebook is checking something I don't know... What can it be? How can I make the code work? Thanks!
did you double post this here?
php: Get html source code with cURL
however in the thread above we found your problem beeing unable to resolve the host and this was the solution:
//$url = "https://graph.facebook.com/19165649929?fields=name";
$url = "https://66.220.146.224/19165649929?fields=name";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: graph.facebook.com'));
$output = curl_exec($ch);
curl_close($ch);
Note that the Facebook Graph API requires authentication before you can view any of these pages.
You basically got two options for this. Either you login as an application (you've registered before) or as a user. See the api documentation to find out how this works.
My recommendation for you is to use the official PHP-SDK. You'll find it here. It does all the session and cURL magic for you and is very easy to use. Take the examples which are included in the package and start to experiment.
Good luck.

send cURL post to be received exactly like form post

I am trying to send simple entries in a form using PHP cURL so the remote server that the entries go to receives them in exactly the same manner as if sent from the form. So far, the remote server accepts post from the form but not when sent by this PHP code. fopen and fsockopen etc. are set to off by the host (Yahoo) that I use so cURL seems the best alternative.
$URL="http://remote_server.cgi";
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.8) Gecko/20100722 Ant.com Toolbar 2.0.1 Firefox/3.6.8 ( .NET CLR 3.5.30729) AutoPager/0.6.1.22';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $URL);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $_POST);
curl_exec ($ch);
curl_close ($ch);
The remote server will not accept the entries when sent this way.
What can be done to make the entries be received the same as if sent by the form?
FORM action="http://remote_server.cgi" method="POST"
The best way to do this IMO, is to use an HTTP proxy to inspect the working request. Fiddler, Charles and Firebug will all do the trick. Look at all of the headers that are included in working submissions to see what you might be missing.
You are probably missing the referer and adding the headers from the form
Normally they check only these two
$headers[]='Content-type: application/x-www-form-urlencoded';
$referer="Form url goes here";
curl_setopt($process, CURLOPT_HTTPHEADER, $headers);
curl_setopt($s,CURLOPT_REFERER, $referer);

Categories