cUrl getting contents of url containing "ü" U+00FC %c3%bc - php

I am trying to get information about groceries, title, image, price etc.
All other URLs work fine and the cUrl response is exactly as expected.
The problem I am having is when URLs contain accented latin/non-standard url/non-english characters like ü or è.
I've tried everything I can think of, but there is probably a simply solution I am missing:
stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/g%C3%BC-lemon-pots-3x45g
stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-lemon-pots-3x45g
stringtest.php?url=http%3A%2F%2Fwww.sainsburys.co.uk%2Fshop%2Fgb%2Fgroceries%2Fdesserts%2Fg%C3%BC-lemon-pots-3x45g
This my code for testing cUrl:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<?php
$url = $_GET['url'];
echo curlUrl($url);
function curlUrl($url){
$ch = curl_init();
$timeout = 5;
$cookie_file = "/tmp/cookie/cookie1.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);
return $html;
}
?>
<form action="stringtest.php" method="get" id="process">
<input type="text" name="url" placeholder="Url" autofocus>
<input type="submit">
</form>
</body>
</html>
The result I get from cUrl is Sainsburys' 404 page claiming the page isn't found.
Copying http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-lemon-pots-3x45g from the url bar results in the URL encoded version of ü (%C3%BC) being copied, as expected. When entering the URL in the browser, ü and %C3%BC can both be used to reach the actual product page so why does Sainsburys return a 404 when cUrl'd?
I've tried various things such as urldecode(), using the exact headers the browser uses, but to no avail.

Seems like an issue with the Sainsbury website itself.
The server returns a 404 when you don't send a valid cookie.
Did you try reloading?
I tried
stringtest.php?url=http://www.sainsburys.co.uk/shop/gb/groceries/desserts/gü-chocolate-ganache-pots-3x45g
and it worked with a valid cookie.

If you try:
wget http://www.sainsburys.co.uk/shop/gb/groceries/desserts/g%C3%BC-lemon-pots-3x45g
The response is:
http://www.sainsburys.co.uk/shop/gb/groceries/bakery
Resolving www.sainsburys.co.uk (www.sainsburys.co.uk)... 109.94.142.1
Connecting to www.sainsburys.co.uk (www.sainsburys.co.uk)|109.94.142.1|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/bakery?langId=44&storeId=10151&krypto=xbYM3SJja%2F1mDOxJIVlKl9vZN6zjdlTL4MSiHOKiUMQoum9OkLwoTv6wj27CjUXwqM4%2BsteXag0O%0AQOWiHuS8onFdmoVLWlJyZ7hXaMhcMW9MIMMAsnPdWTPEzSEnOP5a&ddkey=http:AjaxAutoCompleteDisplayView [following]
--2014-10-07 11:56:11-- http://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/bakery?langId=44&storeId=10151&krypto=xbYM3SJja%2F1mDOxJIVlKl9vZN6zjdlTL4MSiHOKiUMQoum9OkLwoTv6wj27CjUXwqM4%2BsteXag0O%0AQOWiHuS8onFdmoVLWlJyZ7hXaMhcMW9MIMMAsnPdWTPEzSEnOP5a&ddkey=http:AjaxAutoCompleteDisplayView
Reusing existing connection to www.sainsburys.co.uk:80.
HTTP request sent, awaiting response... 200 OK
To follow the redirect in curl, use the -L flag:
curl -L http://www.sainsburys.co.uk/shop/gb/groceries/desserts/g%C3%BC-lemon-pots-3x45g

Related

Why do I get a 403 when sending this data to an API via POST?

We have an API that works with Bearer authentication (https://fromero-marca-blanca.deno.dev/api/cuestionario), which should return the following:
{
"res": "OK",
"payload": {
"name": "Hello"
}
}
As you can see here: https://reqbin.com/he9hier5
This is my PHP code:
$url = "http://fromero-marca-blanca.deno.dev/api/cuestionario";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$headers = array(
"Authorization: Bearer d8TPqowoqP7CGAzVCy3SJykcZ83fVWl0",
"Content-Type: application/json",
);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
$data = '{"name": "Hello"}';
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
$resp = curl_exec($curl);
curl_close($curl);
var_dump($resp);
But when I execute this file, I get:
string(430) "Redirecting you to https://fromero-marca-blanca.deno.dev:443/api/cuestionario"
And then, I get a 403 error (forbidden).
What am I doing wrong? I even have tried to copy the code that generates https://reqbin.com/ and nothing, I keep getting a forbidden.
EDIT: I have just been told by the API programmer, that I will not be able to access the service in any way from the browser, as it does not have CORS enabled. I know what CORS is, but would this prevent me from doing what I am trying to do?
I didn't have any trouble testing the request with Insomnia either, therefore the request itself is fine as verified by your own tests.
However, your problem is the actual response from the server - in your case this is:
<HTML><HEAD>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>Redirecting</TITLE>
<META HTTP-EQUIV="refresh" content="1; url=https://fromero-marca-blanca.deno.dev:443/api/cuestionario">
</HEAD>
<BODY onLoad="location.replace('https://fromero-marca-blanca.deno.dev:443/api/cuestionario'+document.location.hash)">
Redirecting you to https://fromero-marca-blanca.deno.dev:443/api/cuestionario</BODY></HTML>
and this seems to be enough to redirect with var_dump and echo - therefore I'd recommend you to use the proper URL (https) directly since it's redirected anyway to it:
$url = "https://fromero-marca-blanca.deno.dev/api/cuestionario";
This resolves your issue.
As a workaround, try adding the lines that you didn't add.
//for debug only!
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
If you want a permanent solution you should declare in your php.ini file the location of a certificate file. Read more

How to log out of a website that sends back a page to verify if I want to log out using php and curl

I am trying to logout of a website using curl. When clicked on the logout button this websites sends back a page that asks if we want to log out with two buttons "ok" and "cancel". I used curl to get this data
$headers = array(
"GET $geturl HTTP/1.1",
"Host: " . "$ip",
"User-Agent:Mozilla/5.0(X11;Linuxx86_64;rv:45.0)Gecko/20100101Firefox/45.0",
"Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*,q=0.8",
"Referer: " . $referer,
"Cookie: JSESSIONID=" . $session_id,
"Connection: keep-alive",
"Content-Length: 64",
'"etag": W/"102-1257495352000"',
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 15); //timeout after 15 seconds
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($ch, CURLINFO_REDIRECT_URL, true);
curl_setopt($ch, CURLOPT_POSTREDIR, 3);
curl_setopt($ch, CURLOPT_COOKIESESSION, $session_id);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, "logout=logout");
$result1 = curl_exec($ch1);
The output contains two forms:
<form name="myLogout" action="logout.jsp" target="main" method="post">
<input name="logout" type="hidden" value="logout">
<input class="yesno" class="button" type="Submit" value=" OK ">
</form>
</td>
<td width="20"></td>
<td>
<form action="start.jsp" target="main">
<input class="yesno" type="Submit" value=" Cancel ">
</form>
Then the session waits around 15 seconds to get a response for a post-event to the "OK" button.
Thus I am sending another post request with the same curl options as above except I changed CURLOPT_CUSTOMREQUEST from "GET" to "POST", But I am getting either null or "Bad Request".
Can someone please help with this.
Here I will do what i can to explain this.
All you have to do to submit that form or any form is to emulate what it is doing.
Ok, what do forms do?
Simply, they send a request to the server using one of 2 methods, GET or POST. GET is the same as using a URL in the browser (which is why when you submit it the query parameters change in the url and you get a new page).
So in theory you just make a POST request to logout.jsp the forms action with the data in the form logout=logout.
Think in terms of if you built that form
<form name="myLogout" action="logout.jsp" target="main" method="post">
<input name="logout" type="hidden" value="logout">
<input class="yesno" class="button" type="Submit" value=" OK ">
</form>
If I built this form, I would build a page at action="logout.jsp" I would be using the $_POST array, etc.(lets assume its PHP)
<?php
if(isset($_POST['logout'])){
session_destroy();
header('Location: www.example.com');
exti();
}else{
//some error message or redirect to 404, this should never happen.
//maybe send message to the Internet police with your IP, just kidding
}
So I would just look for the logout in the post array, then destroy the session, then redirect to my homepage.
So in curl you just need to
curl_setopt($ch, CURLOPT_COOKIESESSION, $session_id);
curl_setopt($ch, CURLOPT_URL, '{www.example.com}/logout.jsp');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'logout=logout');
And then the other standard CURL stuff.
Some things to consider:
Now this may or may not work, there are a lot of variable that I just don't know about to be able to say it will 100%. They can do things with cookies and JavaScript that is beyond what you can do with just CURL (eg. PhantomJS or headless browser scrappers). However, this HTML is pretty simple (eg. no random generated IDs) so I think it's not that advanced.
One thing to do, is in a browser, go to that page when logged in. Press f12 open the browser debug window. Find the network panel, find the record button (or persist). Logout, then inspect the request that was made to the server for that form. This is what you need to replicate.
In conclusion I done to much scrapping over the years. Now we have a .Net guy to handle it. It's a bit better suited to what we needed then PHP.

CURL not posting data to URL

I have a simple form which i want to convert into a PHP backend system.
Now this form has an action to submit to a URL - the URL only accepts an invite only when a data with the name postdata and correct information is submitted (xml which has been encoded).
Works: - Note that the input name is called postdata and the value contains the xml which has been urlencoded and it works perfectly.
<form name="nonjava" method="post" action="https://url.com">
<input name="postdata" id="nonjavapostdata" type="hidden" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;" />
<button type="submit" name="submit">Button</button>
</form>
However, i want to achieve the following by moving the postdata element within PHP as a lot of information is passed through that way.
Please note: the link is https but it wasnt working on local so i had to disable it within the CURL.
<?php
$url = 'https://super.com';
$xmlData = '<?xml version="1.0" encoding="utf-8"?>
<postdata
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<api>2</api>
</postdata>';
$testing = urlencode($xmlData);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "postdata=" . $testing);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$response = curl_exec($ch);
// Then, after your curl_exec call:
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$header = substr($response, 0, $header_size);
$body = substr($response, $header_size);
echo $header_size . '<br>';
echo htmlspecialchars($header) . '<br>';
echo htmlspecialchars($body) . '<br><br>';
$http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo "$http_status for $url <br>";
?>
I keep getting a 302 error (which is the url content is not finding the data so its redirecting to a not found page.)
302 is NOT error; it just means you need to go to the redirected URL. A browser does that automatically for you. In PHP code, you would have to do it yourself. Look at the Location header on 302.
I tried to look if there is a way that Curl could be told to follow through redirects; many language HTTP libraries support that. This thread seems to suggest that CuRL also supports it - Is there a way to follow redirects with command line cURL
Your code works as it is supposed to, I am getting POST data:
Array ( [postdata] => <?xml version="1.0" encoding="utf-8"?> <postdata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/20
01/XMLSchema"> <api>2</api> </postdata> )
as a result. The problem is on the receiving side, it isnt handling it correctly.
You need to follow the 302 redirect in curl. This is relatively easy to do by adding the location flag. curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
This is the equivalent of doing curl -L at the CLI. You can see an example here in the php documentation.

Not getting a response from web service - not sure what I'm doing wrong

This is my first shot at getting something back from a web service. What I'm expecting is something to the effect of 'Authorization Failed'. The URL is one in our test environment and the XML being sent is correct, but I'm not getting a response and don't know what I'm doing wrong.
The service is REST, the headers have to pass an encoded authorization (this example is correct) and the content type is set as xml.
When I use the same parameters to test it in the Advanced Rest Client in Chrome it connects and gives me a response.
Also, if there's a better way to create the XML, I'm all for that - this is just an example I found and started with. Code is below
<?php
if (!isset($_POST['firstname'])) {
?>
<form name="ppost" method="post" action="<?=$_SERVER['PHP_SELF']?>">
<input type="text" name="firstname" />
<input type="submit" name="SUbmit" value="Submit" />
</form>
<?php
} // end if, form not posted
else {
extract($_POST);
$inputdata = '
<ReqGetWebUserInfo>
<OrgId>598</OrgId>
<OrgUnitId>598</OrgUnitId>
<MasterCustomerId>'.$firstname.'</MasterCustomerId>
<SubCustomerId>0</SubCustomerId>
</ReqGetWebUserInfo>';
echo '<pre>'.$inputdata.'</pre>';
$url = "https://gsusacustom.ebiz.uapps.net/GSUSARestWebService/PersonifyWcfSvc.svc/GetWebUserInfo";
$headers = array(
'Authorization: Basic dG1hc2d1bmRhbTpwYXNzd29yZDE=',
'Content-Type: application/xml;charset=utf-8',
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url ); // THE URL TO FETCH - CAN ALSO BE SET IN THE CURL_INIT
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0); // DON'T INLCUDE HEADER IN THE OUTPUT
curl_setopt($ch, CURLOPT_POST, 1); // TRUE FOR A REGULAR HTTP POST
curl_setopt($ch, CURLOPT_POSTFIELDS, $inputdata); // THE DATA POST FROM THE FORM
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close ($ch);
print $response;
} // end else, form submitted and processed
?>
Thanks in advance.
My guess is you didn't handle the https connection as your has it. Try this option:
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
If the error still persists then do the following to dig more.
Run the code with enabling all error reporting:
error_reporting(E_ALL);
Run the curl with enabling debugging option:
curl_setopt($ch, CURLOPT_VERBOSE, 1);

Grabbing HTML From a Page That Has Blocked CURL

I have been asked to grab a certain line from a page but it appears that site has blocked CURL requests?
The site in question is http://www.habbo.com/home/Intricat
I tried changing the UserAgent to see if they were blocking that but it didn't seem to do the trick.
The code I am using is as follows:
<?php
$curl_handle=curl_init();
//This is the URL you would like the content grabbed from
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0");
curl_setopt($curl_handle,CURLOPT_URL,'http://www.habbo.com/home/Intricat');
//This is the amount of time in seconds until it times out, this is useful if the server you are requesting data from is down. This way you can offer a "sorry page"
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
$buffer = curl_exec($curl_handle);
//This Keeps everything running smoothly
curl_close($curl_handle);
// Change the message bellow as you wish, please keep in mind you must have your message within the " " Quotes.
if (empty($buffer))
{
print "Sorry, It seems our weather resources are currently unavailable, please check back later.";
}
else
{
print $buffer;
}
?>
Any ideas on another way I can grab a line of code from that page if they've blocked CURL requests?
EDIT: On running curl -i through my server, it appears that the site is setting a cookie first?
You are not very specific about the kind of block you're talking. The website in question http://www.habbo.com/home/Intricat does first of all check if the browser has javascript enabled:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<script type="text/javascript">function setCookie(c_name, value, expiredays) {
var exdate = new Date();
exdate.setDate(exdate.getDate() + expiredays);
document.cookie = c_name + "=" + escape(value) + ((expiredays == null) ? "" : ";expires=" + exdate.toGMTString()) + ";path=/";
}
function getHostUri() {
var loc = document.location;
return loc.toString();
}
setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '179.222.19.192', 10);
setCookie('DOAReferrer', document.referrer, 10);
location.href = getHostUri();</script>
</head>
<body>
<noscript>This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your
browser.
</noscript>
</body>
</html>
As curl has no javascript support you either need to use a HTTP client that has -or- you need to mimic that script and create the cookie and new request URI your own.
go in with your browser and copy the exact headers that are being send,
the site won't be able to tell that your are trying to curl because the request will look exactly the same.
if cookies are used - attach them as headers.
This is a cut and paste from my Curl class I did quite a few years back, hope you can pick some gems out of it for yourself.
function get_url($url)
{
curl_setopt ($this->ch, CURLOPT_URL, $url);
curl_setopt ($this->ch, CURLOPT_USERAGENT, $this->user_agent);
curl_setopt ($this->ch, CURLOPT_COOKIEFILE, $this->cookie_name);
curl_setopt ($this->ch, CURLOPT_COOKIEJAR, $this->cookie_name);
if(!is_null($this->referer))
{
curl_setopt ($this->ch, CURLOPT_REFERER, $this->referer);
}
curl_setopt ($this->ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt ($this->ch, CURLOPT_HEADER, 0);
if($this->follow)
{
curl_setopt ($this->ch, CURLOPT_FOLLOWLOCATION, 1);
}
else
{
curl_setopt ($this->ch, CURLOPT_FOLLOWLOCATION, 0);
}
curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($this->ch, CURLOPT_HTTPHEADER, array("Accept: text/html,text/vnd.wap.wml,*.*"));
curl_setopt ($this->ch, CURLOPT_SSL_VERIFYPEER, FALSE); // this line makes it work under https
$try=0;
$result="";
while( ($try<=$this->retry_attempts) && (empty($result)) ) // force a retry upto 5 times
{
$try++;
$result = curl_exec($this->ch);
$this->response=curl_getinfo($this->ch);
// $response['http_code'] 4xx is an error
}
// set refering URL to current url for next page.
if($this->referer_to_last) $this->set_referer($url);
return $result;
}
I know this is a very old post, but since I had to answer myself the same question today, here I share it for people coming, it may be of use to them. I'm also fully aware the OP asked for curl specifically, but --just like me-- there could be people interested in a solution, no matter if curl or not.
The page I wanted to get with curl blocked it. If the block is not because javascript, but because of the agent (that was my case, and setting the agent in curl didn't help), then wget could be a solution:
wget -o output.txt --no-check-certificate --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" "http://example.com/page"

Categories