Test if string is URL encoded in PHP - php

How can I test if a string is URL encoded?
Which of the following approaches is better?
Search the string for characters which would be encoded, which aren't, and if any exist then its not encoded, or
Use something like this which I've made:
function is_urlEncoded($string){
$test_string = $string;
while(urldecode($test_string) != $test_string){
$test_string = urldecode($test_string);
}
return (urlencode($test_string) == $string)?True:False;
}
$t = "Hello World > how are you?";
if(is_urlEncoded($sreq)){
print "Was Encoded.\n";
}else{
print "Not Encoded.\n";
print "Should be ".urlencode($sreq)."\n";
}
The above code works, but not in instances where the string has been doubly encoded, as in these examples:
$t = "Hello%2BWorld%2B%253E%2Bhow%2Bare%2Byou%253F";
$t = "Hello+World%2B%253E%2Bhow%2Bare%2Byou%253F";

i have one trick :
you can do this to prevent doubly encode.
Every time first decode then again encode;
$string = urldecode($string);
Then do again
$string = urlencode($string);
Performing this way we can avoid double encode :)

Here is something i just put together.
if ( urlencode(urldecode($data)) === $data){
echo 'string urlencoded';
} else {
echo 'string is NOT urlencoded';
}

You'll never know for sure if a string is URL-encoded or if it was supposed to have the sequence %2B in it. Instead, it probably depends on where the string came from, i.e. if it was hand-crafted or from some application.
Is it better to search the string for characters which would be encoded, which aren't, and if any exist then its not encoded.
I think this is a better approach, since it would take care of things that have been done programmatically (assuming the application would not have left a non-encoded character behind).
One thing that will be confusing here... Technically, the % "should be" encoded if it will be present in the final value, since it is a special character. You might have to combine your approaches to look for should-be-encoded characters as well as validating that the string decodes successfully if none are found.

I think there's no foolproof way to do it. For example, consider the following:
$t = "A+B";
Is that an URL encoded "A B" or does it need to be encoded to "A%2BB"?

well, the term "url encoded" is a bit vague, perhaps simple regex check will do the trick
$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);

What about:
if (urldecode(trim($url)) == trim($url)) { $url_form = 'decoded'; }
else { $url_form = 'encoded'; }
Will not work with double encoding but this is out of scope anyway I suppose?

There's no reliable way to do this, as there are strings which stay the same through the encoding process, i.e. is "abc" encoded or not? There's no clear answer. Also, as you've encountered, some characters have multiple encodings... But...
Your decode-check-encode-check scheme fails due to the fact that some characters may be encoded in more than one way. However, a slight modification to your function should be fairly reliable, just check if the decode modifies the string, if it does, it was encoded.
It won't be fool proof of course, as "10+20=30" will return true (+ gets converted to space), but we're actually just doing arithmetic. I suppose this is what you're scheme is attempting to counter, I'm sorry to say that I don't think there's a perfect solution.
HTH.
Edit:
As I entioned in my own comment (just reiterating here for clarity), a good compromise would probably be to check for invalid characters in your url (e.g. space), and if there are some it's not encoded. If there are none, try to decode and see if the string changes. This still won't handle the arithmetic above (which is impossible), but it'll hopefully be sufficient.

#user187291 code works and only fails when + is not encoded.
I know this is very old post. But this worked to me.
$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);
if($is_encoded) {
$string = urlencode(urldecode(str_replace(['+','='], ['%2B','%3D'], $string)));
} else {
$string = urlencode($string);
}

send a variable that flags the decode when you already getting data from an url.
?path=folder/new%20file.txt&decode=1

In my case I wanted to check if a complete URL is encoded, so I already knew that the URL must contain the string https://, and what I did was to check if the string had the encoded version of https:// in it (https%3A%2F%2F) and if it didn't, then I knew it was not encoded:
//make sure $completeUrl is encoded
if (strpos($completeUrl, urlencode('https://')) === false) {
// not encoded, need to encode it
$completeUrl = urlencode($completeUrl);
}
in theory this solution can be used with any string that has characters that gets encoded, as long as you know part of the string (https:// in this example) will always exists in what you are trying to check.

I am using the following test to see if strings have been urlencoded:
if(urlencode($str) != str_replace(['%','+'], ['%25','%2B'], $str))
If a string has already been urlencoded, the only characters that will changed by double encoding are % (which starts all encoded character strings) and + (which replaces spaces.) Change them back and you should have the original string.
Let me know if this works for you.

I found.
The url is For Exapmle: https://example.com/xD?foo=bar&uri=https%3A%2F%2Fexample.com%2FxD
You need Found $_GET['uri'] is encoded or not:
preg_match("/.*uri=(.*)&?.*/", $_SERVER['REQUEST_URI'], $r);
if (isset($_GET['uri']) && urldecode($r['1']) === $r['1']) {
// Code Here if url is not encoded
}

private static boolean isEncodedText(String val, String... encoding) throws UnsupportedEncodingException
{
String decodedText = URLDecoder.decode(val, TransformFetchConstants.DEFAULT_CHARSET);
if(encoding != null && encoding.length > 0){
decodedText = URLDecoder.decode(val, encoding[0]);
}
String encodedText = URLEncoder.encode(decodedText);
return encodedText.equalsIgnoreCase(val) || !decodedText.equalsIgnoreCase(val);
}

Related

Cookie and string comparison won't match

I've got a problem, I store string in $_COOKIE['restaurant_name'] it stores string for example: "MMM skanu", when I try comparing them, they seem like they're different strings,
if ($_COOKIE['restaurant_name'] == "MMM skanu")
{
// always false
}
but when I for example
try to print it, with echo $_COOKIE['restaurant_name']; I see it's printing the same string "MMM skanu". I tried using strval() function, but it's still the same. How do I parse or convert this cookie to string? I can also see in my google chrome cookies, that restaurant_name = %20MMM%20skanu%20, does it have anything to do with it?
Here I'm decoding any encoding like '%20' using the inbuilt function urldecode. This function decodes encoded characters and turns them into a space charachter for example "what%20" after decoding is "what ".
Using trim I'm removing any extra space for example "%20what" after decoding becomes " what" and trim removes the space there.
$restaurant_name = trim(urldecode($_COOKIE['restaurant_name']));
if($restaurant_name == "MMM skanu"){
// do something
}

preg_replace replacing &not in string to funny character

For some reason when preg_replace sees &not in string and replaces it with ¬:
$url= "http://something?blah=2&you=3&rate=22&nothing=1";
echo preg_replace("/&rate=[0-9]*/", "", $url) . "<br/>";
But the output is as follows:
http://something?blah=2&you=3¬hing=1 // Current result
http://something?blah=2&you=3&nothing=1 // Expected result
Any ideas why this is happening and how to prevent it?
& has special meaning when used URIs. Your URI contains &not, which is a valid HTML entity on its own. It's being converted to ¬, hence causing the trouble. Escape them properly as &not to avoid this problem. If your data is fetched from elsewhere, you can use htmlspecialchars() to do this automatically.
Use this & in place of this &
because your &no has special meaning
use this url :
http://something?blah=2&you=3&rate=22&nothing=1
and then do your replace accordingly

converting url sperators with slash

I have a category named like this:
$name = 'Construction / Real Estate';
Those are two different categories, and I am displaying results from database
for each of them. But I before that I have to send a user to url just for that category.
Here is the problem, if I did something like this.
echo "<a href='site.com/category/{$name}'> $name </a>";
The URL will become
site.com/cateogry/Construction%20/%20Real%20Estate
I am trying to remove the %20 and make them / So, I did str_replace('%20', '/', $name);
But that will become something like this:
site.com/cateogry/Construction///Real/Estate
^ ^ and ^ those are the problems.
Since it is one word, I want it to appear as Construction/RealEstate only.
I could do this by using at-least 10 lines of codes, but I was hoping if there is a regex, and simple php way to fix it.
You have a string for human consumption, and based on that string you want to create a URL.
To avoid any characters messing up your HTML, or get abuses as XSS attack, you need to escape the human readable string in the context of HTML using htmlspecialchars():
$name = 'Construction / Real Estate';
echo "<h1>".htmlspecialchars($name)."</h1>;
If that name should go into a URL, it must also be escaped:
$url = "site.com/category/".rawurlencode($name);
If any URL should go into HTML, it must be escaped for HTML:
echo "<a href='".htmlspecialchars($url)."'>";
Now the problem with slashes in URLs is that they are most likely not accepted as a regular character even if they are escaped in the URL. And any space character also does not fit into a URL nicely, although they work.
And then there is that black magic of search engine optimization.
For whatever reason, you should convert your category string before you inject it as part of the URL. Do that BEFORE you encode it.
As a general rule, lowercase characters are better, spaces should be dashes instead, and the slash probably should be a dash too:
$urlname = strtr(mb_strtolower($name), array(" " => "-", "/" => "-"));
And then again:
$url = "site.com/category/".rawurlencode($urlname);
echo "<a href='".htmlspecialchars($url)."'>";
In fact, using htmlspecialchars() is not really enough. The escaping of output that goes into an HTML attribute differs from output as the elements content. If you have a look at the escaper class from Zend Framework 2, you realize that the whole thing of escaping a HTML attribute value is a lot more complicated
No, there is nothing you can do to make it easier. The only chance is to use a function that does everything you need to make things easier for you, but you still need to apply the correct escaping everywhere.
You can use a simple solution like this:
$s = "site.com/cateogry/Construction%20/%20Real%20Estate";
$s = str_replace('%20', '', $s);
echo $s; // site.com/cateogry/Construction/RealEstate
Perhaps, you want to use urldecode() and remove the whitespace afterwards?

urlencode to lower case in PHP

In PHP, when url encoding using urlencode(), the outputted characters are in upper case:
echo urlencode('MyString'.chr(31));
//returns 'MyString%1F'
I need to get PHP to give me back 'MyString%1f' for the above example but not to lower case any other part of the string. in order to be consistent with other platforms. Is there any way I can do this without having to run through the string one character at a time, working out if I need to change the casing each time?
Why would you want to do this at all? F or f, it shouldn't make any difference as percent encoding is ment to be case-insensitive. The only case I could think of would be when creating hashes, however personally I would then convert the whole string to either uppercase or lowercase, ie treat it as case-insensitive.
Anyways, if you really need to do this, then it should be relatively easy using preg_replace_callback:
$original = 'MyString%1F%E2%FOO%22';
$modified = preg_replace_callback('/%[0-9A-F]{2}/', function(array $matches)
{
return strtolower($matches[0]);
},
$original);
var_dump($modified);
This should give you:
string(18) "MyString%1f%e2%FOO%22"

Automatic addition of trailing slash to urlencoded urls

I am very confused about the following:
echo("<a href='http://".urlencode("www.test.com/test.php?x=1&y=2")."'>test</a><br>");
echo("<a href='http://"."www.test.com/test.php?x=1&y=2"."'>test</a>");
The first link gets a trailing slash added (that's causing me problems)
The second link does not.
Can anyone help me to understand why.
Clearly it appears to be something to do with urlencode, but I can't find out what.
Thanks
c
You should not be using urlencode() to echo URLs, unless they contain some non standard characters.
The example provided doesn't contain anything unusual.
Example
$query = 'hello how are you?';
echo 'http://example.com/?q=' . urlencode($query);
// Ouputs http://example.com/?q=hello+how+are+you%3F
See I used it because the $query variable may contain spaces, question marks, etc. I can not use the question mark because it denotes the start of a query string, e.g. index.php?page=1.
In fact, that example would be better off just being output rather than echo'd.
Also, when I tried your example code, I did not get a traling slash, in fact I got
<a href='http://www.test.com%2Ftest.php%3Fx%3D1%26y%3D2'>test</a>
string urlencode ( string $str )
This function is convenient when
encoding a string to be used in a
query part of a URL, as a convenient
way to pass variables to the next
page.
Your urlencode is not used properly in your case.
Plus, echo don't usually come with () it should be echo "<a href='http [...]</a>";
You should use urlencode() for parameters only! Example:
echo 'http://example.com/index.php?some_link='.urlencode('some value containing special chars like whitespace');
You can use this to pass URLs, etc. to your URL.

Categories