How to properly sanitize URL as a link in PHP? - php

I have a site where users can share a link to their homepage such as http://example.com/user. Currently, I am using the PHP function filter_var($_POST['url'], FILTER_VALIDATE_URL) to validate the URL before adding it to database using prepared statement.
However, I realize that the PHP filter function accepts input such as http://example.com/<script>alert('XSS');</script> which could be used for cross-site scripting. To counter that, I use htmlspecialchars on the URL within the <a> tag and rawurlencode on the href attribute of the tag.
But rawurlencode causes the / in the URL to be converted to %2f, which makes the URL unrecognizable. I am thinking of doing a preg_replace for all %2f back to /. Is this the way to sanitize the URL for display as a link?

This is outdated now :
I am using the PHP function filter_var($_POST['url'],
FILTER_VALIDATE_URL) to validate the URL before adding it to database
using prepared statement.
Instead of FILTER_VALIDATE_URL
you can use the following trick :
$url = "your URL"
$validation = "/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i";
if((bool)preg_match($validation, $url) === false)
echo 'Not a valid URL';
I think it may works for you. All the best :)

After sanitizing, the URL Use, XSS related scripts, just change %2f character using
str_replace('%2f', '/', $result) after your your code, but before the filter_var() and it will change it back to its original character. So, your script can go on.

Do not allow urls with tags.
A user inserting a tag to a url means its probably malicious.
Having "homepages" containing tags is just wrong.

Related

prevent xss attack via url( PHP)

I am trying to avoid XSS attack via url
url :http://example.com/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
I have tried
var_dump(filter_var('http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29', FILTER_VALIDATE_URL));
and other url_validation using regex but not worked at all.
above link shows all the information but my css and some java script function doesn't work.
please suggest the best possible solution...
Try using FILTER_SANITIZE_SPECIAL_CHARS Instead
$url = 'http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29';
// Original
echo $url, PHP_EOL;
// Sanitise
echo sanitiseURL($url), PHP_EOL;
// Satitise + URL encode
echo sanitiseURL($url, true), PHP_EOL;
Output
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/"ns="alert(0x0000DC)
http%3A%2F%2F10.0.4.2%2FonlineArcNew%2Fhtml%2Fterms_conditions_1.php%2F%26%2334%3Bns%3D%26%2334%3Balert%280x0000DC%29
Function Used
function sanitiseURL($url, $encode = false) {
$url = filter_var(urldecode($url), FILTER_SANITIZE_SPECIAL_CHARS);
if (! filter_var($url, FILTER_VALIDATE_URL))
return false;
return $encode ? urlencode($url) : $url;
}
If you're using MVC, then try to decode all ofthe values before routing, and use stript_tags() to get rid of these nasties. And as the docs say, case should not impact anything.
If not, create a utility function and do the same while retrieveing the variables from the URI. But I am by no means an XSS expert, so this might be just a part of the trick.
From Janis Peisenieks
Step 1: Escape Output Provided by Users
If you want to include data within a page that’s been provided by users, escape the output. And, in this simplified list, we’re going to stick with one simple escape operation: HTML encode any <, >, &, ‘, “. For example, PHP provides the htmlspecialchars() function to accomplish this common task.
Step 2: Always Use XHTML
Read through OWASP’s XSS prevention strategies, and it becomes apparent that protecting against injection requires much more effort if you use unquoted attributes in your HTML. In contrast, in quoted attributes, escaping data becomes the same process needed to escape data for content within tags, the escape operation we already outlined above. That’s because the only troublemaker in terms of sneaking in structurally significant content within the context of a quoted attribute is the closing quote.
Obviously, your markup doesn’t have to be XHTML in order to contain quoted attributes. However, shooting for and validating against XHTML makes it easy to test if all of the attributes are quoted.
Step 3: Only Allow Alphanumeric Data Values in CSS and JavaScript
We need to limit the data you allow from users that will be output within CSS and Javascript sections of the page to alphanumeric (e.g., a regex like [a-zA-Z0-9]+) types, and make sure they are used in a context in which they truly represent values. In Javascript this means user data should only be output within quoted strings assigned to variables (e.g., var userId = “ALPHANUMERIC_USER_ID_HERE”;.) In CSS this means that user data should only be output within the context for a property value (e.g., p { color: #ALPHANUMERIC_USER_COLOR_HERE;}.) This might seem Draconian, but, hey, this is supposed to be a simple XSS tutorial
Now, to be clear, you should always validate user data to make sure it meets your expectations, even for data that’s output within tags or attributes, as in the earlier examples. However, it’s especially important for CSS and JavaScript regions, as the complexity of the possible data structures makes it exceedingly difficult to prevent XSS attacks.
Common data you might want users to be able supply to your JavaScript such as Facebook, Youtube, and Twitter ID’s can all be used whilst accommodating this restriction. And, CSS color attributes and other styles can be integrated, too.
Step 4: URL-Encode URL Query String Parameters
If user data is output within a URL parameter of a link query string, make sure to URL-encode the data. Again, using PHP as example, you can simply use the urlencode() function. Now, let’s be clear on this and work through a couple examples, as I’ve seen much confusion concerning this particular point.
Must URL-encode
The following example outputs user data that must be URL-encoded because it is used as a value in the query string.
http://site.com?id=USER_DATA_HERE_MUST_BE_URL_ENCODED”>
Must Not URL-Encode
The following example outputs the user-supplied data for the entire URL. In this case, the user data should be escaped with the standard escape function (HTML encode any <, >, &, ‘, “), not URL-encoded. URL-encoding this example would lead to malformed links.

Check for multiple patterns with preg_replace?

I am looking for a solution for validating submitted posts. I want to check if someone submits (within the post):
An Iframe for embedding YouTube or Vimeo video, replacing the correct width used in the Iframe
An URL, replaced by a HTML clickable link
An image URL, replacing it by a HTML
I was able to find the correct regex's for each of these requirements, but using 3 seperate preg_replace functions causes interference. For example, detecting an URL will also detect the URL inside the Iframe.
I have searched for a solution on this, both on Stackoverflow as on the rest of the internet. But I am not an expert, so perhaps someone could help me out or direct me to the right tutorial/website/how-to...
What you can do is first match the iframes with preg_match, and then replace them with a placeholder.
Then you can do the replacements for urls/images. Then, replace the iframe placeholders back with the iframes you matched earlier.
You can generate unique sequential placeholders by using preg_replace_callback, so that you get to run some code to increment a $placeholder_id for each replacement.
This is a general strategy that can often greatly simplify complex parsing.
You can simply pass and array of URL patterns to preg_replace() like this:
$pattern_array = array(
'/somepattern/',
'/someotherpattern/',
'/yetanotherpattern/',
)
$replacement_array = array(
'somereplacement',
'someotherreplacement',
'yetanotherreplacement'
}
$result = preg_replace($pattern_array, $replacement_array, $subject_string);

Do I need to encode my $_GET URL?

When I first submit my search form via $_GET it returns results as expected but when using pagination and submitting it again for page X I see that it converts a portion of my URL and fails.
Here is the before and after URL portion that is changing:
// Before
min_score=1&max_score=10&not_scored=1
// After
min_score=1&max_score=10%AC_scored=1
It's encoding 10& How can I prevent this from happening?
The reason is that &not gets intepreted by the browser as ¬. Strict mode or any DOCTYPE might help.
And ¬ simply gets substituted as ¬ then. Which in turn becomes %AC in request urls.
Besides urlencode() on the individual values you should additionally apply htmlspecialchars() on the whole URL before you add it into the <a> tag.
always type urls with
&
instead of &...

wrong url extraction

I am getting input from user on my site through a text area. the input may contain <a> TAG.
I want to extract the url from the input.
$res = get_magic_quotes_gpc() ? stripslashes($data) : $data;
$res = mysql_real_escape_string($res); // php 4.3 and higher
preg_match('#href\s*?=\s*?[\'"]?([^\'"]*)[\'"]?#i', $res, $captures);
$href = $captures[1];
example
if Input sting is this?
$data = 'any string Any Anchor';
the extracted output becomes
"\"http://www.example.com""
i checked the output after each line, 2 double quotes comes after
mysql_real_escape_string($res);
mysql_real_escape_string should only AND ALWAYS be used when passing user values into MySQL queries. Don't use it for anything else, use the right escaping function for the right task.
Here, I don't think you need to use an escape function at all. Your regular expression looks fine, I'm confident it will work if you remove the escaping function.
Also, don't use get_magic_quotes_gpc if you can avoid it. I could explain why but I suppose the fact that it's been deprecated since PHP5.0 is evidence enough. If your host does not allow you to disable it I would consider switching to a more savvy host.
Why don't you try processing the input using XPath to find the a elements and then extract the href attribute value. I did something similar and used XPath in order to process input and it worked a treat. Saves you having to write very complex regex expressions if you would like to account for other tags later on.
Hope this helps.

URL with query string validation using PHP

I need a PHP validation function for URL with Query string (parameters seperated with &). currently I've the following function for validating URLs
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
echo preg_match($pattern, $url);
This function correctly validates input like
google.com
www.google.com
http://google.com
http://www.google.com ...etc
But this won't validate the URL when it comes with parameters (Query string). for eg.
http://google.com/index.html?prod=gmail&act=inbox
I need a function that accepts both types of URL inputs. Please help. Thanks in advance.
A simple filter_var
if(filter_var($yoururl, FILTER_VALIDATE_URL))
{
echo 'Ok';
}
might do the trick, although there are problems with url not preceding the schema:
http://codepad.org/1HAdufMG
You can turn around the issue by placing an http:// in front of urls without it.
As suggested by #DaveRandom, you could do something like:
$parsed = parse_url($url);
if (!isset($parsed['scheme'])) $url = "http://$url";
before feeding the filter_var() function.
Overall it's still a simpler solution than some extra-complicated regex, though..
It also has these flags available:
FILTER_FLAG_PATH_REQUIRED FILTER_VALIDATE_URL Requires the URL to
contain a path part. FILTER_FLAG_QUERY_REQUIRED FILTER_VALIDATE_URL
Requires the URL to contain a query string.
http://php.net/manual/en/function.parse-url.php
Some might think this is not a 100% bullet-proof,
but you can give a try as a start

Categories