This is my first post at stackoverflow.
I need to ask few simple :D questions related to PHP sanitizing inputs and really grateful for anyone who could assist me :)
1)Ok, well when I run get_magic_quotes_gpc() it returns false. Which means magic quotes are turned off. is this correct?
2) Should I sanitize any user entered string using stripslashes(),htmlentities() & strip_tags() when magic quotes are turned off?
3) Even though magic quotes are turned off when I enter characters such as \ or some other character my program has no ability to avoid them. Why is that?
4) Then I modified my program to call a function to clean the string before it is processed. Even though the string is cleaned it still shows those unwanted characters. is there anything wrong in sanitizeString() function
Below is my code, related to question 3)
(The program is supposed to convert Fahrenheit into Celsius or vice versa )
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<form action="TemperatureConverter.php" method="post">
<label>Fahrenheit</label><input type="text" name="f" size="10"/><br>
<label>Celsius</label><input type="text" name="c" size="10"/><br>
<input type="submit" name="submit" value="SUBMIT">
</form>
</body>
</html>
<?php
$f='';
$c='';
if(isset($_POST["f"])){
$f= sanitizeString($_POST["f"]);
}
if(isset($_POST["c"])){
$c=sanitizeString($_POST["c"]);
}
if($f!=""){
$c=(5/9)*($f-32);
echo $f.' Fahrenhite is equal to '.$c.' Celsius ';
}
else if($c!=""){
$f=$c*(9/5)+32;
echo $c.' Celsius is equal to '.$f.' Fahrenhite ';
}
function sanitizeString($str){
$str= stripslashes($str);
$str= htmlentities($str);
$str= strip_tags($str);
return $str;
}
I guess I have posted my code properly which adheres to rules of stackoverflow. If not sorry. :(
In your example as you know the input to be a number, it would be best to simply check for that, rather than attempting to add additional filtering.
for example,
if(isset($_POST["f"])){
$inFahrenhite = trim($_POST['f']); // remove any leading/trailing spaces
if (is_numeric($inFahrenhite)) $f = $_POST['f'];
}
The above code validates that the input is numeric. Since you are expecting a number anything else is invalid and can be ignored.
Other questions.
Yes, it means the settings is turned off.
All filters are not required. There is no need to allow html values if the input should be a number. Using http://www.php.net/manual/en/book.filter.php would be a start.
Magic Quotes only escapes certain characters. The settings is to be deprecated, so you should avoid using it.
These functions only work to ensure that the characters are escaped properly. For example, an & would get converted to &. There is still an & there, but it now has a different purpose.
There're endless poorly written outdated PHP tutorials out there that basically suggest that sanitization is a magic process that automatically fixes your data to avoid any potential vulnerability. Many developers accept that as a fact and apply the recommended functions without even looking up in the documentation to find out what they really do. As a result, they not only write vulnerable applications but they corrupt legitimate user data in the process.
My advice:
Read the docs for any function you use for the first time
Understand what problem you need to solve
Think whether the function does something to solve that problem
For instance:
strip_tags — Strip HTML and PHP tags from a string
Example #1 strip_tags() example
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
You have a temperature conversion tool. Does it make the sense to remove HTML tags from Fahrenheit degrees?
But imagine you have a site to post HTML snippets. Now you have HTML, it make sense to use HTML functions on it, doesn't it? But, why would you want to remove HTML from a HTML snippet? You'd make your site useless! The problem you need to solve is to inject those snippets into the site and get them displayed as raw HTML rather than getting rendered. To do so you need to e.g. convert every < symbol into <.
Related
I'm trying to output the name of a project i.e. "David's Project" in a form, if a user does not correctly input all data in the form, to save the user having to input the name again.
If I var_dump $name I see David's project. But if I echo $name I see David"'" Project. I realise that ' (single quote) becomes "'"; but I have tried using ENT_NOQUOTES and ENT_COMPAT to avoid encoding the single quote but neither works.
$name = trim(filter_input(INPUT_POST, 'name0', FILTER_SANITIZE_STRING));
<form method="post" class="form" />
Title: <input type="text" name="name0" value="<?php echo
htmlspecialchars($name, ENT_NOQUOTES); ?>">
Am I doing something wrong or should the ENT_NOQUOTES work? I tried using str_replace to replace with ' with an \' but this didn't work either.
The only way round this I have found is to use this:
htmlspecialchars_decode(htmlspecialchars($name, ENT_NOQUOTES));
Is that acceptable?
Sorry I realise this is probably a really stupid question but I just can't get my head around it.
Thanks for any replies.
You can accept a simple answer if it solves your problem BUT you should really understand that what you have delved into is a much larger issue you or someone has created for you.
Databases should not contain HTML encoded characters unless they are specifically meant for storing HTML. I highly doubt this is the case as it very rarely is.
Someone is inserting HTML into your database (html encoding data on insert). This means if you ever want to use a mobile app that is not HTML based, or a command line, or anything at all that might use the data and isn't HTML based, you are going to run into a weird problem where the HTML encoded characters have to be removed on output. This is typically kind of the backwards way to do it and can often cause issues.
You rarely need to "sanitize" your inputs. If anything, you should reject input that is not allowed OR simply escape it in the proper way while inserting it into the database. Sanitizing is only a thing in very special circumstances, which you don't appear to have right now. You're simply inputting and outputting text.
You should pretty much never change users input
My suggestion, if possible, is to fix your INSERT code first so it isn't html encoding data. This html encoding should happen when you output the data TO AN HTML FORMAT. You would use htmlspecialchars() to do this.
Do you have to escape or sanatise output that will be in a <textarea>?
It seems that if i sanatise it using htmlentities() the actual &...; character replacements come up
Well, you have to:
<?php
$content = "</textarea><script>alert('hi!')</script>";
?>
<textarea>
<?php echo $content; ?>
</textarea>
Yes, you need to sanitize. Use htmlspecialchars($str, ENT_QUOTES) instead.
If that output was initially provided by the user or any untrusted source (i.e. not directly from your code) then it needs to be sanitized to prevent against XSS attacks.
You need to consider whatever the output is editable by the user or not. If it not and it is a trusted output (maybe coming from pre defined texts that YOU wrote) you obviously don't. Otherwise yes. And the HTML chars replacement is quite normal but you don't have to worry because when the page is read and outputted to the user browser all the previous characters will still be there.
Notice that the > and < characters could be used, if not sanitize, to inject other HTML code and particular the <script> tag that can run Javascript.
Always escape all occurances of < and > (with < and >) within the textarea's content. Otherwise one could provide the following content (example) to "escape" the textarea and inject HTML code:
</textarea><script src="http://malicious.code.is/us.js"></script>
Otherwise this could result in the following code:
<textarea id="text"></textarea><script src="http://malicious.code.is/us.js"></script></textarea>
The second </textarea> at the end would be ignored and the script tag before would be executed.
Just using htmlspecialchars() is NOT enough. It still leaves you vulnerable to certain multibyte character attack vectors (even when using htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8')
Perhaps look at a library like HTMLPurifier to give you a more complete solution.
Here is a pretty good summary of XSS protection in PHP.
http://www.bytetouch.com/blog/programming/protecting-php-scripts-from-cross-site-scripting-xss-attacks/
I am about to make a char counting function which counts input from a tinyMce textarea.
Server-side validation with code like this:
$string = "is<isvery interesting <thatthis willbe stripped";
$stripped = strip_tags($string);
$count = strlen($stripped); // This will return 2
You might notice that $string has no tag at all, anyway strip_tags() strips everything from the first less-than sign on.
Is this a bug or a feature?
This has been documented:
Because strip_tags() does not actually validate the HTML, partial or
broken tags can result in the removal of more text/data than expected.
http://php.net/manual/en/function.strip-tags.php
strip_tags is actually quite dumb. It strips everything, that only remotely looks like an HTML tag. That is, starting with < and some alpha-numeric sign until the closing > or as far as it can get.
The observed behavior is in this context a bug. However, strip_tags is then not the tool to do error correction on input HTML. Its purpose is to strip away stuff, so that the remainder is safe to embed in websites. In doubt, it strips more, which is a good thing.
I have a system set up for users to submit their articles into my database. Since it will just be HTML, I don't want to expect them to know to type <br /> every time there's a newline, so I am using the PHP function nl2br() on the input.
I'm also providing an article modification tool, which will bring their articles back into the form (this is a different page, however) and allow them to edit it. In doing this, the <br /> elements were appearing also (with newlines still). To remedy the elements appearing (which I had expected, anyway) I added preg_replace('/<br(\s+)?\/?>/i', "\n", mysql_result($result,$i,"content")) which I had found in another question on this site. It does the job of removing the <br /> elements, but since it is replacing them with newlines, and the newlines would have remained originally anyway, every time the post is edited, more and more newlines will be added, spacing out the paragraphs more and more each time. This is something a user won't understand.
As an example, say I enter the following into the article submission form:
Hello, this is my article.
I am demonstrating a new line here.
This will convert to:
Hello, this is my article.<br />
I am demonstrating a new line here.
Notice that, even though the newline character was converted, there is still a newline in the text. In the editing form, the <br /> will be converted back to newline and look like this:
Hello, this is my article.
I am demonstrating a new line here.
Because the <br /> was converted to a newline, but there was already a newline. So I guess what I'm expecting is for it to originally be converted to something like this:
Hello, this is my article.<br />I am demonstrating a new line here.
I'm wondering ... is there a way to stop the nl2br() function from maintaining the original newlines? Might it have to do with the Windows \r\n character?
The function you're using, nl2br is used for inserting them, but not replacing them. If you want to replace \n with <br /> you just need to use str_replace. Like so:
$string = str_replace("\n","<br />",$string);
There is absolutely no need for regex in this situation.
It seems like the problem you described is not a bug, but a feature of bl2br. You could just write your own function for it, like:
<?php
function NlToBr($inString)
{
return preg_replace("%\n%", "<br>", $inString);
}
?>
I found this one in the comments of the documentation of the nl2br-function in the PHP Manual: http://php.net/manual/de/function.nl2br.php. If the one I posted did not work for you, there should be plenty more where it came from.
(Or just use the function from the other Answer that was just posted, I guess that should work, too)
This should fix it:
preg_replace('/<br(\s+)?\/?>(?!\s*\n)/i', "\n", mysql_result($result,$i,"content"))
You cannot simply remove the breaks, because they might be on the same line. This regex will replace all breaks with newline but not those that are followed by the newline.
It will leave the <br>\n in the text. Additional regex will get rid of them:
preg_replace('/<br(\s+)?\/?>/i', "", $res)
I have one problem regarding the data insertion in PHP.
In my site there is a message system.
So when my inbox loads it gives one JavaScript alert.
I have searched a lot in my site and finally I found that someone have send me a message with the text below.
<script>
alert(5)
</script>
So how can I restrict the script code being inserted in my database?
I am running on PHP.
There is no problem with JavaScript code being stored in the database. The actual problem is with non-HTML content being taken from the database and displayed to the user as if it were HTML. The correct approach would be to make sure your rendering code treats text as text, not as HTML.
In PHP, this would be done by calling htmlspecialchars on the inbox contents when displaying the inbox (possibly along with nl2br and maybe turning links to <a> tags).
Avoid using striptags for text content: as an user, I might want to type a message like:
... and to create a link, use your-text-here ...
striptags would eliminate the tag, htmlspecialchars would make the text appear as it was typed.
You should not restrict it to be inserted into the database (if StackOverflow would restrict it, we would not be able to post code examples here!)
You should better control how you display it. For instance, add htmlentities() or htmlspecialchars() to your echo call.
This is called XSS. There are numerous threads about it on SO.
How to prevent XSS with HTML/PHP?
What are the best practices for avoid xss attacks in a PHP site?
XSS Attacks Prevention
Is preventing XSS and SQL Injection as easy as does this…?
You should use strip_tags. If you still want to allow some HTML, then add a whitelist in the second parameter.
I should add a really big caveat here. If you're leaving any tags in a strip_tags whitelist, you can still be susceptible to javascript injection. Assume you're allowing the seemingly innocuous tags <strong> and <em>:
Strip tags will still allow all attributes, including event handlers
like <strong onmouseover="window.href=http://mydodgysite.com">this</strong>.
You have a couple of serious options:
strip_tags with no whitelist. Safe, but doesn't allow for any formatting, and may cause problems with strings like this: "x<y, but y>4" --> "x4"
htmlentities. Use this when displaying the data on the screen (not on the data before you put it in the database). It's safe, but doesn't allow for formatting.
A different markup system than HTML, for example: Markdown, Wiki markup, BB Code. Requires rendering to convert back to HTML, but it's mostly safe and can be quite flexible.
User input should be escaped before outputting it.
Whenever you're displaying something a user submitted, run it through htmlspecialchars() first. This'll turn HTML code into safe output.
Take a look at the htmlspecialchars() function. It converts < > ' " and & to their html entity equilivents, meaning <script> will become <script>
You can use strip_tags(). The second argument of this function will allow you to list an explicit list of which tags are allowable:
// Allow <p> and <a>, <script> will be stripped
echo strip_tags($text, '<p><a>');
You may also consider htmlspecialchars(), which converts characters like < into <, causing the browser to interpret them as text, rather than code:
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
If I understand you right, you're just looking for two simple commands:
$message = str_replace($message, "<", "<");
$message = str_replace($message, ">", ">");