Show content from another site and security - php

I'd need to load an user given URL and display a div with my content after the content of the user given website.
Implementing this would be trivial:
$c = file_get_contents($url);
echo $c . $myDivCode;
However, wouldn't this open my server to all kinds of security issues, such as XSS?
If so, what would be the best way to handle this taking into account I would like to be able to display the content of the user given URL as well as possible (i.e. run all the safe scripts).

The best way probably would be to display site in an iframe like that:
echo "<iframe src=\"$url\"></iframe>";
This way user loads the page directly from the url, without your server proxying it.
However, since you're displaying information from another site, your site will always be vulnerable to XSS unless you remove scripts and HTML completely.

Of course you are opened to xss exploits.
To prevent from XSS attacks, you just have to check and validate / escape properly all data, dont allow html or javascript code to be inserted from that url.
Use htmlspecialchars() to convert HTML characters into HTML entities. So characters like <> that mark the beginning/end of a tag are turned into html entities and you can use strip_tags() to only allow some tags as the function does not strip out harmful attributes like the onclick or onload.

Related

Strip php tags from user-created code

I have a webpage where users can edit, create, store, access and debug their HTML codes online. For security purpuses I am already using a sandboxed iframe to display the user content, but there is one more thing I've though about - what if the user created code contains a php tag (my serverside language is php) that, for instance, makes an SQL query and reads/deletes all of my databases or does anything else that is harmful?
Is it possible? If yes - how can I defend myself against this type of attack? I do know that php has an option to strip all tags with certain tagnames, so I can strip all the php tags, but then again - can the users still not use other hacks, such as document.write() the php tag or something like that?
Thank you in advance
The php tag (<?php ?>) isn't a HTML tag (<html><strong> and so on). So that would not really be possible.
On the other hand they could insert javascript (xss). You can escape the output, or filter the input.
For filtering the input see: http://php.net/strip_tags
You can pass the tags you want to keep as a parameter like this:
strip_tags($input, '<br>');

New Way To Prevent XSS Attacks

I have a website related to entertainment. So, I have thought to use a new method to prevent XSS Attack. I have created the following words list
alert(, javascript, <script>,<script,vbscript,<layer>,
<layer,scriptalert,HTTP-EQUIV,mocha:,<object>,<object,
AllowScriptAccess,text/javascript,<link>, <link,<?php, <?import,
I have thought that because my site is related to entertainment, So I do not expect from a normal user (other than malicious user) to use such kind of words in his comment. So, I have decided to remove all the above comma separated words from the user submitted string. I need your advice. Do I no need to use htmlpurifier like tools after doing this?
Note: I am not using htmlspecialchars() because it will also convert the tags generated from my Rich Text Editor (CKEditor), so user formatted will be gone.
Using a black list is a bad idea as it is simple to circumvent. For example, you are checking for and presumably removing <script>. To circumvent this, a malicious user can enter:
<scri<script>pt>
your code will strip out the middle <script> leaving the outer <script> intact and saved to the page.
If you need to enter HTML and your users do not, then prevent them from entering HTML. You need to have a separate method, only accessible to you, for entering articles that with HTML.
This approach misunderstands what the HTML-injection problem is, and is utterly ineffective.
There are many, many more ways to put scripting in HTML than the above list, and many ways to evade the filter by using escaped forms. You will never catch all potential "harmful" constructs with this kind of naive sequence blacklisting, and if you try you will inconvenience users with genuine comments. (eg banning use of words beginning with on...)
The correct way to prevent HTML-injection XSS is:
use htmlspecialchars() when outputting content that is supposed to be normal text (which is the vast majority of content);
if you need to allow user-supplied HTML markup, whitelist the harmless tags and attributes you wish to allow, and enforce that using HTMLPurifier or another similar library.
This is a standard and well-understood part of writing a web application, and is not difficult to implement.
Why not just make a function that reverts the changes htmlspecialchars() made for the specific tags you want to be available, such as <b><i><a> etc?
Hacks to circumvent your list aside, it's always better to use a whitelist than a blacklist.
In this case, you would already have a clear list of tags that you want to support, so just whitelist tags like <em>, <b>, etc, using some HTML purifier.
you can try with
htmlentities()
echo htmlentities("<b>test word</b>");
ouput: <b>test word</b>gt;
strip_tags()
echo strip_tags("<b>test word</b>");
ouput: test word
mysql_real_escape_string()
or try a simple function
function clean_string($str) {
if (!get_magic_quotes_gpc()) {
$str = addslashes($str);
}
$str = strip_tags(htmlspecialchars($str));
return $str;
}

Is htmlspecialchars required if you are not outputting html?

I have a script that registers users based on their user input. This uses prepared statements plus whitelists to prevent sql injection. But I am struggling to understand the prevention of XSS.
From what I understand, you only need to prevent XSS if you are outputting HTML onto the page? What does this mean???
Im guessing that with this register page it doesn't apply because I am not outputting HTML to the web page? Is that right?
If I was to prevent XSS, do I use htmlspecialchars?
Generally correct, if you are having any returned values show up on the page, or if you are inserting information into the database for later retrieval and display (like user profile information) you will want to use htmlspecialchars.
For me, when I do my user registration, if they fail to enter a correct value in an input field, I redisplay the page with the values they entered. In this case, I have it encoded with htmlspecialchars.
If at any point ever, you plan on redisplaying the information from the DB into a webpage (as mentioned with profiles and the like) you should use htmlspecialchars.
Better safe than sorry I always say - never trust user input
Basically, XSS happens when you are taking the user's input un-sanitized and display in your webpage.
For example: A user inputs
<script>alert('hello you are hacked');</script>
In a text box, and you show this in your webpage after it is registered like
Hello, $username
This suddenly gets turned into
Hello, <script>alert('hello you are hacked');</script>
This is one of the form of XSS
One of a effiecient way to prevent XSS is like this
echo htmlspecialchars($varname, ENT_QUOTES, 'UTF-8');
From what I understand, you only need to prevent XSS if you are
outputting HTML onto the page? What does this mean???
XSS is an attack carried out by the server outputting HTML (in practice, Javascript) to the client when it did not mean to do so (and obviously when that HTML was specially crafted and supplied by a hostile user).
Im guessing that with this register page it doesn't apply because I am
not outputting HTML to the web page? Is that right?
If you are not outputting anything that comes from user input you are safe.
If I was to prevent XSS, do I use htmlspecialchars?
Yes, that is sufficient.

What is a function that will allow output with HTML and avoid XSS attacks

I am looking for a way or function that will allow me to display data from my mySQL database. The users are allowed to post articles, that I use mysql_real_escape_string to avoid SQL injections before inserting their post in the DB.
For my testing pursposes I write in a text area my post with tags like <b> <a> <i> <li>.
Later I will use an editor like this one here on Stackoverflow to help users with their posts.
However, I am aware of XSS and just echoing straight from the DB may lead to XSS attacks. So, I choosed for my tests to output the content with htmlentities or htmlspecialchars. None of them will show me the post correctly with html.
Therefore, I used strip tags but as far as I know and read, is not safe.
What is a function that you may use too, that will let me output the data correctly, just like this and prevent XSS?
If you want to display html correctly you should print plain html as you get it.
But for avoiding XSS try to remove javascript tags and don't allow load images from external resources.

Is it possible to execute some tags within a textarea as leaving the rest as plaintext?

I am developing a php-based web application in which there is a text area within which user can type whatever he/she wants and the content later gets displayed on another page after being stored in a database. The scenario is that the user can type in HTML tags. But as far as functionality constraints are concerned, I wish to allow the user to execute some tags such as <a>, <div> etc., leaving the rest of the tags to be displayed as plaintext.
I had previously pasted this question:
Prevent HTML data from being posted into form textboxes
But it answered only the ways such as strip_tags() and htmlspecialchars() which either stripped the html content completely displaying the remaining plaintext or displayed everything as plaintext with no option for adding any tag as exception, respectively. Please help. Cheers.
You can look at HTML Purifier. This is a library specially designed for this.
It seems it can handle any form of xss attack. See also the comparison page.
as told in the last post , strip_tags() is the answer, if you bothered to read the manual page for strip_tags() ,you will see you can tell it what tags to allow, which is exactly what you want.
Check the documentation for strip_tags and you'll see that the second (optional) argument accepted is an array of allowable tags.
Edit: Misunderstood it. Never mind D: More sleep is needed methinks. Looks like you should just run a htmlspecialchars and reconvert the required tags back with a regex
Get PHP's translation table and strip out the ones you don't want, then call strtr();
$table = get_html_translation_table(HTML_SPECIALCHARS);
$table['allowed_tag'] = "";
$table['another_allowed_tag'] = "";
strtr($str, $table);
I haven't tested but it should work.

Categories