We have a CMS editor where php is allowed to be used inside it, however we need to restrict access some commands such as file_get_contents, file(), and global.
Can someone help me with a boolean response regex for that? The text from the template is stored in a string.
I know, probably not an ideal method for this but it's all I can come up with for now :)
What you want to do is pretty much impossible. It is really hard to protect yourself against attacks if you allow people to execute code on your machine.
Here is the try I had on it: Sandbox. Source code.
What it does is basically maintain a large list of blacklisted functions for filesystem access, shell access, a.s.o (I allowed some functions for reading the filesystem like show_source that should not be allowed if you want to use it for something real.)
It also tries to protect from more hidden attacks like $func = 'unlink'; $func(__FILE__); by turning it into $func = 'unlink'; ${'___xyz'.!$___xyz=Sandbox::checkVarFunction($func)}(__FILE__) a.s.o.
PS: Still you probably don't want to allow people to run PHP code on your site. The risk is just by far too big. Instead I would allow people to use a templateing language inside the editor. A good candidate would be Twig, because it has a built in sandbox which allows you to restrict usage to certain tags, functions, ...
It's going to be very hard to protect yourself perfectly.
As I see it, you have a few options:
Search for predefined strings which is not allowed in your content (like file_get_contents) and display a error message saying that the user cannot save because of this. This will however lead to "hacks" where you'll end up searching for all possible characters, like () which can be valid in some cases.
Use token_get_all and try to parse the content as PHP. You can then loop through the whole source code, token by token, and see if you find a token you do not accept.
Write your own language or DSL for this. This language should only be capable of doing exactly what you want. Depending on your requirements, this can be the easiest and most maintainable way to go.
You can use preg_match for this. Use:
if(preg_match("#(file_get_contents|file)\(#i",$text))
Related
I wish to store certain pieces of code in database tables as templates but I am unsure as to whether they are going to create problems or not. I keep reading mixed messages from various different people in different posts and I am just not happy that I am clear on this subject.
I have already worked out that you cannot really echo/ print PHP into a webpage. Obviously you can echo strings of HTML but it becomes awkward when you try to do it with PHP code. The only way I have managed to do this is through eval which is apparently bad in most cases... so I am using another method to implement the templates (i.e. writing a php file to be used as an include file)
The main question I am asking is: is there really a problem with storing the PHP code strings (which include SQL statements) inside text type fields (mediumtext, longtext etc) in tables? Could those SQL statements ever do anything like execute actual actions or would they just remain as text strings?
Just to clarify, the reason I am storing strings of code is because they are templates to be used should the web administrator wish to allocate them to a specific area (div) of the pages.
Use SMARTY or Twig template engine. This will neatly solve your problem and you will not need to store anything in the database. It will also keep your PHP code completely separate from your HTML.
Another option is to use
I can see the need for code in the database for instance if you have multiple sites and want to do a source control between them, and not use any 3rd party software.. I would store in a database and then write the code on to a actual physical page, then run the php from that page...
Do not do this. If your database is ever compromised and someone injects malicious PHP, it may be executed. You should store the templates as files and call them when needed.
And you actually can echo/print PHP. You would do it using eval.
The eval() language construct is very dangerous because it allows execution of arbitrary PHP code. Its use thus is discouraged. If you have carefully verified that there is no other option than to use this construct, pay special attention not to pass any user provided data into it without properly validating it beforehand.
I have seen that some web sites have a PHP script to redirect the user to another web page if they try to access JavaScript files directly. How is that done?
Unfortunately, as the previous answers have stated, you can't prevent one from seeing the contents of a JS file; if you could do that, how is the browser supposed to parse it?
The next best thing to do is to "Obfuscate" it using something like http://www.javascriptobfuscator.com/default.aspx
This will convert your human-readable code into a bunch of character codes and obscure function names. The Obfuscator I linked to generates a unique ID and bases its encryption on that ID, making it harder to decrypt.
However, this isn't fool-proof, and someone who really wants to get at your JS, for whatever reason, will do it. Anything you really don't want users to have access too should be done server-side. ;)
No, that's not possible.
There are plenty of ways how to get JS files. Nothing helps in protection.
Javascript is meant to be client side. That means it always gets executed on the browser which is local and thus can not be hidden.
Tumblr and other blogging websites allows people to post embeded codes of videos from youtube and all video networks.
but how they filter only the flash object code and remove any other html or scripts? and even they have an automated code that informes you this is not a valid video code.
Is this done using REGEX expressions? And Is there a PHP class to do that?
Thanks
Generally speaking, using regex is not a good way to deal with HTML : HTML is not regular enough for regular expressions : there are too many variations permitted in the standards... And browsers even accept HTML that's not valid !
In PHP, as your question is tagged as php, a great solution that exists to filter user input is the HTMLPurifier tool.
A couple of interesting things are :
It allows you specify which specific tags are allowed
For each tag, you can define which specific attributes are allowed
Basically, the idea is to only keep what you specify (white-list), instead of trying to remove bad stuff using a black-list (which will never be quite complete).
And if you only specify a list of tags and attributes that can do no harm, only those will be kept -- and the risks of injections are lowered a lot.
Quoting HTMLPurifier's home page :
HTML Purifier is a standards-compliant
HTML filter library written in PHP.
HTML Purifier will not only remove
all malicious code (better known as
XSS) with a thoroughly audited,
secure yet permissive whitelist, it
will also make sure your documents are
standards compliant, something only
achievable with a comprehensive
knowledge of W3C's specifications.
Yes, another great thing is that the code you get as output is valid.
Of course, this will only allow you to clean / filter / purify the HTML input ; it will not allow you to validate that the URL used by the user is both :
correct ; i.e. points to a real content
"OK" as defined by your website ; i.e. for example no nudity, ...
About the second point, there's not much one can do about it : the best solution will be to either :
Have a moderator accept / reject the contents before they're put online
Give the website's users a way to flag some content as inappropriate, so a moderator takes actions.
Basically, to check the content itself of the video, there is not much choice but have a human being say "ok" or "not ok".
About the first point, though, there's hope : some services that host content have APIs that you might want / be able to use.
For instance, Youtube provides an API -- see Developer's Guide: PHP.
In your case, the Retrieving a specific video entry section looks promising : if you send an HTTP request to an URL that looks like this :
http://gdata.youtube.com/feeds/api/videos/videoID
(Replacing "videoID" by the ID of the video, of course)
You'll get some ATOM feed if the video is valid ; and "Invalid id" if it's not
This might help you validate at least some URL to contents -- even if you'll have to develop some specific code for each possible content-hosting service that your users like...
Now, to extract the identifier of the video from your HTML string... If you're thinking about using regex, you are wrong ;-)
The best solution to extract a portion of data from an HTML string is generally to :
Load the HTML using a DOM parser ; DOMDocument::loadHTML is generally pretty helpful, here
Go though the document using DOM methods ; either, depending on your situation :
DOMDocument::getElementsByTagName, if you need to iterate over all elements that have a specific tag name ; might be great to iterate over all <object> or <embed> tags, for instance
Or, if you need something more complex, you could do an XPath query, using the DOMXPath class and its DOMXPath::query method.
And using DOM will also allow you to modify the HTML document using a standard API -- which might help, in case you want to add some message next to the video, or any other thing like that.
Take a look at htmlpurifier to start.
http://htmlpurifier.org/
I have implemented an algorithm for this for the company i work for. It works just fine. BUT, it was quite complicated to implement.
I would definitely check out HTMLPurifier to see if that works in an easy way for you. If you insist on doing it the old-school-way like I did, this is the basic steps:
1.
First of ==> get friendly with stripos()
2.
You have to make an recursive function to identify the start and stop tags for the widget, that includes all combinations of <embed></embed> or <embed/> (selfclosing) or <object></object> ... or <object><params>...<embed/></object>
3.
After this, you have to parse out all attributes and params.
4.
Now, all <object> tags should have <param> tags as child elements. You have to parse all of these to get all the data you need for finally generating a new embed or object tag. Escpecially the params and attributes that holds with, height, data source are important.
5.
Now, you don't know if the attributes are enclosed by single or double-quotes, so your code has to be lenient in this way. Also, you dont know if the code is valid or well formed. So, It should be able to handle nested embed/object tags, embed tags that are not enclosed correctly etc etc... As it is user generatede content, you can't really know and trust the input. You will see that there are lots of combinations.
6.
If you manage to parse the embeded element with all its attributes (or object element and its child params), the whitelisting of domains is easy...
My code ended up to be about 800 lines of code, which is quite large, and it was filled with recursive methods, finding correct stop and end tags etc. My alghorithm also removed all the SEO-text that often are included in the cut&paste embed-code, like links back to the site holding the widget.
Its a good excercise, but If i where you... Don't start walking this road.
Recommendation: Try find something ready made, open source!
This will never be safe. Browsers have those funny little functionalities that help people display content of their pages even if html is messy. There are endless opportunities to get something through :)
check here to see the tip of the iceberg
What You need to do is use a single input for just a link and aditional inputs for width and height and filter those. THEN generate the object tag Yourself.
This might be safe.
http://php.net/manual/en/function.strip-tags.php
and allow certain tags?
The most simple and elegant solution: Allowing HTML and Preventing XSS # shiflett.org.
Using all sorts of "HTML purifier" is more than pointless. Sorry but I don't get people who like to use these bloated libraries when a much simpler solution is in hand.
If you're looking make your site "safe" from vulnerabilities, a white list approach is the (only) way to go. I would recommend safely escaping all user generated content, and white listing only markup you know is safe and works on your site. This means not only <B> tags, but also the flash embeddings.
For example, if you want to allow any youtube to be embedded, write a validation RegEx that looks for the embed code they generate. Refuse to accept any others (or simply display it as escaped markup). This is testable. Forget all this parsing nonsense.
If you also want to add vimeo videos, then look at the embed code they provide and accept that as well.
Ugh? I know this seems like a pain, but in reality it's much easier to write than some algorithm that tries to detect "bad" content in some sort of generic fashion.
After getting the simple version of the algorithm working, you could go back and make it nicer. You could "provisionally" accept content with URLs, scripts, etc. that don't pass your white list, and have an admin process to add approved regexes to your output escaping routine. This way legitimate users aren't left out in the cold, but you don't open your self up to attacks of this nature.
I have a whole range of jQuery code, how do I stop users from seeing the actual code, or how can I encrypt the .js file.
Please suggest opensource
Thanks
Jean
[edit]
I don't want users to know how I have coded or copy my code
[edit]
Once I use the base62 encode, can it be reverse engineered?
Check out packer by Dean Edwards. It has the ability to encode your JS. You have to let your JS be world readable, otherwise a browser couldn't download it.
You cannot prevent your users from being able to see the source code of a Javascript file : it's executed by the user's browser, which means it must be readable on the client side.
The "best" you can do it minify/obfuscate it ; see for instance the YUI Compressor, which exists to minify JS files (so they are smaller, and can be transferred faster), but also has some obfuscating functionnalities.
If will make you Javascript code harder to read/understand -- but someone really motivated will still be able to read it ; well, it will take some time and a bit of work, but it'll still be possible.
You can use google closure compiler
http://code.google.com/closure/
The Closure Compiler compiles JavaScript into compact code, it obfuscates the code, it can still be read but it will be hard to trace and will take time
Try to pack the code with the packer:
http://dean.edwards.name/packer/
This is not like code encryption, but it obfuscate the code.
There is not really much point in encrypting your js file, everyone knows you can view the source code of anyone's website. I believe there are encryptors out there for javascript, but users will have to download the decryptor module to decrypt it. Also since the browser does need to interpret the code, it would probably not be that hard to circumvent.
You could obfuscate the code, but I would do this using a minification technique, and more for performance reasons rather that hiding the code, some obfuscators are more intrusive than others, but again, the code could be re-formatted, albiet the original variable names will not be recoverable.
You just can't encrypt JavaScript that runs on the client machine. Browsers need the unencrypted code in order to execute it!
This is the first thing I found, but it looks like it might do the job:
http://www.vincentcheung.ca/jsencryption/instructions.html
As others have mentioned though, the browser has to be be able to decrypt the code, so the user would also be able to (although it may be some work to do so).
You should look at obfuscation too, which will make the code much harder to reverse engineer.
http://www.javascriptobfuscator.com/Default.aspx
I run a website (sorta like a social network) that I wrote myself. I allow the members to send comments to each other. In the comment; i take the comment and then call this line before saving it in db..
$com = htmlentities($com);
When I want to display it; I call this piece of code..
$com = html_entity_decode($com);
This works out well most of the time. It allows the users to copy/paste youtube/imeem embed code and send each other videos and songs. It also allows them to upload images to photobucket and copy/paste the embed code to send picture comments.
The problem I have is that some people are basically putting in javascript code there as well that tends to do nasty stuff such as open up alert boxes, change location of webpage and things like that.. I am trying to find a good solution to solving this problem once and for all.. How do other sites allow this kind of functionality?
Thanks for your feedback
First: htmlentities or just htmlspecialchars should be used for escaping strings that you embed into HTML. You shouldn't use it for escaping string when you insert them into a SQL query - Use mysql_real_escape_string (For MySql) or better yet - use prepared statements, which have bound parameters. Make sure that magic_quotes are turned off or disabled otherwise, when you manually escape strings.
Second: You don't unescape strings when you pull them out again. Eg. there is no mysql_real_unescape_string. And you shouldn't use stripslashes either - If you find that you need, then you probably have magic_quotes turned on - turn them off instead, and fix the data in the database before proceeding.
Third: What you're doing with html_entity_decode completely nullifies the intended use of htmlentities. Right now, you have absolutely no protection against a malicious user injecting code into your site (You're vulnerable to cross site scripting aka. XSS). Strings that you embed into a HTML context, should be escaped with htmlspecialchars (or htmlentities). If you absolutely have to embed HTML into your page, you have to run it through a cleaning-solution first. strip_tags does this - in theory - but in practise it's very inadequate. The best solution I currently know of, is HtmlPurifier. However, whatever you do, it is always a risk to let random user embed code into your site. If at all possible, try to design your application such that it isn't needed.
I so hope you are scrubbing the data before you send it to the database. It sounds like you are a prime target for a SQl injection attack. I know this is not your question, but it is something that you need to be aware of.
Yes, this is a problem. A lot of sites solve it by only allowing their own custom markup in user fields.
But if you really want to allow HTML, you'll need to scrub out all "script" tags. I believe there are libraries available that do this. But that should be sufficient to prevent JS execution in user-entered code.
This is how Stackoverflow does it, I think, over at RefacterMyCode.
You may want to consider Zend Filter, it offers a lot more than strip_tags and you do not have to include the entire Zend Framework to use it.