Here's something I've thought about for a while.
I am creating an application where's my users will upload their own custom themes, which means that there's going to be a good opportunity for anyone with basic PHP/XSS/whatever skills to cause a lot of headache.
I would like to run any uploaded files in a sort-of sandboxed, closed environment that only has access to the stuff (variables) that I want and nothing else.
Would this be good practice and how would it be done?
To allow arbitrary html/javascript safely then each user must have its own subdomain. If each user has their own subdomain then a user's JavaScript will be restricted their own sandbox because of the Same Origin Policy. If you only want to allow "safe html" then htmlpurifer is an option, and then you can use 1 domain.
Allowing custom PHP is a bit more hazardous. "Shared hosting" providers rely upon suPHP which forces the php script to run as a specific user. This would require every user to have their own account on your system. This method of defense has been around for a while. It isn't perfect but it does the trick.
Another possible solution for custom themes is to use a templating engine, which can prevent templates from getting full access to PHP. SOme popular frameworks for this:
smarty, it doesn't have the best secuirty track record, but you keep it up to date you probably won't have a problem. It needs to be configured to disallow native php.
twig is a relatively new engine from the makers of Symfony Framework. This means it has a decent developer base and since it ships with Symfony, it's also been tested in the wild. Twig does not allow any PHP functions to be called, unless you specifically create a twig function/filter for them.
As you don't want to grant your users access to PHP, you should use a template engine that supports sandboxing. Twig is a prominent example here.
global scope will always be accessible.
but object oriented concept provide a lot. what you can't do is to hide global stuff. what you can do is not make it visible in the first place.
but executing unreviewed 3rd party code is a tricky thing. i would recommend some sort of process isolation here if possible. which means you open a process using popen or something, in combination with suphp you can make a restricted linux user. that is very well possible and secure with the correct security measures in place.
a good approach to run the code within the same program is to use the templating pattern. its a bit unpractical for classes because whole files get loaded that can inject hazardous code. but you can create custom functions in php from code. the code does not get executed unless the function is called. you can also extend a class to a variable name, which is then user supplied code. however this is almost unpossible to make safe.
when it comes to html code , it is way easier. there are good html tidy is a good start. there are good solutions to allow only speical tags.
javascript can be "secured" in a way that old facebook fbml applications did. which includes server side rewrites, dynamic variable names etc its quite complicated.
in my opinion the best way to allow external customizations is to allow external stylesheets. just load them from an external origin and there is not really a security concern.
edit: of course you can parse any code and limit it to certain statements or deny certain statements, but this is very tricky and for php a very heavy constraint. its probably better to switch to some higher level algorithmic languages or go client side with javascript.
What you want to do is really risky. You should never allow your users to upload PHP files. That's why you don't find many PHP fiddlers around the net (though now there's some).
Also JS is dangerous in some indirect ways and pretty much nobody allows you to upload it (with the notable exception of Tumblr).
What you should do is adopt some kind of templating engine, and sanitize the templates the users upload, to remove scripts.
Since security is an issue, try to check security advisories like Secunia when choosing the templating engine.
Related
Looking at building my first PHP application from the ground up. So pardon the noob question.
Is it proper to use auto_prepend_file to manage common variables like a default db connection or run session_start() or is there a more favorable way to define "application" variables and add session management to every page?
I'm asking because I came across a warning in netbeans when using a variable defined in an include. Searching google I saw a few posts stating that defining the variables in an include was bad practice; so what's the good practice in PHP?
Thanks
Most modern php application layouts do not have the required resources loaded in the code.
Instead most often there is an autloader that parses the requested resource (class name) and loads the correct file. Then most things are encapsulated in objects and classes.
The most common standard now is PSR-0
Configs are mostly stored in config files of various formats, like xml. Then there is often an object that is used to read those configs. From this object the configuration is then obtained to be used in certain places like database connections.
Also those things that get executed are mostly not executed in the code but rather execute themselves by attaching themselves to certain points in a program.
Most php frameworks have a thing called "hooks" or "events". Basically it's nothing else but a simple list with event names and for each entry a list of functions that should be executed.
When some part of the code "fires" it uses a helper class that walks through the entries of the list and executes those as well.
You ask yourself, can't you have loops there? The simple answer is, yes.
The whole idea behind all this stuff is that you have to change no existing code anywhere if you want to bring new code into your application.
Is that good practice? I honestly don't know.
If a project exceeds a certain size and multiple persons are programming on it, some standard may be necessary. And the way not to modify existing code has proven good in practice.
Regarding auto_prepend_file, that is something that I would not do.
I may do it if I have no other way. For example, if I want to execute some code that protects my application from ddos or security injections. And I just do not want to mess with the application itself.
But if I design something from the start, I would not do it.
Why? Maybe I want to switch to a new webserver, or even execute my program in the command line. Then I have a problem if I defined my auto prepending in apache...
Or maybe I have some code where I do not want that at all? Just one file within my application where I just do not want it because I do not need it and it takes up resources or is a security risk?
I often write an application where I have for example the database username and password directly in the function that establishes the link.
Why? Why not? Because I do not want to have it available on a global scale. If it's in the function code, its harder for other, possibly insecure code, to access it.
The very most common mean is to have a config file and just require it somewhere in your application.
Also most modern applications do not have different php files that get loaded by the webserver, so there is no need for having the same code at multiple places.
Instead most modern applications have a single php file (mostly index.php) that serves as a so called "bootstrap" file. The webserver rewrites every request instead of the requests to static resources like images to there, and everything else, like deciding what content to show when looking at the requested url, is handled in the application.
I want to build a report builder into a web app of mine. The user collects data through other parts of the site, and then should be able to generate "reports" in which he/she can use said data in a document-style fashion. I want the user to be able to use basic math functionality, get/set their own variables, etc. I figure why reinvent the wheel? If I were to allow the user to write the report with something like Twig Template Engine and only enable certain extensions for them to use, does this seem reasonably secure? Twig templates already remove any php found in the markup, and there aren't too many powerful functions that you can use, other than basic string alterations, etc. Let me know your thoughts.
Twig has a fairly powerful sandbox extension that does exactly what you're describing. With a sufficiently stringent security policy, I can't see any problems here.
If twig does what you need, why not? It's pretty well done, has a sandbox mode and can compile the templates. In the opposite, offering PHP from PHP is hard to divide, so using some template sounds not bad to me.
As a web developer I am using PHP and I know that I have to worry about security but when you use a framework, there is a lot-of code and design that you relay on but that you didn't code or design and for instance I am using CakePHP.
so in this case with frameworks how much should i worry about security ?
You should always continue respecting the basic principles of security :
don't trust the user
never trust the user
Which kinda means :
filter / validate everything that comes to your application
escape any output.
Using a framework doesn't change much about that, except that :
Output to the database often es some layer of the framework, which should deal with escaping
Frameworks often provide filtering / validation solutions ; use them ;-)
Frameworks often have some guidelines ; read them.
As a sidenote : you said this :
there is a lot-of code and design that
you relay on but that you didn't code
or design
Considering you are using a well-known framework that lots of people use, this code has probably been more tested/reviewed than any code you could write ;-)
That's an advantage of open-source, actually : you are not the only one responsible for the code, and lots of eyes have seen it -- which means lots of hands have enhanced it.
There are a lot of things to consider when dealing with security in an application. As Pascal said, it is a good idea to use a popular framework that has had a number of people looking at it.
I see a few areas of concern in regards to CakePHP.
The first issue is the end user. You should expect someone to do something foolish on every page you build. Some examples of this are:
A person clicking the submit button rapidly over and over. This may skew or mess up your system in a way if you're not careful. The solution for this is not based on the framework, but rather your coding methodology and testing.
SQL Injection and other bad things. Any field on a page can be potentially abused, therefore every form element must be sanitized. CakePHP has simple methods to take care of these security issues. http://book.cakephp.org/view/153/Data-Sanitization
Clean URL's are very important. You should never design a system that allows a user to access integer primary keys directly. For instance, if you have a site that has /show_user/2098 then someone can simply type in show_user/2097 to see someone else's account. CakePHP allows you to incorporate slugs or UUID's quite easily, to prevent this from happening.
Second, you must be concerned with attacks dealing with the code and permissions itself. For example:
Never use eval() or system() in your code from data that may come from the end user. There have been applications in the past written in perl that have been hijacked because of this issue.
The folder structure and permissions is important in regards to security. Users should never have access to get into a writable directory. With CakePHP the folder structure is designed so that you can point apache directly to app/webroot. This means the tmp directory is outside of the apache path, making the system a bit more secure.
Third, you should be concerned with the protection of your administration pages and who has permissions to access what.
CakePHP has an Auth and an Acl component that allows you to choose what users get access to which pages. This makes use of custom Cake Sessions which can be stored in a database, by using PHP or written to the file system.
I would suggest reading up on some of the important components and being sure you set them up properly, to ensue you have built an application without security flaws. Take a look at some of these elements as you research further: http://book.cakephp.org/view/170/Core-Components
I suggest you check out ESAPI: http://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API#tab=PHP
It is not a framework per se, but does contain a lot of tools for the problems Pascal mentions.
I just inherited a 70k line PHP codebase that I now need to add enhancements onto. I've seen worse, at least this codebase uses an MVC architecture and is object oriented. However, there is no templating system and many classes are deprecated - only being called once. I think my method might be the following:
Find all the files on the live server that have not been touched in 48 hours and make them candidates for deletion (luckily there is a live server).
Implement a template system (Smarty) and try to find duplicate code in the templates.
Alot of the methods have copied and pasted code ... I don't know how much I want to mess with it.
My questions are: Are there steps that I should take or you would take? What is your method for dealing with this? Are there tools to help find duplicate PHP code?
Find all the files on the live server that have not been touched in 48 hours and make them candidates for deletion (luckily there is a live server)
By "touched" I'm assuming you'll stat the file to see if it's been accessed by any part of the system. I'd go a month and a half on this rather than 48 hours. In older PHP code bases you'll often find there's a bunch of code lying around that gets called via a local cron job once a week or once a month, or a third party is calling it remotely as a pseudo-service on a regular basis. By waiting 6 weeks be more likely to catch any and all files that are being called.
Implement a template system (Smarty) and try to find duplicate code in the templates.
Why? Serious question, is there a reason to implement a template system? (non-PHP savvy designers, developers who get you into trouble by including too much logic in the Views, or you're the one creating templates, and you know you work much faster in smarty than in PHP). If not, avoid it and just use PHP.
Also, how realistic is it to implement a pure smarty template system? I'd give favorable odds that old PHP systems like this are going to have a ton of "business logic" mixed in with their views that can't be implemented in pure smarty, and if you allowed mixed PHP/Smarty your developers will use PHP everytime.
Alot of the methods have copied and pasted code ... I don't know how much I want to mess with it.
I don't know of any code analysis tools that will do this out of the box, but it sould be possible to whip something up with the tokenizer functions.
What You Should Really Do
I don't want to dissuade you or demoralize you, but why do you want to cleanup this code? Right now it's doing what's is supposed to do. Stupidly, but it's doing it. Every re-factoring project is going to put current, undocumented, possibly business critical functionality at risk and at the end of that work you have an application that's doing the exact same thing. It's 70k lines of what sounds like shoddy code that only you care about fixing, no mater what other people are telling you their priorities are. If their priority was clean code, their code would already be clean. One person can't change a culture. Unless there's a straight forward business case for that code to be cleaned (open sourcing the project as a business strategy?), that legacy code isn't going anywhere.
Here's a different set of priorties to consider with legacy PHP applications
Is there a singleton database object or pair of objects that allows developers to easily setup seperate connections for read (slave) and write (master). Lot of legacy PHP applications will instantiate multiple connections to the same database in a single page call, which is a performance nightmare.
Is there a straight forward way for developers to avoid SQL injection? Give this to them for new code (parameterized SQL), and consider fixing legacy SQL to use this new method, but also consider security steps you can take on the network level.
Get a test framework of some kind wrapped around all the legacy code and treat it as a black-box. Use those tests to create a centralized API developers can use in place of the myriad function calls and copy/paste code they've been using.
Develop a centralized system for configuration values, most legacy PHP code is some awful combination of defines and class constants, which means any config changes mean a code push, which means potential DOOM.
Develop a lint that's hooked into the source control system to enforce code sanity for all new code, not just for style, but to make sure that business logic stays out of the view, that the SQL is being contructed in a safe way, that those old copy/paste libraries aren't being used, etc.
Develop a sane, trackable build and/or push system and stop people from hackin on code live in production
I don't know of any specific tools, but I have worked on re-factoring some fairly large PHP projects.
I would recommend a templating system, either Smarty or a strict PHP system that is clearly explained to anybody working on the project.
Take discrete, manageable sections and re-factor on a regular basis (e.g., this week, I'm going to re-write this). Don't bite off more than you can chew and don't plan to do a full rewrite.
Also, I do regular code searches (I use Eclipse and search through the files in my project) on suspect functions and files. Some people are too scared to make big changes, but I would rather err on the bold side rather than accept messy and poorly organized code. Just be prepared to test, test, test!
You need to identify a solid reason for refactoring. Removing duplicate code is not really a very good one; it needs to be coupled with a real desired improvement, such as reducing memory footprint (useful if the webservers are struggling).
Once you have that in mind, now you can start refactoring. And make sure you have a version-control repository, too. Just don't check in broken code.
Don't be too hasty about single-use classes. A lot of small PHP frameworks work like that. Often they could be abstracted better, though. Also, A lot of PHP code also doesn't understand data layer abstraction with the result that there is SQL code littered through the business logic or even the display code. This problem is often coupled with no custom database handler, which is a problem if you suddenly have to teach it about replication, or caching. This is the same abstraction problem from the other direction.
One very practical step: once you start abstracting repeated code away, you'll find reasons to have multiple files open. If you're using a shell and a Unix editor, then screen will help you immensely.
I wrote a small PHP application that I'd like to distribute. I'm looking for best practices so that it can be installed on most webhosts with minimal hassle.
Briefly: It's simple tool that lets people download files once they login with a password.
So my questions are:
1) How should I handle configuration values? I'm not using a database, so a configuration file seems appropriate. I know that other php apps (e.g. Wordpress) use defines, but they are global and there is potential that the names will conflict. (Global variables also have the same problem, obviously.) I looked at the "ini" file mechanism built into PHP. It only allows comments at the top - so you can't annotate each setting easily - and you can't validate syntax with "php -f". Other options?
2) How to handle templating? The application needs to pump out a form. Possibly with an error message. (e.g. "Sorry, wrong password.") I've have a class variable with the HTML form, but also allow an external template file to be used instead (specified in the config). I do some trivial search and replace - e.g. %SCRIPT% to the name of the script, %STATUS% to hold the error message. This feels a bit like reinventing the wheel, but including a templating system like Smarty is overkill. (Plus they may already have a templating system.) Other options?
3) i18n - There are only 3 message strings, and gettext doesn't seem to be universally installed. Is it such a bad idea just to make these three strings parameters in the config file?
4) How to best integrate with other frameworks? My app is a single class. So, I thought I could just include a php script that showed how the class was called. It would be a starting point for people who had to integrate it into another framework, but also be fine as-is for those not interested in customizing. Reasonable?
5) GET/POST parameters - Is it bad form for a class to be looking at $_GET and $_POST? Should all values be passed into my class during construction?
Thanks.
Configuration
You can use a php file like this:
<?php
return array(
'option1' => 'foobar',
'option2' => 123,
//and so on...
);
?>
And in the main class just use:
$config = (array) include 'path/to/config/file';
And if you plan to mostly distribute your class as a component in other applications, then simply put config array/object as a parameter in your class' constructor and leave the details to the user.
Templating
For such simple application the method your described should be enough. Remember that one can always extend your class and overload your outputting method with his own.
I10N
As mentioned before, for 3 variables anything more than storing them as config is just overkill.
Integration
Comment each public method (or even better also protected and private ones) with explanations what do they do and what parameters are needed. If you combine that with an example, it should be enough for most users.
GET vs POST
Your class uses passwords and you even think of sending them via GET? ;)
Think of browser history, referer headers etc - your users' passwords would be visible there.
Can config be local to class instances? Or could you create a little class that you could create an instance of to query for config values? Also prepending any global vars with you application's name should go some way to stop clashes.
If your templating is really simple, just write a short templater. It'll be easier than trying to fend off problems people get with any 3rd party templater. It might also simplify licensing issues. If you start worrying about what they already have, you'll never release anything. There are too many combinations.
For 3 strings? Yeah do those the same way you're handling config.
Good comments throughout with an intro explaining how you use the class.
I don't think so. If it bothers you, you could use default arguments to use given arguments first, then search for GET/POST values if none are provided (though that might be a security risk)
There are other things to take into consideration. Lots of people are on shared hosts and as a result, don't have control over their php.ini or their php version. You need to make sure you're only using features that are as commonplace as possible.
One example is that shorttags aren't enabled on some hosts (you have to use <?php ... ?> and <?php echo "..."?> instead of <? ... ?> or <?= "..." ?>) which can be a royal PITA.
In addition to Krzysztof's good advice:
Use <?php only
If you use functions that can be disabled, use function_exists() to ensure they're available. #missing_function() makes PHP die silently without any error logged.
You can't rely on things that can be disabled/changed via php.ini. Use ini_get() to adapt to different settings.
If magic_quotes are enabled, strip slashes only on from your copy of input – don't modify global arrays! Security of some lame code may rely on these slashes being present.
Expect that users will mindlessly copy&paste code from your documentation/website.