I am considering allowing users to insert their own REGEXP pattern which will be used in a Mysql query. I am well aware of sql injection risks, and I know there is no way of passing the regexp pattern value as an argument to a prepared statement.
Is there a waterproof way to safely allow users to provide their own regexp pattern for in the sql query, or should I abandon this idea?
I am working with PHP, by the way, so I am basically asking if it is possible in PHP to make sure the pattern is valid, genuine and harmless.
Good question! I experienced that in general users are not really 'educated' enough to enter 'real' regular expressions. The 'standard' user considers the asterisk already as being really complicated. Patterns like /^([^[0-9]+\s[a-z]*)/i are very rarely used by the 'average' user. You wll probably be better advised to provide the user with simpler meta-characters as wildcards. Unless you want to use the tool as an administrator tool ... for people who know what they are doing.
I haven't done this for mysql, but I've done something similar once for another dbms
what I did was:
determine what version/dialect of regex the db actually uses (there are several flavors. you could go for a restricted one like POSIX basic)
try to compile the regex in the application first (in my case it was java). if it throws an exception at this point, you know it's bad.
execute it on the db
additionally, you could:
first execute it with a "sandbox" user (one using a dummy schema, with no rights to do anything dangerous) and see what happens.
restrict it some more (validate it against a whitelist of allowed characters) e.g don't allow dangerous things like single quotes, semicolons, etc. I imagine php/mysql allows various encodings, so you'll have to deal with that too.
That being said I don't think you can completely remove the risk. edit: especially if you want to avoid denial-of-service attacks too.
Related
I know this topic has been covered to death but I would like some feedback from the community regarding security within our web application.
We have standard LAMP stack web app which contains a large number of database queries which are executed using mysqli_query. These queries are not parameterized and at the moment but there is some naive escaping of the inputs using addslashes.
I have been tasked with making this system safer as we will be penetration tested very shortly. The powers above know that parameterized queries are the way to go to make the system safer however they don't want to invest the time and effort into re-writing all the queries in the application and also changing the framework we have to make them all work correctly.
So basically I'm asking what my options are here?
I've run mysqli_real_escape_string over the inputs. I've setup a filter which doesn't allow words like SELECT, WHERE, UNION to be passed in which I guess makes it safer. I know mysqli_query only allows one query to be run at once so there's some security there (from concatenating updates onto the end of of selects).
Do I have any other options here?
Edit: I should probably add that if anyone is able to provide an example of an attack which is completely unavoidable without parameterized queries that would also be helpful. We have a query which looks like this:
SELECT
pl.created
p.LoginName,
pl.username_entered,
pl.ip_address
FROM loginattempts pl
LEFT JOIN people p ON p.PersonnelId = pl.personnel_id
WHERE p.personnelid = $id
AND pl.created > $date1
AND pl.created < $date2
I've substituted a UNION query into the $id UNION SELECT * FROM p WHERE 1 = 1 sort of thing and I can prevent that by not allowing SELECT/UNION but then I'm sure there are countless other types of attack which I can't think of. Can anyone suggest a few more?
Update
I've convinced the powers that be above me that we need to rewrite the queries to parameterized statements. They estimate it will take a few months maybe but it has to be done. Win. I think?
Update2
Unfortunately I've not been able to convince the powers that be that we need to re-write all of our queries to parameterized ones.
The strategy we have come up with is to test every input as follows:
If the user supplied input is_int that cast it as so.
Same for real numbers.
Run mysqli_real_escape_string over the character data.
Change all the parameters in the queries to quoted strings i.e.
WHERE staffName = ' . $blah . '
In accordance with this answer we are 100% safe as we are not changing the character set at any time and we are using PHP5.5 with latin1 character set at all times.
Update 3
This question has been marked as a duplicate however in my mind the question is still not followed answered. As per update no.2 we have found some strong opinion that the mysqli_real_escape string function can prevent attacks and is apparently "100% safe". No good counter argument has since been provided (i.e. a demonstration of an attack which can defeat it when used correctly).
check every single user input for datatype and where applicabile with regular expressions (golden rule is: never EVER trust user input)
use prepared statements
seriously: prepared statements :)
it's a lot of work especially if your application is in bad shape (like it seems to be in your case) but it's the best way to have a decent security level
the other way (which i'm advising against) could be virtual patching using mod_security or a WAF to filter out injection attempts but first and foremost: try to write robust applications
(virtual patching might seem to be a lazy way to fix things but takes actually a lot of work and testing too and should really only be used on top of an already strong application code)
Do I have any other options here?
No. No external measure, like ones you tried to implement, has been proven to be of any help. Your site is still vulnerable.
I've run mysqli_real_escape_string over the inputs
Congratulations, you just reinvented the notorious magic_quotes feature, that proven to be useless and now expelled from the language.
JFYI, mysqli_real_escape_string has nothing to do with SQL injections at all.
Also, combining it with existing addslashes() call, you are spoiling your data, by doubling number of slashes in it.
I've setup a filter which I guess makes it safer.
It is not. SQL injection is not about adding some words.
Also, this approach is called "Black-listing" it is proven to be essentially unreliable. A black list is essentially incomplete, no matter how many "suggestions" you can get.
I know mysqli_query only allows one query to be run at once so there's some security there
There is not. SQL injection is not about adding another query.
Why did I close this question as a duplicate for "How can I prevent SQL-injection in PHP?"?
Because these questions are mutually exclusive, and cannot coexist on the same site.
If we agree, that the only proper answer is using prepared statements, then a question asks "How can I protect using no prepared statements" makes very little sense.
At the same time, if the OP manages to force us to give the positive answer they desperately wants, it will make the other question obsoleted. Why use prepared statements if everything is all right without them?
Additionally, this particular question is too localized as well. It seeks not insight but excuse. An excuse for nobody but the OP personally only. An excuse that let them to use an approach that proven to be insecure. Although it's up to them, but this renders this question essentially useless for the community.
I am trying to figure out which functions are best to use in different cases when inputting data, as well as outputting data.
When I allow a user to input data into MySQL what is the best way to secure the data to prevent SQL injections and or any other type of injections or hacks someone could attempt?
When I output the data as regular html from the database what is the best way to do this so scripts and such cannot be run?
At the moment I basically only use
mysql_real_escape_string();
before inputting the data to the database, this seems to work fine, but I would like to know if this is all I need to do, or if some other method is better.
And at the moment I use
stripslashes(nl2br(htmlentities()))
(most of the time anyways) for outputting data. I find these work fine for what I usually use them for, however I have run into a problem with htmlentities, I want to be able to have some html tags output respectively, for example:
<ul></ul><li></li><bold></bold>
etc, but I can't.
any help would be great, thanks.
I agree with mikikg that you need to understand SQL injection and XSS vulnerabilities before you can try to secure applications against these types of problems.
However, I disagree with his assertions to use regular expressions to validate user input as a SQL injection preventer. Yes, do validate user input insofar as you can. But don't rely on this to prevent injections, because hackers break these kinds of filters quite often. Also, don't be too strict with your filters -- plenty of websites won't let me log in because there's an apostrophe in my name, and let me tell you, it's a pain in the a** when this happens.
There are two kinds of security problems you mention in your question. The first is a SQL injection. This vulnerability is a "solved problem." That is, if you use parameterized queries, and never pass user supplied data in as anything but a parameter, the database is going to do the "right thing" for you, no matter what happens. For many databases, if you use parameterized queries, there's no chance of injection because the data isn't actually sent embedded in the SQL -- the data is passed unescaped in a length prefixed or similar blob along the wire. This is considerably more performant than database escape functions, and can be safer. (Note: if you use stored procedures that generate dynamic SQL on the database, they might also have injection problems!)
The second problem you mention is the cross site scripting problem. If you want to allow the user to supply HTML without entity escaping it first, this problem is an open research question. Suffice to say that if you allow the user to pass some kinds of HTML, it's entirely likely that your system will suffer an XSS problem at some point to a determined attacker. Now, the state of the art for this problem is to "filter" the data on the server, using libraries like HTMLPurifier. Attackers can and do break these filters on a regular basis; but as of yet nobody has found a better way of protecting the application from these kinds of things. You may be better off only allowing a specific whitelist of HTML tags, and entity encoding anything else.
This is one of the most problematic task today :)
You need to know how SQL injection and other attackers methods works. There are very detailed explanation of each method in https://www.owasp.org/index.php/Main_Page and also whole security framework for PHP.
Using specific security libraries from some framework are also good choice like in CodeIgniter or Zend.
Next, use REGEXP as much as you can and stick pattern rules to specific input format.
Use prepared statements or active records class of your framework.
Always cast your input with (int)$_GET['myvar'] if you really need numeric values.
There are so many other rules and methods to secure your application, but one golden rule is "never trust user's input".
In your php configuration, magic_quotes_gpc should be off. So you won't need stripslashes.
For SQL, take a look at PDO's prepared statements.
And for your custom tags, as there are only three of them, you can do a preg_replace call after the call of htmlentities to convert those back before your insert them into the database.
I need to let my users use asterisks (*) as wildcards in search.
Is it secure to convert the asterisks to % and use LIKE in the sql query.
I know that user-regexp can result in regular epressions that take forever to calculate.
I don't think that i possible in this case but is it any other security issues with doing this?
Wildcards in like expressions can cause changes in query execution that make the RDBMS use full-table scans instead of using indexes. This may slow down the query when there is a lot of data. I would recommend checking user's input for presence of at least a few non-wildcard characters in front of the first asterisk.
Also note that if you convert * to %, and use LIKE, you'd need to take care of _ as well, otherwise it would match any single character, not just the underscore.
If all you are doing is a simple replace like so
str_replace('*','%',$query)
then I don't foresee any security concerns, which I believe is what you are concerned about. You're not going to open up any SQL Injection possibilities or anything (at least not by doing this replacement, you may still have security concerns if you aren't escaping the input).
However, as some other users have pointed out you will open up some performance issues. What will happen if I search for just *, am I going to get your entire table back? The best way (if you don't want to use a database search engine) is going to do some user validation. Most likely if you want to prevent a full table scan on your query, you will want to restrict the user to doing only leading or trailing wildcards.
Good:
*foo
bar*
Not as good:
*foo*
ba*r
It will really depend on how many rows you have and how much you trust your users to provide actual input though.
The real question is, is it safe to let the user decides of part of the query.
Even in a simple case, providing user-data to the database is unsafe.
Allowing users input to be directly provided to the database may be dangerous, but as far as your filter and use your database escaping strategy (ie: mysql_real_escape_string() (well prepared statements or any ORM will do that for you) should be safe enough.
However, it may lead to performance problems, a simple EXPLAIN on your query should warn you on how much rows are scanned by your RDBMS engine.
The best way to implement such feature is to use Search Engine. (Isn't it what you're trying to do?)
There are many choices outta there to help you implementing this.
You may take a look to Sphinx, Solr, Xapian or even Lucene. There are all excellent choice.
They basically allow you to "index" your content to make some fulltext search while increasing performance.
They can also give some incredible functionalities like OR, AND, LIKE, MINUS, etc. comparators/operators.
You may then be interested in this question: Choosing a stand-alone full-text search server: Sphinx or SOLR?
I'm using the ezSQL PHP class for MySQL queries. Since all of my queries pass through the $ezsql->query() function, I thought it would be a good idea to implement a method to block common SQL injection techniques from $ezsql->query().
For example, the most common one is probably 1=1. So this regular expression should be able to block all variations of that:
preg_match('/(?:"|\')?(\d)(?:"|\')?=(?:"|\')?\1(?:"|\')?/',$query);
This would block "1"="1", '1'=1, 1=1, etc.
Is this a good idea? If so, what are some other common patterns?
Edit: Forgot to mention, I do use validation and sanitation. This is just an extra precaution.
Is this a good idea?
No. For two reasons:
You're doing it wrong (yes you just failed with your bare approach of a SQL blacklist). And no, I won't tell you how you could improve that because of 2:
It's a blacklist approach. You should not use a blacklist approach inside the database class itself. That's no added pre-caution, it's just useless. Blacklist could be added additionally at the request level of the webserver for example.
Instead use an existing blacklist, don't re-invent the wheel. If you want to learn how to develop your own SQL blacklist layer, help with the development of such existing components. This sort of security is not out-of-the-box so that you can just throw in a question like yours and you can actually expect concrete answers. Take care.
Is this a good idea?
Definitely NO.
Every time I see such a suggestion on an internet forum, I am wondering, what if the software this forum runs on followed such a pattern? A poor inventor would be just unable co come up with their solution, because software would block the post!
extra precautions wouldn't hurt. Better safe than sorry.
As I pointed out above, it apparently hurts. A database that cannot process some odd portions of data is a nonsense.
Besides, I do believe that only knowledge can make you safe.
Not random moves out of some vague ideas but sane and reasonable actions.
As long as you escape and quote the data that goes to the query and as long as you set the proper encoding for the escaping function, there is no reason to sorrow.
As long as you are using prepared statements to add your data to the query, there is no reason to sorrow.
As long as you are filtering SQL identifiers and keywords based on hardcoded whitelist, there is no reason to sorrow.
I am building a new web-app, LAMP environment... I am wondering if preg_match can be trusted for user's input validation (+ prepared stmt, of course) for all the text-based fields (aka not HTML fields; phone, name, surname, etc..).
For example, for a classic 'email field', if I check the input like:
$email_pattern = "/^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)" .
"|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}" .
"|[0-9]{1,3})(\]?)$/";
$email = $_POST['email'];
if(preg_match($email_pattern, $email)){
//go on, prepare stmt, execute, etc...
}else{
//email not valid! do nothing except warn the user
}
can I sleep easy against the SQL/XXS injection?
I write the regexp to be the more restrictive as they can.
EDIT: as already said, I do use prepared statements already, and this behavior is just for text-based fields (like phone, emails, name, surname, etc..), so nothing that is allowed to contain HTML (for HTML fields, I use HTMLpurifier).
Actually, my mission is to let pass the input value only if it match my regexp-white-list; else, return it back to the user.
p.s:: I am looking for something without mysql_real_escape_strings; probably the project will switch to Postgresql in the next future, so need a validation method that is cross-database ;)
Whether or not a regular expression suffices for filtering depends on the regular expression. If you're going to use the value in SQL statements, the regular expression must in some way disallow ' and ". If you want to use the value in HTML output and are afraid of XSS, you'll have to make sure your regex doesn't allow <, > and ".
Still, as has been repeatedly said, you do not want to rely on regular expressions, and please by the love of $deity, don't! Use mysql_real_escape_string() or prepared statements for your SQL statements, and htmlspecialchars() for your values when printed in HTML context.
Pick the sanitising function according to its context. As a general rule of thumb, it knows better than you what is and what isn't dangerous.
Edit, to accomodate for your edit:
Database
Prepared statements == mysql_real_escape_string() on every value to put in. Essentially exactly the same thing, short of having a performance boost in the prepared statements variant, and being unable to accidentally forget using the function on one of the values. Prepared statement are what's securing you against SQL injection, rather than the regex, though. Your regex could be anything and it would make no difference to the prepared statement.
You cannot and should not try to use regexes to accodomate for 'cross-database' architecture. Again, typically the system knows better what is and isn't dangerous for it than you do. Prepared statements are good and if those are compatible with the change, then you can sleep easy. Without regexes.
If they're not and you must, use an abstraction layer to your database, something like a custom $db->escape() which in your MySQL architecture maps to mysql_real_escape_string() and in your PostgreSQL architecture maps to a respective method for PostgreSQL (I don't know which that would be off-hand, sorry, I haven't worked with PostgreSQL).
HTML
HTML Purifier is a good way to sanitise your HTML output (providing you use it in whitelist mode, which is the setting it ships with), but you should only use that on things where you absolutely need to preserve HTML, since calling a purify() is quite costly, since it parses the whole thing and manipulates it in ways aiming for thoroughness and via a powerful set of rules. So, if you don't need HTML to be preserved, you'll want to use htmlspecialchars(). But then, again, at this point, your regular expressions would have nothing to do with your escaping, and could be anything.
Security sidenote
Actually, my mission is to let pass
the input value only if it match my
regexp-white-list; else, return it
back to the user.
This may not be true for your scenario, but just as general information: The philosophy of 'returning bad input back to the user' runs risk of opening you to reflected XSS attacks. The user is not always the attacker, so when returning things to the user, make sure you escape it all the same. Just something to keep in mind.
For SQL injection, you should always use proper escaping like mysql_real_escape_string. The best is to use prepared statements (or even an ORM) to prevent omissions.
You already did those.
The rest depends on your application's logic. You may filter HTML along with validation because you need correct information, but I don't do validation to protect from XSS, I only do business validation*.
General rule is "filter/validate input, escape output". So I escape what I display (or transmit to third-party) to prevent HTML tags, not what I record.
* Still, a person's name or email address shouldn't contain < >
Validation is to do with making input data conform to the expected values for your particular application.
Injections are to do with taking a raw text string and putting it into a different context without suitable Escaping.
They are two completely separate issues that need to be looked at separately, at different stages. Validation needs to be done when input is read (typically at the start of the script); escaping needs to be done at the instant you insert text into a context like an SQL string literal, HTML page, or any other context where some characters have out-of-band meanings.
You shouldn't conflate these two processes and you can't handle the two issues at the same time. The word ‘sanitization’ implies a mixture of both, and as such is immediately suspect in itself. Inputs should not be ‘sanitized’, they should be validated as appropriate for the application's specific needs. Later on, if they are dumped into an HTML page, they should be HTML-escaped on the way out.
It's a common mistake to run SQL- or HTML-escaping across all the user input at the start of the script. Even ‘security’-focused tutorials (written by fools) often advise doing this. The result is invariably a big mess — and sometimes still vulnerable too.
With the example of a phone number field, whilst ensuring that a string contains only numbers will certainly also guarantee that it could not be used for HTML-injection, that's a side-effect which you should not rely on. The input stage should only need to know about telephone numbers, and not which characters are special in HTML. The HTML template output stage should only know that it has a string (and thus should always call htmlspecialchars() on it), without having to have the knowledge that it contains only numbers.
Incidentally, that's a really bad e-mail validation regex. Regex isn't a great tool for e-mail validation anyway; to do it properly is absurdly difficult, but this one will reject a great many perfectly valid addresses, including any with + in the username, any in .museum or .travel or any of the IDNA domains. It's best to be liberal with e-mail addresses.
NO.
NOOOO.
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO.
DO. NOT. USE. REGEX. FOR. THIS. EVER.
RegEx to Detect SQL Injection
Java - escape string to prevent SQL injection
You still want to escape the data before inserting it into a database. Although validating the user input is a smart thing to do the best protection against SQL injections are prepared statements (which automatically escape data) or escaping it using the database's native escaping functionality.
There is the php function mysql_real_escape_string(), which I believe you should use before submitting into a mysql database to be safe. (Also, it is easier to read.)
If you are good with regular expression : yes.
But reading your email validation regexp, I'd have to answer no.
The best is to use filter functions to get the user inputs relatively safely and get your php up to date in case something broken is found in these functions.
When you have your raw input, you have to add some things depending on what you do with these data : remove \n and \r for email and http headers, remove html tags to display to users, use parameterized queries to use it with a database.