why making a user registration must be regular? - php

I'm programming a user registration. However, I found a charactor limiting to 'username' from sample codes, such as just '.', ''', '-' are accepted, no space or other blank, etc.
Are those restrictions necessary?
I'm using MySQL+PHP. If I adopt the following several ways:
change the collation of the column to 'utf8_general_ci';
pull in the function 'mysql_escape_string' or 'mysql_real_escape_string' to PHP;
create a relation table about username <-> userID (the 'username' is what the client input, userID is a INT number.). As well as just use 'userID' in the database, but 'username' only display in HTMLs.
Do I really need a regular expression?
Thank you for your help.
PS: I'm a Chinese, so Chinese characters are required.

Since this is a web app, apart from using mysql_real_escape_string, I would also recommend stripping or disallowing anything that can construct HTML. Generally forbidding "<" and ">" is enough.
You really wouldn't want some user to enter their name as:
<script src=http://malicious/script.js></script>
The alternative solution is to use htmlspecialchars when outputting data to your page.

Those restrictions are not neccessary, however you must ensure, that username is a valid and unique string (with or without regex)
when it comes to stripping vulnerable characters from strings and mysql injection I would advice to use Mysqli extension for prepared statements, which takes care of escaping and you don't have to escape every string manually

it's up to you, but using mysql_real_escape_string should be fine in my opinion (and I'm pretty paranoid)

Related

PHP allowing special characters for name but still keeping security?

I was wondering what would be the best way to allow users with names that contain special characters to be able to register to website witout 'pre-converting' them into non-special character names before input, but still to keep my website secure (like to make it unable or to avoid registering with a name like "-.lčćo+'90'žž++'-.." or something like that) ?
Thanks a lot.
Assuming you're storing the user data in a database, simply make sure you're storing the data with a Unicode character encoding (as opposed to ASCII, which doesn't support special characters, or at least not as many as Unicode), secure against SQL injection (look up PDO and prepared statements - here's a good tutorial), and you should be good.

Zend\escap data before inserting into database

I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.

Regular Expression Replace for Contact Form

In php I have the following reqular expression:
$regexp = "/^([-a-z0-9.,!#'?_-\s])+$/i";
Im trying to validate my websites contact form (specifically the message field) to ensure no nasty code has been entered. The problem I am having is that certain normal punctuation and characters I need to allow, but I'm worried they could be used to insert malicious code.
For any character not obeying the expression above, I would like to replace it to make it safe. Two questions:
How do I do the replacement?
What should I replace the character with? For example I am not allowing parenthesis ( ). would it be best practice to replace like this "(" ")" or maybe \( \)?
EDIT
The data will be sent to an email address and saved to a database
Mmh why don't you just allow every character to be inserted in the contact form, converting them all with htmlentities as soon as they reach the php script after form submit? That way your users will be able to say what they want, and you won't have any problem with "malicious code" :)
And do not forget to use a proper database wrapper (PDO)
or at least escape when inserting into the database.
– knittl
EDIT: added Knittl's quote to stress it again :)
Use the filter extension. More specifically, use the filter_input() function with a sanitizing filter. For example:
$message = filter_input(INPUT_POST, 'message', FILTER_SANITIZE_STRING);
This will make sure that tags are stripped out of the message and that it is safer to handle.
However, it does not mean that you should treat it as 100% safe. You still need to take precautions when saving the message to the database (such as using the database driver's escape method, and removing unwanted/unneeded/suspicious stuff from the message), as well as making sure that it is safe to output to the client.

How to store <textarea> data, plus escape and return data

What's the best route for storing data in MySQL. With MySQL should I just use, TEXT as my field type?
As well when using mysql_real_escape_string() with return'ed values \r\n .
But should I be running the htmlentities() on it after that?
And then when I return data to the screen I should use, NL2BR()?
Just trying to figure out the best route here for storing this information.
Thank you for your help!
TEXT or TINYTEXT or anything similar should be fine for storing ASCII data from the user. If you don't need a lot of space you may think about VARCHAR
i think that mysql_real_escape_string() escapes characters that may compromise the security of an SQL query (single quote, double quote, etc.) but doesn't do much more than that.
htmlentities() converts reserved html characters like < and > into their html encoded equivalent, < and > respectively. These characters are not dangerous for SQL queries so you probably do not need to escape them unless you want to display the HTML tag entered by the user as text, and not let it be interpreted as HTML.
NL2BR() is probably not necessary either.
Most importantly, your decision on when to use each of these functions will depend on your end application. You may need / want some but not others ( though you should definitely use mysql_real_escape_string() )
Really depends on what you are trying to store. For things such as usernames, passwords, etc... then you can use varchar. But if your storing long text such as news posts or html data, then you can use TEXT or LONG TEXT (Depending on how long it is).
You should ALWAYS use mysql_real_escape_string() when inserting into the DB. If you're outputting HTML from the DB, you may wan to run htmlentities or html_specialchars to ensure that you aren't outputting user injected javascript that could redirect your users to hacker websites and such.
One other idea is that you could escape your data using htmlentities before inserting into the DB, but it's your choice.
NL2BR is great for forcing all \r\n to tags instead.
So, it seems like your on the right track...

What is the correct/safest way to escape input in a forum?

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!
When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !
mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.
There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.
The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.
I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")
First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.
I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.

Categories