Cleansing external XML data before sending to MySQL database

Cleansing external XML data before sending to MySQL database - php

I'm working with a number of XML feeds to retrieve data (from an external source). I will be retrieving the data, then sending this to my own MySQL database, so that I can then manipulate it how I wish.
I'm just hoping for some advice on best practice in terms of this process please. I'd like to make this as automated as possible, but I'm cautious of sending unvalidated XML data from an external source straight to my own database.
I will be putting in place a few standard validations to escape strings, etc, but should I be looking to 'cleanse' every piece of data (automatically) before committing to my own DB?
Should I perhaps validate each piece of data against it's own set of rules before it makes it's way to my database?
I hope that's clear enough. I'd love to hear some opinions if possible please.

There are 2 things you should worry about: 1 sql injection 2 cross-site scripting.
The first one is simple just use prepared statemants mysqli or PDO.
For corss-site scripting you can either choose to clean it before you put it in a database or when you retrieve it. Personnely i like to do the second one. just use the function htmlspecialchars() before you echo something and you should be safe.

Related

Is it good to use htmlspecialchars() before Inserting into MySQL?

I am a little confused on this. I have been reading about htmlspecialchars() and I am planning to use this for the textareas POST to prevent XSS attack. I understand that usually htmlspecialchars() are used to generate the HTML output that is sent to the browser. But what I am not sure is:
1) Is it a safe practice to use htmlspecialchars() to the user input data before I insert it into MySQL? I am already using PDO prepared statement with parameterized values to prevent SQL Injection.
2) Or, I really dont need to worry about using htmlspecialchars() to inserted values (provided they are parameterized) and only use htmlspecialchars() when I fetch results from MySQL and display it to users?

As others have pointed out, #2 is the correct answer. Leave it "raw" until you need it, then escape appropriately.
To elaborate on why (and I will repeat/summarise the other posts), let's take scenario 1 to its logical extreme.
What happens when someone enters " ' OR 1=1 <other SQL injection> -- ". Now maybe you decide that because you use SQL you should encode for SQL (maybe because you didn't use parameterised statements). So now you have to mix (or decide on) SQL & HTML encoding.
Suddenly your boss decides he wants an XML output too. Now to keep your pattern consistent you need to encode for that as well.
Next CSV - oh no! What if there are quotes and commas in the text? More escaping!
Hey - how about a nice interactive, AJAX interface? Now you probably want to start sending JSON back to the browser so now {, [ etc. all need to be taken into consideration. HELP!!
So clearly, store the data as given (subject to domain constraints of course) and encode appropriate to your output at the time you need it. Your output is not the same as your data.
I hope this answer is not too patronising. Credit to the other respondents.

I need a better way to stop hackers from inserting fatal data into my database

I currently have a guest book on my site. It is very basic. Recently I have been doubting whether my current method of sanitizing the input added to the database is secure enough. Here is the current snippet of the code that collects the data and sanitizes it:
$name=$_POST['name'];
$c_name=mysql_real_escape_string(htmlspecialchars($name));
$detail=mysql_real_escape_string(htmlspecialchars($_POST['detail']));
Then obviously I would send the data to the server and place them in the necessary tables. Is this way efficient or are there any security holes I should be aware of? Thanks!

You should never use htmlspecialchars() on data you are sending to the database; it should only be used on output. Other than that, I recommend you read "Making Wrong Code Look Wrong" so that you can keep your variables straight.
... making robust code by literally inventing conventions that make errors stand out on the screen.

Validating user input?

I am very confused over something and was wondering if someone could explain.
In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before inserting into database, not on everything as i do prefer to use regular expressions when i can although i find them hard to work with. Now obviously i will use mysql_real_escape_string as the data is going into the database but not sure should i be using htmlentities() only when getting data from database and displaying it on a webpage as doing so before hand is altering the data entered by a person which is not keeping it's original form which may cause problems if i want to use that data later on for use for something else.
So for example, i have a guestbook with 3 fields name, subject and message. Now obviously the fields can contain anything like malicious code in js tags basically anything, now what confuses me is let say i am a malicious person and i decided to use js tags and some malicous js code and submit the form, now basically i have malicious useless data in my database. Now by using htmlentities when outputting the malicious code to the webpage (guestbook) that is not a problem because htmlentities has converted it to it's safe equivalent but then at the same time i have useless malicious code in the database that i would rather not have.
So after saying all this my question is should i accept the fact that some data in the database maybe malicious, useless data and as long as i use htmlentities on output everything will be ok or should i be doing something else aswell?.
I read so many books saying about filtering data on receiving it and escaping it on outputting it so the original form is kept but they only ever give examples like ensuring a field is only an int using functions already built into php etc but i have never found anything in regards ensuring something like a guestbook where you want users to type anything they want but also how you would filter such data apart from mysql_real_escape_string() to ensure it does not break the DB query?
Could someone please finally close this confusion for me and tell me what i should be doing and what is best practice?
Thanks to anyone who can explain.
Cheers!

This is a long question, but I think what you're actually asking boils down to:
"Should I escape HTML before inserting it into my database, or when I go to display it?"
The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars) when you go to display it to the user, and not before putting it into the database.
The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars before the insert, you are no longer being faithful.
You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?
You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).
On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string directly. PDO has more advanced functionality, including type checking.

mysql_real_escape_string() is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.
htmlentities() and htmlspecialchars() come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.

There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.

Securing PHP forms for beginners? Resources?

I successfully built my first html/PHP form that passes variables between multiple pages using the _POST global variable and then emails me the results using the mail() function.
I'm sure this form is incredibly insecure as it is now and vulnerable to all matter of exploits and I want to know how to patch up the holes, however I'm pretty much a beginner to PHP.
Can you recommend any simple-to-follow tutorials for securing PHP forms?

The first, second and third most important thing you need to do when securing your code is to assume ALL data your code handles is somehow meant to steal your data and sabotage your server. Even data you have personally hard-coded into the scripts! :P
Make sure every piece of data is validated and verified before you use it. Use the intval and floatval functions to verify numbers, regular expressions to verify text fields (usernames, passwords, etc...), and always try to use Parameterized Statements when doing SQL queries.
And keep user input away from includes and shell commands altogether. If you need to do includes and shell commands based on use input, use switch and/or if statements on the actual user input and execute static commands based on them. And if that doesn't work either; validate, verify and sanitize the input extremely thoroughly before using it... then cross your fingers and hope all the good exploiters are looking the other way :)
Most importantly; be very very very paranoid. People ARE out to get you! :)
... then find yourself a relaxing hobby, so you don't go crazy xD

I would recommend using PHP Sessions instead of passing variables between forms -- that's one way of securing your input data.
check out this for a start

If you are new to forms with php, this site might be interesting for you: myphpform.com
Another site that gives an overview about possible attacks: phpsec.org

Some small security tips that helped me with my first PHP apps:
If you receive data on one web page and access it on other pages of the same site, throw them in $_SESSION[]. Never pass them in hidden form fields via POST or GET.
If textual data received from the user is displayed as part of a web page or mailed as a HTML mail, always strip_tags() the data before showing/mailing it (to counter XSS attacks).
All data that is received from the user and then needs to be stored in a SQL database, needs to be escaped to counter SQL injection attacks (i.e. mysql_real_escape_string for mysql or use a DB abstraction).

I'm learning PHP on my own and I've become aware of the strip_tags() function. Is this the only way to increase security?

I'm new to PHP and I'm following a tutorial here:
Link
It's pretty scary that a user can write php code in an input and basically screw your site, right?
Well, now I'm a bit paranoid and I'd rather learn security best practices right off the bat than try to cram them in once I have some habits in me.
Since I'm brand new to PHP (literally picked it up two days ago), I can learn pretty much anything easily without getting confused.
What other way can I prevent shenanigans on my site? :D

There are several things to keep in mind when developing a PHP application, strip_tags() only helps with one of those. Actually strip_tags(), while effective, might even do more than needed: converting possibly dangerous characters with htmlspecialchars() should even be preferrable, depending on the situation.
Generally it all comes down to two simple rules: filter all input, escape all output. Now you need to understand what exactly constitutes input and output.
Output is easy, everything your application sends to the browser is output, so use htmlspecialchars() or any other escaping function every time you output data you didn't write yourself.
Input is any data not hardcoded in your PHP code: things coming from a form via POST, from a query string via GET, from cookies, all those must be filtered in the most appropriate way depending on your needs. Even data coming from a database should be considered potentially dangerous; especially on shared server you never know if the database was compromised elsewhere in a way that could affect your app too.
There are different ways to filter data: white lists to allow only selected values, validation based on expcted input format and so on. One thing I never suggest is try fixing the data you get from users: have them play by your rules, if you don't get what you expect, reject the request instead of trying to clean it up.
Special attention, if you deal with a database, must be paid to SQL injections: that kind of attack relies on you not properly constructing query strings you send to the database, so that the attacker can forge them trying to execute malicious instruction. You should always use an escaping function such as mysql_real_escape_string() or, better, use prepared statements with the mysqli extension or using PDO.
There's more to say on this topic, but these points should get you started.
HTH
EDIT: to clarify, by "filtering input" I mean decide what's good and what's bad, not modify input data in any way. As I said I'd never modify user data unless it's output to the browser.

strip_tags is not the best thing to use really, it doesn't protect in all cases.
HTML Purify:
http://htmlpurifier.org/
Is a real good option for processing incoming data, however it itself still will not cater for all use cases - but it's definitely a good starting point.

I have to say that the tutorial you mentioned is a little misleading about security:
It is important to note that you never want to directly work with the $_GET & $_POST values. Always send their value to a local variable, & work with it there. There are several security implications involved with the values when you directly access (or
output) $_GET & $_POST.
This is nonsense. Copying a value to a local variable is no more safe than using the $_GET or $_POST variables directly.
In fact, there's nothing inherently unsafe about any data. What matters is what you do with it. There are perfectly legitimate reasons why you might have a $_POST variable that contains ; rm -rf /. This is fine for outputting on an HTML page or storing in a database, for example.
The only time it's unsafe is when you're using a command like system or exec. And that's the time you need to worry about what variables you're using. In this case, you'd probably want to use something like a whitelist, or at least run your values through escapeshellarg.
Similarly with sending queries to databases, sending HTML to browsers, and so on. Escape the data right before you send it somewhere else, using the appropriate escaping method for the destination.

strip_tags removes every piece of html. more sophisticated solutions are based on whitelisting (i.e. allowing specific html tags). a good whitelisting library is htmlpurifyer http://htmlpurifier.org/
and of course on the database side of things use functions like mysql_real_escape_string or pg_escape_string

Well, probably I'm wrong, but... In all literature, I've read, people say It's much better to use htmlspellchars.
Also, rather necessary to cast input data. (for int for example, if you are sure it's user id).
Well, beforehand, when you'll start using database - use mysql_real_escape_string instead of mysql_escape_string to prevent SQL injections (in some old books it's written mysql_escape_string still).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.