Best way to store articles in a database? (php and sql) - php

I want to store articles in a database, but I cannot seem to find much information on the best way to do this, from what I have read it seems split between most people on how to effectively do this. A lot of people will suggest a way and others will point out sql injection issues, and I cannot seem to find much about this topic that is fairly new.
Here is the html of an article:
<div id="main">
<article>
<header>
<h3> Title </h3>
<time pubdate="pubdate"> 2011-07-22 </time>
</header>
<p> Article Text </p>
</article>
</div>
Ideally I guess it would be best to store the chunk of html making up each article into a database but there seems to be a lot of problems with this, and like I said I can't find many posts over this particular topic, and as someone new to php and databases I want to get some input on the best way to go about this before I proceed.

When ever I store a large amount of user text, I just base64 it, then before you display it, make sure to run it through htmlspecialchars, this will keep html from working, so htmlspecialchars(base64_decode($content)) would work fine for displaying.
If you are using bbcode for formatting, then make sure to run htmlspecialchars before you start formatting your bbcode.
This isn't the only way, you can sanitize inputs without base64'ng it, but I see no reason not to, especially when nobody needs to see directly into the database.

Storing it in a SQL db is fine, but you can and you must protect against SQL injection in your code.
ie, cleaning all user input before sending it to the db.
PHP Manual on SQL injection

I think the best method is to just store pure text, but usually that is not the case when you want to use extra formatting. You can convert the html tags to bbcodes or similar tags which can prevent sql injection however if you escape the html content it would be as safe as any other content. so do mysql_real_escape_string on whatever data you put into the database and you would be fine.
However, the best practice would be to store the html code along with the article text as a html file which you can serve when the user requests the data but in the database you can just store purely text for indexing and search purposes. This is ideal as you would not need the html content for searching anyways and it will also prevent sql attacks if the content is purely text that is to be stored in the database. But as the user requests the file get the content of the html file for that article which contains the formatted text and serve that.

use lucene or sphinx, either from Zend_Lucene or through solr. they will make the indexing for the article faster, and you can also do a full text search on them too. using lucene or solar to index and search in these cases is pretty much a standard procedure, and will let you scale to millions of articles.
sphinx is a daemon that runs "in parallel" to the mysql daemon. for using sphinx, you can use the pecl sphinx extension.
if you want to go with lucene, you can try zend_lucene or solr, which is actually a tomcat distro with an webapp that exposes lucene as a web service, so you can access it in a standard way, independantly of the language.
choosing either of them is ok. you can index by full text (content), and categories, or whatever you need to index by.

the safest way to prevent sql injection here is to use prepared statement.
$stmt = $con->prepare("INSERT INTO Articles (Title, Date, Article) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $title, $currentDate, $articleBody);
The question marks represent the values you will pass. "sss" is saying that each of the 3 variables will be a string and then you can call this prepared statement and pass it the correct values.
$title = $_POST[title];
$currentDate = date("Y-m-d H:i:s");
$articleBody = $_POST[article];
$stmt->execute();
this will make sure that no malicious sql can be injected into your database.
hope this helps!

Store your article as TEXT :) Just pass it through this php function first to prevent injection attacks:
// Prevent MySQL Injection Attacks
function cleanQuery($string){
if(get_magic_quotes_gpc()) // prevents duplicate backslashes
$string = stripslashes($string);
return mysql_escape_string($string);
}

Related

PHP and MySQL - Should I validate/sanitize my data when pulling it from my database before displaying to user?

I validate and sanitize all my data before inserting it into the database. Would it be considered a good or a redundant pactice to validate it when pulling it form the database before displaying it?
This boils down to how much to trust your own code. On one extreme, I could forgo the validation completely if I knew that onlyI would use the client-side interface and would never make a mistake. On the other, I could validate data in every class in case I'm working with others and they forgot to properly do their job. But what's a generally good practice in this particular case?
Input validation should be a yes/no proposition. You should not modify input and save it.
You should use Htmlentities after pulling from the DB and before showing. This is because it's better to clean data just before using it at the point of failure. This is why prepared statements work so well, because there is no external code you rely on.
Say you forget to sanitize 1 field in 1 form, then when you ouput that data to other users you have no way to see that mistake from the code that does the output (assuming its not in the same file).
The less code between the sanitizing and the end result is better.
Now that is not to say save everything and validate it later. Take an email for example, you should validate that for the proper format before saving.
But for other things you don't want to modify user input. Take a file upload. Some people will change the filename to sanitize it, replace spaces etc. This is good but I prefer to create my own filename, and then show them the origainal file name, while the one I use on the server is a hash of their username and the name of the file. They never know this, and I get clean filenames.
You start modifying user data, and it becomes a chore to maintain it. You may have to un-modify it so they can make edits to it... etc. Which means you are then doing way more work then if you just clean it when outputting it.
Take for example the simple act of replacing a users \n line returns with a <br> tag. User inputs into a text field then you change it to html and save it. (besides security reasons not to do this) when user wants to edit the data, you would have to take the <br> and replace them with \n so they can edit it. Security reasons being now you have decided that raw HTML in that field is ok and will just output the raw field, allowing someone a possibility to add their own HTML. So by modifying user data we have created more work for yourself, and we have made assumptions that the data is clean before inserting it when we output it. And we cannot see how it was cleaned when we output it.
So the answer is it depends on the data and what sanitation you are doing.
Hope that makes sense.
I guess there is not need of validating or sanitizing the data from the db as you are doing it before inserting
A attacker always plays with the data which he is sending to the server and just analyis the data coming as a response . They plays with input not with the output.So just secure your data before sending it to server or db .

How do I sanitize data from users before sending it to mySQL?

I am making a forum at this moment.
I would like to sanitize my input data (that is, the posts from users) before sending it to the MySQL database.
I already have been searching some functions to do that, but I'm not sure if I have used enough of them and if they're all secure enough. Any suggestions are welcome.
Here is the code I have:
$message=$_POST['answer'];
$message=nl2br($message); //adds breaks to my text
$message=stripslashes($message); //removes backslahes (needed for links and images)
$message=strip_tags($message, '<p><a><b><i><strong><em><code><sub><sup><img>'); //people can only use tags inside 2nd param
$message = mysql_real_escape_string($message); //removes mysql statements i think (not sure)
edit: Please tell me if I should add some tags to the strip_tags function. Maybe I have forgotten some.
Try using PDO instead. It has great binding function, which really improves security. Here's some examples: http://php.net/manual/pl/pdostatement.bindvalue.php
PDO is by default in PHP5, so pretty much everywhere these days.
If you want to allow limited HTML to be used in forum (as seen by the way you are using strip_tags()), use HTMLPurifier; otherwise you are vulnerable to javascript in attributes of those tags.
By the way, right now you are stripping the <br> tags you've added
When you save to DB:
$message=strip_tags($message, '<p><a><b><i><strong><em><code><sub><sup><img>'); //people can only use tags inside 2nd param
$message = mysql_real_escape_string($message); //removes mysql statements i think (not sure)
When you output:
$message=nl2br($message); //adds breaks to my text
$message=stripslashes($message); //removes backslahes (needed for links and images)
Besides, use htmlspecialchars when you write into html input elements like text or textarea
OBS: Don't reinvent the wheel. Learn some PHP framework like codeigniter that provides very secure ways to manage data.
.

PHP prevent HTML in form text field?

I have a text form field that users my enter notes into. I then use a PHP/MySQL database to store these entries. How do I prevent somebody from entering HTML into the text field?
You're probably looking for strip_tags
Dont do anything to the text, just store it as they enter it.
Reason being is that maybe you was to add content that looks like html but actually isn't. for example
I was updating the site erlier and i had to add a few < br > tags to let the content move down a touch.
What you shuold be doing is storing the content as it is within the database making sure that you escape the data for SQL injection purposes, and then upon output to the browser you should escape using htmlentites the content like so:
<div id="notes">
<?php echo htmlentities($row['note']) ?>
</div>
this way the html tags does not take any effect on the actual DOM as there escaped, the desired output within the DOM should look like:
I was updating the site erlier and i had to add a few < br > tags to let the content move down a touch.
and the user would actually see the <br> as plain text
Hope this helps.
if you're planning to also store the data in your database, you need to clean the input using mysql_real_escape_string() to prevent SQL injection (http://en.wikipedia.org/wiki/SQL_injection)
1. Filter input
First filter your input
Input filtering is one of the
cornerstones of any application
security, independently of the
language or environment.
2. Use PDO
Next use PDO prepared statements to make your SQL queries safe.
A prepared statement is a precompiled
SQL statement that can be executed
multiple times by sending just the
data to the server. It has the added
advantage of automatically making the
data used in the placeholders safe
from SQL injection attacks.

PHP htmlentities() on input before DB insert, instead of on output

I wonder if there's any downside or bad practice in doing the following procedure:
$user_input -> htmlentities($user_input) -> mysql_escape($user_input) -> insert $user_input into DB
Select $user_input from DB -> echo $user_input
instead of doing the following:
$user_input -> mysql_escape($user_input) -> insert $user_input into DB
Select $user_input from DB -> echo htmlentities($user_input)
As we display the same $user_input on a lot of places it feels more efficient do to it on the input instead, are there any downsides / bad practice / exploit-ability in doing it this way?
Cheers!
Good replies to the question from:
#Matt: In general, to keep things readable and maintainable, try to store it as close to the original, unfiltered content as possible. It depends on two things:
Is any other person/program going to reference this data?
Does the data need to be easily editable?
#Sjoerd: There is a downside if you want to display the data as something else than HTML, e.g. a CSV download, PDF, etc.
It depends on two things:
Is any other person/program going to reference this data?
Does the data need to be easily editable?
The advantage of method one is that, in the case that the data is used in one place, and htmlentities() would be called every time, you'd be saving this step.
However, this would only leave a notable improvement if the HTML data is very large. In general, to keep things readable and maintainable, try to store it as close to the original, unfiltered content as possible.
In fact, you might find that HTML is the wrong thing to store anyway. It might be better to store something like Markdown and simply convert it to HTML when viewed.
I'd advice against it. If you ever need that data for anything other than displaying it as HTML (display in console, send in text email, write to log, etc) , you'll have to convert it back.
A good practice is to apply such transformations only at the last moment. Use mysql_escape before inserting into the database, use htmlentities (or htmlspecialchars) before displaying as HTML. That way you always know where your escape functions should be. If they're not there, you can easily tell you're doing something wrong. You also know that data in the database is always clean and you don't need to remember if you encoded it, what with and how to turn it back.
There is a downside if you want to display the data as something else than HTML, e.g. a CSV download, PDF, etc.

How can I store PHP code inside of a mysql table

I am working on building a small php/mysql script that will act something like a wordpress blog but will just be a small site for my eyes only to store PHP code snippets. So I will have categories and then pages with sample code that I write with a javascript syntax highlighter. Instead of storing my php code snippets in the file I am wanting to save them to mysql DB. So what is the best way to save PHP into mysql and to get it out of mysql to show on the page?
My end result will be something like this
alt text http://img2.pict.com/c1/c4/69/2516419/0/800/screenshot2b193.png
Update:
I just wasn't sure if I needed to do something special to the code before sending it to mysql since it has all different kinds of characters in it
Just store in a text field, as is. Not much more beyond that.
If you're not using some kind of database abstraction layer, just call mysql_real_escape_string on the text.
Do you want to be able to search the php code? If so, I recommend using the MyISAM table type as it supports full text indexes (InnoDB does not). Your choices for column type when it comes to a fulltext index are char, varchar and text. I would go with text as your code snippets might get too long for the other types.
Another point worth mentioning, is make sure you properly escape all php code (or any value for that matter) before you insert it. The best way to do this is by using parameterized queries.
Unless I'm missing part of the problem, you should be safe storing it as a TEXT field in a MySQL database. Just make absolutely sure you sanitize the code snippets, as PHP code in particular is quite likely to contain the characters that will escape out of an SQL string. (If you're already using an SQL framework, odds are the framework is doing this for you.)
Store as text (varchar) in the database.
Use cascading style sheet (css) to format code.
http://qbnz.com/highlighter/
Try this:
mysql select ...
eval('?>' . $row['phpcode'] . '<?php ');

Categories