I'm creating my own blog in PHP and want to know your opinions on how I should format my post content.
Currently I store the post content as just plain text, call it when necessary, then wrap each line with P tags. I did this in case I wanted to change the way I formatted my text in the future and it would save me the dilema of having to remove all P tags from the posts in the DB.
Now the problem I have this this method is that if I want to add extra formatting in, e.g. lists etc those would also be wrapped with P tags which is not correct.
How would you do this, would you store text as plain text in the DB, or would you add the HTML formatting and store that in the DB to?
I'd prefer not to store unnessary HTML in the DB, but not sure of a way around it?
I think the best way would be to keep the html in the db. You would have too much to work with parsing the text if you don't use html.
See how it's done in other blog tools. I know that Joomla, for example, keeps all html in the db. I know Joomla isn't blog tool :) but still...
Wordpress stores html in the db. You say you are concerned about storing 'unnecessary' html in the db. What makes it unnecessary? I think it is the opposite. You may have headings or bold or italic text in your post. If storing as plain text, how do you save this formatting? How are you saving the lists you mentioned?
I see it as a better practice to store raw user input in the database, and format it on output, caching the result if it is needed. That way you can change the way you are parsing things easily without having to regex-replace anything inside the database. You can also store the raw input in one column, and the formatted HTML in another one.
I assume that you are formatting your raw text with the Markdown or the Textile syntax?
If you store HTML in your DB, you will be just a few clicks away from your current situation:
you can use strip_tags() to remove HTML formating and in case of bigger changes, you can run HTML Tidy on your code to remap tags and classes.
Related
On Stackoverflow I've found questions about storing BBCode OR HTML into the database, but what about storing both? For example, I would create posts DB table with two columns: body_bbcode & body_html.
In body_bbcode I would store original post submitted by a user (forum member), and in body_html I would store parsed (HTML) version of that post.
So, for displaying forum posts I would use body_html, but for editing & quoting (replying with quote) I would use body_bbcode.
The reason why I want to do this is because the parser is using regex and without body_html it would need to convert at least 15 forum posts per topic page. Correct me if I'm wrong, but that can cause performance issues?
On the other hand, I didn't see anyone doing like this so I'm wondering what are the disadvantages of this approach, besides taking up more space in the Database?
Also, I am thinking of adding a new column in which I would store plain text version for search purposes, so that the tags themselves aren't searched (for example body_text).
The reason why I want to do this is because the parser is using regex and without body_html it would need to convert at least 15 forum posts per topic page. Correct me if I'm wrong, but that can cause performance issues?
A well designed bbcode regex will not hinder performance in any meaningful way.
Do not create "duplicate" columns for bbcode text and html text.
A major problem you run into with your suggested approach is that you will inevitably change your html code. (E.g., add a class to html links, change iframe dimensions of youtube embeds, etc.) Then you're stuck trying to update the data in the html column which would be problematic.
What is the most secure way to save data from a textarea that contains a <pre><code> text in it? , using strip_tags will remove all the tags from the text..
is it save to use this:
strip_tags($input, '<pre><code><other accepted tags except script,php,...');
or should I do other things too?
What is the most secure way to save data from a textarea that contains a <pre><code> text in it?
Save it as it is.
When you take that data back out of the database and put it into a web page, call htmlspecialchars on it first to escape it so that it looks like normal text on the page.
If you want the user to be able to input actual markup, but you only want to allow certain tags, then you've got a different problem and you want something like htmlpurifier.
Either way, the input or database layer is not the right place to be worrying about output formatting concerns.
If you are saving the contents of the text area to mysql database you should use mysqli_escape_string. before saving the data.
Also you can remove javascript tags from the posted data using regular expression. e.g preg_replace
So I'm using Markdown to format text input from user:
http://michelf.com/projects/php-markdown/
But I'm doing this destructively, so the text turns into HTML before database update. Can I transform it back to markdown when displaying it on the screen? The reason is that I want to allow the user to edit that text, and need it in the original form...
You should have two columns in your database: the original input (markdown syntax) and your post-markdown HTML.
When the page is loaded you pull the HTML.
If the user wants to edit you pull the markdown syntax original, and upon edit completion overwrite the HTML stored in the database.
Have you tried http://milianw.de/projects/markdownify/ ?
However, I should note that you should generally not store display formats in your database. It's worth considering storing the markdown in the DB and converting it to HTML on demand.
method is the most efficient when translating bunches of text/web pages including HTML? I want to translate the text, but keep the HTML.
Also, should I keep the words in a database or an array?
When you say "translating", do you mean from one language to another? If so, you can use regular expressions to capture the data between open and closing tags of your HTML without losing the markup. I'm not sure however why you would want to store your data in a database, unless you were going to retrieve it at a later point?
If this is for a translation on the fly, it will always be faster to store your data in memory -- your Array or simply update the HTML while you loop through the data and eliminate the need for an Array altogether.
Hey guys Im building a web-app where users can login and post/read articles and comment and things.
Im giving them a form to post an article where they provide its title, description and text.
leaving the validations and sql injections aside (already done that), I need help with displaying the article stored in MySQL database as TEXT.
Im taking the article text from a textarea, and displaying it in a p tag but then obviously it skips the new line characters entered by the users, but the pre tag makes it ugly by giving a wide scrollable display.I want to know which tag is appropriate to be used for this purpose? or is even taking an article through textarea correct?
Im a learner and am building such a webapp with articles and comments sections for the first time, so any suggestions are most welcome. Thank you in advanced.
My recommendation would be of two choices:
1. Use Plain Text:
If you want that user can not put any HTML in the contents, show a simple HTML Textarea input to user, then when the user enters a new line (Enter key) it would be \n in your database. When you want to print the article just use nl2br($article_contents); and it will convert the new lines (\n) into HTML line breaks.
2. Rich Text:
If you want users to put HTML contents in article then it would be easy if you use any Text Editors like TinyMCE. TinyMCE will make it easy for your users to do simple HTML Formatting like headings, bold, italic, paragraph alignments, color, add images. Then in the PHP side use strip_tags function to allow only the certain tags so the user could not insert any malicious code like XSS injections into HTML contents. For example:
strip_tags($article_contents, "<u><b><i><font><span><p>");
Proposed Answer:
Use <span></span>
Tags like <p></p><div></div> take up as much space as they can, while <span></span> takes up as little as it can to hold whatever is inside it, so it might be more suitable for you.
Let me know if that worked for you.
In PHP you can use function nl2br that changes all newline characters to BR HTML tag. http://php.net/nl2br