Store both BBCode and HTML version in database? - php

On Stackoverflow I've found questions about storing BBCode OR HTML into the database, but what about storing both? For example, I would create posts DB table with two columns: body_bbcode & body_html.
In body_bbcode I would store original post submitted by a user (forum member), and in body_html I would store parsed (HTML) version of that post.
So, for displaying forum posts I would use body_html, but for editing & quoting (replying with quote) I would use body_bbcode.
The reason why I want to do this is because the parser is using regex and without body_html it would need to convert at least 15 forum posts per topic page. Correct me if I'm wrong, but that can cause performance issues?
On the other hand, I didn't see anyone doing like this so I'm wondering what are the disadvantages of this approach, besides taking up more space in the Database?
Also, I am thinking of adding a new column in which I would store plain text version for search purposes, so that the tags themselves aren't searched (for example body_text).

The reason why I want to do this is because the parser is using regex and without body_html it would need to convert at least 15 forum posts per topic page. Correct me if I'm wrong, but that can cause performance issues?
A well designed bbcode regex will not hinder performance in any meaningful way.
Do not create "duplicate" columns for bbcode text and html text.
A major problem you run into with your suggested approach is that you will inevitably change your html code. (E.g., add a class to html links, change iframe dimensions of youtube embeds, etc.) Then you're stuck trying to update the data in the html column which would be problematic.

Related

Is it ok to get keywords and description from database?

This isn't really a coding question, but I wanted to ask about what is more efficient and what should I use.
I am making a tutorials website, and each tutorial is brought up onto a page "blog.php" on which all data comes from the database. Now I have two ways I can fill in the meta keywords tag and meta description tag.
I was thinking of making 2 new columns in the blogs table, keywords, and description, and the meta tags would be filled up correspondingly from the database, and the database would get the keywords and description from user input (whover wrote the blog). I know how I would do this but is this efficient? Because I heard search engines have a harder time reading stuff from the database, so I wanted to make sure.
So I can use that OR do you recommend that I use JQuery to get text from the title tag and use that type of stuff? It would be really great because I was hoping to use PHP to make it dynamic, but if I should use Jquery and javascript then please tell which is better!
Thanks!
Search engines never read stuff from the database.
They read only HTML generated by your script.
Storing keywords and description in database is all right.
The only thing you have to change in your setup is database design.
There shouldn't be keywords field in the blog table. Instead of that there should be keywords table and keywords_blog lookup table to link between keywords and tutorials.
Search engines don't read from your database, only you do! What search engines have sometimes struggled with is dyanmic pages, i.e. a page like blog.php where the content changes via the query string, so blog.php?id=1 etc.
What many people do in this situation is use human readable URLs along with rewriting etc so your URLs might be
/blog/what-i-did-today
/blog/why-x-sucks
or similar. This would all be server by blog.php (or index.php or whatever) and then you can easily allow your bloggers to add their own keywords and descriptions via your database.
But do you really need to write your own? There is a ton of software out there that you can just download and install that does this already. Do you need to re-invent the wheel?

Store data using a txt file

This question might seem strange but I have been searching for an answer for a long time and I couldn't find any.
Let's suppose you have a blog and this blog has many post entries just like any other blog. Now each post can have simple user comments. No like buttons or any other resource that would require data management. Now the query is: Can I store user comments on a single text file? Each post will be associated to a text file that holds the comments. So, if I have n posts I'll have n text files.
I know I can perfectly do this, but I have never seen it anywhere else and no one is talking about it. For me this seems better than storing all coments from all posts in a single mysql table but I don't know what makes it so bad that no one has implemented it yet.
Storing comments in text files associated with corresponding post? Lest see if it's good idea.
Okay adding new comments easy - write new text to the file. But what about format of your data? CSV? Ok then you would have to parse it before rendering.
Paging. If you have a lot of comments you may consider creating paging navigation for it. It can be done easily, sure. But you would need to open the file and read all the records to extract say 20.
Approve your comments. Someone posted new comment. You place it with pending status. So.. In admin panel you need to find those marked comments and process then accordingly - save or remove. Do you think it's convinient with text files? The same if use decided to remove its comment himself.
Reading files if you have many comments and many posts will be slower the it would be in case of database.
Scalability. One day you deside to extend you comments functionality to let one comment to respond to another. How would you do it with text files? Or example from comments by nico: "In 6 months time, when you will want to add a rating field to the comments... you'll have a big headache. Or, just run a simple ALTER query".
This is just for beggining. Someone may add something.
Well, there are good reasons why this isn't done. I can't possibly name them all, but the first things that come to mind:
Efficiency
Flexibility
Databases are much more efficient and flexible than plain text files. You can index, search and assign keys to individual comments and edit and delete any comments based on their key.
Furthermore, you'd get a huge pile of text files if the blog is quite big. While in itself that's not a problem, if you all save them in one directory, it can grow out of proportion and really increase the access time needed to find and open a specific text file.

Best approach to create a tag cloud from input text

I was wondering what would be the best approach to generate a tag cloud from a input text (while user is typing it). For example, if user types a story's text containing keywords "sci-fi, technology, effects", the tag cloud will be formed from each of this keywords ordered by relevance according to their frequency on every story. The tag cloud will be displayed in descending order and using the same font size, it's not the display algorithm, but the search algorithm I should implement.
I'm using mysql and php.
Should I stick to MATCH...AGAINST clause? should I implement a tags table?
More details
I have a mysql table containing a lot of stories. When user is typing one of his/her own, I want to display a tag cloud containing the most frequent words, taken from the input text, occurring on this set of stories that are saved on my db.
The tag cloud will only be used to show to the user the relevance of the words he/she has entered on his/her own story according to the frequency they occur on all stories entered by all users.
I think the first thing you need to do is more clearly define the purpose of your tagging system. Do you want to simply build tags based on the words that occur most frequently within the text? This strikes me as something designed with search rankings in mind.
...Or do you want your content to be better organized, and the tag cloud be a way of providing a better user experience and creating more distinct relationships between pieces of content (ie both of these are tagged sci-fi, so display them in the sci-fi category).
If the former is the case, you might not need to do anything but:
Explode the text by a delimiter like a single space explode(' ', $content);
Have a list (possibly in a config file or within the script itself) of words which will occur frequently which you want to exclude from being tags (and, or, this, the, etc. You could just jack them off pages like this: http://www.esldesk.com/vocabulary/pronouns , http://www.english-grammar-revolution.com/list-of-conjunctions.html
Then you just need to decide how many times a word has to occur (either percentage or numeric), and store those tags in a table that shows the connection between tags and content.
To implement the "as the user is typing" part you just need to use a bit of jQuery's ajax functionality to continually call your script that builds the tag list (ie on keydown).
The other option (better user experience) will incorporate a lot of the same elements, but you'll have to think about a bit more. Some things I would consider:
Do you want to restrict to certain tags (perhaps you don't want to allow just anyone to create new tags)?
How you will deal with synonyms
If you will support multiple languages
If you want a preference towards suggesting existing tags (which might be close) over suggesting new ones
Once you've fully defined the logic and user experience you can come back to the search algorithm. MATCH and AGAINST are good options but you may find that a simple LIKE will do it for you.
Good luck = )
If you want the tag cloud to be generated as the user is typing it, you can do it in two ways.
Directly update the tag cloud from the input text
Send the input text to the backend (in realtime using ajax/comet), which then saves, calculates the word frequency and returns data from which you generate the cloud.
I would go with the former using a jQuery plugin such as - http://plugins.jquery.com/plugin-tags/tag-cloud

Is it possible to auto-categorize posts in forums or BBS?

If I have a forum using tags to categorize posts, is it possible to automatically add tags according to contents and titles after posts are created ?
Thank you very much
The simplest way to do this would be to have a table of known tags. Iterate over each word in the post, and if the word is in the tag table add it to the list. To make this slightly more effective, you could store the tag in both its display and stemmed version (e.g., algorithms and algorithm). Then compared the stemmed words in the post with the stemmed tag name. See Porter's stemming algorithm for a simple way to do that (for English words).
A more effective solution would be using something like TF-IDF and associate vectors with each tag. Create a vector for the new post and compare it to each tag vector using cosine similarity. Whichever tags are above a certain threshold would be added to the post. I've never used it for auto-tagging, but in my experience it is a very effective matching tool when dealing with non-spammy data. (i.e., People aren't trying to cheat or fool the system.)
Both of these methods assume that you already have some sort of tag dictionary built to start things off. You could guess at tag names by looking at which uncommon words (need a frequency table for that) are used frequently in the post.
Try this auto-tagging PHP code:
http://www.dangrossman.info/2008/04/07/auto-tagging-content-with-open-calais/
There's a number of ways to go about this. Simple keyword matching or TF-IDF, as konforce suggest, are viable options. Others include:
Use Yahoo's term extraction webservice to extract significant terms from the text.
Use the Google Prediction API. Train it on a corpus of already tagged posts, then ask it to predict the tags of new posts.

PHP MySQL custom blog, formatting post

I'm creating my own blog in PHP and want to know your opinions on how I should format my post content.
Currently I store the post content as just plain text, call it when necessary, then wrap each line with P tags. I did this in case I wanted to change the way I formatted my text in the future and it would save me the dilema of having to remove all P tags from the posts in the DB.
Now the problem I have this this method is that if I want to add extra formatting in, e.g. lists etc those would also be wrapped with P tags which is not correct.
How would you do this, would you store text as plain text in the DB, or would you add the HTML formatting and store that in the DB to?
I'd prefer not to store unnessary HTML in the DB, but not sure of a way around it?
I think the best way would be to keep the html in the db. You would have too much to work with parsing the text if you don't use html.
See how it's done in other blog tools. I know that Joomla, for example, keeps all html in the db. I know Joomla isn't blog tool :) but still...
Wordpress stores html in the db. You say you are concerned about storing 'unnecessary' html in the db. What makes it unnecessary? I think it is the opposite. You may have headings or bold or italic text in your post. If storing as plain text, how do you save this formatting? How are you saving the lists you mentioned?
I see it as a better practice to store raw user input in the database, and format it on output, caching the result if it is needed. That way you can change the way you are parsing things easily without having to regex-replace anything inside the database. You can also store the raw input in one column, and the formatted HTML in another one.
I assume that you are formatting your raw text with the Markdown or the Textile syntax?
If you store HTML in your DB, you will be just a few clicks away from your current situation:
you can use strip_tags() to remove HTML formating and in case of bigger changes, you can run HTML Tidy on your code to remap tags and classes.

Categories