Large strings: Text files or SQL DB?

Large strings: Text files or SQL DB? - php

I am coding a forum system using PHP.
I am currently storing a threads ID, title, author, views and other attributes in an SQL database and then storing the thread body (the HTML and BBcode) in text files inside a folder named after the thread ID.
In practise it's really simple to grab the database values then just grab the thread body from the text file, but I was wondering if this is the 'proper way'? I have personally no problems doing this but if it turns out it is massively inefficient and I should instead store both the thread body HTML and BBcode in the database instead then I will change.
However, to me it seems wrong to store such a (very possibly) huge string of multi-line text along with lots of different characters in a database - I was taught that databases are more for short field 'values' rather than website content.
I would just like a definitive answer to this because it's been bugging me for ages as to wherever I’ve been doing it properly.
Does anyone know how popular forum systems store threads?
Added
Thanks for the answers, so it's best to store thread content in the database, what field type should i use?
Also what about replies? Another table which has the thread ID and comment ID then the comment body?
I'm new to this database stuff so thanks for your help.

Confluence (a commercial wiki) stores the entire page content within a single column.
The reason to store large text in the database is:
There's (hopefully) no disconnect between the value and the record(s) the text is associated with
There are technologies like Full Text Search (FTS) to make finding specific strings in large amounts of text
Simplified backup & restore process

It would be best practice to store the thread in the database since that will allow you to scale and search easier.
If you want to continue using files to store the content, I would recommend using something along the lines of GridFS. Basically just chunks up files and stores them in NOSQL.

I know that DotNetNuke and the AspDotNetStorefront use a database to store such data. These aren't forums, but a content management system and a shopping cart with content management capabilities.
I've also experimented with several forums (Such as YAF) and all of those use databases as well. Personally, I'd stick with a DB for the HTML, and any image/content files should be stored on the disk with a reference to their location in HTML.
Perhaps the strongest argument for storing in the DB: It's a heck of a lot easier to search the text fields with a LIKE clause than to search for a strong in a text file.
Also, with free forum software cout there, can I ask why you're writing a new one from scratch? I realize there are probably good reasons, but just in case it's something you hadn't thought of yet...
Added
Most of my references were .NET code. Here's an open source forum written in PHP: http://www.phorum.org/

This is an aside as the question has already been accepted, however you should check out phpbb3 (http://www.phpbb.com/). Very robust php forum. May save you some development time :D

I agree with the other answers, storing all the data in your database simplify scaling, backup/restore, allows you to query the data and so on.
If you're concerned with performance, you could implement a cache for the page content. I know PHPBB does this by having a serialized array in a text file with a expiration timestamp. Could also be done using memcached or otherwise.
Storing the data in a database give you the most flexibility and convenience, most problems related to serving the data to the end-user can be handled by caching the data.

Related

What's the most efficient way to setup a multi-lingual website

I'm developing a website that will be available in different languages. It is a LAMP (Linux, Apache, MySQL, PHP) setup, and it makes use of Smarty, mostly for the template engine.
The way we currently translate is by a self-written smarty plugin, which will recognize certain tags in the HTML files, and will find the corresponding tag in an earlier defined language file.
The HTML could look as follows:
<p>Hi, welcome to $#gamedesc;!</p>
And the language file could look like this:
gamedesc:Poing 2009$;
welcome:this is another tag$;
Which would then output
<p>Hi, welcome to Poing 2009!</p>
This system is very basic, but it is pretty hard to control, if I f.e. would like to keep track of what has been translated so far, or give certain users the rights to translate only certain tags.
I've been looking at some alternative ways to approach this, by either replacing the text-file with XML files which could store some extra meta-data, or by perhaps storing all the texts in the database, and retrieving it there.
My question is, what would be the best way to make this system both maintainable and perform well with high user-traffic? Are there perhaps any (lightweight) plugins I could take a look at?

You could give a shot at gettext. It is the way it is done in most C/C++ linux applications and it is an extension to PHP too. The idea is not very different from what you're already doing, but there are tools that ease the mantainance of translations (i.e. poedit).
For user rights to translations, gettext won't be of much help, I think you'll need to do it on your own or look at some frameworks if they have smarter solutions.

Maybe taking a look to gettext lib could help you get some hints http://php.net/manual/en/book.gettext.php hope it helps!

You will need to have a table in your database that you can use to store strings of text, each with an composite ID. the composite ID will be made up of language ID and text node ID.
You will need to give the user a chance to select a preferred language. You should make sure that you either have a default "this has not been translated" for every language you use, or a default language that your entire site can be vied in.
For every bit of text with in your web site, rather then store the text with in the page, you just assign it an ID.
When serving the page, look up the text node ID and preferred language ID and load that string of text, or the string for the default.

in our project, http://pkp.sfu.ca/ojs, we use XML files to store translation key-value pairs. Browse our code: http://github.com/pkp/pkp-lib/blob/master/classes/i18n/PKPLocale.inc.php
We use that class to read the XML files for each locale and in our code we use Locale::translate('locale.key.name');. Similar to gettext, but using an XML file for easier updating.

Looking around at web stuff today I came across this website: http://translateth.is/
It looks simple to use... copy paste in some javascript.

XML Content Management System

Just a quick question I know how I would build a cms using a database but why would you want to create a cms with xml?
What are the pros and con's using xml also if I was to build a cms with xml would I need the help of a database of does xml just remove the need of a database?

I havent't seen CMS without a database in a while.
I think most of those were developed because "a long time ago" you didn't always get access to a database when purchasing/renting webspace.

You might be interested in storing your data in a changing format. XML definitely allows that - being able to define your own tags at will is somewhat akin to being able to add and remove columns without migrating data.
XML can remove the usage of a database - but as the size of the XML file grows, lookup and search become ever more costly. For a personal content management system - especially one where you are looking at the beginning of a file in your most common use case - it could be an acceptable solution.
Making a CMS like this would be something like using TiddlyWiki, which is a single html file that hosts an entire wiki.
For even slightly larger scale CMS, I would immediately opt for a database - probably SQLite for smaller scale, because it's the thing to do nowadays.

How to organize a PHP blog

So, currently I'm organizing my blog based on filename: To create a post I enter the name of the file. As opposed to storing the post in the database, I store them in PHP files. So each time I create a post, A new row in the table is created with the filename and a unique ID. To reference the post (e.g. for comments) I get the name of the current file, then search the entries table for a matching file name. The post ID of the comment matches the ID of that post.
Obviously this isn't the standard way of organizing a blog, but I do it this way for a few reasons:
Clean URL's (even cleaner than mod_rewrite can provide from what I've read)
I always have a hard copy of the post on my machine
Easier to remember the URL of a specific post (kind of a part of clean URL's)
Now I know that the standard way would be storing each post in the database. I know how to do this, but the clean URL's is the main problem. So now to my questions:
Is there anything WRONG with the way I'm doing it now, or could any problems arise from it in the future?
Can the same level of clean URL's that I can get now be achieved with mod_rewrite? If so, links are appreciated
I will be hosting this on a web host. Do only certain web-hosts provide access to the necessary files for mod_rewrite, or is it generally standard on all web-hosts?
Thanks so much guys!
P.S. To be clear, I don't plan on using a blogging engine.

As cletus said, this is similar to Movable Type. There's nothing inherently wrong with storing your data in files.
One thing that comes to mind is: how much are you storing in the files? Just the post content, or does each PHP file contain a copy of the entire design of the page as opposed to using a base template? How difficult would it be to change the design later on? This may or may not be a problem.
What exactly are you looking for in terms of clean URLs? Rewrite rules are quite powerful and flexible. By using mod_rewrite in conjunction with a main PHP file that answers all requests, you can pretty much have any URL format you want, including user-friendly URLs without obscure ID numbers or even file extensions.
Edit:
Here is how it would work with mod_rewrite and a main PHP file that processes requests:
Web server passes all requests (e.g., /my-post-title) to, say, index.php
index.php parses the request path ("my-post-title")
Look up "my-post-title" in the database's "slug" or "friendly name" (whatever you want to call it) column and locates the appropriate row that way
Retrieve the post from the database
Apply a template to the post data
Return the completed page to the client
This is essentially how systems like Drupal and WordPress work.
Also, regarding how Movable Type works, it's been a while since I've used it so I might be wrong, but I believe it stores all posts in the database. When you hit the publish button, it generates plain HTML files by pulling post data from the database and inserting it into a template. This is incredibly efficient when your site is under heavy load - there are no scripts running when a visitor opens up your website, and the server can keep up with heavy visitation when it only needs to serve up static files.
So obviously you've got a lot of options when figuring out how your solution should work. The one you proposed sounds fine, though you might want to give careful consideration to how you'll maintain a large number of posts in individual files, particularly if you want to change the design of the entire site later on. You might want to consider a templating engine like Smarty, and just store post data (no layout tags) in your individual files, for instance. Or just use some basic include() statements in your post files to suck in headers, footers, nav menus, etc.

What you're describing is kind of like how Movable Type works. The issues you'll need to cover are:
Syndication: RSS/Atom;
Sitemap: for Google;
Commenting; and
Tagging and filtering content.
It's not unreasonable not to use a database. If I were to do that I'd be using a templating engine like Smarty that does a better job of caching the results than PHP will out of the box.

How to store Markdown comments

I want to use Markdown for my website's commenting system but I have stumbled upon the following problem: What should I store in the database - the original comment in Markdown, the parsed comment in HTML, or both?
I need the HTML version for viewing and the Markdown version if the user needs to edit his comment. If I store the Markdown version, I have to parse the comments at runtime. If I store the HTML version, I need to convert the comment back to Markdown when the user needs to edit it (I found Markdownify for this but it isn't flawless). If I store both versions, I'm doubling the used space.
What would be the best option? Also, how does Stack Overflow handle this?

Store both. It goes against the rules for database normalization, but I think it's worth it for the speed optimisation in this case - parsing large amounts of text is a very slow operation.
You only need to store it twice, but you might need to serve it thousands of times, so it's a space-time trade-off.

Store the original markdown and parse at runtime. There are a few problems with storing the converted version in the database.
If user wants to edit their comment, you have to backwards convert parsed into original markdown
Space in database (always go by the rule that if you don't need to store it, don't)
Changes made to markdown parser would have to be run on every comment in the database, instead of just showing up at runtime.

Just render the Markdown to HTML at runtime.
If your site runs into performance issues, the Markdown will be one of the last things you'll look into tweaking. And even then, I doubt it'll make sense.
Just take a look at the realtime JavaScript renderer that SO uses. It's fast.
Edit:
Sorry, I should've been more clear. I meant just render in PHP. You'll save yourself a lot of headache -- and you probably have more important things to worry about.

How should I store textual data that won't change very often?

As an exercise in web design and development, I am building my website from the ground up, using PHP, MySQL, JavaScript and no frameworks. So far, I've been following a model-view-controller design. However, there is one hurdle that I am quickly approaching that I'm not sure how I'm going to solve, but I'm sure it's been addressed before with varying degrees of success.
On my website, I'm going to have a resume and an "about me" bio section. These probably won't be changing very often.
For my resume, I think that XML that can be rendered into HTML (or any other format) is the best option, and in that case, I could even build a "resume manager" using PHP that can edit the underlying XML. A resume also seems like it could be built on top of MySQL, as well, and generated into XML or HTML or whatever output format I choose.
However, I'm not sure how to store my about me/bio. My initial idea was a plain text document that can be read it, parsed, and the line breaks converted to paragraphs. However, I'm not sold on that being the best idea. My other idea was using MySQL, but I think that might be overkill for a single page. What I do know, however
What techniques have you used when storing text for a page that will not change very often? How did they work out for you - what problems or successes did you have?

Like McWafflestix said, use HTML, if you want to output HTML. Simplest case within PHP:
<?php
create_header_stuff();
include('static_about.html');
create_footer_stuff();
?>
and in static_about.html something like
<div id="about">
...
</div>
Cheers,

Just use a static page, if the information won't change very often. Just using static HTML gives you more control over the display format.

Generally treating infrequently changing information the same as frequently changing information works well if you add one other component: caching.
Whatever solution you decide on for the back end, store the output in a cache and then check to see if the data has changed. Version numbers or modified dates work well here. If it hasn't changed, just give the cached data. If it has changed then you rebuild the content, cache it and display.
As far as structure goes, I tend to use text blobs in a database if there is any risk that there will be more dynamic databases. XML is a great protocol for communicating between services and as an intermediate step, but I tend to use a database under all my projects because eventually I end up using it for other things anyway.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.