Storing post bodies in database or files?

Storing post bodies in database or files? - php

I'm learning web-centric programming by writing myself a blog, using PHP with a MySQL database backend. This should replace my current (Drupal based) blog.
I've decided that a post should contain some data: id, userID, title, content, time-posted. That makes a nice schema for a database table. I'm having issues deciding how I want to organize the storage of content, though.
I could either:
Use a file-based system. The database table content would then be a URL to a locally-located file, which I'd then read, format, and display.
Store the entire contents of the post in content, ie put it into the database.
If I went with (1), searching the contents of posts would be slightly problematic - I'd be limited to metadata searching, or I'd have to read the contents of each file when searching (although I don't know how much of a problem that'd be - grep -ir "string" . isn't too slow...). However, images (if any) would be referenced by a URL, so referencing content would at least be an internally consistant methodology, and I'd be easily able to reuse the content, as text files are ridiculously easy to work with, compared to an SQL database file.
Going with (2), though, I could use a longtext. The content would then need to be sanitised before I tried to put it into the tuple, and I'm limited by size (although, it's unlikely that I'd write a 4GB blog post ;). Searching would be easy.
I don't (currently) see which way would be (a) easier to implement, (b) easier to live with.
Which way should I go / how is this normally done? Any further pros / cons for either (1) or (2) would be appreciated.

For the 'current generation', implementing a database is pretty much your safest bet. As you mentioned, it's pretty standard, and you outlined all of the fun stuff. Most SQL instances have a fairly powerful FULLTEXT (or equivalent) search.
You'll probably have just as much architecture to write between the two you outlined, especially if you want one to have the feature-parity of the other.
The up-and-coming technology is a key/value store, commonly referred to as NoSQL. With this, you can store your content and metadata into separate individual documents, but in a structured way that makes searching and retrieval quite fast. Some common NoSQL engines are mongo, CouchDB, and redis (among others).
Ultimately this comes down to personal preference, along with a few use-case considerations. You didn't really outline what is important to you as far as conveniences and your application. Any one of these would be just fine for a personal or development blog. Building an entire platform with multiple contributors is a different conversation.

13 years ago I tried your option 1 (having external files for text content) - not with a blog, but with a CMS. And I ended in shoveling it all back into the database for easier handling. It's much easier to have global replaces on the database than on the text file level. With large numbers of post you run into trouble with directory sizes and access speed, or you have to manage subdirectory schemes etc. etc. Stick to the database only approach-
There are some tools to make your life easier with text files than the built-in mysql functions, but with a command line client like mysql and mysqldump you can easily extract any texts to the file system level, work on them with standard tools and re-load them into the database. What mysql really lacks is built-in support for regex search/replace, but even for that you'll find a patch if you're willing to recompile mysql.

Related

PHP - getting HTML articles from file or database?

There are several questions here about file vs. database, but I'm still not sure what to use and why I should use it in my case.
I have a site with quite a lot of HTML articles (between a couple of hundreds and a few thousand words long). In the database (MySQL) I have a version without the markup for the search index, and my question is what to do with the markup version – keep it in the database as well or in separate HTML files on the server (everything else being equal)?
Are there any obvious pros or cons with either approach or is it just a matter of taste?

I wouldn't say it is a matter of taste. If you are using a MYSQL database anyway, then I would strongly recommend keeping the markup version in the database as well, for those arguments:
You would only need one system (the database) instead of two (database and filesystem), which would be much more consistent. It would require some programming work to sustain the consistency of filesystem and database.
You could provide HTML the pages for download or direct view independently from the file system, and you can influence the URL as you like, e.g. insert date and title.
Assumed your markup is XHTML valid, you could use MYSQL's XML path search functions, to search for specific contents (keywords, article headers, etc). Alternatively, you could use regular expressions for such operations. Thus, the database approach gives you, basically, more power over your material.
Nevertheless, here are some pros for the database/filesystem solution:
The performance could be slightly better (depends on your load).
If your database is disconnected for any reason, the articles will still be delivered by the webserver.
In the short term (!), it could be less programming work.

Database vs XML File for minimal content

I am building a website for a client. The landing page has 4 areas of customizable content. The content is minimal, it's mainly just a reference to an image, an associated link, and an order...so 3 fields.
I am already using a lot of MySQL Tables for the other CMS related aspects, but for this one use, I am wondering if a database table really is the best option. The table would have only 4 records and there would be 3 columns. It's not going to be written too very often, just read from as the landing page loads.
Would I be better off sticking to a MySQL table for storing this minimal amount of information since it will fit into the [programming] workflow easy enough? Or would using an XML file that stores the information be a better way to go?
UPDATE: The end user (who knows nothing about databases) will be going through a web interface I create to choose one of the 4 items they want to update, then uploading an image from their computer, then selecting the link from a list of pages on the site (or offsite). The XML file or Database table will store the location of the image on the server and the link to wrap it in.

A database is the correct solution for storing dynamic data. However I agree it sounds like MySQL is overkill for this situation. It also means an entire other thing to administer and manage. But using a flat-file like XML is a bad idea too. Luckily SQLite is just the thing.
Let these snippets from the SQLite site encourage you:
Rather than using fopen() to write XML or some proprietary format into disk files used by your application, use an SQLite database instead. You'll avoid having to write and troubleshoot a parser, your data will be more easily accessible and cross-platform, and your updates will be transactional.
Because it requires no configuration and stores information in ordinary disk files, SQLite is a popular choice as the database to back small to medium-sized websites.
Read more here.
As for PHP code to interface with the database, use either PDO or the specific SQLite extension.
PDO is more flexible so I'd go with that and it should work out of the box - "PDO and the PDO_SQLITE driver is enabled by default as of PHP 5.1.0." (reference) Finding references and tutorials is super easy and just a search away.

If you use PHP the SimpleXML library could give you what you need and using XML in this fashion for something like you describe might be the simplest way to go. You don't have to worry about any MySQL configurations or worrying about the client messing something up. Restoring and maintaining an XML file might be easier too. The worry might be future scalability if you plan to grow the site much. That's my 2 cents anyway.

If I have the choice, should webpage contents be saved on the file system or in MySQL?

I am in the planning stages of writing a CMS for my company. I find myself having to make the choice between saving page contents in a database or in folders on a file system. I have learned that PHP performs admirably well reading and writing to file systems, way better in fact than running SQL queries. But when it comes to saving pages and their data on a file system, there'll be a lot more involved than just reading and writing. Since pages will be drawn using a PHP class, the data for each page will be just data, no HTML. Therefore a parser for the files would have to be written. Also I doubt that all the data from a page will be saved in just one file, it would rather be saved in one directory, with content boxes and data in separated files.
All this would be done so much easier with MySQL, so what I want to ask you experts:
Will all the extra dilly dally with file system saving outweigh it's speed and resource advantage over MySQL?
Thanks for your time.

Go for MySQL. I'd say the only time you should think about using the file system is when you are storing files (BLOBS) of several megabytes, databases (at least the ones you typically use with a php website) are generally less performant when storing that kind of data. For the rest I'd say: always use a relational database. (Assuming you are dealing with data dat has relations of course, if it is random data there is not much benefit in using a relational database ;-)
Addition: If you define your own file-structure, and even your own way of cross referencing files you've already started building a 'database' yourself, that is not bad in itself -- it might be loads of fun! -- but you probably will not get the performance benefits you're looking for unless your situation is radically different than the other 80% of 'standard' websites on the web (a couple of pages with text and images on them). (If you are building google/youtube/flickr/facebook ... you've got a different situation and developing your own unique storage solution starts making sense)

things to consider
race-condition in file write if two user editing same piece of content
distribute file across multiple servers if CMS growth, latency on replication will cause data integrity problem
search performance, grep on files on multiple directory will be very slow
too many files in same directory will cause server performance especially in windows

Assuming you have a low-traffic, single-server environment here…
If you expect to ever have to manage those entries outside of the CMS, my opinion is that it's much, much easier to do so with existing tools than with database access tools.
For example, there's huge value in being able to use awk, grep, sed, sort, uniq, etc. on textual data. Proxying that through a database makes this hard but not impossible.
Of course, this is just opinion based on experience.
S

Storing Data on the filesystem may be faster for large blobs that are always accessed as one piece of information. When implementing a CMS, you typically don't only have to deal with such blobs but also with structured information that has internal references (like content fields belonging to a certain page that has links to other pages...). SQL-Databases provide an easy way to access structured information, files on your filesystem do not (except of course simple hierarchical structures that can be represented with folders).
So if you wanted to store the structured data of your cms in files, you'd have to use a file format that allows you to save the internal references of your data, e.g. XML. But that means that you would have to parse those files, which is not only a lot of work but also makes the process of accessing the data slow again.
In short, use MySQL

Use a database and you have lots of important properties from the beginning "for free" without inventing them in some suboptimal ways if you go the filesystem way. If you don't want to be constrained to MySQL only you can make use of e.g. the database abstraction layer of the doctrine project.
Additionally you have tools like phpMyAdmin for easy lookup or manipulation of your data versus the texteditor.
Keep in mind that the result of your database queries can almost always be cached in memory or even in the filesystem so you have the benefit of easier management with well known tools and similar performance.

When it comes to minor modifications of website contents (eg. fixing a typo or updating external links), I find it much easier to connect to the server using SSH and use various tools (text editors, grep etc.) on files, rather than I having to use CMS interface to update each file manually (our CMS has such interface).
Yet there are several questions to analyze and answer, mentioned above - do you plan for scalability, concurrent modification of data etc.

No, it will not be worth it.
And there is no advantage to using the filesystem over a database unless you are the only user on the system (in which the advantage would be lost anyway). As soon as the transactions start rolling in and updates cascades to multiple pages and multiple files you will regret that you didn't used the database from the beginning :)
If you are set on using caching, experiment with some of the existing frameworks first. You will learn a lot from it. Maybe you can steal an idea or two for your CMS?

flat-file database php application

I'm creating and app that will rely on a database, and I have all intention on using a flat file db, is there any serious reasons to stay away from this?
I'm using mimesis (http://mimesis.110mb.com)
it's simpler than using mySQL, which I have to admit I have little experience with.
I'm wondering about the security of the db. but the files are stored as php and it seems to be a solid database solution.
I really like the ease of backing up and transporting the databases, which I have found harder with mySQL. I see that everyone seems to prefer the mySQL way - and it likely is faster when it comes to queries but other than that is there any reason to stay away from flat-file dbs and (finally) properly learn mysql ?
edit
Just to let people know,
I ended up going with mySQL, and am using the CodeIgniter framework. Still like the flat file db, but have now realized that it's way more complex for this project than necessary.

Use SQLite, you get a database with many SQL features and yet it's only a single file.

Greetings, I'm the creator of Mimesis. Relational databases and SQL are important in situations where you have massive amounts of data that needs to be handled. Are flat files superior to relation databases? Well, you could ask Google, as their entire archiving system works with flat files, and its the most popular search engine on Earth. Does Mimesis compare to their system? Likely not.
Mimesis was created to solve a particular niche problem. I only use free websites for my online endeavors. Plenty of free sites offer the ability to use PHP. However, they don't provide free SQL database access. Therefore, I needed to create a database that would store data, implement locking, and work around file permissions. These were the primary design parameters of Mimesis, and it succeeds on all of those.
If you need an idea of Mimesis's speed, if you navigate to the first page it will tell you what country you're viewing the site from. This free database is taken from the site ip2nation.com and ported into a Mimesis ffdb. It has hundreds if not thousands of entries.
Furthermore, the hit counter on the main page has already tracked over 7000 visitors. These are UNIQUE visits, which means that the script has to search the database to see if the IP address that's visiting already exists, and also performs a count of the total IPs.
If you've noticed the main page loads up pretty quickly and it has two fairly intensive Mimesis database scripts running on the backend. The way Mimesis stores data is done to speed up read and write procedures and also translation procedures. Most ffdb example scripts or other ffdb scripts out there use a simple CVS file or other some such structure for storing data. Mimesis actually interprets binary data at some levels to augment its functionality. Mimesis is somewhat of a hybrid between a flat file database and a relational database.
Most other ffdb scripts involve rewriting the COMPLETE file every time an update is made. Mimesis does not do this, it rewrites only the structural file and updates the actual row contents. So that even if an error does occur you only lose new data that's added, not any of the older data. Mimesis also maintains its history. Unless the table is refreshed the data that rows had previously is still contained within.
I could keep going on about all the features, but this isn't intended as a "Mimesis is the greatest database ever" rant. Moreso, its intended to open people's eyes to the fact that SQL isn't the ONLY technology available, and that flat files, when given proper development paradigms are superior to a relational database, taking into account they are more specialized.
Long live flat files and the coders who brave the headaches that follow.

The answer is "Fine" if you only NEED a flat-file structure. One test: Would a single simple spreadsheet handle all needs? If not, you need a relational structure, not a flat file.
If you're not sure, perhaps you can start flat-file. SQLite is a great app for getting started.
It's not good to learn you made the wrong choice, if you figure it out too far along in the process. But if you understand the importance of a relational structure, and upsize early on if needed, then you are fine.

I really like the ease of backing up
and transporting the databases, which
I have found harder with mySQL.
Use SQLite as mentioned in another answer. There is only one file to backup, or set up periodic dumps of the MySQL databases to SQL files. This is a relatively simple thing to do.
I see that everyone seems to prefer
the mySQL way - and it likely is
faster when it comes to queries
Speed is definitely a consideration. Databases tend to be a lot faster, because the data is organized better.
other than that is there any reason to
stay away from flat-file dbs and
(finally) properly learn mysql ?
There sure are plenty of reasons to use a database solution, but there are arguments to be made for flat files. It is always good to learn things other than what you "usually" use.
Most decisions depend on the application. How many concurrent users are you going to have? Do you need transaction support?

Wanted to inform that Mimesis has moved from the original URL to http://mimesis.site11.com/
Furthermore, I am shifting the focus of Mimesis from an ffdb to a key-value store. It's more sensible Given the types of information I'm storing and the methods I use to retrieve it. There was also a grave error present in the coding of Mimesis (which I've since fixed). However, I'm still in the testing phase of the new key-value store type. I've also been side-tracked by other things. Locking has also been changed from the use of file creation to directory creation as the mutex mechanism.

Interoperability. MySQL can be interfaced by basically any language that counts. Mimesis is unlikely to be usable outside PHP.
This becomes significant the moment you try to use profilers, or modify data from the outside.

You might also look at http://lukeplant.me.uk/resources/flatfile/ for the PHP Flatfile Package.

The issue with going flatfile is that in order to adjust the situation for further development you have to alter a significant amount of code in order to improve the foundation of the system. Whereas if it was a pure SQL system it would require little to no modification to proceed in the future.

MySQL vs File Databases

So I'm going to be working on a home made blog system in PHP and I was wondering which way of storing data is the fastest. I could go in the MySQL direction, or I could go with my own little way of doing it which is storing all of the information (encoded in JSON) in files.
Which way would be the fastest, MySQL or JSON files?

For a small, single user 'database', a file system would likely be quicker - as the size and complexity grows, a database server like MySQL or SQL Server is hard to beat.

I would definately choose a DB option (as you need to be able to search and index stuff). But that does not mean you need a fully realized separate DB service.
MySQL is definitely the more scalable solution.
But the downside is you need to set up and maintain a separate service.
On the other hand there are DBs that are file based and still give you access with standard SQL (SQLite SQLite.org) jumps to mind. You get the advantages of SQL but you do not need to maintain a separate service. The disadvantage is that they are not as scalable.

I would choose a MySQL database - simply because it's easier to manage.
JSON is not really a format for storage, it's for sending data to JavaScripts. If you want to store data in files look into XML or Serialized PHP (which I suspect is what you are after, rather than JSON).

Forgive me if this doesn't answer your question very directly, but since it is a homecooked blog system is it really worth spending time thinking about what storage backend right now is faster?
You're not going to be looking at 10,000 concurrent users from day 1, it doesn't sound like it will need to scale to any maningful degree in the foreseeable future.
Why not just stick with MySQL as a sensible choice rather than a fast one? If you really want some sense that you designed for speed maybe bolt sqlite on instead.

Since you are thinking you may not have the need for a complex relational structure, this might be a fun opportunity to try something more down the middle.
Check out CouchDB, it is a document-based, schema free database (yet still indexable). The database is made of documents that contain named fields (think key-value pairs).
Have fun....

Though I don't know for certain, it seems to me that a MySQL database would be a lot faster, especially as the amount of data gets larger and larger.
Also, using MySQL with PHP is super easy, especially if you use an abstraction class like ezSQL. ezSQL makes working with a database really simple and I think you'd be creating more unnecessary work for yourself by going the home-brewed JSON direction.

I've done both. I like files for very simple problems and databases for complicated problems.
For file solutions, note these problems as the number of files increases:
1) Much more disk space is used than you might expect, because even tiny files use up a whole block. Blocks are fairly large on filesystems which support large drives.
2) Most filesystems get very slow when the number of files in a directory gets very large. My solution to this (assuming the names of the files are reasonably spread out across the alphabet) is to create a directory consisting of the first two letters of the filename. Thus, the file, "animal.txt" would be found at an/animal.txt. This works surprisingly well. If your filenames are not reasonable well-distributed across the alphabet, use some sort of hashing function to create the directories. Sounds a little crazy, but this can work very, very well, and I've used it for very fast solutions with tens of thousands of files.
But the file solutions really only fit sometimes. Unless you have a great reason to go with files, use a database.

This is really cool. It's a PHP class that controls a flat-file database with queries http://www.fsql.org/index.php
For blogs, I recommend caching the pages because blogs usually only have static content. This way, the queries only get run once while caching. You can update the cached pages when a new blog post is added.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.