MySQL vs File Databases

MySQL vs File Databases - php

So I'm going to be working on a home made blog system in PHP and I was wondering which way of storing data is the fastest. I could go in the MySQL direction, or I could go with my own little way of doing it which is storing all of the information (encoded in JSON) in files.
Which way would be the fastest, MySQL or JSON files?

For a small, single user 'database', a file system would likely be quicker - as the size and complexity grows, a database server like MySQL or SQL Server is hard to beat.

I would definately choose a DB option (as you need to be able to search and index stuff). But that does not mean you need a fully realized separate DB service.
MySQL is definitely the more scalable solution.
But the downside is you need to set up and maintain a separate service.
On the other hand there are DBs that are file based and still give you access with standard SQL (SQLite SQLite.org) jumps to mind. You get the advantages of SQL but you do not need to maintain a separate service. The disadvantage is that they are not as scalable.

I would choose a MySQL database - simply because it's easier to manage.
JSON is not really a format for storage, it's for sending data to JavaScripts. If you want to store data in files look into XML or Serialized PHP (which I suspect is what you are after, rather than JSON).

Forgive me if this doesn't answer your question very directly, but since it is a homecooked blog system is it really worth spending time thinking about what storage backend right now is faster?
You're not going to be looking at 10,000 concurrent users from day 1, it doesn't sound like it will need to scale to any maningful degree in the foreseeable future.
Why not just stick with MySQL as a sensible choice rather than a fast one? If you really want some sense that you designed for speed maybe bolt sqlite on instead.

Since you are thinking you may not have the need for a complex relational structure, this might be a fun opportunity to try something more down the middle.
Check out CouchDB, it is a document-based, schema free database (yet still indexable). The database is made of documents that contain named fields (think key-value pairs).
Have fun....

Though I don't know for certain, it seems to me that a MySQL database would be a lot faster, especially as the amount of data gets larger and larger.
Also, using MySQL with PHP is super easy, especially if you use an abstraction class like ezSQL. ezSQL makes working with a database really simple and I think you'd be creating more unnecessary work for yourself by going the home-brewed JSON direction.

I've done both. I like files for very simple problems and databases for complicated problems.
For file solutions, note these problems as the number of files increases:
1) Much more disk space is used than you might expect, because even tiny files use up a whole block. Blocks are fairly large on filesystems which support large drives.
2) Most filesystems get very slow when the number of files in a directory gets very large. My solution to this (assuming the names of the files are reasonably spread out across the alphabet) is to create a directory consisting of the first two letters of the filename. Thus, the file, "animal.txt" would be found at an/animal.txt. This works surprisingly well. If your filenames are not reasonable well-distributed across the alphabet, use some sort of hashing function to create the directories. Sounds a little crazy, but this can work very, very well, and I've used it for very fast solutions with tens of thousands of files.
But the file solutions really only fit sometimes. Unless you have a great reason to go with files, use a database.

This is really cool. It's a PHP class that controls a flat-file database with queries http://www.fsql.org/index.php
For blogs, I recommend caching the pages because blogs usually only have static content. This way, the queries only get run once while caching. You can update the cached pages when a new blog post is added.

Related

Why do forums store posts in a database?

From looking at the way some forum softwares are storing data in a database (eg. phpBB uses MySQL databases for storing just about everything) I started to wonder why they do it that way? Couldn't it be just as fast and efficient to use.. maybe xsl with xslt to store forum topics and posts? Or to at least store the posts in a topic?

There are loads of reasons why they use databases and not flat files. Here are a few off the top of my head.
Referential integrity
Indexes and efficient searching
SQL Joins
Here are a couple more posts you can look at for more information :
If i can store my data in text files and easily can deal with these files, why should i use database like Mysql, oracle etc
Why use MySQL over flatfiles?
Why use SQL database?

But this is exactly what databases have been designed and optimized for, storage and retrieval of data. Using a database allows the forum designer to focus on their problem and not worry about implementing storage as well. It wouldn't make sense to ignore all the work that has been done in the database world and instead implement your own solution. It would take more time, be more buggy, and not run as quickly.

Database engines handle all the problems of concurrency. Imagine that, two users try to write in your forum at the same time. If you store the post in files, the first attempt will lock the file so the second has to wait for the first to finish.
Otherwise if you want to search, it's much faster to do it in database than scanning all the files.
So basically, it's not a good idea to store data wich can be modified by useres simultaneously, and searching is much more efficient in database.

Simply, easy access to data. It's a lot easier to find posts between a date, created by a user, or with certain keywords. You could do all of the above with flat file storage, but this would be IO intensive and slow. If you had the idea of storing each post in its own file, you'd then have the problem of running out of disk space, not because of lack of capacity, but because you'd have consumed all the available inodes.
Software such as this usually has a static caching feature - pages that don't change are written out to static HTML files, and those are served instead of hitting the database.
Mixing static caching with relational DB storage provides the best of both worlds.

Configuration storage setup [file vs. database]

I see programmers putting a lot of information into databases that could otherwise be put in a file that holds arrays. Instead of arrays, they'll use many tables of SQL which, I believe, is slower.
CitrusDB has a table in the database called "holiday". This table consists of just one date column called "holiday_date" that holds dates that are holidays. The idea is to let the user add holidays to the table. Citrus and the programmers I work with at my workplace will prefer to put all this information in tables because it is "standard".
I don't see why this would be true unless you are allowing the user, through a user interface, to add holidays. I have a feeling there's something I'm missing.

Sometimes you want to design in a bit of flexibility to a product. What if your product is released in a different country with different holidays? Just tweak the table and everything will work fine. If it's hard coded into the application, or worse, hard coded in many different places through the application, you could be in a world of pain trying to get it to work in the new locale.
By using tables, there is also a single way of accessing this information, which probably makes the program more consistent, and easier to maintain.
Sometimes efficiency/speed is not the only motivation for a design. Maintainability, flexibility, etc are very important factors.

The main advantage I have found of storing 'configuration' in a database, rather than in a property file, or a file full of arrays, is that the database is usually centrally stored, whereas a server may often be split across a farm of several, or even hundreds of servers.
I have implemented, in a corporate environment, such a solution, and the power of being able to change configuration at a single point of access, knowing that it will immediately be propagated to all servers, without the concern of a deployment process is actually very powerful, and one that we have come to rely on quite heavily.

The actual dates of some holidays change every year. The flexibility to update the holidays with a query or with a script makes putting it in the database the easiest way. One could easily implement a script that updates the holidays each year for their country or region when it is stored in the database.

Theoretically, databases are designed and tuned to provide faster access to data than doing a disk read from a file. In practice, for small to mid-sized applications this difference is minuscule. Best practices, however, are typically oriented at larger scale. By implementing best practices on your small application, you create one that is capable of scaling up.
There is also the consideration of the accessibility of the data in terms of other aspects of the project. Where is most of the data in a web-based application? In the database. Thus, we try to keep ALL the data in the database, or as much as is feasible. That way, in the future, if you decide that now you need to join the holiday dates again a list of events (for example), all the data is in a single place. This segmenting of disparate layers creates tiers within your application. When each tier can be devoted to exclusive handling of the roles within its domain (database handles data, HTML handles presentation, etc), it is again easier to change or scale your application.
Last, when designing an application, one must consider the "hit by a bus principle". So you, Developer 'A', put the holidays in a PHP file. You know they are there, and when you work on the code it doesn't create a problem. Then.... you get hit by a bus. You're out of commission. Developer 'B' comes along, and now your boss wants the holiday dates changed - we don't get President's Day off any more. Um. Johnny Next Guy has no idea about your PHP file, so he has to dig. In this example, it sounds a little trivial, maybe a little silly, but again, we always design with scalability in mind. Even if you KNOW it isn't going to scale up. These standards make it easier for other developers to pick up where you left off, should you ever leave off.

The answer lays in many realms. I used to code my own software to read and write to my own flat-file database format. For small systems, with few fields, it may seem worth it. Once you learn SQL, you'll probably use it for even the smallest things.
File parsing is slow. String readers, comparing characters, looking for character sequences, all take time. SQL Databases do have files, but they are read and then cached, both more efficiently.
Updating & saving arrays require you to read all, rebuild all, write all, save all, then close the file.
Options: SQL has many built-in features to do many powerful things, from putting things in order to only returning x through y results.
Security
Synchronization - say you have the same page accessed twice at the same time. PHP will read from your flatfile, process, and write at the same time. They will overwrite each other, resulting in dataloss.
The amount of features SQL provides, the ease of access, the lack of things you need to code, and plenty other things contribute to why hard-coded arrays aren't as good.

The answer is it depends on what kind of lists you are dealing with. It seems that here, your list consists of a small, fixed set of values.
For many valid reasons, database administrators like having value tables for enumerated values. It helps with data integrity and for dealing wtih ETL, as two examples for why you want it.
At least in Java, for these kinds of short, fixed lists, I usually use Enums. In PHP, you can use what seems to be a good way of doing enums in PHP.
The benefit of doing this is the value is an in-memory lookup, but you can still get data integrity that DBAs care about.

If you need to find a single piece of information out of 10, reading a file vs. querying a database may not give a serious advantage either way. Reading a single piece of data from hundreds or thousands, etc, has a serious advantage when you read from a database. Rather than load a file of some size and read all the contents, taking time and memory, querying from the database is quick and returns exactly what you query for. It's similar to writing data to a database vs text files - the insert into the database includes only what you are adding. Writing a file means reading the entire contents and writing them all back out again.
If you know you're dealing with very small numbers of values, and you know that requirement will never change, put data into files and read them. If you're not 100% sure about it, don't shoot yourself in the foot. Work with a database and you're probably going to be future proof.

This is a big question. The short answer would be, never store 'data' in a file.
First you have to deal with read/write file permission issues, which introduces security risk.
Second, you should always plan on an application growing. When the 'holiday' array becomes very large, or needs to be expanded to include holiday types, your going to wish it was in the DB.
I can see other answers rolling in, so I'll leave it at that.

Generally, application data should be stored in some kind of storage (not flat files).
Configuration/settings can be stored in a KVP storage (such as Redis) then access it via REST API.

If I have the choice, should webpage contents be saved on the file system or in MySQL?

I am in the planning stages of writing a CMS for my company. I find myself having to make the choice between saving page contents in a database or in folders on a file system. I have learned that PHP performs admirably well reading and writing to file systems, way better in fact than running SQL queries. But when it comes to saving pages and their data on a file system, there'll be a lot more involved than just reading and writing. Since pages will be drawn using a PHP class, the data for each page will be just data, no HTML. Therefore a parser for the files would have to be written. Also I doubt that all the data from a page will be saved in just one file, it would rather be saved in one directory, with content boxes and data in separated files.
All this would be done so much easier with MySQL, so what I want to ask you experts:
Will all the extra dilly dally with file system saving outweigh it's speed and resource advantage over MySQL?
Thanks for your time.

Go for MySQL. I'd say the only time you should think about using the file system is when you are storing files (BLOBS) of several megabytes, databases (at least the ones you typically use with a php website) are generally less performant when storing that kind of data. For the rest I'd say: always use a relational database. (Assuming you are dealing with data dat has relations of course, if it is random data there is not much benefit in using a relational database ;-)
Addition: If you define your own file-structure, and even your own way of cross referencing files you've already started building a 'database' yourself, that is not bad in itself -- it might be loads of fun! -- but you probably will not get the performance benefits you're looking for unless your situation is radically different than the other 80% of 'standard' websites on the web (a couple of pages with text and images on them). (If you are building google/youtube/flickr/facebook ... you've got a different situation and developing your own unique storage solution starts making sense)

things to consider
race-condition in file write if two user editing same piece of content
distribute file across multiple servers if CMS growth, latency on replication will cause data integrity problem
search performance, grep on files on multiple directory will be very slow
too many files in same directory will cause server performance especially in windows

Assuming you have a low-traffic, single-server environment here…
If you expect to ever have to manage those entries outside of the CMS, my opinion is that it's much, much easier to do so with existing tools than with database access tools.
For example, there's huge value in being able to use awk, grep, sed, sort, uniq, etc. on textual data. Proxying that through a database makes this hard but not impossible.
Of course, this is just opinion based on experience.
S

Storing Data on the filesystem may be faster for large blobs that are always accessed as one piece of information. When implementing a CMS, you typically don't only have to deal with such blobs but also with structured information that has internal references (like content fields belonging to a certain page that has links to other pages...). SQL-Databases provide an easy way to access structured information, files on your filesystem do not (except of course simple hierarchical structures that can be represented with folders).
So if you wanted to store the structured data of your cms in files, you'd have to use a file format that allows you to save the internal references of your data, e.g. XML. But that means that you would have to parse those files, which is not only a lot of work but also makes the process of accessing the data slow again.
In short, use MySQL

Use a database and you have lots of important properties from the beginning "for free" without inventing them in some suboptimal ways if you go the filesystem way. If you don't want to be constrained to MySQL only you can make use of e.g. the database abstraction layer of the doctrine project.
Additionally you have tools like phpMyAdmin for easy lookup or manipulation of your data versus the texteditor.
Keep in mind that the result of your database queries can almost always be cached in memory or even in the filesystem so you have the benefit of easier management with well known tools and similar performance.

When it comes to minor modifications of website contents (eg. fixing a typo or updating external links), I find it much easier to connect to the server using SSH and use various tools (text editors, grep etc.) on files, rather than I having to use CMS interface to update each file manually (our CMS has such interface).
Yet there are several questions to analyze and answer, mentioned above - do you plan for scalability, concurrent modification of data etc.

No, it will not be worth it.
And there is no advantage to using the filesystem over a database unless you are the only user on the system (in which the advantage would be lost anyway). As soon as the transactions start rolling in and updates cascades to multiple pages and multiple files you will regret that you didn't used the database from the beginning :)
If you are set on using caching, experiment with some of the existing frameworks first. You will learn a lot from it. Maybe you can steal an idea or two for your CMS?

reading from MySQL is faster or reading from a file is faster?

HI
I got a doubt I have seen reading mysql data is slower in case of large tables...I have done lots of optimization but cant get through..
what I am thinking is will it give a better speed if I store data in a piece of file??
off course each data will be a separate file. so millions of data = millions of file. I agree it will consume the disk space... but what about reading process?? is it faster??
I am using PHP to read file...

Reading one file = fast.
Reading many / big files = slow.
Reading singular small entries from database = waste of I/O.
Combining many entries within the database = faster than file accesses.

As long as your tables are properly indexed and as long as you are using those indices (that's right), using a relational DB (like mysql) is going to be much faster, more robust, flexible (insert many buzzwords here), etc.
To examine why your queries' performance does not match your expectations, you can use the explain clause with your selects (http://dev.mysql.com/doc/refman/5.1/en/explain.html).

To answer the topic, yes.
By which I mean that there are so many (unmentioned) factors that it's impossible to unequivocally state that one will be faster than the other every time.

It depends on what kind of data you're storing. Structured data is usually much faster and more flexible/powerful to read using SQL, since that's exactly what its made for. If you want to search, filter, sort or group by a certain attribute, the index structures and optimizations of a DBS are appropriate.
However, when using a DB for storing large files (BLOBs), which contain unstructured data in the sense that you are not going to search, filter, sort or group by any part of the files, then these files just blow up the database size and make it slow. There is an interesting study by Microsoft on this topic (just have to find the link yet). This study is the reason why Microsoft introduced the External BLOB storage in their SQLServer, which basically means what you asked: The BLOBs are saved in files outside the database, because they measured that access is much faster that way.
When storing files (e.g., pictures, videos, documents...) you often have some metadata on the file which you want to be able to use with a structured query language like SQL, while the actual files don't necessarily need to be saved in the database.

Reading from a dbms (MySQL is one) is faster in most cases, because they have built in cache that will keep the data in memory, so next time you try to read the same data, you will not have to wait on the incredible slow hard drive.
A dbms is essentially reading from your hard drive + a cache to speed things up (+ some data sorting algorithms). Remember, your database is stored on your hard drive :)

It depends on a lot of factors, not least of which is what kind of file system you're using. MySQL uses files for storage anyway, so read speed isn't the issue -- the biggest factor will be how fast MySQL can find your data, compared to how fast it can be looked up in your filesystem.
Generally, though, MySQL is quite good about finding data quickly -- after all, that's its purpose in life. So unless you have a really good reason why the FS should be much faster, stick with the DB and check your indexes and such.

By choosing a custom file storage system you will lose the benefits of using a relational database. Also your code might not be easy maintainable.
Nonetheless, there are many who believe that relational databases offer too much complexity at the cost of speed. Have a look at the NoSQL entry in wikipedia and read about possible alternatives.

flat-file database php application

I'm creating and app that will rely on a database, and I have all intention on using a flat file db, is there any serious reasons to stay away from this?
I'm using mimesis (http://mimesis.110mb.com)
it's simpler than using mySQL, which I have to admit I have little experience with.
I'm wondering about the security of the db. but the files are stored as php and it seems to be a solid database solution.
I really like the ease of backing up and transporting the databases, which I have found harder with mySQL. I see that everyone seems to prefer the mySQL way - and it likely is faster when it comes to queries but other than that is there any reason to stay away from flat-file dbs and (finally) properly learn mysql ?
edit
Just to let people know,
I ended up going with mySQL, and am using the CodeIgniter framework. Still like the flat file db, but have now realized that it's way more complex for this project than necessary.

Use SQLite, you get a database with many SQL features and yet it's only a single file.

Greetings, I'm the creator of Mimesis. Relational databases and SQL are important in situations where you have massive amounts of data that needs to be handled. Are flat files superior to relation databases? Well, you could ask Google, as their entire archiving system works with flat files, and its the most popular search engine on Earth. Does Mimesis compare to their system? Likely not.
Mimesis was created to solve a particular niche problem. I only use free websites for my online endeavors. Plenty of free sites offer the ability to use PHP. However, they don't provide free SQL database access. Therefore, I needed to create a database that would store data, implement locking, and work around file permissions. These were the primary design parameters of Mimesis, and it succeeds on all of those.
If you need an idea of Mimesis's speed, if you navigate to the first page it will tell you what country you're viewing the site from. This free database is taken from the site ip2nation.com and ported into a Mimesis ffdb. It has hundreds if not thousands of entries.
Furthermore, the hit counter on the main page has already tracked over 7000 visitors. These are UNIQUE visits, which means that the script has to search the database to see if the IP address that's visiting already exists, and also performs a count of the total IPs.
If you've noticed the main page loads up pretty quickly and it has two fairly intensive Mimesis database scripts running on the backend. The way Mimesis stores data is done to speed up read and write procedures and also translation procedures. Most ffdb example scripts or other ffdb scripts out there use a simple CVS file or other some such structure for storing data. Mimesis actually interprets binary data at some levels to augment its functionality. Mimesis is somewhat of a hybrid between a flat file database and a relational database.
Most other ffdb scripts involve rewriting the COMPLETE file every time an update is made. Mimesis does not do this, it rewrites only the structural file and updates the actual row contents. So that even if an error does occur you only lose new data that's added, not any of the older data. Mimesis also maintains its history. Unless the table is refreshed the data that rows had previously is still contained within.
I could keep going on about all the features, but this isn't intended as a "Mimesis is the greatest database ever" rant. Moreso, its intended to open people's eyes to the fact that SQL isn't the ONLY technology available, and that flat files, when given proper development paradigms are superior to a relational database, taking into account they are more specialized.
Long live flat files and the coders who brave the headaches that follow.

The answer is "Fine" if you only NEED a flat-file structure. One test: Would a single simple spreadsheet handle all needs? If not, you need a relational structure, not a flat file.
If you're not sure, perhaps you can start flat-file. SQLite is a great app for getting started.
It's not good to learn you made the wrong choice, if you figure it out too far along in the process. But if you understand the importance of a relational structure, and upsize early on if needed, then you are fine.

I really like the ease of backing up
and transporting the databases, which
I have found harder with mySQL.
Use SQLite as mentioned in another answer. There is only one file to backup, or set up periodic dumps of the MySQL databases to SQL files. This is a relatively simple thing to do.
I see that everyone seems to prefer
the mySQL way - and it likely is
faster when it comes to queries
Speed is definitely a consideration. Databases tend to be a lot faster, because the data is organized better.
other than that is there any reason to
stay away from flat-file dbs and
(finally) properly learn mysql ?
There sure are plenty of reasons to use a database solution, but there are arguments to be made for flat files. It is always good to learn things other than what you "usually" use.
Most decisions depend on the application. How many concurrent users are you going to have? Do you need transaction support?

Wanted to inform that Mimesis has moved from the original URL to http://mimesis.site11.com/
Furthermore, I am shifting the focus of Mimesis from an ffdb to a key-value store. It's more sensible Given the types of information I'm storing and the methods I use to retrieve it. There was also a grave error present in the coding of Mimesis (which I've since fixed). However, I'm still in the testing phase of the new key-value store type. I've also been side-tracked by other things. Locking has also been changed from the use of file creation to directory creation as the mutex mechanism.

Interoperability. MySQL can be interfaced by basically any language that counts. Mimesis is unlikely to be usable outside PHP.
This becomes significant the moment you try to use profilers, or modify data from the outside.

You might also look at http://lukeplant.me.uk/resources/flatfile/ for the PHP Flatfile Package.

The issue with going flatfile is that in order to adjust the situation for further development you have to alter a significant amount of code in order to improve the foundation of the system. Whereas if it was a pure SQL system it would require little to no modification to proceed in the future.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.