I'm managing a website that is built from the ground-up in PHP and uses AJAX extensively to load pages within the site.
Currently, the 'admin' interface I created stores the HTML for each page, the page's title, custom CSS for that page, and some other attributes in JSON in a file directory (e.g. example.com/hi would have its data stored in example.com/json/hi/data.json).
Is this more efficient (performance-wise) than using a database (MySQL) to store all of the content?
EDIT: I understand that direct filesystem access is more efficient than using a database, but I'm not just getting the contents of a file; I need to use json_decode to parse the JSON into an object/array first to work with it. Is the performance hit from that negligible?
Your manner doesn't sounds very bad, in fact I used a not-so-far solution before and it does the job pretty good. But you might consider more powerful storage as soon as your admin will grow or the need to separate things arises. Since, there are two ways :
SQL : using relationnal database like Postgre or MySQL
No-SQL : which seems to be more of your interest, here is why.
Considering you're using ajax for communication and json for storage, there's MongoDB. A no sql approach to storage that use json syntax for querying. There even is an Interactive Tutorial for it !
When it comes to performances, no one sounds to be faster. Both engine usually written in C or C++, that is, for best performances. Nevertheless, for a structure as simple as you describe, there's no faster way than direct file access.
Thad depends on what you use that data for.
If you only serve 'static' files, with almost the same data all the time, then even static HTML files are recommended for chaching.
If, on the other hand, you process data and display it in multiple forms (searches, custom statistics, etc) then it is much better to store it in some kind of DB
Related
Instead of eval() I am investigating the pros and cons with creating .php-files on the fly using php-code.
Mainly because the generated code should be available to other visitors and for a long period of time, and not only for the current session. The generated php-files is created using functions dedicated for that and only that and under highly controlled conditions (no user input will ever reach those code files).
So, performance wise, how much load is put on the webserver when creating .php-files for instant execution using include() later elsewhere compared to updating a database record and always query a database at every visit?
The generated files should be updated (overwritten) quite frequently but not very frequent compared to how frequently they will be executed
What are the other pro/cons? Should the possibility of the combination of one user overwriting the code files at the same time as others is currently executing them introduce complicated concurrent conflict solving? Using Mutex? Is it next to impossible to overwrite the files if visitors is constantly "viewing" (executing) them?
PS. I am not interested in alternative methods/solutions for reaching "the same" goal, like:
Cached and/or saved output buffers, as an alternative, is out of the question, mainly because the output from the generated php-code is highly dynamic and context-sensitive
Storing the code as variables in a database and create dynamic php code that can do what is requested based on stored data, mainly because I don't want to use a database as backend for the feature. I don't ever need to search the data, query it for Aggregation, ranking or any other data collecting or manipulation
Memcached, APC etcetera. It's not a caching feature I want
Stand-alone (not PHP) server with custom compiled binary running in memory. Not what I am looking for here, although this alternative have crossed my mind.
EDIT:
Got many questions about what "type" of code is generated. Without getting into details I can say: It's very context sensitive code. Code is not based on user direct input but input in terms of choices, position and flags. Like "closed" objects in relation to other objects. Most code parts is related to each other in many different, but very controlled, ways (similar to linked lists, genetic cells in AI-code etcetera) so querying a database is out of the question. One code file will include one or more others, and so on..
I do the same thing in an application. It generates static PHP Code from data in a MySQL database. I store the code in memcached and use ‘eval’ to execute it. Only when something changes in the MySQL database I regenerate the PHP. It saves an awful lot of MySQL reads
I am building a website for a client. The landing page has 4 areas of customizable content. The content is minimal, it's mainly just a reference to an image, an associated link, and an order...so 3 fields.
I am already using a lot of MySQL Tables for the other CMS related aspects, but for this one use, I am wondering if a database table really is the best option. The table would have only 4 records and there would be 3 columns. It's not going to be written too very often, just read from as the landing page loads.
Would I be better off sticking to a MySQL table for storing this minimal amount of information since it will fit into the [programming] workflow easy enough? Or would using an XML file that stores the information be a better way to go?
UPDATE: The end user (who knows nothing about databases) will be going through a web interface I create to choose one of the 4 items they want to update, then uploading an image from their computer, then selecting the link from a list of pages on the site (or offsite). The XML file or Database table will store the location of the image on the server and the link to wrap it in.
A database is the correct solution for storing dynamic data. However I agree it sounds like MySQL is overkill for this situation. It also means an entire other thing to administer and manage. But using a flat-file like XML is a bad idea too. Luckily SQLite is just the thing.
Let these snippets from the SQLite site encourage you:
Rather than using fopen() to write XML or some proprietary format into disk files used by your application, use an SQLite database instead. You'll avoid having to write and troubleshoot a parser, your data will be more easily accessible and cross-platform, and your updates will be transactional.
Because it requires no configuration and stores information in ordinary disk files, SQLite is a popular choice as the database to back small to medium-sized websites.
Read more here.
As for PHP code to interface with the database, use either PDO or the specific SQLite extension.
PDO is more flexible so I'd go with that and it should work out of the box - "PDO and the PDO_SQLITE driver is enabled by default as of PHP 5.1.0." (reference) Finding references and tutorials is super easy and just a search away.
If you use PHP the SimpleXML library could give you what you need and using XML in this fashion for something like you describe might be the simplest way to go. You don't have to worry about any MySQL configurations or worrying about the client messing something up. Restoring and maintaining an XML file might be easier too. The worry might be future scalability if you plan to grow the site much. That's my 2 cents anyway.
From looking at the way some forum softwares are storing data in a database (eg. phpBB uses MySQL databases for storing just about everything) I started to wonder why they do it that way? Couldn't it be just as fast and efficient to use.. maybe xsl with xslt to store forum topics and posts? Or to at least store the posts in a topic?
There are loads of reasons why they use databases and not flat files. Here are a few off the top of my head.
Referential integrity
Indexes and efficient searching
SQL Joins
Here are a couple more posts you can look at for more information :
If i can store my data in text files and easily can deal with these files, why should i use database like Mysql, oracle etc
Why use MySQL over flatfiles?
Why use SQL database?
But this is exactly what databases have been designed and optimized for, storage and retrieval of data. Using a database allows the forum designer to focus on their problem and not worry about implementing storage as well. It wouldn't make sense to ignore all the work that has been done in the database world and instead implement your own solution. It would take more time, be more buggy, and not run as quickly.
Database engines handle all the problems of concurrency. Imagine that, two users try to write in your forum at the same time. If you store the post in files, the first attempt will lock the file so the second has to wait for the first to finish.
Otherwise if you want to search, it's much faster to do it in database than scanning all the files.
So basically, it's not a good idea to store data wich can be modified by useres simultaneously, and searching is much more efficient in database.
Simply, easy access to data. It's a lot easier to find posts between a date, created by a user, or with certain keywords. You could do all of the above with flat file storage, but this would be IO intensive and slow. If you had the idea of storing each post in its own file, you'd then have the problem of running out of disk space, not because of lack of capacity, but because you'd have consumed all the available inodes.
Software such as this usually has a static caching feature - pages that don't change are written out to static HTML files, and those are served instead of hitting the database.
Mixing static caching with relational DB storage provides the best of both worlds.
I am in the planning stages of writing a CMS for my company. I find myself having to make the choice between saving page contents in a database or in folders on a file system. I have learned that PHP performs admirably well reading and writing to file systems, way better in fact than running SQL queries. But when it comes to saving pages and their data on a file system, there'll be a lot more involved than just reading and writing. Since pages will be drawn using a PHP class, the data for each page will be just data, no HTML. Therefore a parser for the files would have to be written. Also I doubt that all the data from a page will be saved in just one file, it would rather be saved in one directory, with content boxes and data in separated files.
All this would be done so much easier with MySQL, so what I want to ask you experts:
Will all the extra dilly dally with file system saving outweigh it's speed and resource advantage over MySQL?
Thanks for your time.
Go for MySQL. I'd say the only time you should think about using the file system is when you are storing files (BLOBS) of several megabytes, databases (at least the ones you typically use with a php website) are generally less performant when storing that kind of data. For the rest I'd say: always use a relational database. (Assuming you are dealing with data dat has relations of course, if it is random data there is not much benefit in using a relational database ;-)
Addition: If you define your own file-structure, and even your own way of cross referencing files you've already started building a 'database' yourself, that is not bad in itself -- it might be loads of fun! -- but you probably will not get the performance benefits you're looking for unless your situation is radically different than the other 80% of 'standard' websites on the web (a couple of pages with text and images on them). (If you are building google/youtube/flickr/facebook ... you've got a different situation and developing your own unique storage solution starts making sense)
things to consider
race-condition in file write if two user editing same piece of content
distribute file across multiple servers if CMS growth, latency on replication will cause data integrity problem
search performance, grep on files on multiple directory will be very slow
too many files in same directory will cause server performance especially in windows
Assuming you have a low-traffic, single-server environment here…
If you expect to ever have to manage those entries outside of the CMS, my opinion is that it's much, much easier to do so with existing tools than with database access tools.
For example, there's huge value in being able to use awk, grep, sed, sort, uniq, etc. on textual data. Proxying that through a database makes this hard but not impossible.
Of course, this is just opinion based on experience.
S
Storing Data on the filesystem may be faster for large blobs that are always accessed as one piece of information. When implementing a CMS, you typically don't only have to deal with such blobs but also with structured information that has internal references (like content fields belonging to a certain page that has links to other pages...). SQL-Databases provide an easy way to access structured information, files on your filesystem do not (except of course simple hierarchical structures that can be represented with folders).
So if you wanted to store the structured data of your cms in files, you'd have to use a file format that allows you to save the internal references of your data, e.g. XML. But that means that you would have to parse those files, which is not only a lot of work but also makes the process of accessing the data slow again.
In short, use MySQL
Use a database and you have lots of important properties from the beginning "for free" without inventing them in some suboptimal ways if you go the filesystem way. If you don't want to be constrained to MySQL only you can make use of e.g. the database abstraction layer of the doctrine project.
Additionally you have tools like phpMyAdmin for easy lookup or manipulation of your data versus the texteditor.
Keep in mind that the result of your database queries can almost always be cached in memory or even in the filesystem so you have the benefit of easier management with well known tools and similar performance.
When it comes to minor modifications of website contents (eg. fixing a typo or updating external links), I find it much easier to connect to the server using SSH and use various tools (text editors, grep etc.) on files, rather than I having to use CMS interface to update each file manually (our CMS has such interface).
Yet there are several questions to analyze and answer, mentioned above - do you plan for scalability, concurrent modification of data etc.
No, it will not be worth it.
And there is no advantage to using the filesystem over a database unless you are the only user on the system (in which the advantage would be lost anyway). As soon as the transactions start rolling in and updates cascades to multiple pages and multiple files you will regret that you didn't used the database from the beginning :)
If you are set on using caching, experiment with some of the existing frameworks first. You will learn a lot from it. Maybe you can steal an idea or two for your CMS?
I'm planning a PHP website architecture. It will be a small website with few visitors and small set of data. The data is modified exclusively by a single user (administrator).
To make things easier, I don't want to bother with a real database or XML data. I think about storing all data through PHP serialization into several files. So for example if there are several categories, I will store an array containing Category class instances for each category.
Are there any pitfalls using PHP serialization in those circumstances?
Use databases -- it is not that difficult and any extra time spent will be well learnt with database use.
The pitfalls I see are as Yehonatan mentioned:
1. Maintenance and adding functionality.
2. No easy way to query or look at data.
3. Very insecure -- take a look at "hackthissite.org". A lot of the beginning examples have to do with hacking where someone put the data hard coded in files.
4. Serialization will work for one array, meaning one table. If you have to do anything like have parent categories that have to match up to other data, not going to work so well.
The pitfalls come when with maintenance and adding functionality.
it is a very good way to learn but you will appreciate databases more after the lessons.
I tried to implement PHP serialization to store website data. For those who want to do the same thing, here's a feedback from the project started a few months ago and heavily modified since:
Pros:
It was very easy to load and save data. I don't have to write SQL queries, optimize them, etc. The code is shorter (with parametrized SQL queries, it may grow a lot).
The deployment does not require additional effort. We don't care about what is supported on the web server: if there is just PHP with no additional extensions, database servers, etc., the website will still work. Sqlite is a good thing, but it is not possible to install it on some servers, and it also requires a PHP extension.
We don't have to care about updating a database server, nor about the database server to use (thus avoiding the scenario where the customer wants to migrate from Microsoft SQL Server to Oracle, etc.).
We can add more properties to the objects without having to break everything (just like we can add other columns to the database).
Cons:
Like Kerry said in his answer, there is "no easy way to query or look at data". It means that any business intelligence/statistics cases are impossible or require a huge amount of work. By the way, some basic scenarios become extremely complicated. Let's say we store products and we want to know how much products there are. Instead of just writing select count(1) from Products, in my case it requires to create a PHP file just for that, load all data then count the number of items, sometimes by adding stuff manually.
Some changes required to implement data migration, which was painful and required more work than just executing an SQL query.
To conclude, I would recommend using PHP serialization for storing data of a small website modified by a single person only if all the following conditions are true:
The deployment context is unknown and there are chances to have a server which supports only basic PHP with no extensions,
Nobody cares about business intelligence or similar usages of the information,
There will be no changes to the requirements with large impact on the data structure.
I would say use a small database like sqlite if you don't want to go through setting up a full db server. However I will also say that serializing an array and storing that in a text file is pretty dang fast. I've had to serialize an array with a few thousand records (a dump from a database) and used that as a temp database when our DB server was being rebuilt for a few days.