I am designing a web application which will be doing three things:
1) store some data
2) make user available to view these data
3) from time to time add/remove/change some data
Looks pretty simple, but I would like to minimase usage of server resources by avoiding MySQL and PHP. My main goal is to deliver HTML file for user - posts1.html (posts2.html, posts3.html... (where 1,2,3 are numbers of pages of data)).
Normally, I would create posts.php file, which would send query to database, but my data are changing only three-five times a day, so it would be a huge waste.
Instead, I thought about caching these data, what would spare a lot of server resources, but in this situation there would be some of PHP code involved.
My another idea is to create script that would be creating all HTML files after every change in database and then replace the old ones with them. But what if someone requests page that is replacing right now? It may cause errors, user can get the uncompleted file etc.
However, there is one solution - I could store created HTML files in two directories (A and B) and using .htaccess do something like this (pseudocode):
if ( (HOURS)%2 == 0 )
/postsX.html -> /A/postsX.html
else
/postsX.html -> /B/postsX.html
It would give me enough time to upgrade all files.
I would love to hear what do you think about it and what would you do?
If you dont want to use a full blown MySQL server, use SQLite. It's part of PHP and very lightweight. Then add caching where appropriate. Your other approaches sound like a waste of time to me. Too much effort for too little gain. SQLite and caching is tried and tested.
Besides, you should not worry about waste of resources unless you are running short on them. Your application doesnt sound like it needs scaling at this point. So build the simplest thing that will work.
If you have to have that static pages approach, then put all those files into a symlinked folder. Create a script that generates the static pages into a new folder (either via cron or manual trigger) and then changes the symlink from the old folder to the new folder. This way you don't have to worry about people hitting your site while its generating content.
you should use SQLite with ADODB or any other supported database and implement caching. See the ADODB compatibility list http://phplens.com/lens/adodb/docs-adodb.htm#drivers. The caching feature is really powerful, ADODB is very famous and well documented.
Related
I've just finished a basic PHP file, that lets indie game developers / application developers store user data, handle user logins, self-deleting variables etc. It all revolves around storage.
I've made systems like this before, but always hit the max_user_connections issue - which I personally can't currently change, as I use a friends hosting - and often free hosting providers limit the max_user_connections anyway. This time, I've made the system fully text file based (each of them holding JSON structures).
The system works fine currently, as it's being tested by only me and another 4/5 users per second. The PHP script basically opens a text file (based upon query arguments), uses json_decode to convert the contents into the relevant PHP structures, then alters and writes back to the file. Again, this works fine at the moment, as there are few users using the system - but I believe if two users attempted to alter a single file at the same time, the person who writes to it last will overwrite the data that the previous user wrote to it.
Using SQL databases always seemed to handle queries quite slowly - even basic queries. Should I try to implement some form of server-side caching system, or possibly file write stacking system? Or should I just attempt to bump up the max_user_connections, and make it fully SQL based?
Are there limits to the number of users that can READ text files per second?
I know game / application / web developers must create optimized PHP storage solutions all the time, but what are the best practices in dealing with traffic?
It seems most hosting companies set the max_user_connections to a fairly low number to begin with - is there any way to alter this within the PHP file?
Here's the current PHP file, if you wish to view it:
https://www.dropbox.com/s/rr5ua4175w3rhw0/storage.php
And here's a forum topic showing the queries:
http://gmc.yoyogames.com/index.php?showtopic=623357
I did plan to release the PHP file, so developers could host it on their own site, but I would like to make it work as well as possible, before doing this.
Many thanks for any help provided.
Dan.
I strongly suggest you not re-invent the wheel. There are many options available for persistent storage. If you don't want to use SQL consider trying out any of the popular "NoSQL" options like MongoDB, Redis, CouchDB, etc. Many smart people have spent many hours solving the problems you are mentioning already, and they are hard at work improving and supporting their software.
Scaling a MySQL database service is outside the scope of this answer, but if you want to throttle up what your database service can handle you need to move out of a shared hosting environment in any case.
"but I believe if two users attempted to alter a single file at the same time, the person who writes to it last will overwrite the data that the previous user wrote to it."
- that is for sure. It even throws an error if the 2nd tries to save while the first has it open.
"Are there limits to the number of users that can READ text files per second?"
- no, but it is pointless to open a file, just for read multiple times. That file needs to be cached in a content management network.
"I know game / application / web developers must create optimized PHP storage solutions all the time, but what are the best practices in dealing with traffic?"
- usually a new database will do a better job than files, starting from the fact that the most often selects are stored in the RAM, the most often .txt files are not. As #oliakaoil read about the DB difference and see what you need.
I have a website that let's each user create a webpage (to advertise his product). Once the page is created it will never be modified again.
Now, my question: Is it better to keep the page content (only a few parts are editable) into a MySql database and generate it using queries everytime the page is accesed or to create a static webpage containing all the info and store it onto the server?
If I store every page on the disk, I may reach like 200.000 files.
If I store each page in MySQL database I would have to make a query each time the page is requested, and for like 200.000 entries and 5-6 queries/second I think the website will be slow...
So what's better?
MySQL will be able to handle the load if you create the tables properly (normalized and indexed). But if the content of the page doesn't change after creation, it's better if you cache the page statically. You can organize the files into buckets (folders) so that one folder doesn't have too many files in it.
Remember to cache only the content areas and not the templates. Unless each user has complete control over how his/her page shows up.
200.000 files writable by the Apache process is not a good idea.
I recommend using a database.
Database imports/exports are easier, not telling about the difference between the maintenance costs.
Databases are using caching, and if nothing is changed, they will pull up the last result, without running the query again. This doesn't stand, thanks JohnP.
If you want to redesign your webpage sometimes later you must be using MySQL to store the pages as you can't really change them (unless you dig into regexp) after making them static.
About the time issue - its not an issue if you set indexes right.
if the data is small to moderate then prefer static hardcoding ie. putting the data in the HTML, but if it is huge, computational or dynamic and changing you have no option but to use a connectivity to the Database
I believe that proper caching technique with certain attributes (long exp. time) would be better than static pages or retrieving everything from mysql everytime.
Static content is usually a good thing if you have a lot of traffic, but 5-6 queries a second is not hard for the database at all, so with your current load it doesn't matter.
You can spread the static files to different directories by file name and set up rewrite rules in your web server (mod_rewrite on Apache, basic location matching with regexp on Nginx and similar on other web servers). That way you won't even have to invoke the PHP interpreter.
A database and proper caching. 200.000 pages times, what? 5KB? That's 1 GB. Easy to keep in RAM. Besides 5/6 queries per second is easy on a database. Program first, then benchmark.
// insert quip about premature optimisation
I am in the planning stages of writing a CMS for my company. I find myself having to make the choice between saving page contents in a database or in folders on a file system. I have learned that PHP performs admirably well reading and writing to file systems, way better in fact than running SQL queries. But when it comes to saving pages and their data on a file system, there'll be a lot more involved than just reading and writing. Since pages will be drawn using a PHP class, the data for each page will be just data, no HTML. Therefore a parser for the files would have to be written. Also I doubt that all the data from a page will be saved in just one file, it would rather be saved in one directory, with content boxes and data in separated files.
All this would be done so much easier with MySQL, so what I want to ask you experts:
Will all the extra dilly dally with file system saving outweigh it's speed and resource advantage over MySQL?
Thanks for your time.
Go for MySQL. I'd say the only time you should think about using the file system is when you are storing files (BLOBS) of several megabytes, databases (at least the ones you typically use with a php website) are generally less performant when storing that kind of data. For the rest I'd say: always use a relational database. (Assuming you are dealing with data dat has relations of course, if it is random data there is not much benefit in using a relational database ;-)
Addition: If you define your own file-structure, and even your own way of cross referencing files you've already started building a 'database' yourself, that is not bad in itself -- it might be loads of fun! -- but you probably will not get the performance benefits you're looking for unless your situation is radically different than the other 80% of 'standard' websites on the web (a couple of pages with text and images on them). (If you are building google/youtube/flickr/facebook ... you've got a different situation and developing your own unique storage solution starts making sense)
things to consider
race-condition in file write if two user editing same piece of content
distribute file across multiple servers if CMS growth, latency on replication will cause data integrity problem
search performance, grep on files on multiple directory will be very slow
too many files in same directory will cause server performance especially in windows
Assuming you have a low-traffic, single-server environment here…
If you expect to ever have to manage those entries outside of the CMS, my opinion is that it's much, much easier to do so with existing tools than with database access tools.
For example, there's huge value in being able to use awk, grep, sed, sort, uniq, etc. on textual data. Proxying that through a database makes this hard but not impossible.
Of course, this is just opinion based on experience.
S
Storing Data on the filesystem may be faster for large blobs that are always accessed as one piece of information. When implementing a CMS, you typically don't only have to deal with such blobs but also with structured information that has internal references (like content fields belonging to a certain page that has links to other pages...). SQL-Databases provide an easy way to access structured information, files on your filesystem do not (except of course simple hierarchical structures that can be represented with folders).
So if you wanted to store the structured data of your cms in files, you'd have to use a file format that allows you to save the internal references of your data, e.g. XML. But that means that you would have to parse those files, which is not only a lot of work but also makes the process of accessing the data slow again.
In short, use MySQL
Use a database and you have lots of important properties from the beginning "for free" without inventing them in some suboptimal ways if you go the filesystem way. If you don't want to be constrained to MySQL only you can make use of e.g. the database abstraction layer of the doctrine project.
Additionally you have tools like phpMyAdmin for easy lookup or manipulation of your data versus the texteditor.
Keep in mind that the result of your database queries can almost always be cached in memory or even in the filesystem so you have the benefit of easier management with well known tools and similar performance.
When it comes to minor modifications of website contents (eg. fixing a typo or updating external links), I find it much easier to connect to the server using SSH and use various tools (text editors, grep etc.) on files, rather than I having to use CMS interface to update each file manually (our CMS has such interface).
Yet there are several questions to analyze and answer, mentioned above - do you plan for scalability, concurrent modification of data etc.
No, it will not be worth it.
And there is no advantage to using the filesystem over a database unless you are the only user on the system (in which the advantage would be lost anyway). As soon as the transactions start rolling in and updates cascades to multiple pages and multiple files you will regret that you didn't used the database from the beginning :)
If you are set on using caching, experiment with some of the existing frameworks first. You will learn a lot from it. Maybe you can steal an idea or two for your CMS?
First of all, the website I run is hosted and I don't have access to be able to install anything interesting like memcached.
I have several web pages displaying HTML tables. The data for these HTML tables are generated using expensive and complex MySQL queries. I've optimized the queries as far as I can, and put indexes in place to improve performance. The problem is if I have high traffic to my site the MySQL server gets hammered, and struggles.
Interestingly - the data within the MySQL tables doesn't change very often. In fact it changes only after a certain 'event' that takes place every few weeks.
So what I have done now is this:
Save the HTML table once generated to a file
When the URL is accessed check the saved file if it exists
If the file is older than 1hr, run the query and save a new file, if not output the file
This ensures that for the vast majority of requests the page loads very fast, and the data can at most be 1hr old. For my purpose this isn't too bad.
What I would really like is to guarantee that if any data changes in the database, the cache file is deleted. This could be done by finding all scripts that do any change queries on the table and adding code to remove the cache file, but it's flimsy as all future changes need to also take care of this mechanism.
Is there an elegant way to do this?
I don't have anything but vanilla PHP and MySQL (recent versions) - I'd like to play with memcached, but I can't.
Ok - serious answer.
If you have any sort of database abstraction layer (hopefully you will), you could maintain a field in the database for the last time anything was updated, and manage that from a single point in your abstraction layer.
e.g. (pseudocode): On any update set last_updated.value = Time.now()
Then compare this to the time of the cached file at runtime to see if you need to re-query.
If you don't have an abstraction layer, create a wrapper function to any SQL update call that does this, and always use the wrapper function for any future functionality.
There are only two hard things in
Computer Science: cache invalidation
and naming things.
—Phil Karlton
Sorry, doesn't help much, but it is sooooo true.
You have most of the ends covered, but a last_modified field and cron job might help.
There's no way of deleting files from MySQL, Postgres would give you that facility, but MySQL can't.
You can cache your output to a string using PHP's output buffering functions. Google it and you'll find a nice collection of websites explaining how this is done.
I'm wondering however, how do you know that the data expires after an hour? Or are you assuming the data wont change that dramatically in 60 minutes to warrant constant page generation?
So I'm going to be working on a home made blog system in PHP and I was wondering which way of storing data is the fastest. I could go in the MySQL direction, or I could go with my own little way of doing it which is storing all of the information (encoded in JSON) in files.
Which way would be the fastest, MySQL or JSON files?
For a small, single user 'database', a file system would likely be quicker - as the size and complexity grows, a database server like MySQL or SQL Server is hard to beat.
I would definately choose a DB option (as you need to be able to search and index stuff). But that does not mean you need a fully realized separate DB service.
MySQL is definitely the more scalable solution.
But the downside is you need to set up and maintain a separate service.
On the other hand there are DBs that are file based and still give you access with standard SQL (SQLite SQLite.org) jumps to mind. You get the advantages of SQL but you do not need to maintain a separate service. The disadvantage is that they are not as scalable.
I would choose a MySQL database - simply because it's easier to manage.
JSON is not really a format for storage, it's for sending data to JavaScripts. If you want to store data in files look into XML or Serialized PHP (which I suspect is what you are after, rather than JSON).
Forgive me if this doesn't answer your question very directly, but since it is a homecooked blog system is it really worth spending time thinking about what storage backend right now is faster?
You're not going to be looking at 10,000 concurrent users from day 1, it doesn't sound like it will need to scale to any maningful degree in the foreseeable future.
Why not just stick with MySQL as a sensible choice rather than a fast one? If you really want some sense that you designed for speed maybe bolt sqlite on instead.
Since you are thinking you may not have the need for a complex relational structure, this might be a fun opportunity to try something more down the middle.
Check out CouchDB, it is a document-based, schema free database (yet still indexable). The database is made of documents that contain named fields (think key-value pairs).
Have fun....
Though I don't know for certain, it seems to me that a MySQL database would be a lot faster, especially as the amount of data gets larger and larger.
Also, using MySQL with PHP is super easy, especially if you use an abstraction class like ezSQL. ezSQL makes working with a database really simple and I think you'd be creating more unnecessary work for yourself by going the home-brewed JSON direction.
I've done both. I like files for very simple problems and databases for complicated problems.
For file solutions, note these problems as the number of files increases:
1) Much more disk space is used than you might expect, because even tiny files use up a whole block. Blocks are fairly large on filesystems which support large drives.
2) Most filesystems get very slow when the number of files in a directory gets very large. My solution to this (assuming the names of the files are reasonably spread out across the alphabet) is to create a directory consisting of the first two letters of the filename. Thus, the file, "animal.txt" would be found at an/animal.txt. This works surprisingly well. If your filenames are not reasonable well-distributed across the alphabet, use some sort of hashing function to create the directories. Sounds a little crazy, but this can work very, very well, and I've used it for very fast solutions with tens of thousands of files.
But the file solutions really only fit sometimes. Unless you have a great reason to go with files, use a database.
This is really cool. It's a PHP class that controls a flat-file database with queries http://www.fsql.org/index.php
For blogs, I recommend caching the pages because blogs usually only have static content. This way, the queries only get run once while caching. You can update the cached pages when a new blog post is added.