First Some Background
I'm planning out the architecture for a new PHP web application and trying to make it as easy as possible to install. As such, I don't care what web server the end user is running so long as they have access to PHP (setting my requirement at PHP5).
But the app will need some kind of database support. Rather than working with MySQL, I decided to go with an embedded solution. A few friends recommended SQLite - and I might still go that direction - but I'm hesitant since it needs additional modules in PHP to work.
Remember, the aim is easy of installation ... most lay users won't know what PHP modules their server has or even how to find their php.ini file, let alone enable additional tools.
My Current Objective
So my current leaning is to go with a filesystem-based data store. The "database" would be a folder, each "table" would be a specific subfolder, and each "row" would be a file within that subfolder. For example:
/public_html
/application
/database
/table
1.data
2.data
/table2
1.data
2.data
There would be other files in the database as well to define schema requirements, relationships, etc. But this is the basic structure I'm leaning towards.
I've been pretty happy with the way Microsoft built their Open Office XML file format (.docx/.xlsx/etc). Each file is really a ZIP archive of a set of XML files that define the document.
It's clean, easy to parse, and easy to understand.
I'd like to actually set up my directory structure so that /database is really a ZIP archive that resides on the server - a single, portable file.
But as the data store grows in size, won't this begin to affect performance on the server? Will PHP need to read the entire archive in to memory to extract it and read its composite files?
What alternatives could I use to implement this kind of file structure but still make it as portable as possible?
Sqlite is enabled by default since PHP5 so most all PHP5 users should have it.
I think there will be tons of problems with the zip approach, for example adding a file to a relatively large zip archive is very time consuming. I think there will be horrible concurrency and locking issues.
Reading zip files requires a php extension anyway, unless you went with a pure PHP solution. The downside is most php solutions WILL want to read the whole zip into memory, and will also be way slower than something that is written in C and compiled like the zip extension in PHP.
I'd choose another approach, or make SQLite/MySQL a requirement. If you use PDO for PHP, then you can allow the user to choose SQLite or MySQL and your code is no different as far as issuing queries. I think 99%+ of webhosts out there support MySQL anyway.
Using a real database will also affect your performance. It's worth loading the extra modules (and most PHP installations have at least the mysql module and probably sqlite as well) for the fact that those modules are written in C and run much faster than PHP, and have been optimized for speed. Using sqlite will help keep your web app portable, if you're willing to deal with sqlite BS.
Zip archives are great for data exchange. They aren't great for fast access, though, and they're awful for rewriting content. Both of these are extremely important for a database used by a web application.
Your proposed solution also has some specific performance issues -- the list of files in a zip archive is internally stored as a "flat" list, so accessing a file by name takes O(n) time relative to the size of the archive.
Related
I have php db driven website that uses a lot of flash for user interaction.
I need to make it multilangual like 20+ languages.
Site is quite large and has a lot of users coming to it every day.
Other developer i work with saying we should store translation in local files e.g. /lang/english.php /lang/german.php etc.
I was thinking since database is on the same dedicated server there should not be a slow down, which way you think will work is faster?
I don't know if it's an option, but you could also use gettext().
That way your translations are stored in local files (faster than a database) and you have the advantage that there are programs like poedit (takes some getting used to...) that you or a translator can use to automatically generate the translation files so it's a bit easier to maintain then php files.
Local files are a LOT faster than DB content (Although you can save the DB output in a local cache, like files or even memcache or APC), probably not that easy to translate, but it will help you with the basic speed of implementation too, You should take a look at:
http://framework.zend.com/manual/en/zend.translate.html
You can use only this part of the framework and it will give you a HUGE boost, it supports DB based translation or local files (a lot of adapters)
UPDATE:
thanks Corbin, you are right, it's better to have the direct link.
Say we want to develop a photo site.
Would it be faster to upload or download images to or from MongoDB than store or download images from disk... Since mongoDB can save images and files in chunks and save metadata.
So for a photosharing website, would it be better (faster) to store the images on a mongodb or on a typical server harddisk. etc.
im thinking of using php, codeigniter btw if that changes the performance issues regarding the question.
Lightweight web servers (lighttpd, nginx) do a pretty good job of serving content from the filesystem. Since the OS acts as a caching layer they typically serve content from memory which is very fast.
If you want to serve images from mongodb the web server has to run some sort of script (python, php, ruby... of course FCGI, you can't start a new process for each image), which has to fetch data from mongodb each time the image is requested. So it's going to be slow? The benefits are automatic replication and failover if you use replica sets. If you need this and clever enough to know to achieve it with FS then go with that option... If you need a very quick implementation that's reliable then mongodb might be a faster way to do that. But if your site is going to be popular sooner or later you have to switch to the FS implementation.
BTW: you can mix these two approaches, store the image in mongodb to get instant reliability and then replicate it to the FS of a couple of servers to gain speed.
Some test results.
Oh one more thing.. coupling the metadata with the image seems to be nice until you realize the generated HTML and the image download is going to be two separate HTTP requests, so you have to query mongo twice, once for the metadata and once for the image.
When to use GridFS for storing files with MongoDB - the document suggests you should. It also sounds fast and reliable, and is great for backups and replication. Hope that helps.
Several benchmarks have shown MongoDB is approximately 6 times slower for file storage (via GridFS) versus using the regular old filesystem. (One compared apache, nginx, and mongo)
However, there are strong reasons to use MongoDB for file storage despite it being slower -- #1 free backup from Mongo's built-in sharding/replication. This is a HUGE time saver. #2 ease of admin, storing metadata, not having to worry about directories, permissions, etc. Also a HUGE time saver.
Our photo back-end was realized years ago in a huge gob of spaghetti code that did all kinds of stuff (check or create user dir, check or create date dirs, check for name collision, set perms), and a whole other mess did backups.
We've recently changed everything over to Mongo. In our experience, Mongo is a bit slower (it may be 6 times slower but it doesn't feel like 6 times slower), and anyway- so what? All that spaghetti is out the window, and the new Mongo+photo code is much smaller, tighter and logic simpler. Never going back to file system.
http://www.lightcubesolutions.com/blog/?p=209
You definitely do not want to download images directly from MongoDB. Even going through GridFS will be (slightly) slower than from a simple file on disk. You shouldn't want to do it from disk either. Neither option is appropriate for delivering image content with high throughput. You'll always need a server-side caching layer for static content between your origin/source (be it mongo or the filesystem) and your users.
So what that in mind you are free to pick whatever works best for you, and MongoDB's GridFS provides quite a few features for free that you'd otherwise have to do yourself when you're working directly with files.
PHP 5.3 has a new feature called PHAR similar to JAR in JAVA. It's basically a archive of PHP files. What are its advantages? I can't understand how they can be helpful in the web scenario.
Any other use other than "ease of deployment" - deploy an entire application by just copying one file
There are tremendous benefits for open source projects (in no particular order).
Easier deployment means easier adoption. Imagine: You install a CMS, forum, or blog system on your website by dragging it into your FTP client. That's it.
Easier deployment means easier security. Updating to the latest version of a software package will be much less complicated if you have only one file to worry about.
Faster deployment. If your webhost doesn't give you shell access, you don't need to unzip before uploading, which cuts out per-file transfer overhead.
Innate compartmentalization. Files that are part of the package are clearly distinguished from additions or customizations. You know you can easily replace the archive but you need to backup your config and custom templates (and they aren't all mixed together).
Easier libraries. You don't need to figure out how to use the PEAR installer, or find out whether this or that library has a nested directory structure, or whether you have to include X, Y, or Z (in that order?). Just upload, include archive, start coding.
Easier to maintain. Not sure whether updating a library will break your application? Just replace it. Broken? Revert one file. You don't even need to touch your application.
What you see is what you get. Chances are, someone is not going to go to the trouble of fudging with an archive, so if you see one installed on a system you maintain, you can be fairly confident that it doesn't have someone's subtly buggy random hacks thrown in. And a hash can quickly tell you what version it is or whether it's been changed.
Don't poo-poo making it easier to deploy things. It won't make any difference for homegrown SaaS, but for anyone shipping or installing PHP software packages it's a game-changer.
In theory it should also improve loading speed. If you have alot of files which need to be included, replacing it with single include will save you time on file opening operations.
In my experience, loosely packaged PHP source files sitting in a production environment invite tinkering with live code when a fix is needed. Deploying in a .phar file discourages this behaviour and helps reinforce better practices, i.e. build and test in a local environment, then deploy to production.
The advantage is mainly ease of deployment. You deploy an entire application by just copying one file.
Libraries can also be used without being expanded.
Any tool that works on a single file "suddenly" works with all files of an application at once.
E.g. transport: You can upload the entire application through a single input/file element without additional steps.
E.g. signing an application: checksum/sign the file -> checksum/signature for the whole application.
...
I am looking for a super-light weight open-source database engine (could be a library that mimics one) to be packaged part of a tiny PHP script distributed to people without sudo access. Basic CRUD, no need for any complicated implementations with string search, etc.
I found txtSQL (uses flat files, which I believe is the way to go) but hesitant to use it given the last time it was updated (2005-03).
Suggestions anyone?
sqlite gives you a platform-independent file format and is heavily regression tested and widely used. It is also available in PHP via SQLite3.
sqlite is about as light as you can get and it does everything via text files.
Setup is following:
Drupal project, one svn repo with trunk/qa/production-ready branches, vhosts for every branch, post-commit hook that copies files from repository to docroots.
Problem is following: Drupal website often relies not only on source code but on DB data too (node types, their settings, etc.).
I'm looking for solution to make this changes versionable. But not like 'diffing' all data in database, instead something like fixtures in unit tests.
Fixture-like scripts with SQL data and files for content that should be versionable and be applied after main post-commit hook.
Is there anything written for that purpose, or maybe it would be easy to adapt some kind of build tool (like Apache Ant) or unit testing framework. And it would be very great, if this tool know about drupal, so in scripts I can do things like variable_set(), drupal_execute().
Any ideas? Or should I start coding right now instead of asking this? :)
It sounds like you've already got some infrastructure there that you've written.
So I'd start coding! There's not anything that I'm aware of thats especially good for this at the moment. And if there is, I imagine that it would take some effort to get it going with your existing infrastructure. So starting coding seems the way to go.
My approach to this is to use sql patch files (files containing the sql statements to upgrade the db schema/data) with a version number at the start of the filename. The database then contains a table with config info in (you may already have this) that includes info on which version the database is at.
You can then take a number of approaches to automatically apply the patch. One would be a script that you call from the postcommit that checks the version the database is at, and then checks to see if the latest version you have a patch for is newer than the version the db is at, and applies it/them (in order) if so.
The db patch should always finish by updating aforementioned the version number in the config table.
This approach can be extended to include the ability to set up a new database based on a full dump file and then applying any necessary patches to it to upgrade it as well.
Did a presentation on this at a recent conference (slideshare link) -- I would STRONGLY suggest that you use a site-specific custom module whose .install file contains versioned 'update' functions that do the heavy lifting for database schema changes and settings/configuration changes.
It's definitely superior to keeping .sql files around, because Drupal will keep track of which ones have run and gives you a batch-processing mechanism for anything thaht requires long-running bulk operations on lots of data.
My approach to this is to use sql patch files (files containing the sql statements to upgrade the db schema/data) with a version number at the start of the filename.
I was thinking of file (xml or something) with needed DB structure, and tool that applies necessary changes.
And yes, after more research I agreee: it will be easier to code it than to adapt some other solutions. Though some routines from simpletest drupal module will be helpful, I think.
You might want to check out the book Refactoring Databases.
The advice I heard from one of the authors is to have a script that will upgrade the database from version to version rather than building up from scratch each time.
Previously: Drupal Source Control Strategy?