Scalable way to store files on server (PHP)? - php

I'm creating my first web application - a really simplistic online text editor.
What I need to do is find the best way to store text based files - a lot of them.
These text files can be past 10,000 words in size (text words not computer words.) in essence I want the text documents to be limitless in size.
I was thinking about storing the text files in my MySQL database - but thought there was a better way.
Instead I'm planing on storing the text files in XML based format in a directory on my server.
The rows in the database define the name of the xml based text file and the user who created the text along with basic metadata.
An ID is generated using a V4 GUID generator , which gives the text an id and stores the text in the "/store" directory on my server. The text definitions in my server contain this id, and the android app I'm developing gets the contents of the text file by retrieving the text definition and then downloading the text to the local device using the GUID in the text definition.
I just think this is a botch job? how can I improve this system?
There has been cases of GUID colliding.
I don't want this to happen. A "slim" possibility isn't good enough - I need to make sure there is absolutely no chance in a GUID collision.
I was planning on checking the database for texts that have the same id before storing the text with a particular id - I however believe with over 20,000 pieces of text in my database this would take an long time and produce unneeded stress on the server.
How can I make GUID safe?
What happens when a GUID collides?
The server backend is going to be written in PHP.

You've got several questions here, so I'll try to answer them all.
Is XML with GUID the best way to do this?
"Best" is usually subjective. This is certainly one way to do it, but you're probably adding unneeded overhead. If it's just text you're storing, why not put it in the SQL with varchar(MAX)?
Are GUID collisions possible?
Yes, but the chance of that happening is small. Ridiculously small. There are much bigger things to worry about.
How can I make GUIDs safe?
Stop worrying about them.
What happens when a GUID collides?
This depends on how you're using them. In this case, the old data stored in the location indicated by the GUID would probably be overwritten by the new data.

Well i dont know if id use a guid i would probably just use the auto_increment key on the db table and name the files like that because unless you have deleted records from the db without cleaning up the filesystem they will always be unique. I dont know if the GUID is a requirement on the android side though.

There's nothing wrong with using MySQL to store the documents!
What is storing them in XML going to provide you with? Adding an additional format layer will only increase the processing time when they are to be read and formatted.
Placing them as files on disk would be no different than storing them in an RDBMS and in the longer-term probably cause you further issues down the line. (File access, disk-seek, locking, race conditions come to mind).


Store image location in database or generate them on the fly based on id

I've been refactoring some code and throwing away some old spaghetti. I am now faced with the following issue:
I have tv episodes which have a screenshot source file and 4 thumbnails. The current code generates the paths during the creation of the thumbnails and also when they are loaded. So the actual path to the image is never stored anywhere. It is generated based on the database id of the episode (using md5 hashes).
This quickly became a mess. Now I decided I store the path to the src and all 4 sizes in a simple json array and plug it into the database.
The question is whether this has any significant downsides? The entire json string is always between 500 and 550 chars.
Or should I stick to the on the fly generation of the paths and figure out a more maintainable way of doing so?
I think either way is valid, but find easier to handle md5, as you dont have to handle json deserialization an variable extraction, simply create the hash and file path.
May be the issue has to be with processing of several md5 hashes, vs storing several json data.
Just choose the one you like more.

Database for Content - OK to store HTML?

Basic question is - is it safe to store HTML in a database if I restrict who can submit to it?
I have a pretty simple question. I provide video tutorials and other content. Without spending months writing a proper BBCode parser, I would need to store the HTML so I can have it look exactly the way I want when I grab it from the database.
Basically I plan to store all information in the database about a tutorial series and each episode. I would like to have some formatting for the descriptions for both so I can add multiple paragraphs, ordered and unordered lists, links to required resources, and so on.
I am using PHP and creating my own database. I am using phpMyAdmin to store the information in the table right now. I will use a user with read only rights when I pull the information in the PHP code.
What is the best way to do this? Thank you!
Like others have pointed out there's nothing dangerous about storing HTML in the DB. But when you display it you need to know the HTML is safe. Seeing as you're the only one editing the HTML I see no problem.
However, I wouldn't store HTML at all. If all you need are headings, paragraphs, lists, links, images etc I'd say Markdown is a perfect fit. The benefit with Markdown is that it looks just like normal text (ie you could send your articles as e-mails or save them as txt-documents), it takes up a lot less space than HTML and you don't have to change it once HTML gets updated.
From the security point of view it is not less secure to store your HTML in a database than storing it anywhere else - if you are the only author of that HTML. But then again if other people can author HTML in your website then it doesn't matter where you store it - only how you sanitize it and how and where you display it.
Now whether or not it is an efficient way to store HTML is a completely different matter. If I were you I would use some decent templating system and store HTML in files.
Storing HTML code is fine. But if it is not from trusted source, you need to check it and allow a secure subset of markup only. HTML Tidy library will help you with that.
Also, you need to count with a future change in website design, so do not use too much markup, only basic tags. To make it look like you want, use global CSS rules and semantically named classes in the markup.
But even better is to use Markdown or another wiki-like syntax. There are nice JS editors for Markdown with real-time preview (like the one here at Stackowerflow), and you can avoid HTML altogether.
My initial answer to "should I store html in a db" is generally no. Sure it's safe if you know what you're storing, but are you really considering best practices when you ask only that question? The true answer is "It depends".
I'm sure there are things like Wordpress that store html in a database, however, as a professional website designer, I like to remember the Separation of Concerns principle. How reusable is storing html in your database for a mobile app? Is your back end now in charge of display as well as data? Do you have many implementation possibilities for a front end or are you now stuck with whatever the back end portrays, what if you want it a different color and you've stacked ul within ul within ul? How easy is the css styling now? How easy is it to change or update that html?
I could be wrong, but even Sitecore and Kentico may store an html template in a database somewhere, but the data associated with that html template is a model, not directly on the html template.
So, when you are considering this question, you may want to store your models one place and your templates another, that way when you say "hey, lets build a mobile app" you can grab your data and go, rather than creating yet another table to store the same data.
I made a really big mistake by storing text data in Mongodb gridFS + compression and using mongodump for daily backup. GridFS is 1GB of textfiles but after backup memory usage rises sometimes 1GB daily after one month 20GB in memory due to how this backup is made.
In mongodb you should do a snapshot of the data folder - rather than do mongodump. The possible reason is that it copies unused data from disk into memory then makes bson dump. So in my case text that was never used for a long time should never be loaded into memory. I think this is how backup works as even right now my Mongodb is using 200MB of ram after run mongodump its can rise to 3GB
So i think the best solution is to use a filesystem for storing HTML files as your even RAID like PERC H700 has many amazing caching features including read ahead. But it has some limitations like network access and with my experiences some data was corrupted in time and needed to run chkdsk for repair as many GB of data was add or removed daily. Also you should consider to use proper raid features like Write trough that prevent data loss when power failure.
Sqlite is not designed to be used with extremely big data so you shouldn't not use it and has missing many caching features.
Not perfect solution is to use MariaDB or its own caching script in nodejs that can use memcached/Linux ramdisk with maybe 1GB of hot cache. Using an internal nodejs caching mechanism after some time can produce many memory leak. So i can use it for network connection and I/O are using filesystem lock and many "HOT" most used files can be programmed to cached in RAM or just leave as is

Is it wise to store base64 encoded images inside a database?

I'm making an android application which takes a photo and push the image (as a base64 encoded string) to a PHP script, from here I'll be storing data about the image inside a MySQL database.
Would it be wise to store the image inside the database (since it's passed as a base64 string), would it be better to convert it back to an image and store it on the filesystem?
A base64 encoded image takes too much place (about 33% more than the binary equivalent).
MySQL offers binary formats (BLOB, MEDIUM_BLOB), use them.
Alternatively, most people prefer to store in the DB only a key to a file that the filesystem will store more efficiently, especially if it's a big image. That's the solution I prefer for the long term. I usually use a SHA1 hash of the file content to form the path to the file, so that I have no double storage and that it's easy to retrieve the record from the file if I want to (I use a three level file tree, first two levels being made respectively from the first two characters and the characters 3 and 4 of the hash so that I don't have too many direct child of a directory). Note that this is for example the logic of the git storage.
The advantage of storing them in the DB is that you'll manage more easily the backups, especially as long as your project is small. The database will offer you a cache, but your server and the client too, it's hard to decide a priori which will be fastest and the difference won't be big (I suppose you don't make too many concurrent write).
I've done it both ways, and every time I come back to code where I stored binary data in a MySQL table I always switch it to filesystem with a pointer in the MySQL table.
When it comes to performance, you're going to be much better off going to the FS as pulling multiple large BLOBs from a MySQL server will tend to saturate its pipe quickly. Usually it's a pipe you don't want clogged.
You could always save the base64_encode($image) in a file and only store the file path in the database, then use fopen() to get the encoded image.
My apologies if I didn't understand the question correctly.
"wise" is pretty subjective, I think. I think it would be wise from a "keep people from directly linking to my images" perspective. Also, it may be helpful as far as if you decide you need to change up dir structures etc.. it might make it easier on you (but this really depends on how you wrote your scripts to begin with..) but other than that... offhand I can't really think of any benefits to doing this.

choose the better option 1) Using files or 2) data direct to database

Well,I am learning the web development and currently working on PHP and mySQL,I am just totally a newbee to database concepts.To be honest Iam not sure when to create a table or when to create a database.I need some suggestions and help from you.
Okay I have these doubt kindly clear me this.I am not much aware of this but how much security does php file concepts provide us. Is there any harm in using file concepts of php?
Okay let me tell you these I want to save some data that user has entered into a text file on the server, the data might be like some message or something like his information,I wanted to save the data in a file and then save its directory path in the database. and while retrieving the data just get the file path from the database then retrieve it from the text file. Is this a good or bad idea of doing it? or should I need to save the user data in the database itself?
Similarly I also want to save the path of the images or pictures in the database and then just put the path in <img>tag.I got no one here to help me with this questions.So please help me with this,Any help is greatly appreciated.
Kindly let me know what is the way I should choose to do ?
For images and other file-bound resources, it makes sense to store the image on the file system and the path to it in the database. After all, file systems are great for storing files. You can store the file as a binary field in the database, so that's certainly an option. There are pros and cons either way. Personally, I prefer to keep the files on the file system.
I'm not sure where you're going with this:
the data might be like some message or something like his information,I wanted to save the data in a file and then save its directory path in the database
Is there a particular reason why this data needs to be in a file? Or are you just not sure how to store it in a database? If the data is structured and consistently organized then I can't imagine a reason not to keep it in a relational database. (And even if it isn't as structured, I'd probably still look into a database over the file system for this sort of thing.)
Images and other non-relational resources are generally file-bound, so keep them in files. But if you're just storing text in a text file for no other reason than that's what you've always done, I'd recommend using a database.
PHP provides only as much security as the underlying filesystem. But putting files on disk and saving the path in the db is the traditional method.
Files in a database are generally not the best solution. But that's mostly because people talk about storing binaries (e.g. images, zip files, etc...) which would be an opaque blob as far as the database is concerned.
With text files, especially small ones, it'd still be at least somewhat useable by the DB, even if only via SUBSTR() type matching/searching, so this is one case where storing in the DB could make sense.
There is a good rule-of-thumb here:
If it is something, the DB "understands" (such as text), store it in
the DB - you might later want fulltext indexing, search, text transformation, whatever.
If not (e.g. Images), store it in files
as all rules of thumb, this might or might not fit your index finger

Efficiently storing data

I am trying to create a world application using jQuery (JS) and PHP. I originally tried doing this by using a MySQL database, which didn't work well - the server got overloaded with database queries and crashed.
This time I want to store the data in a text file... maybe use JSON to parse it? How would I do this? The three main things I want are:
The x and y positions are given from the JS. So, in order:
User loads page and picks username
User moves character, the jQuery gets the x and y position
The username, x and y position are sent to a PHP page in realtime using jQuery's $.post()
The PHP page has to find some way to store it efficiently without crashing the database.
The PHP page sends back ALL online users' names and x and y coordinates to jQuery
jQuery moves the character; everyone sees the animation.
Storing the data in the file instead of the MySQL database isn't an option if you want to improve performance. Just because MySQL stores its data in the files too, but is use some technics to improve performance like caching and using indexes.
The fastest method to save and retrieve data on server is using RAM as a storage. Redis for example do that. It stores all the data in the RAM and can backup it to the hard drive to prevent data loss.
However I don't think the main problem here is MySQL itself. Probably you use it in an inappropriate way. But I can't say exactly since I don't know how many read and write requests your users generate, what the structure of your tables etc.
Text files are not the best performing things on Earth. Use a key-value store like Redis (it has a PHP client) to store them. It should be able to take a lot more beating than the MySQL server.
You can store the data in a text file in CSV (Comma separated values) format.
For example, consider your requirements.
This text file can be stored and read anytime, and use explode function to separate values.
