I'm making an android application which takes a photo and push the image (as a base64 encoded string) to a PHP script, from here I'll be storing data about the image inside a MySQL database.
Would it be wise to store the image inside the database (since it's passed as a base64 string), would it be better to convert it back to an image and store it on the filesystem?
A base64 encoded image takes too much place (about 33% more than the binary equivalent).
MySQL offers binary formats (BLOB, MEDIUM_BLOB), use them.
Alternatively, most people prefer to store in the DB only a key to a file that the filesystem will store more efficiently, especially if it's a big image. That's the solution I prefer for the long term. I usually use a SHA1 hash of the file content to form the path to the file, so that I have no double storage and that it's easy to retrieve the record from the file if I want to (I use a three level file tree, first two levels being made respectively from the first two characters and the characters 3 and 4 of the hash so that I don't have too many direct child of a directory). Note that this is for example the logic of the git storage.
The advantage of storing them in the DB is that you'll manage more easily the backups, especially as long as your project is small. The database will offer you a cache, but your server and the client too, it's hard to decide a priori which will be fastest and the difference won't be big (I suppose you don't make too many concurrent write).
I've done it both ways, and every time I come back to code where I stored binary data in a MySQL table I always switch it to filesystem with a pointer in the MySQL table.
When it comes to performance, you're going to be much better off going to the FS as pulling multiple large BLOBs from a MySQL server will tend to saturate its pipe quickly. Usually it's a pipe you don't want clogged.
You could always save the base64_encode($image) in a file and only store the file path in the database, then use fopen() to get the encoded image.
My apologies if I didn't understand the question correctly.
"wise" is pretty subjective, I think. I think it would be wise from a "keep people from directly linking to my images" perspective. Also, it may be helpful as far as if you decide you need to change up dir structures etc.. it might make it easier on you (but this really depends on how you wrote your scripts to begin with..) but other than that... offhand I can't really think of any benefits to doing this.
Related
What I know, in Database Context, BLOB or Binary Large OBject is nothing but actually a stored binary code for a given data. Can Reserves spaces in GBs and can be used to store virtually any data type. But What's actually a use of it?
My major is Computer Vision and I'm fairly novice at databases and web development. Currently, I'm working on a sentiment analysis project and want to collect a large dataset for this purpose i.e. huge number of images and also want to keep record of whether a image has been used for the analysis purpose or not. I thought storing images in database with separate column for access record is the best thing I can do to have an organized and systematic approach. But Everyone I talked with recommends not to store image as a blob in database but just have its URL or name there and should have images in a dedicated folder.
Moreover, since BLOB is just binary encoding of a file how would we decode it into an image file? I found codes like following to convert a BLOB value into an image:
echo '<img src="data:image/png;base64,' . base64_encode($image->getimageblob()) . '" />';
echo '<img src="data:image/jpg;base64,' . base64_encode($image->getimageblob()) . '" />';
But these codes are specific to the extension (And personally I haven't been successful with any such codes). As all extensions for sure have some different schemes and thus a code cannot be used for image of all those extensions. My dataset targets visuals of an image and not on extension thus contains images of various extensions so how can one deal with them using a BLOB?
So the approach of storing just names in database and and images in a dedicated folder sounds good but then what is the use of database itself? Can not we have some renaming mechanism for images via PHP and store them directly into that folder. Why use database when we can rename images like img_1_accesses_5.png and split image name to get the ID and number of times it accessed?
If BLOB can store virtually every type of data, why the use of BLOB is such horrible and everyone recommends not to use it? And what is the problem if we directly inject images into database as BLOB? And finally If its suitable for images then how to deal with it?
So my question is How to effectively use BLOB and for which purposes it is suitable?
So my question is How to effectively use BLOB and for which purposes it is suitable?
Quick and dirty answer
The simple answer is: BLOBs smaller than
256KB are more efficiently handled by a database,
while a filesystem is more efficient for those greater
than 1MB. Of course, this will vary between
different databases and filesystems
There is a microsoft technical report here : Compare blob and ntfs filesystem . The report is quite old (2006) but i think there isn't any much change from there.
Imaging when you want to read file which stored in blob. you have send request to your database software, then the software controller will read blob data which is stored in filesystem. Instead of directly read from file-system, you have to go through 2 steps processes. So when the size of your file become bigger, blob will slow down your database a lot. And we all know that speed is the main key for database.
Hope that help
I've been refactoring some code and throwing away some old spaghetti. I am now faced with the following issue:
I have tv episodes which have a screenshot source file and 4 thumbnails. The current code generates the paths during the creation of the thumbnails and also when they are loaded. So the actual path to the image is never stored anywhere. It is generated based on the database id of the episode (using md5 hashes).
This quickly became a mess. Now I decided I store the path to the src and all 4 sizes in a simple json array and plug it into the database.
The question is whether this has any significant downsides? The entire json string is always between 500 and 550 chars.
Or should I stick to the on the fly generation of the paths and figure out a more maintainable way of doing so?
I think either way is valid, but find easier to handle md5, as you dont have to handle json deserialization an variable extraction, simply create the hash and file path.
May be the issue has to be with processing of several md5 hashes, vs storing several json data.
Just choose the one you like more.
I am currently involved in a project to create a website which allows users to share and rate images, for their creative media course.
I am trying to find ways to save images to a mysql database. I know i can save images as blobs, but this won't work as i plan on only allowing users to save high res images. Therefore, i've tried to find out how to store images in a directory/server folder and store references to the images in the database. An added complication to he matter, is that the reference must automatically save within a mysql database table.
Does anyone know how to go about this? or point me in the right direction?
Thanks!
I've actually built a similar website (mass image uploader) so I can speak from experience.
Keeping track of the files
Save the image file as-is on disk and save the path to the file in the database. This part should be pretty straightforward.
One disadvantage is that you need a database lookup for every image, but if your table is well optimized (indexes) this should be no real problem.
There are many advantages, such as your files become easily referable and you can add meta data to your files (like number of views).
Filenames
Now, saving files, lots of files, is not immediately straightforward.
If you don't care at all about filenames just generate a random hash like:
$filename = md5(uniqid()); // generate a random hash, mileage may vary
This gets ride of all kind of filename related issues like duplicate filenames, unsupported characters etc.
If you want to preserve the filename, store the filename in the database.
If you want your filename on disk to also be somewhat human readable I would go for a mixed approach: partly hash, partly original filename. You will need to filter unsupported characters (like /), and perhaps transliterate similar characters (like é -> e and ß -> ss). Foreign languages such as Chinese and Hebrew can give interesting results, so be aware of that. You could also encode any foreign character (like base64_encode) but that doesn't do much for readability.
Finally, be aware of filepath length constraints. Filenames and filepaths can not be infinitely long. I believe Windows is 255 for the full path.
Buckets
You should definitely consider using buckets because OSes (and humans) don't like folders with thousands of files.
If you're using hashes you already have a convenient bucket scheme available.
If your hash is 0aa1ea9a5a04b78d4581dd6d17742627
Your bucket(s) can be: 0/a/a/1/e/a9a5a04b78d4581dd6d17742627. In this case there are have 5 nested buckets. which means you can expect to have one file in each bucket after 16^5 (~1 million) files. How many levels of buckets you need is up to you.
Mime-type
It's also good to keep track of the original file extension / mime-type. If you only have one kind of mime-type (like TIFF) then you don't need to worry about it. Most files have some way to easily detect that it's a file in that format but you don't want to have to rely on that. PNGs start with PNG (open one with a text editor to see it).
Relative path vs absolute path
I would also recommend saving the relative path to the files, not the absolute path. This makes maintenance much easier.
So save:
0/a/a/1/e/a9a5a04b78d4581dd6d17742627
instead of:
/var/www/wwwdata/images/0/a/a/1/e/a9a5a04b78d4581dd6d17742627
I'm creating my first web application - a really simplistic online text editor.
What I need to do is find the best way to store text based files - a lot of them.
These text files can be past 10,000 words in size (text words not computer words.) in essence I want the text documents to be limitless in size.
I was thinking about storing the text files in my MySQL database - but thought there was a better way.
Instead I'm planing on storing the text files in XML based format in a directory on my server.
The rows in the database define the name of the xml based text file and the user who created the text along with basic metadata.
An ID is generated using a V4 GUID generator , which gives the text an id and stores the text in the "/store" directory on my server. The text definitions in my server contain this id, and the android app I'm developing gets the contents of the text file by retrieving the text definition and then downloading the text to the local device using the GUID in the text definition.
I just think this is a botch job? how can I improve this system?
There has been cases of GUID colliding.
I don't want this to happen. A "slim" possibility isn't good enough - I need to make sure there is absolutely no chance in a GUID collision.
I was planning on checking the database for texts that have the same id before storing the text with a particular id - I however believe with over 20,000 pieces of text in my database this would take an long time and produce unneeded stress on the server.
How can I make GUID safe?
What happens when a GUID collides?
The server backend is going to be written in PHP.
You've got several questions here, so I'll try to answer them all.
Is XML with GUID the best way to do this?
"Best" is usually subjective. This is certainly one way to do it, but you're probably adding unneeded overhead. If it's just text you're storing, why not put it in the SQL with varchar(MAX)?
Are GUID collisions possible?
Yes, but the chance of that happening is small. Ridiculously small. There are much bigger things to worry about.
How can I make GUIDs safe?
Stop worrying about them.
What happens when a GUID collides?
This depends on how you're using them. In this case, the old data stored in the location indicated by the GUID would probably be overwritten by the new data.
Well i dont know if id use a guid i would probably just use the auto_increment key on the db table and name the files like that because unless you have deleted records from the db without cleaning up the filesystem they will always be unique. I dont know if the GUID is a requirement on the android side though.
There's nothing wrong with using MySQL to store the documents!
What is storing them in XML going to provide you with? Adding an additional format layer will only increase the processing time when they are to be read and formatted.
Placing them as files on disk would be no different than storing them in an RDBMS and in the longer-term probably cause you further issues down the line. (File access, disk-seek, locking, race conditions come to mind).
The best way to store images into MySQL is by storing the image location as a character string.
If you need to manipulate the image, then, the best way is to copy the image as a binary.
How one can store images into binary form and how we can retrieve them back? I don’t know anything about this technique. Please tell me how we can do this.
Don't store images in the database. Store them in the filesystem, then store their relative paths in the database.
I've written some blogs on this (and have some data from SQL Server)
http://www.atalasoft.com/cs/blogs/loufranco/archive/2007/12/03/images-in-databases-part-i-what-to-store.aspx
http://www.atalasoft.com/cs/blogs/loufranco/archive/2007/12/04/images-in-databases-part-ii-web-images-are-random-access.aspx
http://www.atalasoft.com/cs/blogs/loufranco/archive/2009/10/26/more-on-images-in-databases.aspx
Basically,
Small images are ok to put in a blob
Large images are much better to put on the filesystem
Images in a blob are much easier to manage (transactions, backup, simpler code, access control)
Images on the filesystem will perform much better
Think about pulling some meta-data out of the image and storing in separate columns for filtering and sorting purposes.
Almost every professional enterprise system that needs to deal with a lot of large blobs has some way of putting them on the filesystem. The latest SQL Server even has a field type that will do it automatically (and then it's as easy to program and manage as a blob)
You can use the BLOB data type. Although I agree with #Ignacio Vazquez-Abrams, there are times where storing the image in the DB is best. I have done so in past with great results. As long as the files are not large then this is a good solution.