I'm building the database structure for a portal and I have some doubts related to elements that I decided aren't going to be stored in the database, typically media and specifically images.
Suppose that we have contents and every content could have a main image. Also, there is a slideshow with featured contents that need big images from the contents. An intuitive idea is leave the DB without this task and store the images with a name convention. Then, in the code (php), I could check if the file exists and then act as desired (asking to upload the image for the slideshow, showing a default image or a map instead of the needed main image...). The other extreme is storing the filename in the database, and other option is use the file name convention but store in the database a boolean instead of checking for the existence in the code.
I'm interested on the subjective perspective, but I would really like to know if there are best practices for this situation based on technical and objective reasons, or simply for practical reasons...
Store the image filename in the database with each content record. This is the most flexible option because you can easily change the selected image by updating the database record.
Suppose you add some sort of backend/admin area to manage the content. To change the content's main image you can show a dropdown of files in the images folder (and a file upload option) and easily update the record to the chosen image.
If you want a slideshow of content images, you can simply select the image filenames from the table and output <img /> tags pointing to the images.
If you do it without the database, by using a naming convention e.g. content-image-{contentId}.jpg then to change the image you would need to be renaming/deleting files and you would need to cater for different image file extensions.
I do not store images in the database. Instead I store them in a separate folder on disk and maintain a table with name, size, mimetype etc.
My practical reasons for not storing them in the database:
I use mysqldump and then a editor if I want to make changes in the db structure. That is easier without all the binary data inside the dump.
My database server runs on a fast 128GB SSD SATA 600 disk for performance. The space is limited. The images folder is mounted from a NAS storage, that is 12TB in size.
When a browser needs a images, it is not loaded with the html, but in a separate request. When delivering html there is no need that the image comes from a lighting fast storage device.
Related
I have a simple image upload form. When someone uploads an image, it is for a football pool, so there always is a $poolid that goes with the image they upload.
Right now, I am naming the uploaded image using the poolid. So for example, if someone uploads an image, it might get named P0714TYER7EN.png.
All the app will ever do is, when it outputs the football pool's page, it will check to see if an image exists for that pool and if so, it will show it. It checks like this:
if (file_exists("uploads/".$poolid.".png")) { //code to show it }
My first thought when planning this was to add a field called "image" in my MYSQL database's table for all the pool information (called pools) and I would store a value of either the image name (P0714TYER7EN.png) or empty if there wasn't one uploaded. Then I would check that field in the database to determine if an image exists or not.
But I realized I don't really need to store anything in the database because I can simply use the PHP file_exists check above to know if there is an image or not.
In other words, it would seem redundant to have a field in the database.
Everything works doing it this way (i.e. NOT having a field in the database) but I'm wondering if this is bad practice for any reason?
If anyone feels that I should absolutely still have a field in the database, please share your thoughts. I just want to do it the proper way.
Thank you.
The approach could depend a lot on what exactly you're trying to do. Seems like the options you would have is:
File System Only
Benefits would be the speed of accessing static files of an image and use of it in your HTML directly which makes it a more simple solution. Also if you're comfortable with using these functions it will be faster to finish.
Drawbacks would be that you're limited to using file_exists and similar. Any code to manage files this way has to be very specific and static. You also can not search or perform operations efficiently on this. In general relying on the file system alone is not a best practice from my experience.
Database Only
Benefits, you can use Blob type as a column with meta data like owner, uploader, timestamp, etc. in the same row. This makes checking for existing files faster as well as any searching or other operations fast and efficient.
Drawbacks, you can't serve files statically using a CDN or even a cookie-less subdomain or other strategies for page performance. You also have to use PHP and MySQL to generate then serve any images via code rather than just referring to the image file directly.
Hybrid
Benefits, basically the same benefits as both above. You can have your metadata in MySQL with a MD5 hash and location of the file available as well. Your PHP then renders the page with a direct link to the file rather than processing the Blob to an image. You could use this in conjunction with a CDN by prefixing or storing the CDN location as well.
Drawbacks, if you manually changed names of files on the server you'd have to rely on a function matching hashes to detect this, though this would also affect a File System Only that needs to detect a duplicate file potentially.
TLDR; the Hybrid approach is what you'll see most software use like WordPress or others and I believe would be considered a best practice while file system only is a bit of a hack.
Note: Database only could be a best approach in specific situations where you want database clustering and replication of images directly in your database rather than to a file system (especially if the file system is restricted access or unable to be modified for any reason, then you have full flexibility on the DB).
You can also use the blob datatypes from mysql. There you can save the image as binary data next to the data about the football pool.
So when you want to load an football pool you simple fire an sql statement and check if it returns a result, if so load the image from the database and display the data, otherwise throw an error.
If you have very frequent access you can simply put the images into a seperate table and load the image independent of the data about the football pool. Additional set some cache headers on the image and put it in a seperate file, this way you could simply save the primary key of the images in football table. Then you want to display the web page you simply load another document, pass it the primary key of the image, there the image will be loaded, or if the browser has it in cache, will load it from cache without querying the database.
This way you also have a better consistency of data and images.
Your uploading an image to specific folder and that too with poolid which will be unique. It should work just fine.
Problem :
The code you have written works great. But the problem is, for the first time if the image loaded is .png and second time loaded file in jpeg or jpg then file exists wont check that and hence it may fail.
Caution :
If you have already taken a caution to check that the image uploaded must and should be png than the file_exists will work great.
Alternate Solution :
In case if your not checking for the image type to be .png then I highly advice you to take a boolean image column in your table by is_image_uploaded or something which can be set once you upload the file every time.
This makes sure that in case next time you wan to upload the image then you can directly go and check in your database table and see that if is_image_uploaded column is set or not. If not set then upload or else ignore or do whatever you want
Lets say I'm building a image gallery using PHP, where users would be able to upload their photos.
Every user would have 1 folder on server side with all their images there.
Now lets say I need to provide information in browser. Users would be able to browse images and should see lots of information about them, like image size, image dimensions, even EXIF information etc.
I could do this in 2 ways:
Save all information about image into database when uploading image.
Use PHP functions to browse through folder, and get information from every image.
I have something like file manager class, that can do all manipulations with files on server side, like deleteDir, deleteFile, countItems, getFileSize, getDirSize.
And it would be easy to only write one more class that would inspect images, and then I could just upload images, and get their information right from the folders without a need for relation database.
And now the question you all been waiting for is: ... :)
What would be faster, first or second solution? Lets say that site gets loads of traffic.
What solution would be better if I want it to be fast, and not to stress server to much?
actually, I got this situation like yours, this is my solution:
Save all information about image into database when uploading image.
Why?
I tested 2 ways:
Using php to get the image info for 1000 times.
Getting image info from database for 1000 times.
And the result is :
Getting image info from database is faster and faster.
Last but not least:
What would you do if you want to do a image info analystics?
If you save all info in database ,you can easily get them and analyse them ,but if you using php to get the info? it's hard to image.
So, just save all information about image into database when uploading image.
Good luck.
storing it in the database once
reading the data from the database and store it in cache,
redoing things always costs especially if it happens all the time
Depending on the size of these images, you probably want to show thumbnails instead of the original when people are browsing, which means you need to generate them. I would generate the thumbnail on upload and grab all the file info. Then save the file info in the database and put the original and thumbnail in the file system. If you get a lot of traffic, throw memcache on there too.
Storing data in separate places has a way of creating maintenance headache. I would just serialize the metadata for images in each folder and dump it to a file there. If you use gzip compression on the file, retrieval and storage should be very fast.
I have no idea how the big websites save the pictures on their servers. Could any one tell me how do they save the pictures that are uploaded by the users in their database?
I was thinking, maybe they would just save the file(the picture) in some path and just save that path in the databse is that right?
But I want to do it this way. Is this right? For example, a website named www.photos.com. When a user uploads a picture I would create a folder of the user name and save those pictures in that folder.
I believe we can create a directory using php file concepts. So when a new user uploads his picture or file, I want to create a directory with his name.
Example: if user name is john, I would create a directory like this on photos.com www.photos.com/john/ and then save all his pictures to this directory when he uploads a picture. Is this the right way to do this?
I have no one here that has good knowledge of saving the files to servers so please let me know how to do this? I want to do it the correct and secure way.
All big websites don't save pictures to the database they store them in the disk.
They save a reference to the picture's position in a table. And then link from there.
Why? Performance.
Pulling heavy content from a database is a huge performance bottleneck. And databases don't scale horizontally that well, so it would mean even a bigger problem. All big sites use static content farms to deal with static content such as images. That's servers who won't care less about your identity.
How do they keep the pictures really private you might ask? They don't.
The picture's link is, in itself, the address and the password. Let's take Facebook, for example. If I store a private picture on my account you should not be able to open it. But, as long as you have the correct address you can.
This picture is private. Notice the filename
10400121_87110566301_7482172_n.jpg
(facebook changes the url from time to time so the link may be broken)
It's non sequential. The only way to get the picture is to know it's address.
Based on a previous user photo you can't guess the next one.
It has a huge entropy so even if you start taking random wild guesses you'll have an extensive amount of failures and, if you do get to a picture, you won't be able to, from there, realize the owners identity which, in itself, is protection in anonymity.
Edit (why you should not store images in a "username" folder:
After your edit it became clear that you do intent to put files on disk and not on the database. This edit covers the new scenario.
Even though your logic (create a folder per user) seams more organized it creates problems when you start having many users and many pictures. Imagine that your servers have 1T disk space. And lets also imagine that 1T is more or less accurate with the load the server can handle.
Now you have 11 users, assume they start uploading at the same time and each will upload more than 100GB of files. When they reach 91GB each the server is full and you must start storing images on a different server. If that user/folder structure is followed you would have to select one of the users and migrate all of his data to a different server. Also, it makes a hard-limit on a user who can't upload more than 1T in files.
Should I store all files in the same folder, then?
No, big-sites generally store files in sequential folders (/000001/, /000002/, etc) having an x defined number of files per folder. This is mainly for file-system performance issues.
More on how many files in a directory is too many?
It is usually a bad idea to store images in your database (if your site is popular). Database is, traditionally, one of main bottlenecks in most any application out there. No need to load it more than necessary. If images are in the filesystem, many http servers (nginx, for example) will serve them most efficiently.
The biggest social network in Russia, Vkontakte does exactly this: store images in the filesystem.
Another big social network implemented a sophisticated scalable blob storage. But it's not available to the public, AFAIK.
Summary of this answer: don't store blobs in the database.
is this the right way to do
Yes.
The only thing I'd suggest to use not name but id.
www.photos.com/albums/1234/ would be okay for starter.
Image management may best be achieved by physically uploading images to the server and then recording file location and image details in a database. Subsequently, a Search Form could be configured to permit the user to do a text search, part number search, or other queries. A PHP script could be written to produce a valid HTML image tag based on data found in the table.
uploading images into a MySQLâ„¢ BLOB field is such a bad idea such image data is generally problematic if the images are much larger than thumbnails. If the images are large, you can end up having to copy/paste one SQL INSERT statement at a time (into phpMyAdmin). If the images are large and the SQL INSERT statement is broken into two lines by your text editor, you'll never be able to restore the image.
What are some ideas out there for storing images on web servers. Im Interacting with PHP and MySQL for the application.
Question 1
Do we change the name of the physical file to a000000001.jpg and store it in a base directory or keep the user's unmanaged file name, i.e 'Justin Beiber Found dead.jpg'? For example
wwroot/imgdir/a0000001.jpg
and all meta data in a database, such as FileName and ReadableName, Size, Location, etc.
I need to make a custom Filemanager and just weighing out some pros and cons of the underlying stucture of how to store the images.
Question 2
How would I secure an Image from being downloaded if my app/database has not set it to be published/public?
In my app I can publish images, or secure them from download, if I stored the image in a db table I could store it as a BLOB and using php prevent the user from downloading it. I want to be able to do the same with the image if it was store in the FileSystem, but im not sure if this is possible with PHP and Files in the system.
Keeping relevant file names can be good for SEO, but you must also make sure you don't duplicate.
In all cases I would rename files to lowercase and replace spaces by underscores (or hyphens)
Justin Beiber Found dead.jpg => justin_beiber_finally_dead.jpg
If the photo's belongs to an article or something specific you can perhaps add the article ID to the image, i.e. 123_justin_beiber_found_dead.jpg. Alternatively you can store the images in an article specific folder, i.e. /images/123/justin_beiber_found_dead.jpg.
Naming the files like a0000001 removes all relevance to the files and adds no value whatsoever.
Store (full) filepaths only in the database.
For part 2;
I'm not sure what the best solution here is, but using the filesystem, I think you will have to configure apache to serve all files in a particular directory by PHP. In PHP you can then check if the file can be published and then spit it out. If not, you can serve a dummy image. This however is not very efficient and will be much heavier on apache.
I'm creating a blog with a featured image on each post. I have a dilemma, I'm unsure what to do with my image data...
Should I insert image data into my MYSQL database using BLOB?
Or should I just create an uploader which makes a directory into the users images folder and upload the photo that way...then just reference it directly in the image field when adding a Blog Post?
Is there a standardised way?
Kind regards,
adam
Upload the files to your server and save the location of the file in your database. Less strain on your DB and your HTTP daemon is better at serving images than MySQL.
The general approach is not to store files in DB, unless you understand why do you need it to be stored there. So, since you are not sure, it's much simplier storing them in upload folder.
But, just in case you decide you need storing files (no matter images or some other) in DB, you have to declare BLOB field and then save it using some BLOB-supporting DB mechanism. 'PHP's MySQLi extension: Storing and retrieving blobs' is a good example of how it can be made
You should store images in folder. Click on below link from where you can get idea how to crop different-different size images and store images name in to database table:
How can I upload images in a normal insert form (MySql)? after upload the image should have three versions of different sizes and different names
convert the image data to base64. This can be done within PHP:
<?
$image=file_get_contents("image.png");
$image=base64_encode($image);
?>
Storing images in a DB is a good idea for secure images.
Always store images, music files etc in system files on disk, and then store the urls to them in the database. That will make it
1) faster
2) easier to configure security settings
3) better in any ways I can imagine
Disadvantage
If file system is corrupted you will have hard time recovering.
You can also use third party Image hosting sites too, you can use Amazon S3 or Mosso Cloud Files.
Problem with file system is it is difficult to scale.
Facebook uses cassandra to store images.
Since it is blog you can store images in filesystem.
Both are valid approaches.
They have different advantages/disadvantages.
Storing it in the database means you need to add extra code to change the image to a representation which will fit inside a INSERT/UPDATE statement (base64 is one approach, and requires equivalent decode, but you could just use mysql_real_escape_string()). Although you can't query the image directly (other than finding exact matches) it may reduce the number of seek and I/O operations required to retrieve the data compared with looking up the path in the database then retrieving the file.
It's also a lot simpler to set up replication of a database compared with setting up replication of the database AND the filesystem if you run on multiple nodes. And there's the issue og keeping filesystem and database backups synchronized.
OTOH, using a filesystem makes your data tables much smaller, and therefore faster to retrieve records from.
which makes a directory into the users images folder
You certainly don't want to allow users to upload content directly into your webserver's document tree - regardless of which route you take, the data should be stored in a location not directly accessible by the webserver but accessible by your code.