I will try to keep this simple without lots of extra information. I have been investigating MongoDB and I believe it will work well for my next project. There is one thing I am fuzzy about though: storing and retrieving files (chunks) from GridFS.
Lets take a CMS for example. If I wanted to display (output via the browser) an image, I would go my MySQL database, find the key and pull the metadata that would include the file source on the File System and then display an image tag with that href. I know that I can store image/video/etc files in Mongo beautifully and I can retrieve binary, but if I wanted to display that file (push it to the browser) would I have to write the contents to a temporary file and then echo my img tag with the href? That can't be more efficient.
I feel like I'm missing something. For this circumstance, is MongoDB any better?
For clarification: I'm using PHP and Apache on a typical LAMP stack (development not production) and working on a platform to enable creative collaboration between artists. So, I would have several artists collaborating on the same files, and I would like to be able to search inside those files, index them, keep all metadata together and employ sharding. I really seems like MongoDB is the way to go.
Thanks!
Michael
If you want to store binary files in the database, you can serve them directly without writing them to a filesystem first. All you have to do is write a PHP script (or whatever) to send the data in response to an incoming HTTP request, with the proper media type in the response HTTP header.
So if we're talking about images, in your HTML you'd do something like this:
<img src="/your_img_script.php?image=1234"></img>
Then just write your_img_script.php to lookup image ID 1234 and dump the data out with a image/jpeg (or whatever) content type header.
Related
I have thousands of Images (wihch are less then 16MB in size) in RAW BINARY DATA in MONGODB with its Meta-Data in JSON as Date, Time, Location etc from the Small Satellite (BSON Documents). I have to make REST API which can query the Images with its respective Meta-Data. Following things needs to be taken under observation.
Data = Data Screenshot
User's will query the Meta-Data with RestAPI, based on time, location etc.
Server will get the Request from Cilent with Query and DO Image-Processing and Returns the Images
Image Processing will be done on Server Side.
Requested Images will travel through RESTAPI from Server to Client with the GET Request.
NOTE : Just see the Attached Picture to Get the Idea of the DATA.
Tools Used : Data-Base = MongoDB
Questions
Which Server Side Programming Language is More feasible? PHP, Python or Node.js?
How I could do Image-Processing in this scenario? With Libraries on PHP, Python or Node.js?
Which Technology to be Used for making REST API for MongoDB which is best with Binary Data and Images.
How Images will travel from Server to client i-e In binary data. and then Renders at Client Side.
NodeJS and Python are equally suited for basic image processing. I would make a server side language determination more on the expertise of your team and/or current environment in that regard.
With Python PIL or Pillow is the primary image library to use. I've used https://github.com/aheckmann/gm for NodeJS which was not difficult to do basic image processing.
(and 4) In terms of retrieving images from an API I typical using a typical REST/CRUD set up to get the metadata and put the image one level deeper using by adding .../image? to the endpoint. For example:
GET ../picture/<id>
Would return the picture metadata with the image url included.
GET ../picture/<id>/image?<processing params>
Would return the image itself.
If you are making a web application using the <img> tag and the correct image URL is sufficient for displaying the image.
I would also recommend storing the images on the filesystem directly (not in the database) unless you have a specific reason to store them in the database. It tends to simplify both you storage code and retrieval since you don't need to deal with sending a BLOB to the database. Nginx (https://www.nginx.com/resources/wiki/modules/upload/) for example lets you set file upload location so literally just need to get the filename from the headers and copy/rename it to the location you want to store the file permanently. This also lets you easily remap a URL to the location on the filesystem. The biggest benefit is that it lets the webserver worry about the upload and download and you just have keep track of the filename in your code.
According to your questions, I will give you answer one by one.
nodejs is better as they have more then 15000 modules.
you can do image processing by using nodejs modules like sharp, Jimp and many more.
MongoDB is a document-oriented NoSQL database (Big Data ready). It stores data in JSON-like format and allows users to perform SQL-like queries against it, and nodejs is best for this purpose.
for this there are number of modules for travels from server to client, like webcamjs.
This is going to sound like an odd request.
I have a PHP script pulling a mp3 stream from SoundCloud and repeating the stream with the correct headers to allow WinAmp to play the file. But it only shows the local url I have the script running from. Before anyone asks, I am injecting ID3v1 into the file before echoing it.
Is there any way to provide WinAmp with the meta data from php?
Just to clarify, you are effectively proxying an MP3 file from SoundCloud, and you want to embed metadata into it?
Winamp will pick up ID3 tags in an HTTP-served MP3 file. However, if you are using ID3v1, those tags don't exist until the very end of the file. If you want the file to be identified without having to download the whole file, you must use ID3v2 which are typically located at the beginning of the file. (I actually recommend using both ID3v1 and ID3v2 for broader player compatibility, but almost everything supports ID3v2, so it is your choice.)
Now, there is another method but if you use this method the metadata won't be saved in the file when downloaded. You can use SHOUTcast-style metadata. Basically, Winamp and other clients (like VLC) send a request header, Icy-MetaData: 1. This tells the server that it supports SHOUTcast-style metadata. In your server response, you would insert metadata every 8KB or so. Basically, you want the reverse of what I have detailed here: https://stackoverflow.com/a/4914538/362536
In the end, simply adding ID3v2 tags will solve your problem in the best way, but I wanted to mention the alternative option in case you needed it for something else.
I need a simple code to upload images to mySQL using PHP... short! snippet... and is it possible to upload an html, css file to mySQL?... its reason is complicated but all answers are appreciated!... EDIT:: say I have 1000 users.. and they each have their own layout for their page.. So inside their MYSQL record will be a html file, css file(possibly), and image(s)...
I am a big fan of using a filesystem for storing physical files, i've yet to see any solid reason why they are better off in a database.
To automate this process you could have a shell script called through exec
exec("/home/some/path/my_filesystem_creator.sh ".escapeshellarg($args));
or PHP's native mkdir or anything really. If you went for a structure like:
/common/
/userdirs/1/
/userdirs/2/
essentially all i would imagine you would need to do is create a user dir, and copy into it the default versions of their site assets - images/css/html etc.
This should be easy enough to manage
Are you asking how to store a file in the database?
http://www.php-mysql-tutorial.com/wikis/mysql-tutorials/uploading-files-to-mysql-database.aspx
Or do you need to know how to upload a file to your web server in order to display it in a PHP/MySQL website?
Your page would be faster, if you generate a directory on your filespace for each user and store their css/js/image files there.
The reason for this is, that when you like to output your images to the browser, you will need to establish an own db connection for each file (since each is an own HTTP request to a PHP file, selecting the image).
You might want to take a look at http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database.html and http://hashmysql.org/index.php?title=Storing_files_in_the_database before doing that. Storing files in mysql is generally considered a bad idea.
Just use different CSS rules for each user. Create the CSS dynamically though PHP based on user-specific variables. For example, if they have a div with an avatar or some other personal image, just create a class that uses variables for images, and then you really only need one or two files at most to do the whole thing. I would use a heredoc, but you could just use quotation marks to integrate the PHP.
php creates your css -
.useravatar{ 'background: url($baseurl.$urseridpic)'}
In the html, the div just needs the class of 'useravatar' never needing to be changed.
Here's a bit of history first: Recently finished an application that allows me to upload images and store them in a directory, it also stores the information of that file in a database. Database stores the location, name and gives it an ID (auto_increment).
Okay, so what I'm doing now is allowing people to insert images into posts. Throwing a few ideas around on the best way to do this, as the application I designed allows people to move files around, and I don't want images in posts to break if an image is moved to a different directory (hence the storing of IDs).
What I'm thinking of doing is when linking to images, instead of linking to the file directly, I link it like so:
<img src="/path/to/functions.php?method=media&id=<IMG_ID_HERE>" alt="" />
So it takes the ID, searches the database, then from there determines the mime type and what not, then spits out the image.
So really, my question is: Is this the most efficient way?
Note that on a single page there could be from 3 to 30 images, all making a call to this function.
Doing that should be fine as long as you are aware of your memory limitations configured by both PHP and the web server. (Though you'll run into those problems merely by receiving the file first)
Otherwise, if you're strict about this being just for images, it could prove more efficient to go with Mike B's approach. Design a static area and just drop the images off in there, and record those locations in the records for their associated post. It's less work, and less to worry about... and I'm willing to bet your web server is better at serving files than most developer's custom application code will be.
Normally, I would recommend keeping the src of an image static (instead of a php script). But if you're allowing users to move them around the filesystem you need a way to track them
Some form of caching would help reduce the number of database calls required to fetch the filesystem location of each image. Should be pretty easy to put an indefinite TTL on the cache and invalidate upon the image being moved.
I don't think you should worry about that, what you have planned sounds fine.
But if you want to go out of your way to minimise requests or whatever, you could instead do the following: when someone embeds an image in a post, replace the anchor tag with some special character sequence, like [MYIMAGE=1234] or something. Then when a page with one or more posts is viewed, search through all the posts to find all the [MYIMAGE=] sequences, query the database to get all of the images' locations, and then output the posts with the [MYIMAGE=] sequences replaced with the appropriate anchor tags. You might or might not want to make sure users cannot directly add [MYIMAGE=] tags to their submitted content.
The way you have suggested will work, and it's arguably the nicest solution, but I should warn you that I've tried something similar before and it completely fell apart under load. The database seemed to be keeping up, but the script would start to time out and the image wouldn't arrive. That was probably down to some particular server configuration, but it's worth bearing in mind.
Depending on how much access you have to the server it's running on, you could just create a symlink whenever the user moves a file. It's a little messy but it'll be fast and reliable, and will also handle collisions if a user moves a file to where another one used to be.
Use the format proposed by Hammerite, and use [MYIMAGE=1234] tags (or something similar).
You can then fetch the id-path mappings before display, and replace the [MYIMAGE] tags with proper tags which link to images directly. This will yield much better performance than outputting images using php.
You could even bypass the database completely, and simply use image paths like (for example) /images/hash(IMAGEID).jpg.
(If there are different file formats, use [MYIMAGE=1234.png], so you can append png/jpg/whatever without a database call)
If the need arises to change the image locations, output method, or anything else, you only need to change the method where [MYIMAGE] tags are converted to full file paths.
You have a forum (vbulletin) that has a bunch of images - how easy would it be to have a page that visits a thread, steps through each page and forwards to the user (via ajax or whatever) the images. i'm not asking about filtering (that's easy of course).
doable in a day? :)
I have a site that uses codeigniter as well - would it be even simpler using it?
assuming this is to be carried out on server, curl + regexp are your friends .. and yes .. doable in a day...
there are also some open-source HTML parsers that might make this cleaner
It depends on where your scraping script runs.
If it runs on the same server as the forum software, you might want to access the database directly and check for image links there. I'm not familiar with vbulletin, but probably it offers a plugin api that allows for high level database access. That would simplify querying all posts in a thread.
If, however, your script runs on a different machine (or, in other words, is unrelated to the forum software), it would have to act as a http client. It could fetch all pages of a thread (either automatically by searching for a NEXT link in a page or manually by having all pages specified as parameters) and search the html source code for image tags (<img .../>).
Then a regular expression could be used to extract the image urls. Finally, the script could use these image urls to construct another page displaying all these images, or it could download them and create a package.
In the second case the script actually acts as a "spider", so it should respect things like robots.txt or meta tags.
When doing this, make sure to rate-limit your fetching. You don't want to overload the forum server by requesting many pages per second. Simplest way to do this is probably just to sleep for X seconds between each fetch.
Yes doable in a day
Since you already have a working CI setup I would use it.
I would use the following approach:
1) Make a model in CI capable of:
logging in to vbulletin (images are often added as attachments and you need to be logged in before you can download them). Use something like snoopy.
collecting the url for the "last button" using preg_match(), parsing the url with parse_url() / and parse_str() and generating links from page 1 to page last
collecting html from all generated links. Still using snoopy.
finding all images in html using preg_match_all()
downloading all images. Still using snoopy.
moving the downloaded image from a tmp directory into another directory renaming it imagename_01, imagename_02, etc. if the same imagename allready exists.
saving the image name and precise bytesize in a db table. Then you can avoid downloading the same image more than once.
2) Make a method in a controller that collects all images
3) Setup a cronjob that collect images at regular intervals. wget -o /tmp/useless.html http://localhost/imageminer/collect should do nicely
4) Write the code that outputs pretty html for the enduser using the db table to get the images.