I was thinking of using imagemagick to process images uploaded by a user in various ways (creating new images that are scaled, have drop shadows, etc.) but I've been worried about the speed. I don't want the user staring at a loading gif forever.
So I started looking around to see how other sites do it and I found http://www.redbubble.com. Users upload artwork and almost instantly there are tons of variations of the image in the shop processed in various ways. What does it use to process and generate images so fast?
it's relatively hard and inconvenient to maintain client-side image processing (it would be some kind of flash app similar to www.picnik.com with limited functionality)
I see use of ruby, nginx, remote xhr calls, json etc. that means that delayed_jobs/resque might be used to schedule asynchronous image processing using imagemagick, json/xhr to check the status. processed images are requested from ih*.redbubble.net (point to edgecastcdn.net) and seems like they produce them on the fly and let CDN cache them until user changes that image or it expires in the cache.
they have ~800k monthly visitors, you don't want to put load on app/web servers to process images, there is either delayed_jobs or resque behind the scene or ih* (image host?) servers that produce images on the fly (there are 4 of them, but who knows how many behind virtual host/proxy configuration)
all upload requests go to amazon (ec2, that might be a load balanced IP), originals are stored on amazon s3. they can scale by requesting more ec2 instances on demand.
hope you get an idea what's behind... back to your question: no client-side images processing, imagemagick is used, and there is a chance they do it on the fly.
Related
I use Jelastic to host a PHP application. Editors can upload pictures through the application that are stored in the file system. These pictures are stored within the document root and are served on the frontend as e.g. http://example.com/uploads/123/picture.jpeg
For the NGinx application server, I have enabled vertical scaling but have a single node, i.e. no horizontal scaling.
Picture uploads are not reliable. When I update a picture #1 through my PHP admin interface, then update another one, picture #1 has changed back to the old picture.
My question: Are picture uploads sync'ed across multiple cloudlets on a single node? What will happen if I scale horizontally to multiple nodes?
My question: Are picture uploads sync'ed across multiple cloudlets on a single node?
I think there is a terminology problem here.
Cloudlet: A composite resource unit composed of RAM and CPU usage. 1 Cloudlet = 128MB RAM and approx. 200MHz CPU. A server (Jelastic refers to this as a 'node') typically uses multiple cloudlets; e.g. it may use several GB RAM and/or several GHz CPU at any given moment.
More details at http://kb.layershift.com/introducing-cloudlets
Each node is a self-contained (virtual) server, with its own filesystem. So if you have a single NGINX PHP application server, it doesn't matter if it uses 1 or 100 cloudlets (remember, this is only a measurement of RAM and CPU consumption!), it has 1 filesystem and all of the files that you successfully write there will be available for any subsequent requests.
What will happen if I scale horizontally to multiple nodes?
Right, you have to be careful here. If your application is writing to the local filesystem, you have a problem when dealing with multiple horizontally scaled servers. This is a very typical scaling problem that every application must deal with.
If we're simply talking about static resources (e.g. images), one of the best and simplest ways to handle this issue is to upload all of those to a single server. For example if you have 4 NGINX PHP servers - let's say they load balance your-application.com - you might make one of those servers (or perhaps a completely separate environment) images.your-application.com
So you perform the uploads to images.your-application.com, and reference that directly in your HTML when you wish to display those uploaded images.
Remember, images.your-application.com is only responsible for serving the actual images; so it's really lightweight and should handle a decent volume simply with vertical scaling - which is completely automatic on Jelastic.
When you need to scale images.your-application.com, the easy way is to take a CDN service (CloudFlare, Incapsula etc.). This will leave images.your-application.com only handling the uploads and the small amount of download traffic which is not already cached at the CDN.
Having the same issue, please read this jelastic tutorial.
In summary, jelastic have a script which help you with sincronization, you just have to execute the script and indicate the folders you want to sync in all nodes.
Then, everytime you upload a file to those folders, in cuestion of seconds or minutes the files will be available for all nodes; the time is depending of the file size.
I am building a web application that allows users to upload audio files, music in particular. Most of the time, I expect the duration of each song to normally be about several minutes and the file to be approximately 3-10MB in size. However, I would like to accept audio uploads up to about 100MB, possibly allowing for over an hour of audio. I am currently using a combination of FFmpeg, SoX, and LAME to convert from 7 possible formats to mp3 and perform audio modifications including equalization, trimming, and fading. The files are then stored and linked in the database.
My current strategy is to handle the entire process in one HTTP file upload request using PHP on the backend, in which I perform the following functions:
Validation
Transcode audio into multiple versions (using shell through PHP)
Store the original and transcoded versions in a temp directory
Upload all audio files to Amazon S3 for permanent storage
Commit the ID of each file to a database, linking them to the user
This works very similar to an image processing system I have already set up. However, while images can complete this whole process in just a few seconds, audio can take a lot longer. At most, audio could take about 5-10 minutes to be processed and stored.
My questions are:
For audio processing, would it be better to fork off the transcoding to another background process, writing its state to the database, and pinging it every few seconds to update the webpage vs. doing it all in one HTTP request?
With the intention of scaling in the future, would it be advisable to do all processing on a single server instance, leaving the frontend web instances free to replicate / be destroyed?
If yes, would this require cross-domain file uploading directly to that server? (Anyone know if this is how youtube or the big sites do it?)
Thanks!
If I understand your system correctly, your best approach is probably something more like this:
In your web front-end, store the audio and create a "task" indicating that the audio needs to be processed.
Run a background task that pulls tasks and does the processing. At the end of the task, the user can be notified (if necessary) and database state can be updated or whatever.
Your tasks should be written so that if they fail partway through, they can be re-executed from the start without causing problems. You can run multiple background tasks and web front-ends in this architecture.
A good way to write tasks is using a message passing system like AMQP. There are cheap services like rabbitmq that will do this for you. You can, of course, also build your own on top of any database, but this may require polling.
Finally, you might find it faster and more efficient to use a service like zencoder to do your transcoding, because they can parallelize the work and probably handle more input formats, but it may not be compatible with your processing.
you definitely want to throw the audio processing to a background process.
Depending on the scalability involved, you might need a computer dedicated to the processing. You might want to look into other resources you can offload audio stuff too (like PCIe cards and such)
Sorry to say I know nothing about cross domain file uploading or how the big dogs do it (youtube, soundcloud ect)
My application requires downloading many images from server(each image about 10kb large). And I'm simply downloading each of them with independent AsyncTask without any optimization.
Now I'm wondering what's the common practice to transfer these images. For example, I'm thinking about saving zipped images at server, then send zipped file for user's mobile to unzip. In this case, is it better to combine the zip files into one big zip file for user to download?
Or there's better solution? Thanks in advance!
EDIT:
It seems combining zip files is a good idea, but I feel it may take too long for user to wait downloading and unzipping all images. So I may put ten or twenty images in each zip file, so user can see some downloaded ones while waiting for more to come. Having multiple AsyncTask fired together can be faster right? But they won't finish at the same time even given same file size and same address to download?
Since latency is often the largest problem with mobile connections, reducing the number of connections you have to open is a great way to optimize the loading times. Sending a zip file with all the images sounds like a very good idea, and is probably worth the time implementing.
Images probably are already compressed (gif, jpg, png). You will not reduce filesize but will reduce the number of connections. Which is a good idea for mobile. If it is always the same set of images you can use some sprite technology (sending one bigger image file containing all the images but with different x/y offset, in html you can use the backround with an offset to show the right image).
I was looking at the sidebar and saw this topic, but you're asking about patching when I saw the comments.
The best way to make sure is that the user knows what to do with it. You want the user to download X file and have Y output for a different purpose. On the other hand, it appears common practice is that chunks of resources for those not native to the Android app and not able to fit in the APK.
A comparable example is the JDIC apps, which use the popular Japanese resource that are in tandem used for English translations. JDIC apps like WWWJDIC use online downloads for the extremely large reference files that would otherwise have bad latency (which have been mentioned before) on Google servers. It's also bad rep to have >200 MB on Google apps unless it is 3D, which is justifiable. If your images cannot be compressed without extremely long loading times on the app itself, you may need to consider this option. The only downside is to request online connection (also mentioned before).
Also, you could use 7zip and program Android to self-extract it to a location. http://www.wikihow.com/Use-7Zip-to-Create-Self-Extracting-excutables
On another note, it would be optimal for the user to perform routine checks on the app while having a one-time download on initial startup. You can then optionally put in an AsyncTask so that your files will be downloaded to the app and used after restart or however you want it, so you really need only one AsyncTask. The benefit of this is that the user syncs on the apps and he may need to check only once. The downside is that the user may not always be able to update and may need to use 4G or LTE, but that is a minor concern if he can use WiFi whenever he wants.
Currently I am looking to move my websites images to a storage service. I have two websites developed in PHP and ASP.NET.
Using Amazon S3 service we can host all our images and videos to serve web pages. But there are some limitations using S3 service when we want to serve images.
If website needs different thumbnail images with different sizes from original image, it is tough. We have again need to subscribe for EC2 also. Though the data transfer from S3 to EC2 is free, it takes time for data transfer before processing image resize operation.
Uploading number of files in zip format and unzipping in S3 is not possible to reduce number of uploads.
Downloading multiple files from S3 is not possible in case if we want to shift to another provider.
Image names are case sensitive in S3. Which will not load images if image name does not match with request.
Among all these first one is very important thing since image resize is general requirement.
Which provider is best suitable to achieve my goal. Can I move to Google AppEngine only for the purpose of image hosting or is there any other vendor who can provide above services?
I've stumbled upon a nice company called Cloudinary that provides CDN image storage service - they also provide a variety of ways that allow on the fly image manipulation (Cropping will mainly concern you as you we're talking about different sized thumbnails).
I'm not sure how they compete with other companies like maxcdn in site speed enhancement but from what I can see - they have more options when it come to image manipulation.
S3 is really slow and also not distributed. Cloudfront in comparison is also one of the slowest and most expensive CDNs you can get. The only advantage is that if you're using other AWS already you'll get one bill.
I blogged about different CDNs and ran some tests:
http://till.klampaeckel.de/blog/archives/100-Shopping-for-a-CDN.html
As for the setup, I'd suggest something that uses origin-pull. So you host the images yourself and the CDN requests a copy of it the first time it's requested.
This would also mean you can use a script to "dynamically" generate the images because they'll be pulled only once or so. Just have to set appropriate cache headers. The images would then be cached until you purge the CDN's cache.
HTH
I've just come across CloudFlare - from what I understand from their site, you shouldn't need to make any changes to your website. Apparently all you need to do is change your DNS settings. Even provides a free option.
If you're using EC2, then S3 is your best option. The "best practice" is to simply pre-render the image in all sizes and upload with different names. I.e.:
/images/image_a123.large.jpg
/images/image_a123.med.jpg
/images/image_a123.thumb.jpg
This practice is in use by Digg, Twitter (once upon a time, maybe not with twimg...), and a host of other companies.
It may not be ideal, but it's the fastest and most simple way to do it. In terms of switching to another provider, you'll likely not do that because of the amount of work to transfer all of the files anyway. If you've got 1,000,000 images or 3,000,000 images, you've still got many megabytes of files.
Fortunately, S3 has an import/export service. You can send them an empty hard drive and they'll format it and download your data to it for a small fee.
In terms of your concern about case sensitivity, you won't find a provider that doesn't have case sensitivity. If your code is written properly, you'll normalize all names to uppercase or lowercase, or use some sort of base 64 ID system that takes care of case for you.
All in all, S3 is going to give you the best "bang for your buck", and it has CloudFront support if you want to speed it up. Not using S3 because of reasons 3 and 4 is nonsense, as they'll likely apply anywhere you go.
I have a general question about this.
When you have a gallery, sometimes people need to upload 1000's of images at once. Most likely, it would be done through a .zip file. What is the best way to go about uploading this sort of thing to a server. Many times, server have timeouts etc. that need to be accounted for. I am wondering what kinds of things should I be looking out for and what is the best way to handle a large amount of images being uploaded.
I'm guessing that you would allow a user to upload a zip file (assuming the timeout does not effect you), and this zip file is uploaded to a specific directory, lets assume in this case a directory is created for each user in the system. You would then unzip the directory on the server and scan the user's folder for any directories containing .jpg or .png or .gif files (etc.) and then import them into a table accordingly. I'm guessing labeled by folder name.
What kind of server side troubles could I run into?
I'm aware that there may be many issues. Even general ideas would be could so I can then research further. Thanks!
Also, I would be programming in Ruby on Rails but I think this question applies accross any language.
There's no reason why you couldn't handle this kind of thing with a web application. There's a couple of excellent components that would be useful for this:
Uploadify (based on jquery/flash)
plupload (from moxiecode, the tinymce people)
The reason they're useful is that in the first instance, it uses a flash component to handle uploads, so you can select groups of files from the file browser window (assuming no one is going to individually select thousands of images..!), and with plupload, drag and drop is supported too along with more platforms.
Once you've got your interface working, the server side stuff just needs to be able to handle individual uploads, associating them with some kind of user account, and from there it should be pretty straightforward.
With regards to server side issues, that's really a big question, depending on how many people will be using the application at the same time, size of images, any processing that takes place after. Remember, the files are kept in a temporary location while the script is processing them, and either deleted upon completion or copied to a final storage location by your script, so space/memory overheads/timeouts could be an issue.
If the images are massive in size, say raw or tif, then this kind of thing could still work with chunked uploads, but implementing some kind of FTP upload might be easier. Its a bit of a vague question, but should be plenty here to get you going ;)
For those many images it has to be a serious app.. thus giving you the liberty to suggest a piece of software running on the client (something like yahoo mail/picassa does) that will take care of 'managing' (network interruptions/resume support etc) the upload of images.
For the server side, you could process these one at a time (assuming your client is sending them that way)..thus keeping it simple.
take a peek at http://gallery.menalto.com
they have a dozen of methods for uploading pictures into galleries.
You can choose ones which suits you.
Either have a client app, or some Ajax code that sends the images one by one, preventing timeouts. Alternatively if this is not available to the public. FTP still works...
I'd suggest a client application (maybe written in AIR or Titanium) or telling your users what FTP is.
deviantArt.com for example offers FTP as an upload method for paying subscribers and it works really well.
Flickr instead has it's own app for this. The "Flickr Uploadr".