I've been reading a lot about dynamic image manipulation, storage and content delivery, the company I'm working for already uses AWS for some of their services.
The application I'm working on, store document images to a S3 bucket (not limited to), and i need to display them on demand.
The first version of this application, stored the images locally and performs the image manipulation on-demand on the same server.
Now, the documents storage has increased and a lot of images are being stored, all this via web application, this means that one user may upload say 100+ images and the server needs to process them as fast as it can.
That's why the images are uploaded to an EC2 instance and they are streamed to a S3 bucket internally, that's how we save the original image in the first place, no thumbnails here to speed up the uploading process.
Then a different user may want to preview this images or see them in original size, this is why i need to dynamically re-size them, i will implement Cloudfront for the image caching after they are re-sized, and here comes the issue.
The workflow is like this:
1. User Request CDN image
2.a Cloudfront Serves the cached image
2.b Cloudfront request the image to a custom origin if its not cached
3. The origin server query S3 for the image
4.a If the image size exists on S3
5. Return the image to Cloudfront, Cache and return to user
4.b If the image size does not exists on S3
5. Generate a image size from the original S3 image
6. Save the new size to S3
7. Return the new size to Cloudfront, Cache and return to user
The custom origin is responsible of creating the missing image size and save it to S3, the Cloudfront can use the cached image or request this new image size to S3 as it now exists.
I think this is possible, as i already read a lot of documentation about it, but i still haven't found documentation of someone who has made this before.
Does this looks like a good way of handle the image manipulation, has anyone saw any documentation about how to do this.
I'm a PHP developer but i might be able to implement a non-PHP solution in favor of performance on the image server.
If you are open to non-PHP based solutions https://github.com/bcoe/thumbd is a good option since it already integrates S3, etc. However you will need to know the sizes you need ahead of time. I would recommend such an approach rather than generating sizes on the fly since it means faster response times for your user. Your user will not have to wait while the new size is being generated. Storage on S3 is incredibly cheap and so you will not be wasting any $$ by creating multiple sizes either.
Related
I have a quiz site which creates images from a set of source images, the result images are stored in S3 and i don't care about it. My question is about source images, S3 or EFS is better for storing the source images for this purpose. I am using php to create result images.
Here's a general rule for you: Always use Amazon S3 unless you have a reason to do otherwise.
Why?
It has unlimited storage
The data is replicated for resilience
It is accessible from anywhere (given the right permissions)
It has various cost options
Can be accessed by AWS Lambda functions
The alternative is a local disk (EBS) or a shared file system (EFS). They are more expensive, can only be accessed from EC2 and take some amount of management. However, they have the benefit that they act as a directly-attached storage device, so your code can reference it directly without having to upload/download.
So, if your code needs the files locally, the EFS would be a better choice. But if you code can handle S3 (download from it, use the files, upload the results), then S3 is a better option.
Given your source images will (presumably) be at a higher resolution than those you are creating, and that once processed, they will not need to be accessed regularly after while (again, presumably), I would suggest that the lower cost of S3 and the archiving options available there means it would be best for you. There's a more in depth answer here:
AWS EFS vs EBS vs S3 (differences & when to use?)
I have a bunch of images in Amazon S3 that I need to physically rotate. I currently do this by downloading the image to my server, rotating it using GD and overwriting it back to S3.
This process takes ~5 secs per image. I was wondering if there is any AWS API or such that can do this rotation directly in S3, preferably as in a batch mode?
I would appreciate it if anyone who has any experience with that kind of stuff can give me any pointers!
There is no way to rotate an image 'on' S3. Any method you employ is going to have to read the file from S3, do the rotation, and write it back to S3.
If the server you are doing it on now is not an EC2 instance, than its worth a try to do it there - the latency will be reduced quite a bit. Lambda is another option for you in that it will run within the AWS infrastructure, so network overhead will be reduced.
Not quite sure what your constraints might be, but if you're preparing the images for a web page - you could rotate them client-side using CSS. That would prevent the additional calls to S3, and eliminate processing load on your application server.
img {
transform: rotate(90deg);
}
I have a laravel php app were a user is going to upload an image. This image is going to be converted into a number of different sizes as required around the application and then each image is going to be uploaded to aws s3.
When the user uploads the image php places it in /tmp until the request has completed if it hasnt been renamed. I am planning on pushing the job of converting and uploading the versions to a queue. What is the best way to ensure that the image stays in /tmp long enough to be converted and then uploaded to s3
Secondly where should I save the different versions so that I can access them to upload them to s3 and then remove them from the server(preferably automatically)?
I would create a new directory and work on it. tmp folder is flushed every now and then depending on your system.
As for different sizes, i would create separate buckets for each size which you can access with whatever constant you use to store the image (ex: email, user id, etc..).
I have another issue with amazon and its related to file uploads.I am using jqueryFileUpload and amazon API's to uplaod files to amazon S3.I have succeeded in uploading it,but it involves a trick.
I had to store the image on my server and then move it to S3 from there using putObjectFile method of S3.Now the plugin comes with great functions to crop/resize images and I have been using them since long.Now when I integrate the plugin with AWS,i am facing performance issues with upload.The time taken for uploads is longer than normal and this raises questions of us using AWS S3 over traditional way.
I had to make changes to my UploadHandler.php file to make it work.These are the changes made.i added a part of AWS upload code to the file from line 735 to 750
$bucket = "elasticbeanstalk-2-66938761981";
$s3 = new S3(awsAccessKey, awsSecretKey);
$response = $s3->putObjectFile($file_path,$bucket,$file->name,S3::ACL_PUBLIC_READ);
$thumbResponse = $s3->putObjectFile('files/thumbnail/'.$file->name,$bucket,'images/'.$file->name,S3::ACL_PUBLIC_READ);
//echo $response;
//echo $thumbResponse;
if ($response==1) {
//echo 'HERER enter!!';
} else {
$file->error = "<strong>Something went wrong while uploading your file... sorry.</strong>";
}
return $file;
Here is a link to s3 class on git.
The normal upload to my current server(not amazon),same image uploads in 15 secs,but on amazon S3 it takes around 23 secs and I am not able to figure out a better solution.I have to store the image on my sever before uploading to S3 as I am not sure if I can process them on the fly and upload directly to S3.Can anyone suggest the right way to approach the problem?Is it possible to resize the images to different sizes in memory and upload directly to S3 avoiding the overhead of saving it to our server?If yes can anyone guide me in the right direction?
Thank you for the attention.
I believe the approximate 8secs is the overhead here for creating versions of image in different sizes.
You may take different approaches to get rid of the resizing overhead at time of upload. The basic idea will be to allow the uploading script to finish execution and return the response, and do the resizing process as a separate script.
I like to suggest following approaches:
Approach 1. Don't resize during the upload! Create resized versions on-the-fly only when it is being requested for the first time and cache the generated images to serve directly for later requests. I saw a few mentions of Amazon CloudFront as a solution in some other threads in Stackoverflow.
Approach 2. Invoke the code for creating resized versions as a separate asynchronous request after the upload of original image. There will be a delay in scaled versions being available. So write necessary code to show some place holder images in the website until the scaled versions become available. You will have to figure out some way to identify whether scaled version is available yet or not(For example check file is existing, or set some flag in database). Some ways for making asynchronous cURL requests are suggested here if you would like to try it out.
I think both approaches will have equal level of complexity.
Some other approaches are suggested as answers for this other question.
Currently I am looking to move my websites images to a storage service. I have two websites developed in PHP and ASP.NET.
Using Amazon S3 service we can host all our images and videos to serve web pages. But there are some limitations using S3 service when we want to serve images.
If website needs different thumbnail images with different sizes from original image, it is tough. We have again need to subscribe for EC2 also. Though the data transfer from S3 to EC2 is free, it takes time for data transfer before processing image resize operation.
Uploading number of files in zip format and unzipping in S3 is not possible to reduce number of uploads.
Downloading multiple files from S3 is not possible in case if we want to shift to another provider.
Image names are case sensitive in S3. Which will not load images if image name does not match with request.
Among all these first one is very important thing since image resize is general requirement.
Which provider is best suitable to achieve my goal. Can I move to Google AppEngine only for the purpose of image hosting or is there any other vendor who can provide above services?
I've stumbled upon a nice company called Cloudinary that provides CDN image storage service - they also provide a variety of ways that allow on the fly image manipulation (Cropping will mainly concern you as you we're talking about different sized thumbnails).
I'm not sure how they compete with other companies like maxcdn in site speed enhancement but from what I can see - they have more options when it come to image manipulation.
S3 is really slow and also not distributed. Cloudfront in comparison is also one of the slowest and most expensive CDNs you can get. The only advantage is that if you're using other AWS already you'll get one bill.
I blogged about different CDNs and ran some tests:
http://till.klampaeckel.de/blog/archives/100-Shopping-for-a-CDN.html
As for the setup, I'd suggest something that uses origin-pull. So you host the images yourself and the CDN requests a copy of it the first time it's requested.
This would also mean you can use a script to "dynamically" generate the images because they'll be pulled only once or so. Just have to set appropriate cache headers. The images would then be cached until you purge the CDN's cache.
HTH
I've just come across CloudFlare - from what I understand from their site, you shouldn't need to make any changes to your website. Apparently all you need to do is change your DNS settings. Even provides a free option.
If you're using EC2, then S3 is your best option. The "best practice" is to simply pre-render the image in all sizes and upload with different names. I.e.:
/images/image_a123.large.jpg
/images/image_a123.med.jpg
/images/image_a123.thumb.jpg
This practice is in use by Digg, Twitter (once upon a time, maybe not with twimg...), and a host of other companies.
It may not be ideal, but it's the fastest and most simple way to do it. In terms of switching to another provider, you'll likely not do that because of the amount of work to transfer all of the files anyway. If you've got 1,000,000 images or 3,000,000 images, you've still got many megabytes of files.
Fortunately, S3 has an import/export service. You can send them an empty hard drive and they'll format it and download your data to it for a small fee.
In terms of your concern about case sensitivity, you won't find a provider that doesn't have case sensitivity. If your code is written properly, you'll normalize all names to uppercase or lowercase, or use some sort of base 64 ID system that takes care of case for you.
All in all, S3 is going to give you the best "bang for your buck", and it has CloudFront support if you want to speed it up. Not using S3 because of reasons 3 and 4 is nonsense, as they'll likely apply anywhere you go.