Image caching vs. image processing in PHP and S3

Image caching vs. image processing in PHP and S3 - php

Here is the thing. Right now I have this e-commerce web site where people can send a lot of pictures for their products. All the images are stored at Amazon's S3. When we need a thumbnail or something, I check over S3 if there is one available. If not, I process one and send it to S3 and display it on the browser. Every different sized thumbnail gets stored at S3, and checking the thumbnail availability at every request is kind of money consuming. I'm afraid I'll pay a lot once the site starts to get more attention (if it gets...).
Thinking about alternatives, I was thinking on keeping only the original images at S3 and process the images on the fly at every request. I imagine that in that way I would by on CPU usage, but I hasn't made any benchmarks to see how far can I go. The thing is that I wouldn't expend money making requests and storing more images on S3 and I could cache everything on the user's browser. I know it's not that safe to do that, so that is why I'm bringing this question here.
What do you think? How do you think I could solve this?

I would resize at the time of upload and store all version in S3.
For example if you have a larger image ( 1200x1200 ~200kb ) and create 3 resized version ( 300x300, 120x120, and 60x60 ) you only add about 16% or 32kb ( for my test image, YMMV ). Lets say you need to store a million images; that is roughly 30 GB more, or $4.5 extra a month. Flickr reported to have 2 billion images ( in 2007 ) that is ~$9k extra a month, not too bad if you are that big.
Another major advantage is you will be able to use Amazon's CloudFront.

If you're proxying from S3 to your clients (which it sounds like you're doing), consider two optimizations:
At upload time, resize the images at once and upload as a package (tar, XML, whatever)
Cache these image packages on your front end nodes.
The 'image package' will reduce the number of PUT/GET/DELETE operations, which aren't free in S3. If you have 4 image sizes, you'll cut down by 4.
The cache will further reduce S3 traffic, since I figure the work flow is usually see a thumbnail -> click it for a larger image.
On top of that, you can implement a 'hot images' cache that is actively pushed to your web nodes so it's pre-cached if you're using a cluster.
Also, I don't recommend using Slicehost<->S3. The transit costs are going to kill you. You should really use EC2 to save a ton of bandwidth(Money!!).
If you aren't proxying, but handing your clients S3 URL's for the images, you'll definitely want to preprocess all of your images. Then you don't have to check for them, but just pass the URL's to your client.
Re-processing the images every time is costly. You'll find that if you can assume that all images are resized, the amount of effort on your web nodes goes down and everything will speed up. This is especially true since you aren't firing off multiple S3 requests.

Keep a local cache of:
Which images are in S3
A cache of the most popular images
Then in both circumstances you have a local reference. If the image isn't in the local cache, you can check a local cache to see if it is in S3. Saves on S3 traffic for your most popular items and saves on latency when checking S3 for an item not in the local cache.

Related

converting image to all sizes on upload vs resizing on request via php

So I have a platform for users which allows them to upload a fair amount of pictures. At the moment, I have my server resizing and saving all images individually to my CDN (so I can pick the best option in order to reduce load time when a user requests to view it), but it seems very wasteful in regards to server storage.
The images are being converted into resolutions of 1200px, 500px, 140px, 40px and 24px.
What I'm wondering is, would it be more efficient to just save the file at 1200px, then serve it via PHP at the requested size using something like ImageMagick? Would there be any major trade-offs and if so, is it worth it?
What I'm doing right now:
https://v1x-3.hbcdn.net/user/filename-500x500.jpg
An example of what I could do:
https://v1x-3.hbcdn.net/image.php?type=user&file=filename&resolution=500
Cheers.

No it's not, because:
you have a small number of sizes
if you will not use caching (image generation on first request only) you can DDOS yourself (image processing its a cpu affected process)
have to do extra work if will use CDN like Cloudflare for HTTP-caching
It makes sense if you have a lot sizes of images, for example, API that supports multiple Andoid/IOS devices, meaning iphone 3 supports 320x320 image only and if you dont have users with such device, your server never creates such image.
Advice:
During image generation, use optimization it reduces image size with imperceptible loss of quality.

Slowness found when base 64 image select and encode from database

I am working in ionic framework. Currently designing a posts page with text and images. User can post there data and image and all are secure.
So, i use base 64 encoding and save the image in database.
encodeURIComponent($scope.image)
Each time when user request, i select rows from table and display them along with text and decode them.
decodeURIComponent($scope.image)
with HTML "data:image/jpeg;base64,_______" conversion.
Works fine, but take so much time that i expected. Hence, image are 33% bigger size, and totally looks bulgy.
Then i decide to move on file upload plugin of cordova. But i realize, maintain file in this way is so much risk and complected. I also try to save binary data into database. But failed.
Text selecting without base64 data are dramatically reduce time. If it is possible to select image individually in another http call, after selecting other column and display. Is it a right mechanism to handle secure images?

As a rule of thumb, don't save files in the database.
What does the mysql manual have to say about it?
http://dev.mysql.com/doc/refman/5.7/en/miscellaneous-optimization-tips.html
With Web servers, store images and other binary assets as files, with
the path name stored in the database rather than the file itself. Most
Web servers are better at caching files than database contents, so
using files is generally faster. (Although you must handle backups and
storage issues yourself in this case.)
Don't save base4 encoded files in a database at all
Works fine, but take so much time that i expected. Hence, image are
33% bigger size, and totally looks bulgy.
As you discovered, unwanted overhead in encoding/decoing + extra space used up which means extra data transfer back and forth as well.
As #mike-m has mentioned. Base64 encoding is not a compression method. Why use Base64 encoding is also answered by a link that #mike-m posted What is base 64 encoding used for?.
In short there is nothing to gain and much to loose by base64 encoding images before storing them on the file system be it S3 or otherwise.
What about Gzip or other forms of compression without involving base64. Again the answer is that there is nothing to gain and much to lose. For example I just gzipped a 1941980 JPEG image and saved 4000 bytes that's 0.2% saving.
The reason is that images are already in compressed formats. They cannot be compressed any further.
When you store images without compression they can be delivered directly to browsers and other clients and they can be cached. If they are compressed (or base64 encoded) they need to be decompressed by your app.
Modern browsers are able to display base64 images embedded to the HTML but then they cannot be cached and the data is about 30% larger than it needs to be.
Is this an exception to the norm?
User can post there data and image and all are secure.
I presume that you mean a user can download images that belong to him or shared with him. This can be easily achieved by savings the files off the webspace in the file system and saving only the path in the database. Then the file is sent to the client (after doing the required checks) with fpassthru
What about when I grow to a 100000 users
How they take care about images file. In performance issue, when large
user involved, it seams to me, i need 100000 folder for 100000 user
and their sub folder. When large amount of user browse same root
folder, how file system process each unique folder.
Use a CDN or use a file system that's specially suited for this like BTRFS
Database has good searching facility, good thread safe connection, good session management. Is this scenario changed when large operation involved
Yes Indeed. Use it to the fullest by saving all the information about the file and it's file path in the database. Then save the file itself in the file system. You get best of both worlds.

Since it's just personal files, your could store them in S3.
In order to be safe about file uploads, just check the file's mime type before uploading for whatever storage you choose.
http://php.net/manual/en/function.mime-content-type.php
just run a quick check on the uploaded file:
$mime = mime_content_type($file_path);
if($mime == 'image/jpeg') return true;
no big deal!
keeping files on the database is bad practise, it should be your last resource. S3 is great for many use cases, but it's expensive for high usages and local files should be used only for intranets and non-public available apps.
In my opinion, go S3.
Amazon's sdk is easy to use and you get a 1gb free storage for testing.
You could also use your own server, just keep it out of your database.
Solution for storing images on filesystem
Let's say you have 100.000 users and each one of them has 10 pics. How do you handle storing it locally?
Problem: Linux filesystem breaks after a few dozens of thousands images, therefore you should make the file structure avoid that
Solution:
Make the folder name be 'abs(userID/1000)*1000'/userID
That way when you have the user with id 989787 it's images will be stored on the folder
989000/989787/img1.jpeg
989000/989787/img2.jpeg
989000/989787/img3.jpeg
and there you have it, a way of storing images for a million users that doesn't break the unix filesystem.
How about storage sizes?
Last month I had to compress a 1.3 million jpegs for the e-commerce I work on. When uploading images, compress using imagick with lossless flags and 80% quality. That will remove the invisible pixels and optimize your storage. Since our images vary from 40x40 (thumbnails) to 1500x1500 (zoom images) we have an average of 700x700 images, times 1.3 million images which filled around 120GB of storage.
So yeah, it's possible to store it all on your filesystem.
When things start to get slow, you hire a CDN.
How will that work?
The CDN sits in front of your image server, whenever the CDN is requested for a file, if it doesn't find it in it's storage (cache miss) it will copy it from your image server. Later, when the CDN get's requested again, it will deliver the image from it's own cache.
This way no code is needed to migrate to a CDN image deliver, all you will need to do is change the urls in your site and hire a CDN, the same works for a S3 bucket.
It's not a cheap service, but it's waaaaay cheaper then cloudfront and when you get to the point of needing it, you can probably afford it.

I would suggest you to continue with base64 string only, you can use LZ string compression technique to reduce the string size. I've been using and it's working pretty well.
I don't know how am I near to your question, but hope this will help you out.
Here is LZ compression technique : https://github.com/pieroxy/lz-string/

Images upload with PHP: how to optimize space, bandwidth and performance

I'm planning to develop an area where the users can upload pictures. I know how to upload a picture on the server using PHP but the problem is what is the best practice to develop a performing system.
The idea is to display in different pages thumbs and I would like to know if it's a better idea to save two different images (thumb + original) on the server or if it's better to save just the original and create all the thumbs on the fly. Thumb + original means more space on the server, whereas the option "thumbs on the fly" means most likely a server overload.
I found couple of good scripts to resize and cropping on the fly but not sure if it's a good idea to use especially if the web site has few thousands visitor per day (or may be more in the future just to be optimistic/pessimistic).

Absolutely generate and save the thumbnails on disk. Storage is cheap.

You can generate some thumbnails and save them on disk but in the long term that's problematic due to different devices needing different sizes, different formats, etc.
If you are already saving the uploaded images on S3, Azure Storage, or Google Cloud I recommend to use some on the fly image processing service like imglab or cloudinary.
With these services you can generate many different types of cropping, and serving them in different (modern) formats like WebP or AVIF so you don't need to generate them before hand. SEO will be improved wit this option too.
Additionaly images will be behind a global CDN so users will get the images in a fast way independent or their location.

Importing, resizing and uploading millions of images to Amazon S3

We are using PHP with CodeIgniter to import millions of images from hundreds of sources, resizing them locally and then uploading the resized version to Amazon S3. The process is however taking much longer than expected, and we're looking for alternatives to speed things up. For more details:
A lookup is made in our MySQL database table for images which have not yet been resized. The result is a set of images.
Each image is imported individually using cURL, and temporarily hosted on our server during processing. They are imported locally because the library doesn't allow resizing/cropping of external images. According to some tests the speed difference when importing from different external sources have been between 80-140 seconds (for the entire process, using 200 images per test), so the external source can definitely slow things down.
The current image is resized using the image_moo library, which creates a copy of the image
The resized image is uploaded to Amazon S3 using a CodeIgniter S3 library
The S3 URL for the new resized image is then saved in the database table, before starting with the next image
The process is taking 0.5-1 second per image, meaning all current images would take a month to resize and upload to S3. The major problem with that is that we are constantly adding new sources for images, and expect to have at least 30-50 million images before the end of 2011, compared to current 4 million at the start of May.
I have noticed one answer in StackOverflow which might be a good complement to our solution, where images are resized and uploaded on the fly, but since we don't want any unnecessary delay when people visit pages, we need to make certain that as many images as possible are already uploaded. Besides this, we want multiple size formats of the images, and currently only upload the most important one because of this speed issue. Ideally, we would have at least three size formats (for example one thumbnail, one normal and one large) for each imported image.
Someone suggested making bulk uploads to S3 a few days ago - any experience in how much this could save would be helpful.
Replies to any part of the question would be helpful if you have some experience of similar process. Part of the code (simplified)
$newpic=$picloc.'-'.$width.'x'.$height.'.jpg';
$pic = $this->image_moo
->load($picloc.'.jpg')
->resize($width,$height,TRUE)
->save($newpic,'jpg');
if ($this->image_moo->errors) {
// Do stuff if something goes wrong, for example if image no longer exists - this doesn't happen very often so is not a great concern
}
else {
if (S3::putObject(
S3::inputFile($newpic),
'someplace',
str_replace('./upload/','', $newpic),
S3::ACL_PUBLIC_READ,
array(),
array(
"Content-Type" => "image/jpeg",
)))
{ // save URL to resized image in database, unlink files etc, then start next image

Why not add some wrapping logic that lets you define ranges or groups of images and then run the script several times on the server. If you can have four of these processes running at the same time on different sets of images then it'll finish four times faster!
If you're stuck trying to get through a really big backlog at the moment you could look at spinning up some Amazon EC2 instances and using them to further parallelize the process.

I suggest you split your script into 2 scripts which run concurrently. One would fetch remote images to a local source, simply doing so for any/all images that have not yet been processed or cached locally yet. Since the remote sources add a fair bit of delay to your requests you will benefit from constantly fetching remote images, not only doing so as you process each one.
Concurrently you use a second script to resize any locally cached images and upload them to Amazon S3. Alternately you can split this part of the process as well using one script for resizing to a local file then another to upload any resized files to S3.
The first part (fetch remote source image) would greatly benefit from running multiple concurrent instances like James C suggests above.

PHP/JS - Create thumbnails on the fly or store as files

For an image hosting web application:
For my stored images, is it feasible to create thumbnails on the fly using PHP (or whatever), or should I save 1 or more different sized thumbnails to disk and just load those?
Any help is appreciated.

Save thumbnails to disk. Image processing takes a lot of resources and, depending on the size of the image, might exceed the default allowed memory limit for php. It is less of a concern if you have your own server with only your application running but it still takes a lot of cpu power and memory to resize images. If you're considering creating thumbnails on the fly anyway, you don't have to change much - upon the first request, create the thumbnail from the source file, save it to disk and upon subsequent requests just read it off the disk.

I use phpThumb, as it's the best of both worlds. You can create thumbnails on the fly, but it automatically caches the images to speed up future requests. It creates a nice wrapper around the GD and ImageMagick libraries. Worth a look!

It would be much better to cache the thumbnails. Generating them on the fly would be very taxing on the system.

It depends on the usage pattern of the site, but, basically, how many times do you expect each image to be viewed?
In the case of thumbnails, they're most likely to be around for quite a while (the image is uploaded once and never changed, so the thumbnail doesn't change either), so it's generally worthwhile to generate when the full image is uploaded and store them for later. Unless the site is completely dead, they'll be viewed many (hundreds or thousands of) times over their lifetime and disk is a lot cheaper than latency these days. This also becomes more significant as load on the server increases, of course.
Conversely, for something like stock charts that get updated every hour (if not more frequently), that would be a situation where you'd do better to create them on the fly, so as to avoid wasting CPU time on constantly generating images which no user will ever see.
Or, if you want to get fancy, you can optimize to handle either access pattern by generating the images on the fly the first time they're needed and then showing the pre-generated one afterwards, up until the data it's generated from changes, at which point you delete it so that it will be regenerated the next time it's needed. But that would be overkill for something as static as thumbnails, IMO.

check out the gd library and imagemagick

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.