I am building a web application that allows the user to upload high res images (in the ballpark of 10mb). After its uploaded it creates a medium sized and thumbnail sized image of the upload. It seems it takes upwards of 100mb to of memory allocated to PHP just to resize to the medium image. I don't have a lot of experience in this type of scalability, will the site easily crash? Will I need webservers with 16gb of memory just to handle the load of the resizing? Are there alternative? Any information would be greatly appreciated!
Thank you!
You could create a queue of images to be resized and ensure that only x number of images are being resized at any given time. x would depend on the amount of available memory.
If you resize the images in real time as soon as they are uploaded, you are bound to run into a situation where more images are being resized than your memory can hold, which would cause a crash.
Instead, as the image are uploaded, add them to a DB. Then have a PHP script which fetches x images from the DB, forks new processes for each of these images to rezize them. As and when a process report completion to the parent, the parent deletes the image's entry from the queue and fetches another. Wash, rinse, repeat.
run benchmarks
do processing in the background
do processing during off-peak hours
use lighter libraries
Related
I'm developing IPS 4. I have profile popup:
The profile cover loads very long, because cover dimensions and size are big. So I've decided to make a PHP API which resizes images to needed size and then displays the resized image.
Is this good idea to make cover upload faster?
You need to populate a 436x85 box with user-provided pictures.
My own digital camera has a 18 MPx sensor that produces 4896x3672 pictures that use around 7 MB when compressed as JPEG. Imagine you display e.g. a dozen profiles per page. That's 84 MB worth of network transfer (more than a typical MP3-encoded music album) for a single page. JPEG compression roughly accomplishes a 1/10 ratio so you can assume 840 MB of RAM just to store the pictures. And they you have the overhead of having the browser resample pictures real time.
On the other size, a 436x85 JPEG can use 8 to 22 KB on average (depending on quality settings).
So if you use the raw pictures uploaded by users, of course it's not going to be fast.
Conclusion: always resize pictures yourself. And please do it only once, it's a heavy process even for your server.
Yes, it is a good idea to store not only the original image, but the resized ones too, because each time a new user is requesting certain page he is getting this big image, what is basically a waste of transfer and makes user wait what leads to poor user experience.
You should make a script which resizes and saves newly uploaded images to your server and use them instead of these big original ones. But also don't forget that resizing is really CPU-heavy, so it would be a good idea to queue this action and not do it instantly during the user's request.
I'm in the process of making a website for a school group and the website will have a lot of pictures of its activities. I'm using lightbox so the pictures will display as a slideshow and I changed the dimension of the pictures so the "current" picture on slideshow isn't as big as the original. I still find about 5 seconds delay from opening the picture or going to the next picture. I'm wondering if there's a way to achieve a faster load time or even another method I didn't consider.
I'm using xhtml, css, and php for my site.
Sorry for answering, but I can't write comment now...
I'm doing it other way, after upload, I resampled picture to big picture and thumbnail.
Ofcourse you can resampled picture everytime, but this is propably reason, why you have to wait so long, and it is lots of work for server, if you have many visitors in one moment.
So for me is best way to said, that big image is 640x480 max, and then save picture after upload to this size, ofcourse resampled by the same ratio.
Edit: From your post, I can't know if you resizing/resampling image, and where, in HTML by setting height and width or in PHP and how often.
Let's say you have a picture on the server, and its size is 8000x6000. its size could be something like 10 MB.
Now let's say you want to display this image in a web page and do it like this:
<img src="largeImage.jpg" width="800" height="600"/>
The browser will download the large image (10 MB), which takes a whole lot of time, and will then resize it to 800x600 in memory to display it on the web page (this consumes memory and CPU time). Total time: 25 seconds.
Now suppose you resize this image to 800x600 and put this resized image on the server. You then display the image with
<img src="smallImage.jpg" width="800" height="600"/>
The small image will look identical to the user visiting your web page, but the browser will only have to download a small image, which will be something like 100 KB large (100 times less than the large image). The time taken to download the image will be divided by 100 as well (0.25 seconds), and the browser won't have to load and resize a huge image in memory (less memory, less CPU time). Your image will be visible almost instantly.
There are many tools (I use Irfanview myself) which are able to take a large collection of images and resize them all at once. You should do that.
I'm building an image gallery which present a few images in the frontpage. The images are larger than the actual size displayed in the frontpage, which leads me to the following question:
If cache is not an option, what would be better:
Using php to shrink the image and send it to the client.
Send the original full size image and let the client shrink it (with simple width and height attributes)
I tend to think that the second is a better solution, but I'd like to hear more opinions.
Thanks!
Edit:
When people upload the images, I create thumbnails for them to be displayed when browsing the site.
The "cache is not an option" reason:
The discussed images are 5 "featured" images in the frontpage which will not stay the same for more than an hour max. so isn't it a waste to create another image copy of every uploaded image just for that?
Essentially, it depends on
What's the original-to-desired width/height ratio? It's not a big deal serving a 500x500 image and showing it as 250x250, but wasting bandwidth on 1920x1080 images is. Also, mobile devices might not have enough resources available to actually display the webpage if you serve too many big images.
What do you have more of: bandwidth or CPU power? Can you make sure nobody uses your on-the-fly resizer as DOS target?
Generally solutions with a cache, even a very temporary one, are much better though.
[AD edit]
The discussed images are 5 "featured" images in the frontpage which
will not stay the same for more than an hour max. so isn't it a waste
to create another image copy of every uploaded image just for that?
It is. But you could simply create a separate folder for these thumbnails, and setup a cron job to wipe files older than an hour. Here's an example I use on my site (set to 30 minutes):
*/15 * * * * find /var/www/directory/ -mmin +30 -exec rm -f {} \; >/dev/null 2>&1
Given 'enough' CPU ressources I would prefer to shrink images before sending them to go easy on people with bad connections and mobile devices.
Another option and my preferred strategy would be to keep smaller versions of the images and then use them. If the images are uploaded at some point, then create a smaller version of the image on upload.
It kind of depends on your flow, but I would resize them on-the-fly and save the thumb. So if the thumb exist, serve it, if not resize on the fly and serve that (while saving the thumb).
Then in a cronjob you can remove old images.
How about 3. don't resize the images in the process that's supposed to serve them to client - make a background process do the resizing, send the thumbnails if resized, full images if not yet. This has the advantage that you can throttle the resizing process independently of the user requests.
We are using PHP with CodeIgniter to import millions of images from hundreds of sources, resizing them locally and then uploading the resized version to Amazon S3. The process is however taking much longer than expected, and we're looking for alternatives to speed things up. For more details:
A lookup is made in our MySQL database table for images which have not yet been resized. The result is a set of images.
Each image is imported individually using cURL, and temporarily hosted on our server during processing. They are imported locally because the library doesn't allow resizing/cropping of external images. According to some tests the speed difference when importing from different external sources have been between 80-140 seconds (for the entire process, using 200 images per test), so the external source can definitely slow things down.
The current image is resized using the image_moo library, which creates a copy of the image
The resized image is uploaded to Amazon S3 using a CodeIgniter S3 library
The S3 URL for the new resized image is then saved in the database table, before starting with the next image
The process is taking 0.5-1 second per image, meaning all current images would take a month to resize and upload to S3. The major problem with that is that we are constantly adding new sources for images, and expect to have at least 30-50 million images before the end of 2011, compared to current 4 million at the start of May.
I have noticed one answer in StackOverflow which might be a good complement to our solution, where images are resized and uploaded on the fly, but since we don't want any unnecessary delay when people visit pages, we need to make certain that as many images as possible are already uploaded. Besides this, we want multiple size formats of the images, and currently only upload the most important one because of this speed issue. Ideally, we would have at least three size formats (for example one thumbnail, one normal and one large) for each imported image.
Someone suggested making bulk uploads to S3 a few days ago - any experience in how much this could save would be helpful.
Replies to any part of the question would be helpful if you have some experience of similar process. Part of the code (simplified)
$newpic=$picloc.'-'.$width.'x'.$height.'.jpg';
$pic = $this->image_moo
->load($picloc.'.jpg')
->resize($width,$height,TRUE)
->save($newpic,'jpg');
if ($this->image_moo->errors) {
// Do stuff if something goes wrong, for example if image no longer exists - this doesn't happen very often so is not a great concern
}
else {
if (S3::putObject(
S3::inputFile($newpic),
'someplace',
str_replace('./upload/','', $newpic),
S3::ACL_PUBLIC_READ,
array(),
array(
"Content-Type" => "image/jpeg",
)))
{ // save URL to resized image in database, unlink files etc, then start next image
Why not add some wrapping logic that lets you define ranges or groups of images and then run the script several times on the server. If you can have four of these processes running at the same time on different sets of images then it'll finish four times faster!
If you're stuck trying to get through a really big backlog at the moment you could look at spinning up some Amazon EC2 instances and using them to further parallelize the process.
I suggest you split your script into 2 scripts which run concurrently. One would fetch remote images to a local source, simply doing so for any/all images that have not yet been processed or cached locally yet. Since the remote sources add a fair bit of delay to your requests you will benefit from constantly fetching remote images, not only doing so as you process each one.
Concurrently you use a second script to resize any locally cached images and upload them to Amazon S3. Alternately you can split this part of the process as well using one script for resizing to a local file then another to upload any resized files to S3.
The first part (fetch remote source image) would greatly benefit from running multiple concurrent instances like James C suggests above.
For an image hosting web application:
For my stored images, is it feasible to create thumbnails on the fly using PHP (or whatever), or should I save 1 or more different sized thumbnails to disk and just load those?
Any help is appreciated.
Save thumbnails to disk. Image processing takes a lot of resources and, depending on the size of the image, might exceed the default allowed memory limit for php. It is less of a concern if you have your own server with only your application running but it still takes a lot of cpu power and memory to resize images. If you're considering creating thumbnails on the fly anyway, you don't have to change much - upon the first request, create the thumbnail from the source file, save it to disk and upon subsequent requests just read it off the disk.
I use phpThumb, as it's the best of both worlds. You can create thumbnails on the fly, but it automatically caches the images to speed up future requests. It creates a nice wrapper around the GD and ImageMagick libraries. Worth a look!
It would be much better to cache the thumbnails. Generating them on the fly would be very taxing on the system.
It depends on the usage pattern of the site, but, basically, how many times do you expect each image to be viewed?
In the case of thumbnails, they're most likely to be around for quite a while (the image is uploaded once and never changed, so the thumbnail doesn't change either), so it's generally worthwhile to generate when the full image is uploaded and store them for later. Unless the site is completely dead, they'll be viewed many (hundreds or thousands of) times over their lifetime and disk is a lot cheaper than latency these days. This also becomes more significant as load on the server increases, of course.
Conversely, for something like stock charts that get updated every hour (if not more frequently), that would be a situation where you'd do better to create them on the fly, so as to avoid wasting CPU time on constantly generating images which no user will ever see.
Or, if you want to get fancy, you can optimize to handle either access pattern by generating the images on the fly the first time they're needed and then showing the pre-generated one afterwards, up until the data it's generated from changes, at which point you delete it so that it will be regenerated the next time it's needed. But that would be overkill for something as static as thumbnails, IMO.
check out the gd library and imagemagick