I've been searching about this for a while but I didn't find what I wanted, so here is my problem:
using PHP,
I want to create a very big image file,lets say 20000 gigapixels, then I want to add a small image to specific location on this big image. My computer doesn't have enough RAM to load up the entire image and manipulate pixels that way, so I think I need to access the image data on hard disk and manipulate them in some way, so anyone knows how to do this?
thanks for helping me out :)
ImageMagick supports operations on very large files. I don't see support in the PHP/ImageMagick API but you could call out (exec) to the command line program and use one of it's disk caching or streaming options.
There is some documentation for dealing with large files here: www.imagemagick.org.
What would you do with an image that size? You couldn't serve it a browser, and even if you did manage to load it into the server, it would take up all the server resources, so you wouldn't be using the server for anything else in the meanwhile.
The short answer is that handling an image of that kind of scale as a single file in RAM is out of the question unless you've got an extremely powerful machine dedicated to it, and nothing else. At 20k x 20k pixels, even a simple monchrome image is going to take 400mb. Scale that up to any useful colour depth, and you're talking about gigabytes of RAM just to hold the graphic, and that's before we even start thinking about actually doing stuff with it.
I guess the solution is to look at what other people do, given the same problem.
Real applications that use images of that scale (eg mapping apps or panorama photos like this one) store their image as a series of much smaller blocks. Each block is a smaller image in its own right. They'd also usually have separate sets of blocks for each zoom level too. Handling a single massive image file is implausible for any realistic server environment, but smaller chunks make it easy to handle for both browser and server. The server just sends the blocks to the user that are in the current view; when the user scrolls or zooms, they get sent more blocks.
Your question mentions adding a smaller image to a specific location on the big one. Again, looking at how others do this, google maps and others handle this kind of thing using a layering system. The layers are built up and sent to the browser separately.
I know that doesn't directly answer the question, but I hope it gives you some options to think about.
Just keep a simple file, not image, and store pixel data in it in any custom format. PHP has a fseek function, which allows you to jump to any location in the file, so you can calculate needed location & perform read/write on it. If you have image with size W x H, and if each pixel takes 3 bytes, then the address of pixel (X, Y) in the file will be (W * Y + X) * 3.
Related
I need to display +-250 jpeg images (which are maps with no georeference). each jpeg is about 7Mb in filesize. When I make tiles from a jpeg with maptiler the total tile structure increases to 40Mb. I don't like this because I don't have that much space on the server. I wonder if there will be a significant decrease in performance if I create the tiles on demand with php (i.e. create and serve them when needed). Of course it will make the server work harder but will it be that bad?
There will be quite a decrease in performance if you move this server-side and do it in PHP. Really the best option would be to get more server space or use something like S3 to host the images, that would be able to do it cheaply and simply.
I've been on a project for the past few days and hit a problem displaying large quantities of images (+20gb total ~1-2gb/directory)in a gallery on one area of the site. The site is built on the bootstrap framework. I've been trying to make massive carousels that ultimately do not function fluidly due to combined /images size. Question A: In this situation do I need i/o from a database and store images there-- is this faster than in /images folder on front end?
And b) in my php script i need to -set directories to variables/ iterate through and display images into < li >, but how do I go about putting controls on the memory usage so as to not overload browser? Any additions, suggestions, or alternatives would be greatly appreciated. Im looking for most direct means to end here.
Though the question is a little generic, here are some thoughts in regards to your two questions:
A) No, performance pulling images from a database would most likely be worse than pulling straight from the file system. In general, it is not a good idea to store images or other binary data in databases unless you absolutely have to, because databases can't do much with this information and you are just adding an extra layer on top of the file system that doesn't need to be there. You would, however, want to store paths to images in your database, potentially along with other characteristics such as image dimensions, thumbnail paths, keywords, etc. Then your application would read the entries for the images to return the correct paths to the images.
B) You will almost certainly want to implement some sort of paging if you are displaying many hundreds or thousands of photos. If the final display must be a carousel, you will want to investigate the Javascript that drives it to determine how you could hook in a function that retrieves more results from your PHP application via an AJAX call when it reaches the end or near end of the current listing of images. If you are having problems with the browser crashing due to too many images, you will also want to remove images from the first part of the list of <li>s when you load new ones so that it keeps the DOM under control.
A) It's a bad idea to store that much binary data into a database, even if the DB allows it, you shouldn't use it, it'll also give you much more memory consumption, all your data will be stored in the database's memory space, then copied into PHP's memory space for you to handle, which eats up twice the memory, plus the overhead of running a database server, and querying, etc.. so no, it's slower to use a database, accessing the filesystem directly is faster, if you also use varnish or other front-end caching system, you'll even be able to serve content much faster too.
What I would do is store files on the filesystem, and the best server to handle static serving like that is either G-WAN or NGINX Source, but do your read up and decide for yourself what suits you best. point is, stay away from apache, and probably host all those static files onto a separate server running a lightweight http server
ProTip: Save multiple copies of the same image with scaled down sizes for example 50% and another version with 25% of the original image size, this way you'll be able to send the thumbnails first for quick browsing, then when a user decides to view an image you serve up the 50% or 100% size, depending on their screen size, this way you save yourself bandwidth and memory. you also save a big 3G bill for mobile users.
B) This is where it makes some sense to use a database, you can index all the directories into a database, and use that to store the location of the image in the FS, and perhaps some tags, and maybe even number of views, etc...
and in the forntend you'll implement a scipt that'll fetch for example 50 thumbnails per page then the user can scroll around using some fancy JQuery, and when you need to fetch more, simply get a new result set with 50 more thumbs, etc..
this way you'll save yourself memory, bandwidth and even the users will thank you for such a lightweight browsing experience !
Another tip:
If you want to be able to handle bigger traffic, you might want to consider using a CDN, there are many CDN services that aren't as expensive as Amazon S3, a simple search will give you tons of resources !
Happy hacking !
I want to create multiple thumbnails using GD library in php, and I already have a script to do this, the question is what is better for me .. is it better to create thumbnail on the fly? or create a physical file on my server each time I want a thumb?? and Why?
Please, consider time consuming and storage capacity and other disadvantages for both
When you create the thumbnail depends on a couple of factors (that I'll get into) but you should never discard the output of something like this (unless you'll never use it again) as it's a really expensive operation.
Anyway your two main choices for "when to generate the thumbnail" are:
When it's first requested. This is common and it means that you don't generate thumbnails that are never used but it does mean if you have a page full of first-time-thumbnails that the server might become overwhelmed with PHP processes generating the thumbnails.
I had a similar issue with Sorl+Django where I was generating 100+ thumbnails per request for the first few requests after uploading and it basically made the entire server hang for 20 minutes. Not good.
Generate all required thumbnails when you upload. Because it takes a long time to upload, you break down the processing quite a lot. You can also pull it out-of-process (ie use another script to process uploads - perhaps not even in PHP).
The obvious downside is you're using up disk space that you otherwise might not need to use up... But unless you're talking about hundreds of thousands of thumbnails, a small percentage of unused ones probably won't break the bank.
Of course, if disk space is an issue, there might be an argument for pushing the thumbnail up to a CDN at the same time as you process it.
One note when you save the thumbnails, it's fairly common that you'll want to resize the thumbnails at some point down the line or perhaps want two small variants. I find it really useful to make the filenames very specific so if the original image is image.jpg, the 200x200 version is image-200x200.jpg.
Neither/both - don't generate the thumbnails till you need them - but keep the files you generate.
That way you'll minimise the amount of work needed and have a self-repairing system
C.
GD is really resource heavy, so you should look at if you can use ImageMagick instead (which also has a clearer syntax).
You definitely will be better off caching the created thumbnail after the first run (regardless of if you run GD or ImageMagick) and serve them from the cache. If you are worried about storage, clear out old files from the cache now and then.
Always cache (= write out to disk) the results of GD operations. They are too expensive both regarding processor time and memory to be done on the fly every time. This becomes increasingly true the more visitors/hits you have.
A certain site I know recently upgraded their bandwith from 2,5 TB monthly to 3,5 TB.
Reason is they went over the 2,5 limit recently. They're complaining they don't know how to get down the bandwidth usage.
One thing I haven't seen them consider is the fact that JPEG and other images that are displayed on the site(and it is an image-heavy site) can contain metadata. Where the picture was taken and such.
Fact of the matter is, this information is of no importance whatsoever on that site. It's not gonna be used, ever. Yet it's still adding to the bandwidth, since it increases the filesize of every images from a few bytes to a few kilobytes.
On a site that uses up more then 2,5 TB per month, stripping the several thousands images of their metadata will help decrease the bandwidth usage at least by a few Gigabytes per month I think, if not more.
So is there a way to do this in PHP? And also, for the allready existing files, does anybody know a good automatic metadata remover? I know of JPEG & PNG Stripper, but that's not very good... Might be usefull for initial cleaning though...
It's trivial with GD:
$img = imagecreatefromjpeg("myimg.jpg");
imagejpeg($img, "newimg.jpg", $quality);
imagedestroy($img);
This won't transfer EXIF data. Don't know how much bandwidth it will actually save, though, but you could use the code above to increase the compression of the images. That would save a lot of bandwidth, although it possibly won't be very popular.
I seriously doubt image metadata is the root of all evil here.
Some questions to take into consideration:
How is the webserver configured?
Does it issue http 304 responses properly?
Isn't there some kind of hand-made caching/streaming of data through php scripting that prevents said data from being cached by the browser? (in which case, url rewriting and http redirections should be considered).
Check out Smush.it! It will strip all un-necs info from an image. They have an API you can use to crunch the images.
Note: By Design, it may change the filetype on you. This is on purpose. If another filetype can display the same image with the same quality, with less bytes it will give you a new file.
I think you need to profile this. You might be right about it saving a few GB but thats relatively little on 2.5TB of bandwidth. You need real data about what is being served most and work on that. If you do find it is images that send your bandwidth usage so high you first should check your caching headers and 304 responses, you also might want to investigate using something like amazon S3 to serve your images. I have managed to reduce bandwidth costs a lot by doing this.
That said, if the EXIF data is really making that much of a difference then you can use the GD library to copy a jpeg image using the imagejpeg function. This won't copy EXIF data.
Emil H's probably addresses the question the best.
But I wanted to add that this will almost certainly not save you as much as you may think. This type of metadata takes up very little space; I would think that
Re-compressing the images to a smaller file size, and
Cropping or resizing to reduce the resolution of the images
are both going to have a much greater effect. With point one alone you could probably drop bandwidth 50% and with both, you could drop bandwidth 80% - that is if you are willing to sacrifice some image size.
If not, you could always have the default view at a smaller size, with an 'enlarge' link. Most people just browsing will see the smaller image, and only those who want the largest size will click to enlarge it, so you'll still get almost all the bandwidth saving. This is what Flickr does, for example.
Maybe some sort of hex data manipulation would help here. I'm facing the same problem and investigating on some sort of automated solution.
Just wondering if that can be done and if possible, I'll write a php class for this.
Might be smart to do all the image manipulation on the client side (using a java applet such as facebook does) and then when the image is compressed, resized and fully stripped of unnecessary pixels and content, it can be uploaded at it's optimal size, saving you bandwidth and server side performance! (at the cost of initial development)
For an image hosting web application:
For my stored images, is it feasible to create thumbnails on the fly using PHP (or whatever), or should I save 1 or more different sized thumbnails to disk and just load those?
Any help is appreciated.
Save thumbnails to disk. Image processing takes a lot of resources and, depending on the size of the image, might exceed the default allowed memory limit for php. It is less of a concern if you have your own server with only your application running but it still takes a lot of cpu power and memory to resize images. If you're considering creating thumbnails on the fly anyway, you don't have to change much - upon the first request, create the thumbnail from the source file, save it to disk and upon subsequent requests just read it off the disk.
I use phpThumb, as it's the best of both worlds. You can create thumbnails on the fly, but it automatically caches the images to speed up future requests. It creates a nice wrapper around the GD and ImageMagick libraries. Worth a look!
It would be much better to cache the thumbnails. Generating them on the fly would be very taxing on the system.
It depends on the usage pattern of the site, but, basically, how many times do you expect each image to be viewed?
In the case of thumbnails, they're most likely to be around for quite a while (the image is uploaded once and never changed, so the thumbnail doesn't change either), so it's generally worthwhile to generate when the full image is uploaded and store them for later. Unless the site is completely dead, they'll be viewed many (hundreds or thousands of) times over their lifetime and disk is a lot cheaper than latency these days. This also becomes more significant as load on the server increases, of course.
Conversely, for something like stock charts that get updated every hour (if not more frequently), that would be a situation where you'd do better to create them on the fly, so as to avoid wasting CPU time on constantly generating images which no user will ever see.
Or, if you want to get fancy, you can optimize to handle either access pattern by generating the images on the fly the first time they're needed and then showing the pre-generated one afterwards, up until the data it's generated from changes, at which point you delete it so that it will be regenerated the next time it's needed. But that would be overkill for something as static as thumbnails, IMO.
check out the gd library and imagemagick