I am working with a large amount of pages (letters) that are the same except for the address and a few other minor details. I believe what slows the PDF creation down the most is the logo image that I'm including on every page (even though it is fairly small).
I'm hoping to speed up the process some more by caching the logo, i.e. by loading the file once and storing it in a variable and have TCPDF use that instead of loading the image every time. TCPDF can load a "PHP image data stream", and the example given is this:
$imgdata = base64_decode('iVBORw0KGgoAAAANSUhEUgAAABwAAAASCAMAAAB/2U7WAAAABlBMVEUAAAD///+l2Z/dAAAASUlEQVR4XqWQUQoAIAxC2/0vXZDrEX4IJTRkb7lobNUStXsB0jIXIAMSsQnWlsV+wULF4Avk9fLq2r8a5HSE35Q3eO2XP1A1wQkZSgETvDtKdQAAAABJRU5ErkJggg==');
$pdf->Image('#'.$imgdata);
However, I have no idea how to create an image stream like this from a file.
My logo is a small (4kB) PNG file. If I use readfile($file) and send that to $pdf->Image with the '#' in front, it errors out - something about the cache folder which is already set to chmod 777 (it's a test server - I'll work on proper permissions on the live server). I believe I also tried base64_encode which also didn't work.
Any thoughts on how to do this?
PS: I already noticed that the more pages I include into the PDF, the slower it gets, so I'll find a good middle (probably 200-250 pages per file instead of the current 500).
Thanks!
Posted the same question in the TCPDF forum on sourceforge (sourceforge forum post), and the author of TCPDF answered.
He said that images are cached internally, however if the images need processing, he suggests using the XObject() template system (see example 62 on TCPDF site).
It took me a while to get it working (still not sure why it didn't work for me at first), but once I had it looking exactly like my original version using Image(), I ran a few tests with about 3,000 entries divided into PDF files of 500 pages each.
There was no speed gain at all between XObject() and Image(), and XObject() actually appeared to make the resulting files just a tiny bit larger (2.5kB in a 1.2MB file).
While this doesn't directly answer my original question (how to create a PHP data stream that can be directly used in TCPDF using Image('#'.$image)), it tells me what I really needed to know - the image is already cached, and caching using XObject() does not provide any advantage to my situation.
Related
I'm currently rewriting a website that need a lot of different sizes for each images. In the past I was doing it by creating the thumbnails images for all sizes on the upload. But now I have a doubt about is performance. This is because now I have to change my design and half of my images are not of the right size. So I think of 2 solutions :
Keep doing this and add a button on the backend to re-generate all the images. The problem is that I always need to know every sizes needed by every part of the site.
Only upload the real size image, and when displaying it, put in the SRC tag something like sr="thumbs.php?img=my-image-path/image.jpg&width=120&height=120". Then create the thumb and display it. Also my script would check if the thumb already exists, if it does it doesn't need to recrate it so just display it. Each 5 Days launch a script with a crontask to delete all the thumbs (to be sure to only use the usefull ones).
I think that the second solution is better but I'm a little concern by the fact that I need to call php everytime an image is shown, even if it's already created, it's php that give it to display...
Thanks for your advises
Based on the original question and subsequent comments, it would sound like on-demand generation would be suitable for you, as it doesn't sound like you will have a demanding environment in terms of absolutely minimizing the amount of download time to the end client.
It seems you already have a grasp around the option to give your <img> tags a src value that is a PHP script, with that script either serving up a cached thumbnail if it exists, or generating it on the fly, caching it, and then serving it up, so let me give you another option.
Generally speaking, utilizing PHP to serve up static resources is not a great idea as you begin to scale your site as
This would require the additional overhead of invoking PHP to serve these sorts of requests, something much more optimized with the basic web server like Apache, Nginx, etc. This means your site is going to be able to handle less traffic per server because it is using extra memory, CPU, etc. in order to serve up this static content.
It makes it hard to move those static resources into a single repository outside of the server for serving up content (such as CDN). This means you have to duplicate your files on each and every web server you have powering a site.
As such, my suggestion would be to still serve up the images as static image files via the webserver, but generate thumbnails on the fly if they are missing. To achieve this you can simply create a custom redirect rule or 404 handler on the web server, such that requests in your thumbnail directory which do not match an existing thumbnail image could be redirected to a PHP script to automatically generate the thumbnail and serve up the image (without the browser even knowing it). Future requests against this thumbnail would be served up as a static image.
This scales quite nicely as, if in the future you have the need to move your static images to a single server (or CDN), you can just use an origin-pull mechanism to try to get the content from your main servers, which will auto-generate them via the same mechanism I just mentioned.
Use the second option, if you don't have too much storage and first if you don't have too much CPU.
Or you can combine these: generate and store the image at the first open of the php thumbnails generator and nex time just give back the cached image.
With this solution you'll have only the necessary images and if you want you can delete sometimes the older ones.
I am making a web application that needs to show 3 types of thumbnails to a user. No I might end up with a lot of thumbnail files on the server for a lot of users.
This makes me think is generating thumbnails on the fly is a better option than storing them?
Speed vs Storage vs Logic - Which one to go for?
Does anyone here ever faced such a dilemma - let me know!
I am using CodeIgniter and its inbuilt Image Library for generating thumbnails.
I would go with: generate when needed, store afterwards.
Link to the image using a URL like /img/42/400x300.jpg. Through rewrite rules, you can fire up a PHP script should the image not exist. That script can then generate the requested image in the requested size and store it in the public web folder, where the web server can serve it directly the next time.
That gives you the best of both worlds: the image is not generated until needed, it is only generated once and it even makes it very flexible to work with different image sizes on the fly.
If you're worried about storage space, you can add a regular clean-up job which removes old images or perhaps analyses your access log files and removes images which where not accessed for some time.
My comment as an answer: (why not :)
My personal thoughts on this are, if you're anticipating a lot of users go with storage as the the load of creating dynamic thumbnails for every one of these users for every page load is going to hurt the server, maybe create it dynamically the first time it's ever viewed and then store it.
You may also take advantage of browser caching to save load and bandwidth. (marginal but every little helps)
So some background: what I'm doing, is creating a gallery that shows thumbnails of all the pictures in a server directory dynamically (it caches the thumbnails, don't worry). When a user clicks on a thumbnail, a loading gif is displayed until the image is ready, and then the image displayed. The actual pictures are very large in size and might take a considerable amount of time to download to a users computer.
What I would like to do, is show a percentage of the picture that is downloaded while the loading gif is playing.
I realize there are other questions like this, and from what research I've done so far, I also realize this might not be able to be accomplished without some server-side tricks.
From what I have come across in the last little bit, I've gathered (and I could be wrong, so please correct me if I am) is that the client-side code, knows how many bytes are received, but not how large the file is.
So is there a possible configuration using some php/javascript tricks, so that the client side javascript can load an image from a web-server directory and be able to calculate downloaded percentage?
Possibly the php code sending an extra header to the client with file size or something? Or even opening a second request to the web server for file size? How could you get the currently downloaded bytes?
You can use XMLHttpRequest2 to load the data and hook onto the progress events. The loaded data is turned into base64 and added to a Data URI. Once loading has finished you can assign a new image source to the constructed URI.
More info can be found here: http://blogs.adobe.com/webplatform/2012/01/13/html5-image-progress-events/
I am currently working on a PHP application which is ran from the command line to optimize a folder of Images.
The PHP application is more of a wrapper for other Image Optimizer's and it simply iterates the directory and grabs all the images, it then runs the Image through the appropriate program to get the best result.
Below are the Programs that I will be using and what each will be used for...
imagemagick to determine file type and convert non-animated gif's to png
gifsicle to optimize Animated Gif images
jpegtran to optimize jpg images
pngcrush to optimize png images
pngquant to optimize png images to png8 format
pngout to optimize png images to png8 format
My problem: With 1-10 images, everything runs smooth and fairly fast however, once I run on a larger folder with 10 or more images, it becomes really slow. I do not really see a good solution around this but one thing that would help is to avoid re-processing images that have already been Optimized. So if I have a folder with 100 images and I optimize that folder and then add 5 new images, re-run the optimizer. It then has to optimize 105 images, my goal is to have it only optimize the 5 newer images since the previous 100 would have already been optimized. This alone would greatly improve performance when new images are added to the image folder.
I realize the simple solution would be to simply copy or move the images to a new folder after processing them, my problem with that simple solution is that these images are used for the web and websites, so the images are generally hard-linked into a websites source code and changing the path to the images would complicate that and possibly break it sometimes.
Some ideas I have had are: Write some kind of text file database to the image folders that will list all the images that have already been processed, so when the application is ran, it will only run on images that are not in that file already. Another idea was to cheange the file name to have some kind of identification in the name to show it has been optimized, a third idea is to move each optimized file to a final destination folder once it is optimized. Idea 2 and 3 are not good though because they will break all image path links in the websites source code.
So please if you can think of a decent/good solution to this problem, please share?
Meta data
You could put a flag in the meta info of each image after it is optimized. First check for that flag and only proceed if it's not there. You can use exif_read_data() to read the data. Writing it maybe like this.
The above is for JPGs. Metdata for PNGs is also possible take a look at this question, and this one.
I'm not sure about GIFs, but you could definitely convert them to PNGs and then add metadata... although I'm pretty sure they have their own meta info, since meta data extraction tools allow GIFs.
Database Support
Another solution would be to store information about the images in a MySQL database. This way, as you tweak your optimizations you could keep track of when and which optimization was tried on which image. You could pick which images to optimize according to any parameters of your choosing. You could build an admin panel for this. This method would allow easy experimentation.
You could also combine the above two methods.
Maximum File Size
Since this is for saving space, you could have the program only work on images that are larger than a certain file size. Ideally, after running the compressor once, all the images would be below this file size, and after that only newly added images that are too big would be touched. I don't know how practical this is in terms of implementation, since it would require that the compressor gets any image below some arbitrary files size. You could make the maximum file size dependent on image size.....
The easiest way would most likely be to look at the time of the last change for each image. If an image was changed after the last run of your script, you have to run it on this particular image.
The timestamp when the script was ran could be saved easily in a short text file.
A thought that comes to my head is to mix the simple solution with a more complicated one. When you optimize the image, move it to a separate folder. When an access is made into the original image folder, have your .htaccess file capture those links and route them to an area of which can see if that same image exists within the optimized folder section, if not, optimize, move, then proceed.
I know i said simple solution, this is a sightly complicated solution, but the nice part is that the solution will provide a scalable approach to your issue.
Edit: One more thing
I like the idea of a MySQL database because you can add a level security (not all images can be viewed by everyone) If thats a need of course. But it also makes your links problem (the hard coded one) not so much a problem. Since all links are a single file of which retrieves the images from the db and the only thing that changes are get variables which are generated. This way your project becomes significantly more scalable and easier to do a design change.
Sorry this is late, but since there is a way to address this issue without creating any files, storing any data of any kind or keeping track of anything. I thought I'd share my solution of how I address things like this.
Goal
Setup an idempotent solution that efficiently optimizes images without dependencies that require keeping track of its current status.
Why
This allows for a truly portable solution that can work in a new environment, an environment that somehow lost its tracker, or an environment that is sensitive as to what files you can actually save in there.
Diagnose
Although metadata might be the first source you'd think to check for this information, it's true that in some cases it will not be available and the nature of metadata itself is arbitrary, like comments, they can come and go and not affect the image in any way. We want something more concrete, something that is a definite descriptor of the asset at hand. Ideally you would want to "identify" if one has been optimized or not, and the way to do that is to review the image to see if it has been based on its characteristics.
Strategy
When you optimize an image, you are providing different options of all sorts in order to reach the final state of optimization. These are the very traits you will also check to come to the conclusion of whether or not it had been in fact optimized.
Example
Lets say we have a function in our script called optimize(path = ''), and let's assume that part of our optimization does the following:
$ convert /path/to/image.jpg -bit-depth=8 -quality=87% -colors=255 -colorspace sRGB ...
Note that these options are ones that you choose to specify, they will be applied to the image and are properties that can be reviewed later...
$ identify -verbose /path/to/image.jpg
Image: /path/to/image.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Geometry: 1250x703+0+0
Colorspace: sRGB <<<<<<
Depth: 8-bit <<<<<<
Channel depth:
Red: 8-bit
Green: 8-bit
Blue: 8-bit
Channel statistics:
Pixels: 878750
Red:
...
Green:
...
Blue:
...
Image statistics:
Overall:
...
Rendering intent: Perceptual
Gamma: 0.454545
Transparent color: none
Interlace: JPEG
Compose: Over
Page geometry: 1250x703+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 87 <<<<<<
Properties:
...
Artifacts:
...
Number pixels: 878750
As you can see here, the output quite literally has everything I would want to know to determine whether or not I should optimize this image or not, and it costs nothing in terms of a performance hit.
Conclusion
When you are iterating through a list of files in a folder, you can do so as many times as you like without worrying about over optimizing the images or keeping track of anything. You would simply filter out all the extensions you don't want to optimize (eg .bmp, .jpg, .png) then check their stats to see if they possess the attributes your function will apply to the image in the first place. If it has the same values, skip, if not, optimize.
Advanced
If you want to get extremely efficient, you would check each attribute of the image that you plan on optimizing and in your optimization execution you would only apply the options that have not been applied to the command.
Note
This technique is obviously meant to show an example of how you can accurately determine whether or not an image needs to be optimized. The actual options I have listed above are not the complete scope of elements that can be chosen. The are a variety of available options to choose from, and you can apply and check for as many as you want.
I have a very large image generated on the fly with PHP and outputted to the browser. (it's 5000px wide and 1000-2000px tall. It's a plot of the daily user activity on my site).
The problem is that nowadays the plot is too big and the PHP script gives memory exhausted errors (tough the generated PNG itself is quite small) and I can't get the image due to this.
Is there way to output this large image in multiple parts somehow using GD in PNG format?
(ps: the host where I run the site uses safe mode, so I can't modify the configuration and I think they're using the default PHP installation.)
EDIT1: It's an admin script. No users see it except me.
EDIT2: and example image can be seen here: http://users.atw.hu/calmarius/trash/wtfb2/x.png
(I also have the option to group the tracks by IP address.)
Every user+IP pair has its own 24 hour track on the plot. And every green mark denotes an user activity. As you can see this image can be output track by track. And there is no need to output and generate the whole thing all once.
This website will be an online strategy game and I want to use this graph in the future to make detecting multiaccounts easier. (Users who are trying to get advantage by registering multiple accounts over those ones who only have 1.) But this is a different problem.
I'm using PHP script because I'm too lazy to export the requestlog from the database, download it and feed the data to a program that would make the plot for me. ;)
Set the memory limit to unlimited before processing the image.
ini_set('memory_limit', '-1');
It'd help to say how you're generating the image (GD library, ImageMagick) and how you're outputting it. Are you saving the file to a directory and then using readfile() to output it? If yes, fopen / fread / echo combination is about 50%-60% faster than using readfile() to output files to the browser. Are you using gzip compression? What's the time limit on php execution? What's the exact error message you're getting?