I've tried Zend PDF and it works pretty well but I have a troublesome problem : when I add an image to a page with drawImage() it always appears pixilated regardless position and dimensions.
There's many lines I don't understand in the Zend images classes, has someone already encountered this problem?
How can I fix this ?
(I cannot post my code today but it's very simple and I think my question is not specific)
Remember that PDF is inherently a print medium (300dpi and higher), while your average .jpg is intended for screen viewing (72-100dpi). If you don't supply source images with approximately the same resolution as the document you're inserting it into, the PDF display engine will have to do all kinds of scaling to make things fit. Once when you insert the image to stretch it out to the size you want, and then downscaling again to make the PDF fit on your screen.
Related
This question is slightly long, so I'll try to be clear.
On a website I co-develop, I created a drawing application that sends data about the lines drawn on the <canvas> element to the server to create an image and save it at multiple sizes, which works. There is the standard save which is saved to 320x212, with the smaller version at 240x176.
However, some users on my site use a pretty antiquated device/browser for the application. It also has a smaller viewport, so the image is smaller, so server-side I'm currently multiplying the coordinates to compensate when saving the image to the standard size. But this has the side-effect of random unfilled parts of the image showing up, that were filled in on the canvas.
(source: socialcu.be)
(On the canvas for the smaller device's viewport, the entire bottom portion was green, while my method of multiplying the coordinates to compensate caused many unfilled portions to appear on the scaled-up image)
So my first question is, would making the image as a vector image, scaling it up (while still a vector), and then saving it as a raster image fix that issue?
And secondly (if the answer to the first is a yes), then what would be the best way of doing this in PHP? I've heard of Cairo, but the information on it (namely tutorials for the PHP package) is quite lacking. Optimally, is there a way to do this in Imagick, or a tutorial on how to use Cairo?
Cairo is definitely lacking for documentation on the PHP site, so it will be a much more difficult set to learn. I found a couple of links through Google in regard to Cairo. There is a blog post about someone's experience with it here: Getting started with Cairo
Referenced in that post are a couple of walk-thrus by Michael Maclean.
There are quite a few examples of drawing with imagick on the imagick website, and from the looks of the available research online with examples imagick has a better support community (for now). imagick will output a raster image though.
I am using HTML2PDF converter in order to export web page to PDF file. The issue I have faced is that the result PDF is to large (more than 1 MB). I want to reduce it, so here is what I have basically:
2 images (100 KB both)
1 Courier Bulgarian font - added
3 tables with a lot of inline styles of each cell
Could these things lead to the large size of the output PDF? And could anyone share some experience and best practices with the library in order to get smaller PDF as result.
Thanks in advance.
If you have access to Adobe Acrobat, it'll let you peek in the PDF and see which areas use what percentage of space. This would tell you how much of the space is taken up by images, fonts etc... If you make a sample PDF available I'll be happy to take a look at it.
How much space in the PDF is used by different objects really depends on how the PDF was written and how efficiently it uses different compression algorithms. For example, images can be ZIP, JPEG or even JPEG-2000 compressed in PDF, the question is what HTML2PDF does with your images.
Fonts can be big as well - depending on what the size of the original font is.
Page content (your tables with inline styles etc) are written in a condensed textual format (for example 1 0 0 rg would be the instruction to set the fill color for text and line-art to red). Normally efficient PDF writers will write all of these textual instructions and then ZIP compress them, which means they won't take much place in the file.
So you'd really need to take a look at the PDF file itself to see where the space is used. That will allow you to start looking at the library to see if it can be made more efficient.
To start with...
Have one separate page for printing and displaying - for printing.
While printing use Ptint.CSS and include as less as possible classes
In those three tables. Keep only require classes in CSS. Please use compress images for printing e.g if you have BMP in display use the PNG for printing.
Make tables simple using TR-TD and make them less colorfull do not put inline CSS.
This may help you to save PDF in compress version.
I have mananged to find what causes the big size of my output PDFs files. It last the fonts.
For my task I have used courier bulgarian (italics, normal and bold). This means I have embeded three additional fonts.
Something more, I do not know, but helvica is the defauld font for this PHP library and althought I am not using it, it is embeded to the PDFs too.
If you meet the same issue, open all files that have "helvica" and replace the "helvica" font with this that you are using.
I've been doing some speed optimization on my site using Page Speed and it gives recommendations like so:
Optimizing the following images could reduce their size by 35.3KiB (21% reduction).
Losslessly compressing http://example.com/.../some_image.jpg could save 25.3KiB (55% reduction).
How are the calculations done to get the file size reduction numbers? How can I determine if an image is optimized in PHP?
As I understand it, they seem to base this on the image quality (so saving at 60% in Photoshop or so is considered optimized?).
What I would like to do is once an image is uploaded, check if the image is fully optimized, if not, then optimize it using a PHP image library such as GD or ImageMagick. If I'm right about the number being based on quality, then I will just reduce the quality as needed.
How can I determine if an image is fully optimized in the standards that Page Speed uses?
Chances are they are simply using a standard compression or working on some very simple rules to calculate image compression/quality. It isn't exactly what you were after but what I often use on uploaded images etc etc dynamic content is a class called SimpleImage:
http://www.white-hat-web-design.co.uk/blog/resizing-images-with-php/
this will give you the options to resize & adjust compression, and I think even change image type (by which I mean .jpg to .png or .gif anything you like). I worked in seo and page optimization was a huge part of my Job I generally tried to make the images the size the needed to be no smaller or bigger. compress JS & CSS and that's really all most people need to worry about.
It looks like you could use the PageSpeed Insights API to get the compressions scores: https://developers.google.com/speed/docs/insights/v1/reference, though I'm guessing you'd want to run a quality check/compression locally, rather than sending everything through this API.
It also looks like they've got a standalone image optimizer available at http://code.google.com/p/page-speed/wiki/DownloadPageSpeed?tm=2, though this appears to be a Windows binary. They've got an SDK there as well, though I haven't looked into what it entails, or how easy it would be to integrate into an existing site.
I am currently working on a PHP application which is ran from the command line to optimize a folder of Images.
The PHP application is more of a wrapper for other Image Optimizer's and it simply iterates the directory and grabs all the images, it then runs the Image through the appropriate program to get the best result.
Below are the Programs that I will be using and what each will be used for...
imagemagick to determine file type and convert non-animated gif's to png
gifsicle to optimize Animated Gif images
jpegtran to optimize jpg images
pngcrush to optimize png images
pngquant to optimize png images to png8 format
pngout to optimize png images to png8 format
My problem: With 1-10 images, everything runs smooth and fairly fast however, once I run on a larger folder with 10 or more images, it becomes really slow. I do not really see a good solution around this but one thing that would help is to avoid re-processing images that have already been Optimized. So if I have a folder with 100 images and I optimize that folder and then add 5 new images, re-run the optimizer. It then has to optimize 105 images, my goal is to have it only optimize the 5 newer images since the previous 100 would have already been optimized. This alone would greatly improve performance when new images are added to the image folder.
I realize the simple solution would be to simply copy or move the images to a new folder after processing them, my problem with that simple solution is that these images are used for the web and websites, so the images are generally hard-linked into a websites source code and changing the path to the images would complicate that and possibly break it sometimes.
Some ideas I have had are: Write some kind of text file database to the image folders that will list all the images that have already been processed, so when the application is ran, it will only run on images that are not in that file already. Another idea was to cheange the file name to have some kind of identification in the name to show it has been optimized, a third idea is to move each optimized file to a final destination folder once it is optimized. Idea 2 and 3 are not good though because they will break all image path links in the websites source code.
So please if you can think of a decent/good solution to this problem, please share?
Meta data
You could put a flag in the meta info of each image after it is optimized. First check for that flag and only proceed if it's not there. You can use exif_read_data() to read the data. Writing it maybe like this.
The above is for JPGs. Metdata for PNGs is also possible take a look at this question, and this one.
I'm not sure about GIFs, but you could definitely convert them to PNGs and then add metadata... although I'm pretty sure they have their own meta info, since meta data extraction tools allow GIFs.
Database Support
Another solution would be to store information about the images in a MySQL database. This way, as you tweak your optimizations you could keep track of when and which optimization was tried on which image. You could pick which images to optimize according to any parameters of your choosing. You could build an admin panel for this. This method would allow easy experimentation.
You could also combine the above two methods.
Maximum File Size
Since this is for saving space, you could have the program only work on images that are larger than a certain file size. Ideally, after running the compressor once, all the images would be below this file size, and after that only newly added images that are too big would be touched. I don't know how practical this is in terms of implementation, since it would require that the compressor gets any image below some arbitrary files size. You could make the maximum file size dependent on image size.....
The easiest way would most likely be to look at the time of the last change for each image. If an image was changed after the last run of your script, you have to run it on this particular image.
The timestamp when the script was ran could be saved easily in a short text file.
A thought that comes to my head is to mix the simple solution with a more complicated one. When you optimize the image, move it to a separate folder. When an access is made into the original image folder, have your .htaccess file capture those links and route them to an area of which can see if that same image exists within the optimized folder section, if not, optimize, move, then proceed.
I know i said simple solution, this is a sightly complicated solution, but the nice part is that the solution will provide a scalable approach to your issue.
Edit: One more thing
I like the idea of a MySQL database because you can add a level security (not all images can be viewed by everyone) If thats a need of course. But it also makes your links problem (the hard coded one) not so much a problem. Since all links are a single file of which retrieves the images from the db and the only thing that changes are get variables which are generated. This way your project becomes significantly more scalable and easier to do a design change.
Sorry this is late, but since there is a way to address this issue without creating any files, storing any data of any kind or keeping track of anything. I thought I'd share my solution of how I address things like this.
Goal
Setup an idempotent solution that efficiently optimizes images without dependencies that require keeping track of its current status.
Why
This allows for a truly portable solution that can work in a new environment, an environment that somehow lost its tracker, or an environment that is sensitive as to what files you can actually save in there.
Diagnose
Although metadata might be the first source you'd think to check for this information, it's true that in some cases it will not be available and the nature of metadata itself is arbitrary, like comments, they can come and go and not affect the image in any way. We want something more concrete, something that is a definite descriptor of the asset at hand. Ideally you would want to "identify" if one has been optimized or not, and the way to do that is to review the image to see if it has been based on its characteristics.
Strategy
When you optimize an image, you are providing different options of all sorts in order to reach the final state of optimization. These are the very traits you will also check to come to the conclusion of whether or not it had been in fact optimized.
Example
Lets say we have a function in our script called optimize(path = ''), and let's assume that part of our optimization does the following:
$ convert /path/to/image.jpg -bit-depth=8 -quality=87% -colors=255 -colorspace sRGB ...
Note that these options are ones that you choose to specify, they will be applied to the image and are properties that can be reviewed later...
$ identify -verbose /path/to/image.jpg
Image: /path/to/image.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Geometry: 1250x703+0+0
Colorspace: sRGB <<<<<<
Depth: 8-bit <<<<<<
Channel depth:
Red: 8-bit
Green: 8-bit
Blue: 8-bit
Channel statistics:
Pixels: 878750
Red:
...
Green:
...
Blue:
...
Image statistics:
Overall:
...
Rendering intent: Perceptual
Gamma: 0.454545
Transparent color: none
Interlace: JPEG
Compose: Over
Page geometry: 1250x703+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 87 <<<<<<
Properties:
...
Artifacts:
...
Number pixels: 878750
As you can see here, the output quite literally has everything I would want to know to determine whether or not I should optimize this image or not, and it costs nothing in terms of a performance hit.
Conclusion
When you are iterating through a list of files in a folder, you can do so as many times as you like without worrying about over optimizing the images or keeping track of anything. You would simply filter out all the extensions you don't want to optimize (eg .bmp, .jpg, .png) then check their stats to see if they possess the attributes your function will apply to the image in the first place. If it has the same values, skip, if not, optimize.
Advanced
If you want to get extremely efficient, you would check each attribute of the image that you plan on optimizing and in your optimization execution you would only apply the options that have not been applied to the command.
Note
This technique is obviously meant to show an example of how you can accurately determine whether or not an image needs to be optimized. The actual options I have listed above are not the complete scope of elements that can be chosen. The are a variety of available options to choose from, and you can apply and check for as many as you want.
I run a site with lots of small images (www.iconfinder.com) and would like to develop a feature that can compare and recognize images. A user should be able to upload an image (icon) and then the site will respond with information about the image if it's in the database.
What is the approach to finding similar (or the same image). I know I can compare md5 of the two images, but I also want be able to find matches if the are scaled.
This is a good start if you are interested in looking at doing it in PHP:
http://www.intelliot.com/blog/2008/03/sorted-directory-listing-image-resizing-comparison-and-similarity-in-php/
There probably aren't a lot of languages LESS suited to this task than PHP. You should really look for an image comparison library with a C compatible API and figure out how to glue that into your PHP application.
Identical images can be checked with an md5sum, but detecting if somebody uploads a scaled image, which displays the same thing as the other is very hard. This requires digital image processing.
An approach is to scale down all images to a certain width (say 100px). Then check a few coordinates for the color. If another image matches a big part (say 80%), it might be the same image.
But if the image is lighter... this won't work.