I was able to install webp support for imagemagick. But I'm missing some precise commands.
The basic is covered thru:
$im = new Imagick();
$im->pingImage($src);
$im->readImage($src);
$im->resizeImage($width,$height,Imagick::FILTER_CATROM , 1,TRUE );
$im->setImageFormat( "webp" );
$im->writeImage($dest);
But I'm missing lots of fine tuning options as described in the imageMagick command line documentation here:
http://www.imagemagick.org/script/webp.php
Specifically:
How do I set compression quality? (I tried setImageCompressionQuality and it does not work, ie the output is always the same size)
How do I set the "method" (from 0 to 6)?
Thanks
EDIT: I followed #emcconville's advice below (thanks!) and neither the method nor the compression worked. So I start to suspect my compilation of imagemagick.
I tried using command line:
convert photo.jpg -resize 1170x1170\> -quality 50 photo.webp
Wehn changing the 50 variable for quality the resulting file was always the same size. So there must be something wrong with my imagemagick...
How do I set the "method" (from 0 to 6)?
Try this...
$im = new Imagick();
$im->pingImage($src);
$im->readImage($src);
$im->resizeImage($width,$height,Imagick::FILTER_CATROM , 1,TRUE );
$im->setImageFormat( "webp" );
$im->setOption('webp:method', '6');
$im->writeImage($dest);
How do I set compression quality? (I tried setImageCompressionQuality and it does not work, ie the output is always the same size)
Imagick::setImageCompressionQuality seems to work for me, but note that webp:lossless becomes enabled if the values is 100, or greater (see source). You can test toggling lossless to see how that impacts results.
$img->setImageFormat('webp');
$img->setImageCompressionQuality(50);
$img->setOption('webp:lossless', 'true');
Edit from comments
Try testing the image conversion to webp directly with the cwebp utility.
cwebp -q 50 photo.jpg -o photo.webp
This will also write some statistical information to stdout, which can help debug what's happening.
Saving file 'photo.webp'
File: photo.jpg
Dimension: 1170 x 1170
Output: 4562 bytes Y-U-V-All-PSNR 55.70 99.00 99.00 57.47 dB
block count: intra4: 91
intra16: 5385 (-> 98.34%)
skipped block: 5357 (97.83%)
bytes used: header: 86 (1.9%)
mode-partition: 2628 (57.6%)
Residuals bytes |segment 1|segment 2|segment 3|segment 4| total
macroblocks: | 0%| 0%| 0%| 98%| 5476
quantizer: | 45 | 45 | 43 | 33 |
filter level: | 14 | 63 | 8 | 5 |
Also remember that for some subject matters, adjusting the compression quality doesn't always guaranty a file size decrease. But those are extreme edge cases.
Related
I've been able to get EXIF data by using exif_read_data(). According to the EXIF documentation provided on PHP docs, there has to be an imageNumber tag (I understand it's not guaranteed), but I haven't been able to see anything like that on my test image (Unedited JPG from a Nikon D5100). The same image seems to carry information about the shutter count as per online shutter count websites.
Really appreciate it if you can shed some light on what I'm possibly doing wrong to get this number. Or is there any other place or method they store shutter count in image meta?
EDIT:
Here's the code I tried to work out, and I'm trying to get imageNumber which is apparently not available to get. But online tools show the shutter count on the same image. I'd like to get the same result using PHP (or even using another language). Any help is appreciated.
$exif_data = exif_read_data ( $_FILES['fileToUpload']['tmp_name']);
print_r( $exif_data);
As per your example file it is specific to Nikon's MakerNote and in there specific to the D5100 model. Using ExifTool in verbose mode shows the structure:
> exiftool -v DSC_8725.JPG
...
JPEG APP1 (65532 bytes):
ExifByteOrder = MM
+ [IFD0 directory with 11 entries]
| 0) Make = NIKON CORPORATION
| 1) Model = NIKON D5100
...
| 9) ExifOffset (SubDirectory) -->
| + [ExifIFD directory with 41 entries]
...
| | 16) MakerNoteNikon (SubDirectory) -->
| | + [MakerNotes directory with 55 entries]
...
| | | 38) ShotInfoD5100 (SubDirectory) -->
| | | + [BinaryData directory, 8902 bytes]
...
| | | | ShutterCount = 41520
JPEG explained, see segment APP1:
Exif explained, see tag 0x927c:
Nikon's MakerNote explained, see tag 0x0091:
ShotInfoD5100 explained, see index 801
MakerNotes are proprietary: how data is stored there is up to each manufacturer. Documentations from those are rare - mostly hobbyists reverse engineer that information - that's why only selected software can read it at all for selected models. At this point you may realize that dozens of manufacturers with dozens of models exist, for which you all would have to interpret bytes differently - which is a lot of work! As per exif_read_data()s ChangeLog PHP 7.2.0 nowhere claims to support Nikon at all.
You have to either parse the MakerNote yourself or find PHP code/library/software which already did that for you. As a last resort you could execute non-PHP software (such as ExifTool) to get what you want.
I'm uploading images to my website and optimising them by removing EXIF data using Image Magick's stripImage() function.
$img = new Imagick($image);
$img->stripImage();
$img->writeImage($image);
$img->destroy();
It works quite well, I get reduced file sizes as expected. However, if I open the image in Photoshop, photoshop reads the image as having a resolution of 1 Pixel per Inch.
It seems that stripImage is removing the EXIF data that photoshop uses to determine resolution.
How do I prevent this behaviour while stripping everything else?
The short answer is that I don't think you can selectively remove parts of the EXIF data with ImageMagick, so you will probably need to extract the original density, clear the EXIF, then put back whatever Photoshop needs... not sure though, if Photoshop uses the standard density in a JPEG header or a value from the EXIF data.
Anyway, to work out the answer, you can get all density type settings in an image with EXIFtool like this:
exiftool "-*resolution*" image.jpg
X Resolution : 72
Y Resolution : 72
Resolution Unit : inches
Exiftool is described here. Or, as you know, and love ImageMagick, you can use identify like this:
identify -verbose IMG_3942.JPG | grep -i reso
Resolution: 72x72
exif:ResolutionUnit: 2
exif:thumbnail:ResolutionUnit: 2
exif:thumbnail:XResolution: 72/1
exif:thumbnail:YResolution: 72/1
exif:XResolution: 72/1
exif:YResolution: 72/1
You can also set the density with ImageMagick after you have stripped the EXIF data like this:
# Strip EXIF then add in resolution
convert IMG_3942.JPG -strip -density 180x180 180.jpg
# Check what happened
exiftool "-*resolution*" 180.JPG
Resolution Unit : inches
X Resolution : 180
Y Resolution : 180
You can also modify the EXIF data with libexif if you look here.
Firstly, my Java version:
string str = "helloworld";
ByteArrayOutputStream localByteArrayOutputStream = new ByteArrayOutputStream(str.length());
GZIPOutputStream localGZIPOutputStream = new GZIPOutputStream(localByteArrayOutputStream);
localGZIPOutputStream.write(str.getBytes("UTF-8"));
localGZIPOutputStream.close();
localByteArrayOutputStream.close();
for(int i = 0;i < localByteArrayOutputStream.toByteArray().length;i ++){
System.out.println(localByteArrayOutputStream.toByteArray()[i]);
}
and output is:
31
-117
8
0
0
0
0
0
0
0
-53
72
-51
-55
-55
47
-49
47
-54
73
1
0
-83
32
-21
-7
10
0
0
0
Then the Go version:
var gzBf bytes.Buffer
gzSizeBf := bufio.NewWriterSize(&gzBf, len(str))
gz := gzip.NewWriter(gzSizeBf)
gz.Write([]byte(str))
gz.Flush()
gz.Close()
gzSizeBf.Flush()
GB := (&gzBf).Bytes()
for i := 0; i < len(GB); i++ {
fmt.Println(GB[i])
}
output:
31
139
8
0
0
9
110
136
0
255
202
72
205
201
201
47
207
47
202
73
1
0
0
0
255
255
1
0
0
255
255
173
32
235
249
10
0
0
0
Why?
I thought it might be caused by different byte reading methods of those two languages at first. But I noticed that 0 can never convert to 9. And the sizes of []byte are different.
Have I written wrong code? Is there any way to make my Go program get the same output as the Java program?
Thanks!
First thing is that the byte type in Java is signed, it has a range of -128..127, while in Go byte is an alias of uint8 and has a range of 0..255. So if you want to compare the results, you have to shift negative Java values by 256 (add 256).
Tip: To display a Java byte value in an unsigned fashion, use: byteValue & 0xff which converts it to int using the 8 bits of the byte as the lowest 8 bits in the int. Or better: display both results in hex form so you don't have to care about sign-ness...
Even if you do the shift, you will still see different results. That might be due to different default compression level in the different languages. Note that although the default compression level is 6 in both Java and Go, this is not specified and different implementations are allowed to choose different values, and it might also change in future releases.
And even if the compression level would be the same, you might still encounter differences because gzip is based on LZ77 and Huffman coding which uses a tree built on frequency (probability) to decide the output codes and if different input characters or bit patterns have the same frequency, assigned codes might vary between them, and moreover multiple output bit patterns might have the same length and therefore a different one might be chosen.
If you want the same output, the only way would be (see notes below!) to use the 0 compression level (not to compress at all). In Go use the compression level gzip.NoCompression and in Java use the Deflater.NO_COPMRESSION.
Java:
GZIPOutputStream gzip = new GZIPOutputStream(localByteArrayOutputStream) {
{
def.setLevel(Deflater.NO_COMPRESSION);
}
};
Go:
gz, err := gzip.NewWriterLevel(gzSizeBf, gzip.NoCompression)
But I wouldn't worry about the different outputs. Gzip is a standard, even if outputs are not the same, you will still be able to decompress the output with any gzip decoders whichever was used to compress the data, and the decoded data will be exactly the same.
Here are the simplified, extended versions:
Not that it matters, but your codes are unneccessarily complex. You could simplify them like this (these versions also include setting 0 compression level and converting negative Java byte values):
Java version:
ByteArrayOutputStream buf = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(buf) {
{ def.setLevel(Deflater.NO_COMPRESSION); }
};
gz.write("helloworld".getBytes("UTF-8"));
gz.close();
for (byte b : buf.toByteArray())
System.out.print((b & 0xff) + " ");
Go version:
var buf bytes.Buffer
gz, _ := gzip.NewWriterLevel(&buf, gzip.NoCompression)
gz.Write([]byte("helloworld"))
gz.Close()
fmt.Println(buf.Bytes())
NOTES:
The gzip format allows some extra fields (headers) to be included in the output.
In Go these are represented by the gzip.Header type:
type Header struct {
Comment string // comment
Extra []byte // "extra data"
ModTime time.Time // modification time
Name string // file name
OS byte // operating system type
}
And it is accessible via the Writer.Header struct field. Go sets and inserts them, while Java does not (leaves header fields zero). So even if you set compression level to 0 in both languages, the output will not be the same (but the "compressed" data will match in both outputs).
Unfortunately the standard Java does not provide a way/interface to set/add these fields, and Go does not make it optional to fill the Header fields in the output, so you will not be able to generate exact outputs.
An option would be to use a 3rd party GZip library for Java which supports setting these fields. Apache Commons Compress is such an example, it contains a GzipCompressorOutputStream class which has a constructor which allows a GzipParameters instance to be passed. This GzipParameters is the equvivalent of the gzip.Header structure. Only using this would you be able to generate exact output.
But as mentioned, generating exact output has no real-life value.
From RFC 1952, the GZip file header is structured as:
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
Looking at the output you've provided, we have:
| Java | Go
ID1 | 31 | 31
ID2 | 139 | 139
CM (compression method) | 8 | 8
FLG (flags) | 0 | 0
MTIME (modification time) | 0 0 0 0 | 0 9 110 136
XFL (extra flags) | 0 | 0
OS (operating system) | 0 | 255
So we can see that Go is setting the modification time field of the header, and setting the operating system to 255 (unknown) rather than 0 (FAT file system). In other respects they indicate that the file is compressed in the same way.
In general these sorts of differences are harmless. If you want to determine if two compressed files are the same, then you should really compare the decompressed versions of the files though.
The following script is supposed to create an image with lower quality and so smaller file size (kb), instead it create an image with lower quality but bigger file size.
On my test the original is about 300kb, using 90% quality the output is almost the double and using 100% quality, the output is more than 1mb ...
<?php
$quality = 90;
$path = '/var/www/TEST/';
$inputSrc = $path . 'original.jpg';
$outputSrc = $path . 'after' . $quality . '.jpg';
$handler = imagecreatefromjpeg($inputSrc);
imagejpeg($handler, $outputSrc, $quality);
I assume the issue is related to imagejpeg bad implementation ...
is there any way to workaround this ?
isImageMagicka better solution ?
Thanks
Update
I was curious so I gave a try to ImageMagick and unfortunately I have similar result (slightly better).
Full test results:
Original size: 294.6Kb
GD (imagejpeg) 90%: 581.7Kb
GD (imagejpeg) 100%: 1.1Mb
ImageMagick 90%: 431.7Kb
ImageMagick 100%: 780.9kb
Update 2
I did some more test with GIMP and looks that in order to obtain a file with very similar size to the original one you have to check the option use quality setting from original image.
Now I'm confused more ... since when I select that setting Gimp automatically change the output quality to 74% (for the example image).
I was assuming that the JPEG quality value, if lower that 100%, decrease the image quality at every iteration of a save ... but I start to think I'm wrong here.
Update 3
With ImageMagick is not necessary to specify the quality of the sample and if you leave it emptyImageMagick will use the same quality detected in the input image.
So for the example image it is detected as a quality of 69 and the outfile is 326kb. That is the best result so far.
Here the image I'm using:
I had a little look at this. You can work backwards in ImageMagick and, rather than define the quality and see what size results, you can define the size and see what quality results. So, for a concrete example, you can say you want the output file not to exceed 100kB, like this:
convert MShRR.jpg -define jpeg:extent=100k out.jpg
and you get 99kB like this:
-rw-r--r--# 1 mark staff 294608 14 Jan 09:36 MShRR.jpg
-rw-r--r--# 1 mark staff 99989 14 Jan 09:44 out.jpg
To my eyes, the resulting image is a little posterised:
You can often add a tiny amount of blur to disguise this, as follows:
convert MShRR.jpg -blur x0.5 -define jpeg:extent=100k out.jpg
YMMV - Your Mileage May Vary !!!
I need to scan an uploaded PDF to determine if the pages within are all portrait or if there are any landscape pages. Is there someway I can use PHP or a linux command to scan the PDF for these pages?
(Updated answer -- scroll down...)
You can use either pdfinfo (part of either the poppler-utils or the xpdf-tools) or identify (part of the ImageMagick toolkit).
identify:
identify -format "%f Page %s: Width: %W -- Height: %H\n" T-VD7.PDF
Example output:
T-VD7.PDF Page 0: Width: 595 -- Height: 842
T-VD7.PDF Page 1: Width: 595 -- Height: 842
T-VD7.PDF Page 2: Width: 1191 -- Height: 842
[...]
T-VD7.PDF Page 11: Width: 595 -- Height: 421
T-VD7.PDF Page 12: Width: 595 -- Height: 842
Or a bit simpler:
identify -format "%s: %Wx%H\n" T-VD7.PDF
gives:
0: 595x842
1: 595x842
2: 1191x842
[...]
11: 595x421
12: 595x842
Note, how identify uses a zero-based page counting mechanism!
Pages are 'landscape' if their width is bigger than their height. They are neither-nor, if both are equal.
The advantage is that identify lets you tweak the output format quite easily and very extensively.
pdfinfo:
pdfinfo input.pdf | grep "Page.*size:"
Example output:
Page size: 595.276 x 841.89 pts (A4)
pdfinfo is definitely faster and also more precise than identify, if it comes to multi-page PDFs. (The 13-page PDF I tested this with took identify to 31 seconds to process, whereas pdfinfo needed less than half a second....)
Be warned: by default pdfinfo does report the size for the first page only. To get sizes for all pages (as you may know, there are PDFs which use mixed page sizes as well as mixed orientations), you have to modify the command:
pdfinfo -f 3 -l 13 input.pdf | grep "Page.*size:"
Output now:
Page 1 size: 595.276 x 841.89 pts (A4)
Page 2 size: 595.276 x 841.89 pts (A4)
Page 3 size: 1191 x 842 pts (A3)
[....]
Page 12 size: 595 x 421 pts (A5)
Page 13 size: 595.276 x 841.89 pts (A4)
This will print the sizes of page 3 (f irst to report) through page 13 (l ast to report).
Scripting it:
pdfinfo \
-f 1 \
-l 1000 \
Vergleich-VD7.PDF \
| grep "Page.* size:" \
| \
| while read Page _pageno size _width x _height rest; do
[ "$(echo "${_width} / 1"|bc)" -gt "$(echo "${_height} / 1"|bc)" ] \
&& echo "Page $_pageno is landscape..." \
|| echo "Page $_pageno is portrait..." ; \
done
(the bc-trick is required because the -gt comparison works for the shell only with integers. Dividing by 1 with bc will take round the possible real values to integers...)
Result:
Page 1 is portrait...
Page 2 is portrait...
Page 3 is landscape...
[...]
Page 12 is landscape...
Page 13 is portrait...
Update: Using the 'right' pdfinfo to discover page rotations...
My initial answer tooted the horn of pdfinfo. Serenade X says in a comment that his/her problem is to discover rotated pages.
Ok now, here is some additional info which is not yet known widely and therefor has not yet been really absorbed by all pdfinfo users...
As I mentioned, there are two different pdfinfo utilities around:
the one which comes as part of the xpdf-utils package (on some platform also named xpdf-tools).
the one which comes as part of the poppler-utils package (on some platforms also named poppler-tools, and sometimes it is not separated out as a packages but is part of the main poppler package).
Poppler's pdfinfo output
So here is a sample output from Poppler's pdfinfo command. The tested file is a 2-page PDF where the first page is in portrait A4 and the second page is in landscape A4 format:
kp#mbp:~$ pdfinfo -f 1 -l 2 a4portrait+landscape.pdf
Producer: GPL Ghostscript 9.05
CreationDate: Thu Jul 26 14:23:31 2012
ModDate: Thu Jul 26 14:23:31 2012
Tagged: no
Form: none
Pages: 2
Encrypted: no
Page 1 size: 595 x 842 pts (A4)
Page 1 rot: 0
Page 2 size: 842 x 595 pts (A4)
Page 2 rot: 0
File size: 3100 bytes
Optimized: no
PDF version: 1.4
Do you see the lines saying Page 1 rot: 0 and Page 2 rot: 0?
Do you notice the lines saying Page 1 size: 595 x 842 pts (A4) and Page 2 size: 842 x 595 pts (A4) and the differences between the two?
XPDF's pdfinfo output
Now let's compare this to the output of XPDF's pdfinfo:
kp#mbp:~$ xpdf-pdfinfo -f 1 -l 2 a4portrait+landscape.pdf
Producer: GPL Ghostscript 9.05
CreationDate: Thu Jul 26 14:23:31 2012
ModDate: Thu Jul 26 14:23:31 2012
Tagged: no
Pages: 2
Encrypted: no
Page 1 size: 595 x 842 pts (A4)
Page 2 size: 842 x 595 pts (A4)
File size: 3100 bytes
Optimized: no
PDF version: 1.4
You may notice one more difference, if you look closely enough. I won't point my finger to it, and will keep my mouth shut for now... :-)
Poppler's pdfinfo correctly reports rotation of page 2
Next, I rotate the second page of the file by 90 degrees using pdftk (I don't have Adobe Acrobat around):
pdftk \
a4portrait+landscape.pdf \
cat 1 2E \
output a4portrait+landscape---page2-landscaped-by-pdftk.pdf
Now Poppler's pdfinfo reports this:
kp#mbp:~$ pdfinfo -f 1 -l 2 a4portrait+landscape---page2-landscaped-by-pdftk.pdf
Creator: pdftk 1.44 - www.pdftk.com
Producer: itext-paulo-155 (itextpdf.sf.net-lowagie.com)
CreationDate: Thu Jul 26 14:39:47 2012
ModDate: Thu Jul 26 14:39:47 2012
Tagged: no
Form: none
Pages: 2
Encrypted: no
Page 1 size: 595 x 842 pts (A4)
Page 1 rot: 0
Page 2 size: 842 x 595 pts (A4)
Page 2 rot: 90
File size: 1759 bytes
Optimized: no
PDF version: 1.4
As you can see, the line Page 2 rot: 90 tells us what we are looking for. XPDF's pdfinfo would essentially report the same info about the changed file as it does about the original one. Of course, it would still correctly capture the changed Creator:, Producer: and *Date: infos, but it would miss the rotated page...
Also note this detail: page 2 originally was designed as a landscape page, which can be seen from the Page 2 size: 842 x 595 pts (A4) info part. However, it shows up in the current PDF as a portrait page, as can be seen by the Page 2 rot: 90 part.
Also note that there are 4 different values that could appear for the rotation info:
0 (no rotation),
90 (rotation to the East, or 90 degrees clockwise),
180 (rotation to the South, tumbled page image, upside-down, or 180 degrees clockwise),
270 (rotation to the West, or 90 degrees counter-clockwise, or 270 degrees clockwise).
Some Background Info
Popper (developed by The Poppler Developers) is a fork of XPDF (developed by Glyph & Cog LLC), that happened around 2005. (As one of their important reason for their forking the Poppler developer at the time gave: Glyph & Cog didn't always provide timely bugfixes for security-related problems...)
Anyway, the Poppler fork for a very long time kept the associated commandline utilities, their commandline parameters and syntax as well as the format of their output compatible to the original (XPDF/Glyph & Cog LLC) ones.
Existing Poppler tools gaining additional features over competing XPDF tools
However, more recently they started to add additional features. Out of the top of my head:
pdfinfo now also reports the rotation status of each page (starting with Poppler v0.19.0, released March 1st, 2012).
pdffonts now also reports the font encoding for each font (starting with Poppler v0.19.1, released March 15th, 2012).
Poppler tools getting more siblings
The Poppler tools also provide some extra commandline utilities which are not in the original XPDF package (some of which have been added only quite recently):
pdftocairo - utility for creating PNG, JPEG, PostScript, EPS, PDF, SVG (using Cairo)
pdfseparate - utility to extract PDF pages
pdfunite - utility to merge PDF files
pdfdetach - utility to list or extract embedded files from a PDF
pdftohtml - utility to convert HTML from PDF files
identify which comes with ImageMagick will give you the width and height of a given PDF file (it also requires GhostScript be installed on the system).
$ identify -format "%g\n" FILENAME.PDF
1417x1106+0+0
Where 1417 is the width, 1106 is the height, and you (for this purpose) can ignore the +0+0.
Edit: Sorry, I was referring to Mike B's comment on the original question - as he said, after knowing the width and height you can determine if you have a portrait or landscape image (if height > width then portrait else landscape).
Also, the \n added to the -format argument (as suggested by Kurt Pfeifle) will separate each page into its own line. He also mentions the %W and %H format parameters; all the possible format parameters can be found here (there are a lot of them).