exif_read_data - Incorrect APP1 Exif Identifier Code - php

I have problem with some of my photos when i want to read EXIF data.
My code below:
$exif_date = exif_read_data($file_path, 'IFD0');
With some images i get warrning:
Message: exif_read_data(001.jpg) [function.exif-read-data]: Incorrect APP1 Exif Identifier Code
My question is: how can I awoid this warrning, can I check somehow if app1 is correct before exif_read?
Thanks for help.

For the quick answer, take a look at the last rows of this post.
I think some code is still missing. I came exactly across the same problem and after searching I found multiple websites related to this problem:
http://drupal.org/node/556970
a bug report with 2 solutions:
simply put an # in front of exif_read_data
check $imageinfo['APP1'] if it contains Exif
After reading dcro's answer here, I found out that the second parameter of getimagesize() returns such an $imageinfo array. Now I tested one of my images with the following code:
<?php
getimagesize("test.jpg", $info);
var_dump($info);
?>
This returned the following:
array(1) {
["APP1"]=>
string(434) "http://ns.adobe.com/xap/1.0/<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Exempi + XMP Core 4.1.1">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:type>Image</dc:type>
<dc:format>image/jpeg</dc:format>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>"
}
This btw. doesn't look like Exif. This looks more like XMP, but the funny part is that for example the exiftool finds some exif data (orientation for example). In the XMP specification I found that it is possible to have XMP and Exif data side by side in one file (page 18). Further search revealed that there are script like this one to extract Exif from XMP.
Anyway, since
getimagesize() does not give me usable information about the Exif in my picture and
the stated script shows that in my image the Exif data is not embedded into the XMP data and
it simply works to suppress the exif-read-data() warning
I will still use the #exif-read-data($file_path) solution.

You can use PHP's getimagesize() function to extract the APP markers from the file and then verify if the APP1 marker actually contains EXIF data (the content for that marker should start with 'Exif')

Related

ImageMagick with PHP text overflowing PDF to JPG conversion

I'm trying now to convert a PDF file to JPG, using ImageMagick with PHP and CakePHP. The PDF is in perfect shape and it's right the way it should be, but the image generated from the PDF is always overflowing the borders of the file.
Until now, I've tried tweaking the code for the generation with no sucess, reading a lot from the PHP docs (http://php.net/manual/pt_BR/book.imagick.php).
Here are the convertion code:
$image = new Imagick();
$image->setResolution(300,300);
$image->setBackgroundColor('white');
$image->readImage($workfile);
$image->setGravity(Imagick::GRAVITY_CENTER);
$image->setOption('pdf:fit-to-page',true);
$image->setImageFormat('jpeg');
$image->setImageCompression(imagick::COMPRESSION_JPEG);
$image->setImageCompressionQuality(60);
$image->scaleImage(1200,1200, true);
$image->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN);
$image->setImageAlphaChannel(Imagick::ALPHACHANNEL_REMOVE);
$image->writeImage(WWW_ROOT . 'files' . DS . 'Snapshots' . DS . $filename);
Here are the results:
https://imgur.com/a/ISBmDMv
The first image is the PDF before the conversion and the second one, the image generated from the PDF where the right side text overflows.
So, why this is happening? And if someone got some alternative for any tech used (the GhostScript, ImageMagick, etc) is also welcome!
Thanks everyone!
Its very hard to say why you see the result you do, without seeing the original PDF file, rather than a picture of it.
The most likely explanation is that your original PDF file uses a font, but does not embed that font in the PDF. When Ghostscript comes to render it to an image it must then substitute 'something' in place of the missing font. If the metrics (eg spacing) of the substituted font do not match precisely the metrics of the missing font, then the rendered text will be misplaced/incorrectly sized. Of course since its not using the same font it also won't match the shapes of the characters either.
This can result in several different kinds of problems, but what you show is pretty typical of one such class of problem. Although you haven't mentioned it, I can also see several places in the document where text overwrites as well, which is another symptom of exactly the same problem.
If this is the case then the Ghostscript back channel transcript will have told you that it was unable to find a font and is substituting a named font for the missing one. I can't tell you if Imagemagick stores that anywhere, my guess would be it doesn't. However you can copy the command line from the ImagMagick profile.xml file and then use that to run Ghostscript yourself, and then you will be able to see if that's what is happening.
If this is what is happening then you must either;
Create your PDF file with the fonts embedded (this is good practice anyway)
Supply Ghostscript with a copy of the missing font as a substitute
Live with the text as it is

PHP - Check if pdf contains given text - TcpdfFpdi / pdftk / fpdi

I have a pdf document and I want to check if a specific text occurs (which are tags that I put in while generating the pdf) in the document, however using these libraries (tcpdfFpdi, pdftk or fdpi) I couldn't figure out if it's possible or how to do it.
$str = "{hello}";
$pdf = new TcpdfFpdi();
$pdf->setSourceFile($filePath);
$pdf->searchForText($str); // something like this which returns boolean
If I try without any library to dd(file_get_contents($filePath)), it returns a very long output and doesn't seem to contain the file I want so I think it's better to use one of those libraries.
Just an idea…
It's no actual PHP solution but you could use tools like pdftotext which I know from this post (where a PDF file is converted into a string to count its words): https://superuser.com/a/221367/535203
You can install it and play around with that command and call it from within your PHP application.
As far as I remember (long time ago since I used pdftotext) the output text is not exaclty the PDF's content but to search a few tags in it it's at least a good try.

Metadata extraction from PNG images

How to extract metadata from a image like this website? I have used exev2 library but it gives only limited data as compared to this website. Is there some more advanced library?
I have already tried hacoir-metadata Python library.
Also how does Windows extract details of image (the one we see from properties)?
PNG files are made up of blocks, most of which are IDAT blocks which contain compressed pixel data in an average PNG. All PNG's start with a IHDR block and end with an IEND block. Since PNG is a very flexible standard in this way, it can be extended by making up new types of blocks--this is how animated Animated PNG works. All browsers can see the first frame, but browsers which understand the types of blocks used in APNG can see the animation.
There are many places that text data can live in a PNG image, and even more places metadata can live. Here is a very convenient summary. You mentioned the "Description tag", which can only live in text blocks, so that it was I'll be focusing on.
The PNG standard contains three different types of text blocks: tEXt (Latin-1 encoded, uncompressed), zTXt (compressed, also Latin-1), and finally iTXt, which is the most useful of all three as it can contain UTF-8 encoded text and can either be compressed or decompressed.
So, your question becomes, "what is a convenient way to extract the text blocks?"
At first, I thought pypng could do this, but it cannot:
tEXt/zTXt/iTXt
Ignored when reading. Not generated.
Luckily, Pillow has support for this - humorously it was added only one day before you asked your original question!
So, without further ado, let's find an image containing an iTXt chunk: this example ought to do.
>>> from PIL import Image
>>> im = Image.open('/tmp/itxt.png')
>>> im.info
{'interlace': 1, 'gamma': 0.45455, 'dpi': (72, 72), 'Title': 'PNG', 'Author': 'La plume de ma tante'}
According to the source code, tEXt and zTXt are also covered.
For the more general case, looking over the other readers, the JPEG and GIF ones also seem to have good coverage of those formats as well - so I would recommend PIL for this. That's not to say that the maintainers of hacoir-metadata wouldn't appreciate a pull request adding text block support though! :-)
I found this code buried in a Pillow pull request
from PIL import PngImagePlugin
info = PngImagePlugin.PngInfo() # read PNG data
info.add_text("foo", "bar") # write PNG data
img.save(filenew, "png", pnginfo=info)
You can try this pre-alpha solution by Daniel Chesterton. I am not sure is it just what you want or is it a part of the wanted solution, but I believe you can sort it out by playing with it.
https://github.com/dchesterton/image

Part of EXIF data read by exif_read_data() is corrupted

When I read the EXIF data from a raw file with exif_read_data() a lot of the data gets corrupted. Or so I think.
The file I'm trying to read is a DNG Raw file from a Pentax K-x camera.
Here is a demo: http://server.patrikelfstrom.se/exif/?file=_IGP6211.DNG
(I've added a standard JPEG from a Canon EOS 1000D as comparison)
I get no errors on this site and it seems to include data that exif_read_data() doesn't return.
http://regex.info/exif.cgi
And the corrupt data I'm talking about is: ...”¯/ѳf/ÇZ/íÔ.ƒ.9:./<ñ.TÛ¨.zâh!o†!™˜...
And: UndefinedTag:0xC65A
The server is running PHP version 5.5.3
Just because the data isn't human readable doesn't mean it's garbage.
Those values that you're seeing are private EXIF fields which are left up to the implementer to determine. They could be binary data, they could be text, they could be anything. This listing can help you determine what some of those values are.
For example, tag 0xC634 is DNGPrivateData which is data specifically for programs that deal with DNG files.
You can map the undefined tags to what they most likely are using this file:
https://github.com/peterhudec/image-metadata-cruncher/blob/master/includes/exif-mapping.php
It looks like your script is dying on 0xc634 => 'SR2Private'
Looking here http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Pentax.html it looks like it is used to store information about the flash on the camera? I don't know for sure, but it probably is not imporant information, and probably not meant to be viewed in text format.
I would probably just make a list of what keys it seems to die on, loop through the exif data, see if it starts with undefinedkey: and either rename the key to the mapped one, or unset those items:
$bad_keys = array('0xc634', ..., '0xc723');
foreach ( $exif as $key => $value ) {
if ( strtolower( substr( $key, 0, 13 ) ) == 'undefinedtag:' ) {
//use the file with the map of undefined tags
//either change the key, or unset it if it's one
//that seems to be corrupt
}
}

php get original image type [duplicate]

I have a sever which people can upload files to. The problem is that some of the filenames are mangled (dont have any extension) and so I cannot immediately determine file type. This question is two part: for the files which do have filenames what is the best way to determine whether or not it is an image? (Just a big long if/else if list?) Secondly, for the files which dont have extensions, how can I determine if they are images?
You can use exif_imagetype()
<?php
$type =exif_imagetype($image);
where $type is a value
IMAGETYPE_GIF
IMAGETYPE_JPEG
IMAGETYPE_PNG
IMAGETYPE_SWF
IMAGETYPE_PSD
IMAGETYPE_BMP
IMAGETYPE_TIFF_II (intel byte order)
IMAGETYPE_TIFF_MM (motorola byte order)
IMAGETYPE_JPC
IMAGETYPE_JP2
IMAGETYPE_JPX
IMAGETYPE_JB2
IMAGETYPE_SWC
IMAGETYPE_IFF
IMAGETYPE_WBMP
IMAGETYPE_XBM
IMAGETYPE_ICO
From the manual:
When a correct signature is found, the appropriate constant value will be returned otherwise the return value is FALSE. The return value is the same value that getimagesize() returns in index 2 but exif_imagetype() is much faster.
You can use getimagesize
It does not require the GD image library and it returns same information about image type.
http://it2.php.net/manual/en/function.getimagesize.php
If you have the GD2 extension enabled, you could just use that to load the file as an image, then if it returns invalid you can catch the error and return FALSE, otherwise return TRUE.
You have two options here, one's simple and pre-built with some shortfalls, the other is complex and requires math.
PHP's fileinfo can be used to detect file types based on the file's actual header information. For instance, I just grabbed your gravitar:
But the actual code is this:
‰PNG
IHDR szzô
IDATX…­—OL\UÆZÀhëT)¡ c•1T:1‘Š‘.Ú(]4†A“ÒEY˜à.................................
So, even without the file name I could detect it quite obviously. This is what the PHP Fileinfo extension will do. Most PNG and JPG files tend to have this header in them, but this is not so for every single file type.
That being said, fileinfo is dead simple to use, from the manual:
$fi = new finfo(FILEINFO_MIME,'/usr/share/file/magic');
$mime_type = $fi->buffer(file_get_contents($file));
Your other option is more complex and it depends on your own personal ambitions, you could generate a histogram and profile files based on their content.
Something like this looks like a GIF file:
And something like this looks like a TIFF file:
From there you'd need to generate a model over multiple types of files for what the histogram of each type should be, and then use that to guess. This is a good method to use for files that don't really have those "magic headers" that can be read easily. Keep in mind, you'll need to learn some math and how to model an average histogram function and match them against files.
You can try to load the image into PHP's GD library, and see if it works.
$file = file_get_contents('file');
$img = imagecreatefromstring($file);
if($img === FALSE){
// file is NOT an image
}
else{
// file IS an image
}
Look at image magic identify. http://www.imagemagick.org/script/identify.php
The php wrapper is here: http://www.php.net/manual/en/function.imagick-identifyimage.php
Or if you just want to validate that it's an image (and don't care about the meta data): http://www.php.net/manual/en/function.imagick-valid.php
exif_imagetype() might work
make sure you have exif enabled.
Try looking at exif_imagetype
If you need a fast solution, use imagesx() and imagesy(). There is also a fast way to check large image file dimensions, by reading just a small amount of data from the file header. Explained in more detail in the following url:
http://hungred.com/useful-information/php-fastest-image-width-height/
You can use the Fileinfo extension:
http://www.php.net/manual/en/function.finfo-file.php
finfo_file() uses magic bytes and does not have to load the whole image into memory. The result is a string with the corresponding MIME type, e.g.:
text/html
image/gif
application/vnd.ms-excel
The type of the image is typically going to be able to be inferenced from the header information of the file.
For the first question is extension is known you could use the PHP function in_array() Documentation

Categories