PDF Verification in PHP

PDF Verification in PHP - php

What is the best way in PHP to determine if a PDF is filled out correctly? The source PDF is a faxed form that contains handwritten data. Is an image comparison an option? If the form is filled out on a computer, I know I can use pdftotext to verify that the fields are completed or not. I just don't know how to verify handwritten data.

For hand-written data an image comparison may definitely be an option. See for example the following answer for a basic idea how to start tackling this task:
Imagemagick : “Diff” an Image
However, the job may be much more difficult when faxed images come into play. (We all know how bad a quality you can get from faxes. Also, they frequently are skewed by a small degree. And they may be slightly scaled, compared to the original. Not to forget that their resolution is 204x196dpi, which adds a bit of a distortion. And lastly -- how do you get the faxed form back into PHP? This might involve another step of scanning in the paper, which again will not necessarily add quality to the result.
Still, ImageMagick may be able to handle all this: it can -deskew images, it can reduce or completly remove -noise, and it can -distort, -scale and -repage images and much more...

Related

php - Improving zbar recognition of scanned qr codes

I'm trying to read scanned qr codes from php, running zbarimg via exec. Working not-too-bad.
The issue is that it seems to choke on scanning artifacts like these small dots:
I've been trying to get rid of the white dots syndrome by fiddling around with Imagick - changing brightness/contrast/sharpness seems to make them stand out less but some, like this one, are still unreadable.
Is there a way to remove the white dots / improve zbarimg's recognition?
Edit:
One thing I forgot to point out:
What strikes me as weird is the fact that scanning the QR via smartphone, using the camera, reads the code succesfully in a single instant without a single issue, which leads me to think this "fixing up" shouldn't even be needed.
Am I just using zbar the wrong way?
Or do mobile OSes just use a different, better, algorithm? I tried using a zxing wrapper for PHP as well, but it gave even less results compared to zbar.

In terms of cleaning up the image you have shown us, the obvious approach would be to use cellular automata - although for best results you would ant to modify the behaviour to encompass the sharpening and thresholding you are already applying using other filters. You might consider setting the size of the cell to, say, one 25th of the QR code block resolution rather than a 1:1 with pixels in the unerdlying image. Really you should be applying your thresholding via a histogram based approach (assuming that you can isolate the QR code in the image).
I'm not aware of an implementation in PHP but there is at least one OpenCV interface for PHP

Remove distortion from image using php

I am beginner to PHP image processing. I have an image, I have to remove the distortion from that image and read the date. How can I do that in PHP.
How can I remove destortion from that image and read the date.

There simply isn't any one method fits all type answer. Unless you write some amazingly complicated code that can pick the distortion applied, relate it back on a pixel by pixel basis to what the original was meant to be, you will make buckets of money getting past all the CAPTCHAS used for exactly that purpose - to make sure a human is driving the wheel, not a bit of code.
In theory, you could write a bit of code to apply the opposite of that exact distortion fairly easily, looks like nothing more than a repeated drag has been added to that image - but that will work for that EXACT image, and will likely just make the next image even more distorted.

How should I generate an isometric image of a Minecraft skin in PHP?

I'm trying to generate 3D isometric views of players' heads, but I'm not sure what kind of support PHP has for this type of operation, or of any external libraries that may be better suited.
Basically I need to take a net like this (here is a diagram showing what each portion is mapped to) and make a 3D head from it. I also need to include the 'head accessory' portions, which should be slightly larger/offset from the actual head.
Does anyone know how I should go about this?

Well first it will be a complex job in my view.
The http://www.minecraftwiki.net/images/0/01/Skinzones.png file you mentioned is flat, but you have to convert that in ISOMETRIC 3D look, so you have to distort the images
For example look at the images below
So you can see that 3D box image is created from the pieces of other images, the logic is to add perspective to the flat images and join them. but as it is 2D we will call it Image Distortion.
Unfortunately GD Library which comes bundled with PHP is not advanced enough to let you do such things.
You have to use some other library like Image Magic and this link is tutorial for using distort functions http://www.imagemagick.org/Usage/distorts/
Second big thing is the processing of the images, you can process the images live but it will consume lots of resources on server, so it is suggested that you use pre processed images, and not process them every time.
To generate the Isometric image you have to write the code your self, and it may need alteration on each image character depending upon the size of the image. But when you have written a code it will be easy.
My Suggestion is to write your own code once, then alter it for every character and save the processed images in a sprite and use them when you add play functionality.
check out this link as well
http://www.fmwconcepts.com/imagemagick/index.php

given two images, determine whether one is edited from the other (and which is the original)

suppose there is an image on web without watermark. And someone downloads it and makes some edits on it like adding watermark etc etc. Is it possible to write a script in php to compare these two images. Like when I submit these two images to the script, it should be able to output the original image and manipulated image.
I read google's webmaster page which says
Google often finds multiple copies of the same image online. We use many different signals to identify the original source of the image
Blockquote
This is the main concern of my question
One more doubt is will there be any meta tags inside an image. if at all how to read them. Is it possible to edit them. Are there any information(not visual) inside an image which cannot be edited.

Anything within the image can be edited (it is, after all, just a collection of bytes), and it's definitely trivial for someone to add a watermark to an image, or simply change the contrast ever-so-slightly, to make it a very different file from the original. There are several other non-destructive changes that would make image files look completely different to a naive comparison algorithm (e.g., scaling, changing filetypes and compression, changing brightness, rotation, etc.).
Advanced image processing algorithms, however, can still often identify similarities between images that have been manipulated in ways like those above. There are many algorithms to do this, and honestly you could spend thousands of hours trying to roll an algorithm like this yourself. These sorts of algorithms are referred to as "content-based image retrieval."
You might be better off calling into engine that's already been developed to do exactly this. Here are some possibilities:
TinEye has a RESTful API that you can use, described here.
You could scrape the response from Google's Search by Image results using this technique.
You could use any of the number of suggestions within this slightly older StackOverflow post.
Good luck!

Photos taken by digital cameras usually have exif data embedded.
You can get the data with the exif_read_data function in PHP.
As for identifying similar images, here's some useful resources:
TinEye
SO Q on image similarity
The comments on Resig's article

You could submit both images to ImageEdited and see which one has been edited. Even if the exif data's missing, it tells when an image has been created with a program.

How to avoid Optimizing images that are already optimized with PHP?

I am currently working on a PHP application which is ran from the command line to optimize a folder of Images.
The PHP application is more of a wrapper for other Image Optimizer's and it simply iterates the directory and grabs all the images, it then runs the Image through the appropriate program to get the best result.
Below are the Programs that I will be using and what each will be used for...
imagemagick to determine file type and convert non-animated gif's to png
gifsicle to optimize Animated Gif images
jpegtran to optimize jpg images
pngcrush to optimize png images
pngquant to optimize png images to png8 format
pngout to optimize png images to png8 format
My problem: With 1-10 images, everything runs smooth and fairly fast however, once I run on a larger folder with 10 or more images, it becomes really slow. I do not really see a good solution around this but one thing that would help is to avoid re-processing images that have already been Optimized. So if I have a folder with 100 images and I optimize that folder and then add 5 new images, re-run the optimizer. It then has to optimize 105 images, my goal is to have it only optimize the 5 newer images since the previous 100 would have already been optimized. This alone would greatly improve performance when new images are added to the image folder.
I realize the simple solution would be to simply copy or move the images to a new folder after processing them, my problem with that simple solution is that these images are used for the web and websites, so the images are generally hard-linked into a websites source code and changing the path to the images would complicate that and possibly break it sometimes.
Some ideas I have had are: Write some kind of text file database to the image folders that will list all the images that have already been processed, so when the application is ran, it will only run on images that are not in that file already. Another idea was to cheange the file name to have some kind of identification in the name to show it has been optimized, a third idea is to move each optimized file to a final destination folder once it is optimized. Idea 2 and 3 are not good though because they will break all image path links in the websites source code.
So please if you can think of a decent/good solution to this problem, please share?

Meta data
You could put a flag in the meta info of each image after it is optimized. First check for that flag and only proceed if it's not there. You can use exif_read_data() to read the data. Writing it maybe like this.
The above is for JPGs. Metdata for PNGs is also possible take a look at this question, and this one.
I'm not sure about GIFs, but you could definitely convert them to PNGs and then add metadata... although I'm pretty sure they have their own meta info, since meta data extraction tools allow GIFs.
Database Support
Another solution would be to store information about the images in a MySQL database. This way, as you tweak your optimizations you could keep track of when and which optimization was tried on which image. You could pick which images to optimize according to any parameters of your choosing. You could build an admin panel for this. This method would allow easy experimentation.
You could also combine the above two methods.
Maximum File Size
Since this is for saving space, you could have the program only work on images that are larger than a certain file size. Ideally, after running the compressor once, all the images would be below this file size, and after that only newly added images that are too big would be touched. I don't know how practical this is in terms of implementation, since it would require that the compressor gets any image below some arbitrary files size. You could make the maximum file size dependent on image size.....

The easiest way would most likely be to look at the time of the last change for each image. If an image was changed after the last run of your script, you have to run it on this particular image.
The timestamp when the script was ran could be saved easily in a short text file.

A thought that comes to my head is to mix the simple solution with a more complicated one. When you optimize the image, move it to a separate folder. When an access is made into the original image folder, have your .htaccess file capture those links and route them to an area of which can see if that same image exists within the optimized folder section, if not, optimize, move, then proceed.
I know i said simple solution, this is a sightly complicated solution, but the nice part is that the solution will provide a scalable approach to your issue.
Edit: One more thing
I like the idea of a MySQL database because you can add a level security (not all images can be viewed by everyone) If thats a need of course. But it also makes your links problem (the hard coded one) not so much a problem. Since all links are a single file of which retrieves the images from the db and the only thing that changes are get variables which are generated. This way your project becomes significantly more scalable and easier to do a design change.

Sorry this is late, but since there is a way to address this issue without creating any files, storing any data of any kind or keeping track of anything. I thought I'd share my solution of how I address things like this.
Goal
Setup an idempotent solution that efficiently optimizes images without dependencies that require keeping track of its current status.
Why
This allows for a truly portable solution that can work in a new environment, an environment that somehow lost its tracker, or an environment that is sensitive as to what files you can actually save in there.
Diagnose
Although metadata might be the first source you'd think to check for this information, it's true that in some cases it will not be available and the nature of metadata itself is arbitrary, like comments, they can come and go and not affect the image in any way. We want something more concrete, something that is a definite descriptor of the asset at hand. Ideally you would want to "identify" if one has been optimized or not, and the way to do that is to review the image to see if it has been based on its characteristics.
Strategy
When you optimize an image, you are providing different options of all sorts in order to reach the final state of optimization. These are the very traits you will also check to come to the conclusion of whether or not it had been in fact optimized.
Example
Lets say we have a function in our script called optimize(path = ''), and let's assume that part of our optimization does the following:
$ convert /path/to/image.jpg -bit-depth=8 -quality=87% -colors=255 -colorspace sRGB ...
Note that these options are ones that you choose to specify, they will be applied to the image and are properties that can be reviewed later...
$ identify -verbose /path/to/image.jpg
Image: /path/to/image.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Geometry: 1250x703+0+0
Colorspace: sRGB <<<<<<
Depth: 8-bit <<<<<<
Channel depth:
Red: 8-bit
Green: 8-bit
Blue: 8-bit
Channel statistics:
Pixels: 878750
Red:
...
Green:
...
Blue:
...
Image statistics:
Overall:
...
Rendering intent: Perceptual
Gamma: 0.454545
Transparent color: none
Interlace: JPEG
Compose: Over
Page geometry: 1250x703+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 87 <<<<<<
Properties:
...
Artifacts:
...
Number pixels: 878750
As you can see here, the output quite literally has everything I would want to know to determine whether or not I should optimize this image or not, and it costs nothing in terms of a performance hit.
Conclusion
When you are iterating through a list of files in a folder, you can do so as many times as you like without worrying about over optimizing the images or keeping track of anything. You would simply filter out all the extensions you don't want to optimize (eg .bmp, .jpg, .png) then check their stats to see if they possess the attributes your function will apply to the image in the first place. If it has the same values, skip, if not, optimize.
Advanced
If you want to get extremely efficient, you would check each attribute of the image that you plan on optimizing and in your optimization execution you would only apply the options that have not been applied to the command.
Note
This technique is obviously meant to show an example of how you can accurately determine whether or not an image needs to be optimized. The actual options I have listed above are not the complete scope of elements that can be chosen. The are a variety of available options to choose from, and you can apply and check for as many as you want.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.