ImageMagick with PHP text overflowing PDF to JPG conversion

ImageMagick with PHP text overflowing PDF to JPG conversion - php

I'm trying now to convert a PDF file to JPG, using ImageMagick with PHP and CakePHP. The PDF is in perfect shape and it's right the way it should be, but the image generated from the PDF is always overflowing the borders of the file.
Until now, I've tried tweaking the code for the generation with no sucess, reading a lot from the PHP docs (http://php.net/manual/pt_BR/book.imagick.php).
Here are the convertion code:
$image = new Imagick();
$image->setResolution(300,300);
$image->setBackgroundColor('white');
$image->readImage($workfile);
$image->setGravity(Imagick::GRAVITY_CENTER);
$image->setOption('pdf:fit-to-page',true);
$image->setImageFormat('jpeg');
$image->setImageCompression(imagick::COMPRESSION_JPEG);
$image->setImageCompressionQuality(60);
$image->scaleImage(1200,1200, true);
$image->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN);
$image->setImageAlphaChannel(Imagick::ALPHACHANNEL_REMOVE);
$image->writeImage(WWW_ROOT . 'files' . DS . 'Snapshots' . DS . $filename);
Here are the results:
https://imgur.com/a/ISBmDMv
The first image is the PDF before the conversion and the second one, the image generated from the PDF where the right side text overflows.
So, why this is happening? And if someone got some alternative for any tech used (the GhostScript, ImageMagick, etc) is also welcome!
Thanks everyone!

Its very hard to say why you see the result you do, without seeing the original PDF file, rather than a picture of it.
The most likely explanation is that your original PDF file uses a font, but does not embed that font in the PDF. When Ghostscript comes to render it to an image it must then substitute 'something' in place of the missing font. If the metrics (eg spacing) of the substituted font do not match precisely the metrics of the missing font, then the rendered text will be misplaced/incorrectly sized. Of course since its not using the same font it also won't match the shapes of the characters either.
This can result in several different kinds of problems, but what you show is pretty typical of one such class of problem. Although you haven't mentioned it, I can also see several places in the document where text overwrites as well, which is another symptom of exactly the same problem.
If this is the case then the Ghostscript back channel transcript will have told you that it was unable to find a font and is substituting a named font for the missing one. I can't tell you if Imagemagick stores that anywhere, my guess would be it doesn't. However you can copy the command line from the ImagMagick profile.xml file and then use that to run Ghostscript yourself, and then you will be able to see if that's what is happening.
If this is what is happening then you must either;
Create your PDF file with the fonts embedded (this is good practice anyway)
Supply Ghostscript with a copy of the missing font as a substitute
Live with the text as it is

Related

PHP - Check if pdf contains given text - TcpdfFpdi / pdftk / fpdi

I have a pdf document and I want to check if a specific text occurs (which are tags that I put in while generating the pdf) in the document, however using these libraries (tcpdfFpdi, pdftk or fdpi) I couldn't figure out if it's possible or how to do it.
$str = "{hello}";
$pdf = new TcpdfFpdi();
$pdf->setSourceFile($filePath);
$pdf->searchForText($str); // something like this which returns boolean
If I try without any library to dd(file_get_contents($filePath)), it returns a very long output and doesn't seem to contain the file I want so I think it's better to use one of those libraries.

Just an idea…
It's no actual PHP solution but you could use tools like pdftotext which I know from this post (where a PDF file is converted into a string to count its words): https://superuser.com/a/221367/535203
You can install it and play around with that command and call it from within your PHP application.
As far as I remember (long time ago since I used pdftotext) the output text is not exaclty the PDF's content but to search a few tags in it it's at least a good try.

Can't use custom fonts to PHP Imagick with Pango / another solution for arabic ligatures

We have an Image processing microservice that created rich images with text on top of it and we are in the process of adding arabic locale to our website
While translating some of the content to Arabic our translator told us that the text in generated images not rendering correctly in arabic - they appear with no ligatures, so every character is separated.
We currently use ImageMagick (6.8.9-9) with PHP and the text generation used ImagickDraw->annotateImage that worked fine until we hit that ligatures problem.
I googled a little and found "pango" that solves that problem and also allows some sort of "html" syntax to define multiple settings for one text line which is pretty cool.
Problem is that I didn't find a way to use custom font files
This is the code:
$img = new \Imagick;
$img->newImage($width, $height, new ImagickPixel('transparent'));
$img->setBackgroundColor(new ImagickPixel('transparent'));
$img->setFont($fontFile); // didn't work
$img->setPointSize($fontSize);
$img->setOption("font", $fontFile); // didn't work
$img->newPseudoImage($width, $height, "pango:<span font='".$fontName."' foreground='".$textColor."'>".$text."</span>"); // font='".$fontName."' - also didn't work
I also tried to install the font in ubuntu OS and use only the font name but that didn't worked either, as well as combining all the options above or some of them.
I found that question:
PHP Imagick don't use custom font > convert does. But he said he used "caption" instead of pango - which doesn't provide that cool "html" syntax and does not solve the Arabic issue.
Please - if you know any solution I will love you forever!
Thanks.

Metadata extraction from PNG images

How to extract metadata from a image like this website? I have used exev2 library but it gives only limited data as compared to this website. Is there some more advanced library?
I have already tried hacoir-metadata Python library.
Also how does Windows extract details of image (the one we see from properties)?

PNG files are made up of blocks, most of which are IDAT blocks which contain compressed pixel data in an average PNG. All PNG's start with a IHDR block and end with an IEND block. Since PNG is a very flexible standard in this way, it can be extended by making up new types of blocks--this is how animated Animated PNG works. All browsers can see the first frame, but browsers which understand the types of blocks used in APNG can see the animation.
There are many places that text data can live in a PNG image, and even more places metadata can live. Here is a very convenient summary. You mentioned the "Description tag", which can only live in text blocks, so that it was I'll be focusing on.
The PNG standard contains three different types of text blocks: tEXt (Latin-1 encoded, uncompressed), zTXt (compressed, also Latin-1), and finally iTXt, which is the most useful of all three as it can contain UTF-8 encoded text and can either be compressed or decompressed.
So, your question becomes, "what is a convenient way to extract the text blocks?"
At first, I thought pypng could do this, but it cannot:
tEXt/zTXt/iTXt
Ignored when reading. Not generated.
Luckily, Pillow has support for this - humorously it was added only one day before you asked your original question!
So, without further ado, let's find an image containing an iTXt chunk: this example ought to do.
>>> from PIL import Image
>>> im = Image.open('/tmp/itxt.png')
>>> im.info
{'interlace': 1, 'gamma': 0.45455, 'dpi': (72, 72), 'Title': 'PNG', 'Author': 'La plume de ma tante'}
According to the source code, tEXt and zTXt are also covered.
For the more general case, looking over the other readers, the JPEG and GIF ones also seem to have good coverage of those formats as well - so I would recommend PIL for this. That's not to say that the maintainers of hacoir-metadata wouldn't appreciate a pull request adding text block support though! :-)

I found this code buried in a Pillow pull request
from PIL import PngImagePlugin
info = PngImagePlugin.PngInfo() # read PNG data
info.add_text("foo", "bar") # write PNG data
img.save(filenew, "png", pnginfo=info)

You can try this pre-alpha solution by Daniel Chesterton. I am not sure is it just what you want or is it a part of the wanted solution, but I believe you can sort it out by playing with it.
https://github.com/dchesterton/image

Add round corners to a jpeg file

I am trying to add round corners to a jpeg file, but the problem is that after adding round corners, I am getting a black background color. Somehow I am not able to change it to any other color (white, transparent, red). It just simply shows black background where the image has rounded corners.
The code that I am using is:
<?php
$image = new Imagick('example.jpg');
$image->setBackgroundColor("red");
$image->setImageFormat("jpg");
$image->roundCorners(575,575);
$image->writeImage("rounded.jpg");
header('Content-type: image/jpeg');
echo $image;
?>
I cannot use png as the jpeg files are huge, about 5 MB, so if I used png, the file size would go up to 26 MB, even though the png adds transparent round corners.
Also the IMagick version that i am using is:
ImageMagick 6.6.2-10 2010-06-29 Q16 http://www.imagemagick.org
Also the output(image generated) will get printed so I don't know if css will work over here.
Sorry, I am trying to actually create a new jpeg file with rounded corners from an already existing jpeg file that doesn't have round corners this is actually a photograph taken from a camera, so there are multiple/too many colors so I can't use gif as well.
Also my site will only just generate the round corner image then afterwards it will get downloaded using a FTP program by the admin of the site and then using a system software will get printed, so in short my website will not be printing the image but rather just generate it

Try this:
<?php
$input = 'example.jpg';
$size = getimagesize($input);
$background = new Imagick();
$background->newImage($size[0], $size[1], new ImagickPixel('red'));
$image = new Imagick($input);
$image->setImageFormat("png");
$image->roundCorners(575,575);
$image->compositeImage($background, imagick::COMPOSITE_DSTATOP, 0, 0);
$image->writeImage("rounded.jpg");
?>

I may get downvoted, but I say let css deal with the corners and take some load off of your server :)
CSS rounded corners.

JPG doesn't have a transparent color(s) (alpha channels) in its palette.
The output image must use either PNG or GIF (or another image format that supports alpha channels).
setImageBackgroundColor is another option if you want an opaque background.
EDIT
Your comment reminds me that you could try to use the command line; shell_exec() will run a command line argument from PHP. The command in the ImageMagick API you'll need to start with is convert example.jpg, and then you can pass flags with the various parameters you want.
Since ImageMagick is already installed, it will work right away. You may need to point your system PATH to the ImageMagick directory where all of the executables are.
There's plenty of questions and forums dedicated to rounded corners with this method so I'll leave that up to you.
Here's a helpful tip though - there is a silly confusion with the convert command, since Windows also has a convert.exe that is rarely used, but will confuse your command line, so make sure you're calling the right convert. ;) To test if it's working, try convert example.jpg example.gif (which should convert your example to a gif).
To get output from your command line, finish all commands with 2>&1 which will pipe cmd output back into PHP.

ImageMagick failing to convert to JPG

We recently installed the latest version of ImageMagick onto our Linux server. I seem to be having issues performing the most basic of tasks.
I am running this command line:
/usr/bin/convert /location/to/source/design.ai /location/to/save/output.jpg
Unfortunatly is saves design.jpg as an illustrator file (if I rename the file to output.ai it opens). Even if I do this:
/usr/bin/convert /location/to/source/design.ai -rotate 90 /location/to/save/design.jpg
It rotates the file and saves again as an illustrator document. This happens with all filetypes (e.g. png, bmp, etc...)
It appears ImageMagick cannot figure out what I want it converted to and just saves as the same file type.
Any ideas on fixing this?
Regards:
John

(Yes, McKay is properly right. This question would be better placed at serverfault.)
But I have an idea. By doing 'convert' only one gets a hint at the bottom:
To specify a particular image format, precede the filename
with an image format name and a colon (i.e. ps:image) or specify the
image type as the filename suffix (i.e. image.ps).
Perhaps convert gets confused by the path given.
So you could try this:
convert /location/to/source/design.ai output.jpg
or
convert /location/to/source/design.ai jpg:/location/to/save/output.jpg
Regards
Sigersted

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.