I'm using Concrete5, and I'm trying to display thumbnails for various uploaded files. While some of these might be images, the majority are PDFs.
I'm currently using:
<?php
$file = File::getByID($fID);
$imageHelper = Core::make('helper/image');
try {
$imageHelper->outputThumbnail($file, 200, 200);
} catch(InvalidArgumentException $e) { ?>
<img src='https://placehold.it/200x200'>
<?php } ?>
I'd much prefer to somehow create a smaller thumbnail of PDF files, for example by using ghostscript in the background. In the built-in file manager, at least a PDF icon is displayed. That would be a non-optimal option, but still better than not displaying anything to signify that we're dealing with a PDF..
How can I access the built-in thumbnails? And, more importantly, how can I properly overwrite them for certain file-types when they are uploaded?
EDIT:
I came across $file->getThumbnailURL('type'); and created a type for my own purposes. How would you automatically generate such a thumbnail when a file is uploaded? I can likely figure out how to generate the file with plain PHP, but storing it in Concrete5 is something I'm unsure about.
In the end, here's how I did it.
I started off by creating a new thumbnail type in the configure method of my package's controller, as follows:
use Concrete\Core\File\Image\Thumbnail\Type\Type;
...
public function configure($pkg) {
...
$thumbnailType = new Type();
$thumbnailType->setName(tc('ThumbnailTypeName', 'PDF Thumbnails'));
$thumbnailType->setHandle('pdfthumbnails');
$thumbnailType->setWidth(200);
$thumbnailType->setHeight(200);
$thumbnailType->save();
}
Then I created a class mypackage/src/document_processing/pdfthumbnails.php with the following contents:
namespace Concrete\Package\Mypackage\Src\DocumentProcessing;
use Core;
use File;
use Concrete\Core\File\Image\Thumbnail\Type\Type;
class Pdfthumbnails {
public function processPDFThumbnails($fv) {
$fi = Core::make('helper/file');
$fvObj = $fv->getFileVersionObject();
$ext = $fi->getExtension($fvObj->getFilename());
$file = $fvObj->getFile();
if ($ext == 'pdf') {
$type = Type::getByHandle('pdfthumbnails');
$basetype = $type->getBaseVersion();
$thumbpath = $basetype->getFilePath($fvObj);
$fsl = $file->getFileStorageLocationObject()->getFileSystemObject();
$fre = $fvObj->getFileResource();
// this requires sufficient permissions..
// depending on your setup, reconsider 0777
mkdir('application/files'.dirname($thumbpath), 0777, true);
exec('gs -o application/files'.escapeshellarg($thumbpath).' -dPDFFitPage -sDEVICE=png16m -g200x200 -dLastPage=1 -f application/files/'.escapeshellarg($fre->getPath()));
}
}
}
And then I hooked into the on_file_version_add event in my package's controller:
use Concrete\Package\Mypackage\Src\DocumentProcessing\Pdfthumbnails;
...
public function on_start() {
Events::addListener('on_file_version_add', array(new Pdfthumbnails(), 'processPDFThumbnails'));
}
This appears to be possible inside C5 after all, using file inspectors:
Any time a file is imported into Concrete5 (which happens through an instance of the File Importer class) it may be run through an optional file Inspector, which is a PHP class that can perform additional operations on files of a certain type when they're uploaded or rescanned
More information and implementation examples on file inspectors can be found in the C5 documentation.
In this Concrete5 forum discussion, someone seems to have used this feature to build exactly what you want to build, a thumbnail generator for PDFs using ImageMagick.
That user's example code does two things. First, it registers a new custom file inspector with the running C5 instance. Then, your custom inspector library is added to the project.
Related
I have a series of PDF files on my shared hosting webserver which I'm writing a PHP script for to catalogue them on the screen. I've added metadata to the PDF files - Document Title, Author and Subject. The filename is composed of the Author and Title so I can construct the catalogue text from that. However, I want to display the contents of the 'Subject' metadata field as well.
Because I'm using shared hosting, I cannot install any extra PHP extensions. They have the free version of PDFLib but this doesn't include any functions to load the PDF file or to extract metadata.
This is the script so far which just displays a list of the filenames...
function catalogue($folder){
$files = preg_grep('/^([^.])/', scandir($folder));
foreach($files as $file){
echo($file.'<br/>');
}
}
So, I've not made much progress :(
I've tried PDF_open_pdi_document() but this is not part of the installed PDFLib extension. I've tried PDF_pcos_get_string() but all I get with...
PDF_pcos_get_string($file,0,'author');
...is...
pdf_pcos_get_string(): supplied resource is not a valid pdf object resource
...and I can find literally ZERO help on the web for this function. Literally nothing!
I am running PHP 7.4 on the shared hosting.
Metadata aren't encrypted like the PDF, so you can use file_get_contents, find the pattern for the subject (<</Subject) and extract it using either a regex or a simple combination of strpos/substr.
Thank you #drdlp. I've used file_get_contents() to load in the PDF and extract and display the metadata.
function catalogue($folder){
$files = preg_grep('/^([^.])/', scandir($folder));
foreach($files as $file){
$page = file_get_contents($file);
$metadata = preg_match_all('/\/[^\(]*\(([^\/\)]*)/',$page,$matches);
$author = $matches[1][0];
$subject = $matches[1][4];
$title = $matches[1][5];
echo($title.'/'.$subject.'/'.$author.'<br>');
}
}
/
However, this is very slow for 40 odd PDF articles in a folder.
How can I speed this up?
I've begun experimenting with pdf.js for which I can load all the basic details from files first (filename etc) and then update them with Javascript after the page has loaded.
However, I clearly don't know enough about Javascript to make this work. This is what I have so far and I am very stuck. I've imported pdf.js from mozilla.github.io/pdf.js/build/pdf.js...
function pdf_metadata(file_url,id){
var pdfjsLib = window['pdfjs-dist/build/pdf'];
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';
var loadingTask = pdfjsLib.getDocument(file_url);
loadingTask.promise.then(function(pdf) {
pdf.getMetadata().then(function(details) {
console.log(details);
document.getElementById(id).innerHTML=details;
}).catch(function(err) {
console.log('Error getting meta data');
console.log(err);
});
});
}
The line console.log(details); outputs an object to the console. From there I have no idea how to extract any data at all. Therefore document.getElementById(id).innerHTML=details; displays nothing.
This is the object which is output to the console.
We have a project where we merge different pdfs to create a catalog.
Right now it's running on myokyawhtun/pdfmerger, which runs fine, but it does not keep links set in acrobat.
We have tried different libraries we found (pure PHP, we cannot install or call applications from the command line via shell-exec or similar on this webspace, so no gs), even if we just import the pdf-files via fpdi and resave them, the hyperlinks get lost.
Is there any (pure PHP) library out there which can retain links inside the files? Or are there some special settings that we missed?
We have tried:
setasign/fpdi
iio/libmergepdf
jurosh/pdf-merge
Example code for the current lib (myokyawhtun/pdfmerger):
require('vendor/myokyawhtun/pdfmerger/tcpdf/tcpdf.php');
require('vendor/myokyawhtun/pdfmerger/tcpdf/tcpdi.php');
require('vendor/myokyawhtun/pdfmerger/PDFMerger.php');
$pdf = new \PDFMerger\PDFMerger;
foreach($sourcePdfs as $file)
{
$pdf->addPDF($pdfDir.'/source/'.$file);
}
$pdf->merge('download', 'Download.pdf');
All the mentioned libraries use FPDI under the hood, which simply does not support content outside of a pages content stream, such as links or any other annotation type.
We (author of FPDI) also offer non-free products which work on another level and which allow you keep all annotations including links and also forms when you concatenate the documents. This is possible with the SetaPDF-Merger component:
$merger = new SetaPDF_Merger();
foreach($sourcePdfs as $file) {
$merger->addFile($pdfDir . '/source/' . $file);
}
$merger->merge();
$document = $merger->getDocument();
$document->setWriter(new SetaPDF_Core_Writer_Http('Download.pdf'));
$document->save()->finish();
I'm using a generic PHP based CMS, i wanted to create a script which read the pdf created a thumbnail and cached it. There were lots of different answers, and i did have a fair few problems with different versions of imagick, but this is script which worked for me.
some people might find it useful and maybe someone could advice me if it is optimised?
<?php
$loc = *the file location*;
$pdf = *the file name*;
$format = "jpg";
$dest = "$loc$pdf.$format";
if (file_exists($dest))
{
$im = new imagick();
$im->readImage($dest);
header( "Content-Type: image/jpg" );
echo $im;
exit;
}
else
{
$im = new imagick($loc.$pdf.'[0]');
$im->setImageFormat($format);
$width = $im->getImageheight();
$im->cropImage($width, $width, 0, 0);
$im->scaleImage(110, 167, true);
$im->writeImage($dest);
header( "Content-Type: image/jpg" );
echo $im;
exit;
}
?>
Leverage PHP and ImageMagick to create PDF thumbnails
http://stormwarestudios.com/articles/leverage-php-imagemagick-create-pdf-thumbnails/
In this article, we discuss using PHP and ImageMagick to generate thumbnails from a given PDF, storing them in a temporary (or “cache”) directory, and serving them up to the web.
One of our more recent clients made a request to display PDF thumbnails published through the Joomla CMS that we’d deployed for them.
The requirement was fairly simple, but the execution was a little more involved. After installing ImageMagick, ImageMagick PHP bindings (which incidentally aren’t working, and a workaround was devised), and sleuthing some code, the following solution was determined:
<?php
function thumbPdf($pdf, $width)
{
try
{
$tmp = 'tmp';
$format = "png";
$source = $pdf.'[0]';
$dest = "$tmp/$pdf.$format";
if (!file_exists($dest))
{
$exec = "convert -scale $width $source $dest";
exec($exec);
}
$im = new Imagick($dest);
header("Content-Type:".$im->getFormat());
echo $im;
}
catch(Exception $e)
{
echo $e->getMessage();
}
}
$file = $_GET['pdf'];
$size = $_GET['size'];
if ($file && $size)
{
thumbPdf($file, $size);
}
?>
The above code assumes that you’ve provided appropriate permissions to the temporary directory (usually chmod 755 or chmod 777, depending on your level of courage), that you’ve saved the above code snippet in a file called thumbPdf.php, and placed this somewhere visible on your web server.
After obtaining parameters from GET, the code checks the destination temporary directory, and if the desired image is not present, it uses ImageMagick’s convert program to generate the PDF thumbnail, sized down to the appropriate proportion, and saves the image in the temporary directory. Finally, it reloads the thumbnail into an ImageMagick PHP object, and outputs the content to the browser.
Invoking the above code is done fairly easily; simply call the PHP script from inside an image tag, like so:
<img src="/path/to/thumbPdf.php?pdf=your.pdf&size=200" />
The above code would generate a thumbnail from the first page of “your.pdf”, sized 200 pixels wide by an appropriately-proportioned height.
Good luck, and happy webmastering!
I know it's been discussed here:
Should I use a PHP extension for ImageMagick or just use PHP's Exec() function to run the terminal commands?
And to quote drew101:
You would benefit a lot using the PHP extensions instead of using exec
or similar functions. Built in extensions will be faster and use less
memory as you will not have to spawn new processes and read the output
back. The image objects will be directly available in PHP instead of
having to read file output, which should make the images easier to
work with.
If you have a busy site, creating lots of processes to edit images may
start to slow things down and consume additional memory.
If you have not installed the Imagick php library for some reason you may use the ghost script and generate thumbnail of an pdf using the below example :
exec('gs -dSAFER -dBATCH -sDEVICE=jpeg -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -sOutputFile=xyz.jpg xyz.pdf');
After searching Google and SO, I found this little bit of code for creating thumbnails of PDF documents using ImageMagick.
The trouble for me is in implementing it into my WordPress theme. I think that I'm getting stuck on the path to cache that the script needs for temporary files.
I'm using it as described in the article:
<img src="http://localhost/multi/wp-content/themes/WPalchemy-theme/thumbPdf.php?pdf=http://localhost/multi/wp-content/uploads/2012/03/sample.pdf&size=200 />
which must be right (maybe... but I assume i am correct to use full URL to the actual file), because when I click on that URL I am taken to a page that reads the following error:
Unable to read the file: tmp/http://localhost/multi/wp-content/uploads/2012/03/sample.pdf.png
Now tmp is defined in the thumbPdf.php script, but I am confused as to what it's value should be. Is it a url or a path? Like timthumb.php, can i make it be relative to the thumbPdf.php script? (I tried ./cache which is the setting in timthumb -and was sure to have a /cache folder in my theme root, to no avail). also, fyi I put a /tmp folder in my root and still get the same error.
So how do I configure tmp to make this work?
http://stormwarestudios.com/articles/leverage-php-imagemagick-create-pdf-thumbnails/
function thumbPdf($pdf, $width)
{
try
{
$tmp = 'tmp';
$format = "png";
$source = $pdf.'[0]';
$dest = "$tmp/$pdf.$format";
if (!file_exists($dest))
{
$exec = "convert -scale $width $source $dest";
exec($exec);
}
$im = new Imagick($dest);
header("Content-Type:".$im->getFormat());
echo $im;
}
catch(Exception $e)
{
echo $e->getMessage();
}
}
$file = $_GET['pdf'];
$size = $_GET['size'];
if ($file && $size)
{
thumbPdf($file, $size);
}
I have seen this answer:
How do I convert a PDF document to a preview image in PHP?
and am about to go try it next
The error tells everything you need.
Unable to read the file: tmp/http://localhost/multi/wp-content/uploads/2012/03/sample.pdf.png
Script currently tries to read file from servers tmp/ folder.
$tmp = 'tmp';
$format = "png";
$source = $pdf.'[0]';
//$dest = "$tmp/$pdf.$format";
$dest = "$pdf.$format";
Remember securitywise this doesn't really look so good, someone could exploit ImageMagic bug to achieve very nasty things by giving your script malformed external source pdf. You should at least check if the image is from allowed source like request originates from the same host.
Best way to work with ImageMagic is to always save the generated image and only generate a new image if generated image doesn't exist. Some ImageMagic operations are quite heavy on large files so you don't want to burden your server.
i am aware of the basics like what is a function, a class, a method etc. however i am totally confused on what exactly the below code does to read the image, i read it somewhere that we have to read the image in binary format first. i am confused on the process how the php reads the image and loads it for reading. i would like to know the function of each and every code in this class and what is actually happening with the code.
Code :
class Image {
function __construct($filename) {
//read the image file to a binary buffer
$fp = fopen($filename, 'rb') or die("Image $filename doesn't exist");
$buf = '';
while(!feof($fp))
$buf .= fgets($fp, 4096);
//create image
imagecreatefromstring($buf);
}
}
and when i instantiate the object image with the syntax $image = new Image("pic.jpg"); it does not print the image in html, what does the variable $image actually hold, if i want to print that image in html what should i be doing.
Update :
FYI : I understand PHP and HTML, as i was trying to learn OOP in PHP and i came across the article as this particular code was not understood clearly by me so i thought of asking you guys, i highly appreciate your response, i would be thankful if you could try and explain the code instead of asking me to try different code.
My concern is purely meant for learning purpose, i am not implementing it anywhere.
In HTML, all you need to do is refer to the file in an <img> tag.
<img src="/path/to/image/image.jpg" width="600" height="400" alt="Image Name" />
The source needs to be the URL of the image, relative to your webserver root directory.
As for the code, you put up. That would be completely unnecessary for HTML usage, and is also unnecessary for standard image use within PHP, as there are direct methods to load an image from a file, imagecreatefromjpeg() for instance for JPEG files.
As it stands, the constructor of your Image class takes a filename, opens that file and reads the entire contents as binary data in to the string $buf, 4096 bytes at a time. Then it calls imagecreatefromstring($buf) to convert the file data in to an image resource, which can then be used further within PHP with the PHP GD image handling functions.
As I say, none of this is particularly relevant if all you wish to do is display an existing image within HTML. Those commands are designed for image manipulation, inspection and creation.
Your $image would contain an instance of the Image Class.
Your constructor will try to open $filename. If that's not possible, the script will die/terminate with an error message. If $filename can be opened, the file will be read into the $buf variable and a GD image resource will be created.
The code is suboptimal for a number of reasons:
the GD resource created by imagecreatefromstring is not assigned to a property of the Image class. This means, the entire process is pointless, because the resource will be lost after it was created. The work done in the constructor is lost.
calling die will terminate the script. There is no way to get around this. It would be better to throw an Exception to let the developer decide whether s/he wants the script to terminate or catch and handle this situation.
reading a file with fopen and fread works, but file_get_contents is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by your OS to enhance performance.
You should not do work in the constructor. It is harmful to testability.
A better approach would be to use
class Image
{
protected $_gdResource;
public function loadFromFile($fileName)
{
$this->_gdResource = imagecreatefromstring(
file_get_contents($fileName)
);
if(FALSE === $this->_gdResource) {
throw new InvalidArgumentException(
'Could not create GD Resource from given filename. ' .
'Make sure filename is readable and contains an image type ' .
'supported by GD'
);
}
}
// other methods working with $_gdResource …
}
Then you can do
$image = new Image; // to create an empty Image instance
$image->loadFromFile('pic.jpg') // to load a GD resource
PHP's imagecreate* function return a resource. If you want to send it to the client, you'll have to set the appropriate headers and then send the raw image:
header('Content-Type: image/jpeg');
imagejpeg($img);
See the GD and Image Functions documentation for details.
class Image
{
public $source = '';
function __construct($filename)
{
$fp = fopen($filename, 'rb') or die("Image $filename doesn't exist");
$buf = '';
while(!feof($fp))
{
$buf .= fgets($fp, 4096);
}
$this->source = imagecreatefromstring($buf);
}
}
$image = new Image('image.jpg');
/* use $image->source for image processing */
header('Content-Type: image/jpeg');
imagejpeg($image->source);
If you just want to display the image, all of the above is irrelevant. You just need to write out an HTML image tag, e.g.
echo '<img src="pic.jpg" />';
That's it.
The code that you have given takes a very long and inconvenient way to load an image for manipulation using the GD library; that's almost certainly not what you wanted to do, but if you did, then you could use imagecreatefromjpeg instead.