Extract PDF metadata field using PHP - php

I have a series of PDF files on my shared hosting webserver which I'm writing a PHP script for to catalogue them on the screen. I've added metadata to the PDF files - Document Title, Author and Subject. The filename is composed of the Author and Title so I can construct the catalogue text from that. However, I want to display the contents of the 'Subject' metadata field as well.
Because I'm using shared hosting, I cannot install any extra PHP extensions. They have the free version of PDFLib but this doesn't include any functions to load the PDF file or to extract metadata.
This is the script so far which just displays a list of the filenames...
function catalogue($folder){
$files = preg_grep('/^([^.])/', scandir($folder));
foreach($files as $file){
echo($file.'<br/>');
}
}
So, I've not made much progress :(
I've tried PDF_open_pdi_document() but this is not part of the installed PDFLib extension. I've tried PDF_pcos_get_string() but all I get with...
PDF_pcos_get_string($file,0,'author');
...is...
pdf_pcos_get_string(): supplied resource is not a valid pdf object resource
...and I can find literally ZERO help on the web for this function. Literally nothing!
I am running PHP 7.4 on the shared hosting.

Metadata aren't encrypted like the PDF, so you can use file_get_contents, find the pattern for the subject (<</Subject) and extract it using either a regex or a simple combination of strpos/substr.

Thank you #drdlp. I've used file_get_contents() to load in the PDF and extract and display the metadata.
function catalogue($folder){
$files = preg_grep('/^([^.])/', scandir($folder));
foreach($files as $file){
$page = file_get_contents($file);
$metadata = preg_match_all('/\/[^\(]*\(([^\/\)]*)/',$page,$matches);
$author = $matches[1][0];
$subject = $matches[1][4];
$title = $matches[1][5];
echo($title.'/'.$subject.'/'.$author.'<br>');
}
}
/
However, this is very slow for 40 odd PDF articles in a folder.
How can I speed this up?
I've begun experimenting with pdf.js for which I can load all the basic details from files first (filename etc) and then update them with Javascript after the page has loaded.
However, I clearly don't know enough about Javascript to make this work. This is what I have so far and I am very stuck. I've imported pdf.js from mozilla.github.io/pdf.js/build/pdf.js...
function pdf_metadata(file_url,id){
var pdfjsLib = window['pdfjs-dist/build/pdf'];
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';
var loadingTask = pdfjsLib.getDocument(file_url);
loadingTask.promise.then(function(pdf) {
pdf.getMetadata().then(function(details) {
console.log(details);
document.getElementById(id).innerHTML=details;
}).catch(function(err) {
console.log('Error getting meta data');
console.log(err);
});
});
}
The line console.log(details); outputs an object to the console. From there I have no idea how to extract any data at all. Therefore document.getElementById(id).innerHTML=details; displays nothing.
This is the object which is output to the console.

Related

image in OpenTBS not changing

I'm trying to show a series of pictures and comments in a document with OpenTBS. The pictures are hosted on a local webserver. The data is in an array.
In the resulting document the text lines are rendered as expected, but the sample image is not changed.
When I copy paste the url location in my browser it shows a picture without problem.
With setting "$NoErr = false;" there is no error message.
What am I doing wrong?
My template:
[imgs; block=begin]
<a sample image>[imgs.url;ope=changepic]
Location: [imgs.url]
Description: [imgs.txt]
[imgs; block=end]
In my PHP code (a.o.):
$imgs = array();
$imgs[] = array('url'=>'http://192.168.0...', 'txt'=>'Sample 1');
$imgs[] = array('url'=>'http://192.168.0...', 'txt'=>'Sample 2');
$OOo->MergeBlock('imgs', $imgs);
$OOo->Show(OPENTBS_DOWNLOAD, 'file.docx');
Update: same problem when I change the url to some public available images on the web.
OpenTBS uses the following 3 functions in order to instert a picture into the current document :
file_exists()
filesize()
file_get_contents()
While the function file_get_contents() usually works for an URL, the tow other functions file_exists() and filesize() return false despite the PHP documentation says they can supports http protocol.
So the behavior you have probably come from file_exists() returning false for you url.
The workaround I suggest is to download the file as a temporary file, then insert it into the document.

Concrete5: set File thumbnail to generated image (e.g. for PDFs)

I'm using Concrete5, and I'm trying to display thumbnails for various uploaded files. While some of these might be images, the majority are PDFs.
I'm currently using:
<?php
$file = File::getByID($fID);
$imageHelper = Core::make('helper/image');
try {
$imageHelper->outputThumbnail($file, 200, 200);
} catch(InvalidArgumentException $e) { ?>
<img src='https://placehold.it/200x200'>
<?php } ?>
I'd much prefer to somehow create a smaller thumbnail of PDF files, for example by using ghostscript in the background. In the built-in file manager, at least a PDF icon is displayed. That would be a non-optimal option, but still better than not displaying anything to signify that we're dealing with a PDF..
How can I access the built-in thumbnails? And, more importantly, how can I properly overwrite them for certain file-types when they are uploaded?
EDIT:
I came across $file->getThumbnailURL('type'); and created a type for my own purposes. How would you automatically generate such a thumbnail when a file is uploaded? I can likely figure out how to generate the file with plain PHP, but storing it in Concrete5 is something I'm unsure about.
In the end, here's how I did it.
I started off by creating a new thumbnail type in the configure method of my package's controller, as follows:
use Concrete\Core\File\Image\Thumbnail\Type\Type;
...
public function configure($pkg) {
...
$thumbnailType = new Type();
$thumbnailType->setName(tc('ThumbnailTypeName', 'PDF Thumbnails'));
$thumbnailType->setHandle('pdfthumbnails');
$thumbnailType->setWidth(200);
$thumbnailType->setHeight(200);
$thumbnailType->save();
}
Then I created a class mypackage/src/document_processing/pdfthumbnails.php with the following contents:
namespace Concrete\Package\Mypackage\Src\DocumentProcessing;
use Core;
use File;
use Concrete\Core\File\Image\Thumbnail\Type\Type;
class Pdfthumbnails {
public function processPDFThumbnails($fv) {
$fi = Core::make('helper/file');
$fvObj = $fv->getFileVersionObject();
$ext = $fi->getExtension($fvObj->getFilename());
$file = $fvObj->getFile();
if ($ext == 'pdf') {
$type = Type::getByHandle('pdfthumbnails');
$basetype = $type->getBaseVersion();
$thumbpath = $basetype->getFilePath($fvObj);
$fsl = $file->getFileStorageLocationObject()->getFileSystemObject();
$fre = $fvObj->getFileResource();
// this requires sufficient permissions..
// depending on your setup, reconsider 0777
mkdir('application/files'.dirname($thumbpath), 0777, true);
exec('gs -o application/files'.escapeshellarg($thumbpath).' -dPDFFitPage -sDEVICE=png16m -g200x200 -dLastPage=1 -f application/files/'.escapeshellarg($fre->getPath()));
}
}
}
And then I hooked into the on_file_version_add event in my package's controller:
use Concrete\Package\Mypackage\Src\DocumentProcessing\Pdfthumbnails;
...
public function on_start() {
Events::addListener('on_file_version_add', array(new Pdfthumbnails(), 'processPDFThumbnails'));
}
This appears to be possible inside C5 after all, using file inspectors:
Any time a file is imported into Concrete5 (which happens through an instance of the File Importer class) it may be run through an optional file Inspector, which is a PHP class that can perform additional operations on files of a certain type when they're uploaded or rescanned
More information and implementation examples on file inspectors can be found in the C5 documentation.
In this Concrete5 forum discussion, someone seems to have used this feature to build exactly what you want to build, a thumbnail generator for PDFs using ImageMagick.
That user's example code does two things. First, it registers a new custom file inspector with the running C5 instance. Then, your custom inspector library is added to the project.

Unzipped image doesn't display in browsers

fI have the following code:
$ipaFile= '/path/file.ipa';
$iconFilePath = "Payload/myapp.app/AppIcon40x40#2x.png"; // the pathway to my image file if the ipa file is unzipped.
$iconFile = "AppIcon40x40#2x.png";
$iconSaveFile = '/path/';
if ($zip->open($ipaFile) === TRUE) {
if($zip->locateName($iconFilePath) !== FALSE) {
if($iconData = $zip->getFromName($iconFilePath)) {
file_put_contents($iconSaveFile.$IconFile, $iconData);
}
}
}
This code successfully pulls an image out of an ipa (zipped) file and puts it where I want it to be. The image displays properly in image viewing programs. However, when I want to view the image in a browser, the browser tells me that the image cannot be displayed because it contains errors.
Doing further research, I get that the file is somehow not being unzipped incorrectly. There are a lot of issues regarding images not displaying in browsers, but there are a wide variety of reasons and I'm just not sure which one is mine. I've tested a variety of ways to try and fix the problem (fread method of getting file, using PHP's image functions, using headers, etc) and I can't seem to make it work. Any help would be appreciated, thanks.
For any who will wonder, when Xcode (Apple) compiles an app, it modifies the PNG files within them and a standard browser will not render them as is.
http://echoone.com/filejuicer/formats/ipa
https://theiphonewiki.com/wiki/IPA_File_Format

Actionscript - AS 2.0 - PHP copy file from server to server THEN run function

Basic Idea: I have a flash file that takes screenshots with a click of a button, sending the data to a PHP file, and then the user gets to save a PNG image. The images that are merged together (via PHP) require that they reside on the same server as the PHP, otherwise they do not merge and the final PNG shows up blank.
My solution so far: I have two PHP files, and I just need to find a way to merge them. The screenshot one, and one that copies a file from one server to another. This is my cheat work around to bring the image to reside on the same server, THEN run the screenshot php.
The Server-to-Server PHP Code:
<?PHP
$inputfile = FOPEN("https://www.google.com/intl/en_com/images/srpr/logo3w.png", "r");
$outputfile = FOPEN("transferedfile.gif", "w");
ECHO "File opened...";
$data = '';
WHILE (!FEOF($inputfile)) {
$data .= FREAD($inputfile, 8192);
}
ECHO "Data read...";
FWRITE($outputfile, $data);
ECHO "transfered data";
FCLOSE ($inputfile);
FCLOSE ($outputfile);
ECHO "Done.";
?>
So as you can see, it pulls Google's logo and saves it as "transferedfile.gif" to the directory the PHP resides on. I can get this PHP code to work by saving this as whateverIWant.php on my webserver, and visiting it directly, but I need to in place of Google's logo (in this example) put a value that will be dynamically changing via flash.
So basically… in the flash file, I'll have a dyniamic variable where the URL will change, in short. So we'll just say that I define that variable in flash as var imageToGet so somehow I need to pass that variable into this PHP. That's one step... here's the AS 2.0 code:
My Actionscript (2.0) Code:
button.onRelease = function ():Void {
sendImageToServer();
ScreenShot.save(_root, "screenshot.png", 0, 0, 100, 140);
};
the sendImageToServer() function isn't made yet. This is where I'm stuck. I would need the sendImageToServer() function to send var imageToGet as what image to get, THEN run the ScreenShot.save() function after the transfer is done (aka FCLOSE ($outputfile); is complete)
In Summary: A movie clip on the stage will have a dynamic image loaded into it, that once a button is pressed, it would need to copy that dynamic image to the local server, and then run the screenShot function. I believe once I have this figured out, I should be able to do everything else, such as saving as a unique name, saving multiple files, etc. But I just need pushed in the right direction :)
Thanks so much everyone # StackOverflow. You've been nothing but awesome to me thus far!
EDIT -- I've found a good starting point!!
I found a good starting point, and am answering my own question in case someone else stumbles upon this. I used these two codes as a starting point, and I think I'm on the right track…
In Flash: I simply made a dynamic textbox with the instance name of traceText
In Actionscript (2.0):
var send:LoadVars = new LoadVars;
var receive:LoadVars = new LoadVars;
send.toPHP = "asd123";
receive.onLoad = function(){
encrypted = this.toFlash;
traceText.text = encrypted;
}
send.sendAndLoad("test.php",receive,"POST");
In "test.php" file:
$fromFlash = $_POST['toPHP'];
$encrypted = $fromFlash;
$toFlash = "&toFlash=";
$toFlash .= $encrypted;
echo $toFlash;
What this ended up doing was sending the variable to PHP and then back again. Which is perfect for what I needed. For now, I should be good! Hope this helps anyone that needs it.

Suggested php code to read file rating set by Adobe Bridge CS3

Background: I have been attempting to read the rating that is assigned in Adobe Bridge CS3 using the creative commons Metadata toolkit for php without success. I am using shared hosting so I do not have an oppotunity to recompile php with different modules.
Is php code available that could be used to read the rating that is embedded in the .jpg file? I have read that this is an xmp (xml) formatted section within the file.
I'm posting my solution here in case someone else has a similiar problem and reads this thread. Here is what I found:
Windows Vista add the rating to the exif section embedded in the file
Adobe Bridge adds another section to the jpg file that contains data formatted in xml. The xml + data structure is referred to as the xmp file.
I hadn't yet processed the file with Adobe bridge, that is why I was unable to read the xmp data with the Metadata toolkit.
Using the Creative Commons - Metadata toolkit I was able to read the ratings using the following code. This code is part of a Drupal Module, some of the referenced variables are Drupal settings: variable_get() is a Drupal function to read a variable from a perssistent data store.
include_once("PHP_JPEG_Metadata_Toolkit_1.11/JPEG.php");
include_once("PHP_JPEG_Metadata_Toolkit_1.11/Photoshop_File_Info.php");
include_once("PHP_JPEG_Metadata_Toolkit_1.11/EXIF.php");
include_once("PHP_JPEG_Metadata_Toolkit_1.11/XMP.php");
$photodir = variable_get('rotate_images_sourcefiles_dir',"sites/default/files/imageNodes");
$rating_threshold = variable_get('rotate_images_rating_threshold',3);
$allImages=dir($photodir);
$filenames = scandir($photodir);
foreach($filenames as $filename){
$rating = null;
$info = pathinfo($filename);
if (strtolower($info['extension'])=="jpg"){
// First try to get the rating from the EXIF data, this is where it is stored by Windows Vista
$exif = get_EXIF_JPEG( $photodir . "/" . $filename );
$rating = $exif[0][18246]['Data'][0];
$jpeg_header = get_jpeg_header_data($photodir . "/" . $filename );
// If no rating was found in the EXIF data, it may be in the Adobe format xmp section
if ($rating == null){
if($jpeg_header != false){
$xmp = get_XMP_text($jpeg_header);
$xmpArray = read_XMP_array_from_text($xmp);
$rating = $xmpArray[0]['children'][0]['children'][0][attributes]['xap:Rating'];
}
}
}
}
I did need to modify the EXIF_Tags.php file in the metadata toolkit by adding an additional entry to the EXIF Tags array. I reported this to the author, but I don't believe he is maintaing the module any longer. The source is on sourceforge, but I don't know how to post a patch. So you may need to make the change to EXIF.php yourself to make the code work.
'EXIF' => array (
// Exif IFD
18246 => array ( 'Name' => "Rating",
'Type' => "Numeric" ),
Theoretically if you use fgets you should be able to read it. It would be helpful if you know where this section begins in terms of bytes into the file.

Categories