I'm using pdfparser to parse text from a pdf file. for old version pdf files it is working but for new version pdf files this parser is not working.
my pdf version is 1.7
<?php
include 'vendor/autoload.php';
// Parse pdf file and build necessary objects.
$parser = new Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('sample.pdf');
// Retrieve all pages from the pdf file.
$pages = $pdf->getPages();
// Loop over each page to extract text.
$content=array();
foreach ($pages as $page) {
$content[]= $page->getTextArray();
echo"<pre>";
print_r($content);
}
I experienced the same behaviour!
Now I use a tool to check the pdf version before I try to parse it. If it is not 1.4 I convert it to 1.4 and parse it then.
Here is a php library for that if needed: https://github.com/xthiago/pdf-version-converter
Code example:
function searchablePdfParser($systemPath) {
//we save the file to a temporay file because we might need to convert it.
$tempPath = getPathWithIdAndTimestamp($systemPath) . 'tmp.pdf';
copy($systemPath, $tempPath);
//check whether it needs to be converted and convert it if required
$guesser = new RegexGuesser();
$pdfVersion = $guesser->guess($tempPath); // will print something like '1.4'
if ( $pdfVersion != '1.4' ) {
$command = new GhostscriptConverterCommand();
$filesystem = new Filesystem();
$converter = new GhostscriptConverter($command, $filesystem);
$converter->convert($tempPath, '1.4');
}
//parse the original file or the converted file if it hadn't been a pdf 1.4 version
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($tempPath);
$text = $pdf->getText();
unlink($tempPath);
if ( strlen($text) < 30 ) {
return '';
}
return $text;
}
I am having issues getting an image into my PDF. I have a form, and on it I have two upload fields. The user can upload a PDF, or an image. When the file is uploaded, it is saved in my temp folder without an extension. I pass my class two things: $fileData, which is the url to the file, and inputArray which is an array of other data from the form (name, address, etc).
My code is like so
private $tplidx;
public function __construct($fileData, $inputArray) {
$pdf = new \FPDI();
$count = 10;
$pdf->AddPage('P');
$pdf->SetFontSize(20);
foreach($inputArray as $input) {
$pdf->SetXY(50, $count);
$pdf->Write(1, $input);
$count = $count + 10;
}
foreach($fileData as $name => $extension){
if($extension == "application/pdf") {
$pagecount = $pdf->setSourceFile($name);
for($i=0; $i<$pagecount; $i++){
$pdf->AddPage();
$this->tplidx = $pdf->importPage($i+1);
$pdf->useTemplate($this->tplidx, 10, 10, 200);
}
} else {
$pdf->AddPage();
$filetype = explode("/",$extension);
$pdf->Image($name.'.'.$filetype[1],30,120,25);
}
}
$pdf->Output('test.pdf', 'F');
}
The first foreach adds the inputs from the field to a page, this works fine.
The next foreach next checks if its a pdf, and if it is, it adds it to another page in the PDF. This also works fine.
My problem is in the else, because I am appending the files extension, I get the error
Can't open image file
If I remove the extension part, I get the error
Image file has no extension and no type was specified
Is there any way to solve this issue?
Thanks
You can pass the image type in the $type parameter of the Image() method.
i was splitting pdf into different single page using fpdf and fpdi. Everything works fine but the link inside pdf was not working. Link was removed on splitted single pages.
split_pdf("test.pdf", 'splitedpdf/');
function split_pdf($filename, $end_directory = false)
{
require_once('fpdf/fpdf.php');
require_once('fpdi/fpdi.php');
$end_directory = $end_directory ? $end_directory : './';
$new_path = preg_replace('/[\/]+/', '/', $end_directory.'/'.substr($filename, 0, strrpos($filename, '/')));
if (!is_dir($new_path))
{
// Will make directories under end directory that don't exist
// Provided that end directory exists and has the right permissions
mkdir($new_path, 0777, true);
}
$pdf = new FPDI();
$pagecount = $pdf->setSourceFile($filename); // How many pages?
// Split each page into a new PDF
for ($i = 1; $i <= $pagecount; $i++) {
$new_pdf = new FPDI();
$new_pdf->AddPage();
$new_pdf->setSourceFile($filename);
$new_pdf->useTemplate($new_pdf->importPage($i));
try {
$new_filename = $end_directory.str_replace('.pdf', '', $filename).'_'.$i.".pdf";
$new_pdf->Output($new_filename, "F");
echo "Page ".$i." split into ".$new_filename."<br />\n";
} catch (Exception $e) {
echo 'Caught exception: ', $e->getMessage(), "\n";
}
}
// $pdf->close();
}
FPDI is not able to handle any dynamic content link links, form fields or any other annotation type. There's an extension which support at least links (only compatible with FPDI 1.4.4 + FPDF_TPL 1.2.3).
If you need to extract the pages including all attached annotations, you may check out the SetaPDF-Merger component (not free!).
I have found this script http://d.danylevskyi.com/node/7 which I have used as a starter for the below code.
The goal is to be able to save a user picture:
<?php
define('DRUPAL_ROOT', getcwd());
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
$uid = 99;
$account = user_load($uid);
// get image information
$image_path = 'public://avatars/upload/b8f1e69e83aa12cdd3d2babfbcd1fe27_4.gif';
$image_info = image_get_info($image_path);
// create file object
$file = new StdClass();
$file->uid = $uid;
$file->uri = $image_path;
$file->filemime = $image_info['mime_type'];
$file->status = 0; // Yes! Set status to 0 in order to save temporary file.
$file->filesize = $image_info['file_size'];
// standard Drupal validators for user pictures
$validators = array(
'file_validate_is_image' => array(),
'file_validate_image_resolution' => array(variable_get('user_picture_dimensions', '85x85')),
'file_validate_size' => array(variable_get('user_picture_file_size', '30') * 1024),
);
// here all the magic :)
$errors = file_validate($file, $validators);
if (empty($errors)) {
file_save($file);
$edit['picture'] = $file;
user_save($account, $edit);
}
?>
A picture is created in sites/default/files/pictures/ with the name picture-99-1362753611.gif
Everything seems correct in the file_managed table except that:
the filename field is empty
the uri field shows public://avatars/upload/b8f1e69e83aa12cdd3d2babfbcd1fe27_4.gif
the status field is set to 0 (temporary)
The picture field in the users table gets updated with the fid of the above mentioned entry.
I would guess that the file_managed table should store the final file (in sites/default/pictures) instead of the original file info and that the users table should link to the one too.
Any idea how I can achieve that? I am quite new to the Drupal API. Thank you.
Edit:
I understand that I am giving the original file to the file_save and user_save functions. But which one actually creates the file in sites/default/pictures/ ?
Try adding the following to your code:
$file->filename = drupal_basename($image_path);
$file->status = FILE_STATUS_PERMANENT;
$file = file_save($file); // Use this instead of your current file_save
Does that help?
------------------ EDIT ------------------
If you want to save a copy of the file in a new location, you can replace the third line above with something like
// Save the file to the root of the files directory.
$file = file_copy($file, 'public://');
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
My concept is - there are 10 pdf files in a website. User can select some pdf files and then select merge to create a single pdf file which contains the selected pages. How can i do this with php?
Below is the php PDF merge command.
$fileArray= array("name1.pdf","name2.pdf","name3.pdf","name4.pdf");
$datadir = "save_path/";
$outputName = $datadir."merged.pdf";
$cmd = "gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=$outputName ";
//Add each pdf file to the end of the command
foreach($fileArray as $file) {
$cmd .= $file." ";
}
$result = shell_exec($cmd);
I forgot the link from where I found it, but it works fine.
Note: You should have gs (on linux and probably Mac), or Ghostscript (on windows) installed for this to work.
i suggest PDFMerger from github.com, so easy like ::
include 'PDFMerger.php';
$pdf = new PDFMerger;
$pdf->addPDF('samplepdfs/one.pdf', '1, 3, 4')
->addPDF('samplepdfs/two.pdf', '1-2')
->addPDF('samplepdfs/three.pdf', 'all')
->merge('file', 'samplepdfs/TEST2.pdf'); // REPLACE 'file' WITH 'browser', 'download', 'string', or 'file' for output options
I've done this before. I had a pdf that I generated with fpdf, and I needed to add on a variable amount of PDFs to it.
So I already had an fpdf object and page set up (http://www.fpdf.org/)
And I used fpdi to import the files (http://www.setasign.de/products/pdf-php-solutions/fpdi/)
FDPI is added by extending the PDF class:
class PDF extends FPDI
{
}
$pdffile = "Filename.pdf";
$pagecount = $pdf->setSourceFile($pdffile);
for($i=0; $i<$pagecount; $i++){
$pdf->AddPage();
$tplidx = $pdf->importPage($i+1, '/MediaBox');
$pdf->useTemplate($tplidx, 10, 10, 200);
}
This basically makes each pdf into an image to put into your other pdf. It worked amazingly well for what I needed it for.
$cmd = "gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=".$new." ".implode(" ", $files);
shell_exec($cmd);
A simplified version of Chauhan's answer
Both the accepted answer and even the FDPI homepage seem to give botched or incomplete examples. Here's mine which works and is easy to implement. As expected it requires fpdf and fpdi libraries:
FPDF: http://www.fpdf.org/en/download.php
FPDI: https://www.setasign.com/products/fpdi/downloads
require('fpdf.php');
require('fpdi.php');
$files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];
$pdf = new FPDI();
// iterate over array of files and merge
foreach ($files as $file) {
$pageCount = $pdf->setSourceFile($file);
for ($i = 0; $i < $pageCount; $i++) {
$tpl = $pdf->importPage($i + 1, '/MediaBox');
$pdf->addPage();
$pdf->useTemplate($tpl);
}
}
// output the pdf as a file (http://www.fpdf.org/en/doc/output.htm)
$pdf->Output('F','merged.pdf');
I've had similar problem in my software. We've wanted to merge several PDF files into one PDF file and submit it to an outer service. We've been using the FPDI solution as shown in Christa's solution.
However, the input PDF's we've been using could be in version higher than 1.7. We've decided to evaluate the FPDI commercial add-on. However, it turned out that some of the documents scanned by our office copier were having malformed indexes, which crashed the commercial FPDI add-on. So we've decided to use Ghostscript solution as in Chauhan's answer.
But then we got some strange metadata in the output PDF properties.
Finally we've decided to join two solutions to get PDF's merged and downgraded by Ghostscript, but the metadata is set by FPDI. We don't know yet how it would work with some advanced formatted pdfs, but for scans we use it works just fine. Here's our class excerpt:
class MergedPDF extends \FPDI
{
private $documentsPaths = array();
public function Render()
{
$outputFileName = tempnam(sys_get_temp_dir(), 'merged');
// merge files and save resulting file as PDF version 1.4 for FPDI compatibility
$cmd = "/usr/bin/gs -q -dNOPAUSE -dBATCH -dCompatibilityLevel=1.4 -sDEVICE=pdfwrite -sOutputFile=$outputFileName";
foreach ($this->getDocumentsPaths() as $pdfpath) {
$cmd .= " $pdfpath ";
}
$result = shell_exec($cmd);
$this->SetCreator('Your Software Name');
$this->setPrintHeader(false);
$numPages = $this->setSourceFile($outputFileName);
for ($i = 1; $i <= $numPages; $i++) {
$tplIdx = $this->importPage($i);
$this->AddPage();
$this->useTemplate($tplIdx);
}
unlink($outputFileName);
$content = $this->Output(null, 'S');
return $content;
}
public function getDocumentsPaths()
{
return $this->documentsPaths;
}
public function setDocumentsPaths($documentsPaths)
{
$this->documentsPaths = $documentsPaths;
}
public function addDocumentPath($documentPath)
{
$this->documentsPaths[] = $documentPath;
}
}
The usage of this class is as follows:
$pdf = new MergedPDF();
$pdf->setTitle($pdfTitle);
$pdf->addDocumentPath($absolutePath1);
$pdf->addDocumentPath($absolutePath2);
$pdf->addDocumentPath($absolutePath3);
$tempFileName = tempnam(sys_get_temp_dir(), 'merged');
$content = $pdf->Render();
file_put_contents($tempFileName, $content);
I have tried similar issue and works fine, try it. It can handle different orientations between PDFs.
// array to hold list of PDF files to be merged
$files = array("a.pdf", "b.pdf", "c.pdf");
$pageCount = 0;
// initiate FPDI
$pdf = new FPDI();
// iterate through the files
foreach ($files AS $file) {
// get the page count
$pageCount = $pdf->setSourceFile($file);
// iterate through all pages
for ($pageNo = 1; $pageNo <= $pageCount; $pageNo++) {
// import a page
$templateId = $pdf->importPage($pageNo);
// get the size of the imported page
$size = $pdf->getTemplateSize($templateId);
// create a page (landscape or portrait depending on the imported page size)
if ($size['w'] > $size['h']) {
$pdf->AddPage('L', array($size['w'], $size['h']));
} else {
$pdf->AddPage('P', array($size['w'], $size['h']));
}
// use the imported page
$pdf->useTemplate($templateId);
$pdf->SetFont('Helvetica');
$pdf->SetXY(5, 5);
$pdf->Write(8, 'Generated by FPDI');
}
}
This worked for me on Windows
download PDFtk free from https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
drop folder (PDFtk) into the root of c:
add the following to your php code where $file1 is the location and name of the first PDF file, $file2 is the location and name of the second and $newfile is the location and name of the destination file
$file1 = ' c:\\\www\\\folder1\\\folder2\\\file1.pdf';
$file2 = ' c:\\\www\\\folder1\\\folder2\\\file2.pdf';
$file3 = ' c:\\\www\\\folder1\\\folder2\\\file3.pdf';
$command = 'cmd /c C:\\\pdftk\\\bin\\\pdftk.exe '.$file1.$file2.$newfile;
$result = exec($command);
I created an abstraction layer over FPDI (might accommodate other engines).
I published it as a Symfony2 bundle depending on a library, and as the library itself.
The bundle
The Library
usage:
public function handlePdfChanges(Document $document, array $formRawData)
{
$oldPath = $document->getUploadRootDir($this->kernel) . $document->getOldPath();
$newTmpPath = $document->getFile()->getRealPath();
switch ($formRawData['insertOptions']['insertPosition']) {
case PdfInsertType::POSITION_BEGINNING:
// prepend
$newPdf = $this->pdfManager->insert($oldPath, $newTmpPath);
break;
case PdfInsertType::POSITION_END:
// Append
$newPdf = $this->pdfManager->append($oldPath, $newTmpPath);
break;
case PdfInsertType::POSITION_PAGE:
// insert at page n: PdfA={p1; p2; p3}, PdfB={pA; pB; pC}
// insert(PdfA, PdfB, 2) will render {p1; pA; pB; pC; p2; p3}
$newPdf = $this->pdfManager->insert(
$oldPath, $newTmpPath, $formRawData['insertOptions']['pageNumber']
);
break;
case PdfInsertType::POSITION_REPLACE:
// does nothing. overrides old file.
return;
break;
}
$pageCount = $newPdf->getPageCount();
$newPdf->renderFile($mergedPdfPath = "$newTmpPath.merged");
$document->setFile(new File($mergedPdfPath, true));
return $pageCount;
}
myokyawhtun's solution worked best for me (using PHP 5.4)
You will still get an error though - I resolved using the following:
Line 269 of fpdf_tpl.php - changed the function parameters to:
function Image($file, $x=null, $y=null, $w=0, $h=0, $type='', $link='',$align='', $resize=false, $dpi=300, $palign='', $ismask=false, $imgmask=false, $border=0) {
I also made this same change on line 898 of fpdf.php