Get content of PDF file in PHP - php

I have a FlipBook jquery page and too many ebooks(pdf format) to display on it. I need to keep these PDF's hidden so that I would like to get its content with PHP and display it with my FlipBook jquery page. (instead of giving whole pdf I would like to give it as parts).
Is there any way i can get whole content of PDF file with PHP?
I need to seperate them according to their pages.

You can use PDF Parser (PHP PDF Library) to extract each
and everything from PDF's.
PDF Parser Library Link: https://github.com/smalot/pdfparser
Online Demo Link: https://github.com/smalot/pdfparser/blob/master/doc/Usage.md
Documentation Link: https://github.com/smalot/pdfparser/tree/master/doc
Sample Code:
<?php
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
$text = $pdf->getText();
echo $text;
?>
Regarding another part of your Question:
How To Convert Your PDF Pages Into Images:
You need ImageMagick and GhostScript
<?php
$im = new imagick('file.pdf[0]');
$im->setImageFormat('jpg');
header('Content-Type: image/jpeg');
echo $im;
?>
The [0] means page 1.

Related

Text extraction from a pdf / a

Do you know any library that allows me to extract the text of a type A pdf to read it in PHP?
I have tried many libraries but none of them have been able to read the content
I need help
You could try PDF Parser, an open source library available in github
Will be something like this. But check the doc for further details
<?php
// lot of lines
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
$text = $pdf->getText();
echo $text;
?>

PHP QR Code scan from PDF file

I have a PDF file with a QR Code. Iuploaded the file on the server folder named "tmp" and now I need to scan and convert this QR via PHP.
I found this library:
include_once('lib/QrReader.php');
$qrcode = new QrReader('/var/tmp/qrsample.png');
$text = $qrcode->text(); //return decoded text from QR Code
print $text;
But this works only for png/jpeg files.
Is there any way to scan PDF ? Is there any way to convert PDF to png only the time that I need ?
Thank you!
First, transform your PDF into an image with Imagick, then use your library to decode the QRcode from it:
include_once('lib/QrReader.php');
//the PDF file
$pdf = 'mypdf.pdf';
//retrieve the first page of the PDF as image
$image = new imagick($pdf.'[0]');
//pass it to the QRreader library
$qrcode = new QrReader($image, QrReader::SOURCE_TYPE_RESOURCE);
$text = $qrcode->text(); //return decoded text from QR Code
print $text;
//clear the resources used by Imagick
$image->clear();
You would want to convert the PDF to a supported image type, OR find a QR code reading library that supports PDFs. IMO the first option is easier. A quick search leads me to http://www.phpgang.com/how-to-convert-pdf-to-jpeg-in-php_498.html for a PDF -> img converter.
Presumably the QR code is embedded in the PDF as an image. You could use a command-line tool such as pdfimages to extract the image first, then run your QRReader library on the extracted image. You might need a bit of trial and error to establish which image is the QR code if there is more than one image in the PDF.
See extract images from PDF with PHP for more detail.

Reading PDF files in PHP or JS, then extracting the contents, by text ideally

I have a task of reading pdf files after an upload in the DB or n a folder,
What is the question here is : How to read PDF files in PHP or JS, JQuery, AJAX,
Then i want to recuperate the datas to inject in a form fields.
There's a lot of infos to do this process with text files but pdf seems complicated. There is a PHP class for that ? I'm not used to classes in Php but with infos, it would lead me.
Thanks a lot for help!!
Have a grreat one!
I managed to do this using http://www.pdfparser.org/
I needed the specifications from a pdf file and get all the raw text. This is the code I used:
<?php
include 'pdfparser-master/vendor/autoload.php';
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('specs.pdf');
$text = $pdf->getText();
echo $text;
?>

Keep only the first page of a PDF document with PHP

I have many PDFs that are generated and uploaded to my server.
The problem is they contain the same page three times (3 pages in total with the same content).
My goal is to edit the PDF with PHP so that it contains only one page.
Is there any library that allows me to simply load a PDF and keep only the first page?
Thank you!
Using FPDI, you can create a function to extract the first page of a PDF file:
function first_page ($path) {
$pdf = new FPDI();
$pdf->AddPage();
$pdf->setSourceFile($path);
$pdf->useTemplate($pdf->importPage(1));
return $pdf;
}
Then output the extracted PDF as you would do with FPDF:
// Extract first page from /path/to/my.pdf
// and output it to browser with filename "MyPDF".
first_page('/path/to/my.pdf')->Output('MyPDF', 'I');
FPDF (http://www.fpdf.org/) or MDPF (http://www.mpdf1.com/mpdf/index.php) are great libraries for work with PDF files. I have experiences only with creating PDF; but I assume that one of those libraries can solve your problem.
Edit: Here is some example with FPDF
https://gist.github.com/maccath/3981205

Bloated PDF created by TCPDF

In a web app developed in PHP we are generating Quotations and Invoices (which are very simple and of single page) using TCPDF lib.
The lib is working just great but it seems to generate very large PDF files. For example in our case it is generating PDF files as large as 4 MB (+/- a few KB).
How to reduce this bloating of PDF files generated by TCPDF?
Here is code snippet that I am using
ob_start();
include('quote_view_bag_pdf.php'); //This file is valid HTML file with PHP code to insert data from DB
$quote = ob_get_contents(); //Capture the content of 'quote_view_bag_pdf.php' file and store in variable
ob_end_clean();
//Code to generate PDF file for this Quote
//This line is to fix a few errors in tcpdf
$k_path_url='';
require_once('tcpdf/config/lang/eng.php');
require_once('tcpdf/tcpdf.php');
// create new PDF document
$pdf = new TCPDF();
// remove default header/footer
$pdf->setPrintHeader(false);
$pdf->setPrintFooter(false);
// add a page
$pdf->AddPage();
// print html formated text
$pdf->writeHtml($quote, true, 0, true, 0); //Insert Variables contents here.
//Build Out File Name
$pdf_out_file = "pdf/Quote_".$_POST['quote_id']."_.pdf";
//Close and output PDF document
$pdf->Output($pdf_out_file, 'F');
$pdf->Output($pdf_out_file, 'I');
///////////////
enter code here
Hope this code fragment will give some idea?
You need to see what it is putting inside the PDF. Is it embedding lots of images or fonts?
You can examine the contents with lots of PDFtools. If you have Acrobat 9.0, there is a blog article showing how to do this at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects
Finally I have managed to solve the problem.
The problem was that by mistake I had inserted a link to email id in the web page that was getting rendered to PDF. By just removing this link the size of the generated PDF went down to just 260 kb!
Thanks everyone who tried to help me out in solving this problem.
Current TCPDF version now includes font subsetting by default to dramatically reduce PDF size.
Check the TCPDF website at http://www.tcpdf.org and consult the official forum for further information.

Categories