TL;DR: I have problems with PHP generating PDF's longer than 1 page total.
Hello again. My goal is to create a script that will basically get all the important data and create a A4 format PDF invoice/document for printing/mailing/archiving. Generating PDF document is fine as long as the document does not overflow.
I want the invoice pages to be outlined with a border, it should contain:
the needed stuff required for the invoice to be valid
billed products/other information
place for supplier/customer signature and stamp or other data
All the pages HAVE to contain header and footer (company logo) and footer (page # of # - Invoice/Document ID - Date and Time - office ID - Printer ID, assigned personnel, whatever someone can ask), as well as border around the document body (under header, above footer).
Everything is fine as long as the document size is not bigger than
$pageSize-$pageMargins-$header-$footer-$invoiceDataBlock-$signaturesBlock
which is basically just like 10 cm for the actual invoiced items. If the document is bigger, I actually create attachment for the invoice manually using spreadsheet editor.
The question is: What can I do to create a multipage PDF document that has no problems like invoiced items overlaying the header/footer? I need to know when to continue on the next page. How do I know this? What is the best way to accomplish this task?
Thank you in advance!
I've used both FPDF and TCPDF to generate multi-page invoice files. They are roughly the same in terms of how they work. (I started with FPDF, then switched to TCPDF when I needed to include Unicode characters, which FPDF didn't support at the time.)
As Eugen suggested, you can hand-roll your own headers and footers more easily than using the functions built in to either FPDF or TCPDF.
My strategy for making sure I don't overwrite footers is simply to be careful with the data included on the invoice. When adding new SKUs, I test long names to make sure they will fit in their field in the invoice PDF. For items that must be variable-length, I put unknown content onto its own line to reduce possible impact:
Domain registration (2 years)
↳ example.com
As I generate each page of the invoice, I keep track of how many lines I've used. I know I can safely put 20 lines of items, and I know my maximum single item is 2 lines, then when I get to 20 lines, I start a new page. 15 items means 1 page. 25 items means two pages. The item counter goes up, and every time I hit the 20 line limit, it generates the next page and resets the page item counter.
Note that I'm not including any code in this answer because you didn't include any code in your question. If you'd like help with implementation, I suspect that will be grounds for an additional question. :-)
Use TCPDF. It has a very handy SetY() / GetY() pair of functions, that allows you to know, where on the page you are. You can use this to know when to do a page break.
Hint: Do not use the Header/Footer capabilities - they are clunky. Draw your own headers/footers.
Edit
As from discussion below, here are some details: To avoid overlaying you have 2 possibilities
Use getStringHeight() and calculate
Use Transactions
The first version draws its rationale from the fact, that of all objects you typically use in generating a PDF a text-flow is the only one, of which you cannot tell beforehand the height it will use. getStringHeight() provides you with a good enough estimate, so you know before adding the element, if it will fit on the page (leaving enough room on the bottom for the footer). So basically you extend your drawing loop to calculate the height of each element and test, if you need to start a new page first. This allows also for some sort of keeptogether, e.g. if the remaining space after a section title is too low, start a new page before, to keep section title and section body together.
The second version is even easier: In TCPDF you can use transactions simialr to a Database: Start a transaction, draw, if the result is not to your liking roll back, else commit. We found this to be quite a performance hog, ultimately deciding against it for long textual reports, but a 2-page invoice is a very different beast.
Related
Is it possible, using wkhtmltopdf, to define dynamic margins ?
I'm generating a pdf for an invoice, on the first page I have the BVR in the footer so the user can cut it out and pay with it. On pages besides the first one, I have no footer.
The problem is that, when having enough content to fill the second page fully, the page breaks occurs at the given footer margin, leaving me with a page empty for 1/3.
Is it possible to define (javascript or other ways) a dynamic margin size for the first page, and then remove the bottom margin for any other ?
Nope. You can create a feature request at https://github.com/wkhtmltopdf/wkhtmltopdf/issues, but I don't think this is coming any time soon because the underlying technology (webkit) has stabilised (doesn't change often) and doesn't really need this feature.
How about for a workaround you examine the content and if it's the specific length you split the HTML to two files and convert them individually. Then you can join them with pdftk for example.
I'm developing a WebApp in which I take an invoice converted from PDF to HTML, then parse the invoice lines.
I have a div in my main window which displays the contents.
But when I display the contents from the invoice in that div, all the contents appear overlapped.
In the converted invoice there is no table, only divs with absolute positioning. I can't make it any other way at least with this aproach, because that's the way the converter works.
So, as a solution I'm converting from "div to table", trying to decide when there is a change of row or not, based on the top parameter from the corresponding div.
However besides the invoice data, I also have the invoice header. I'm having difficulties to decide if the table is the same or not.
But so far, I think the solution passes through making 3 tables, one for the company logo, one for the header, and one for the data.
But I need all these tables to appear in the correct positions and with the correct sizes.
At the moment, I'm not allowed to paste invoice examples, and as I'm stuck in an early stage (close to the algorithm stage). I don't think any examples of my code and of the invoices could help anyone to understand the situation better.
But I promise to update this with examples soon.
As an alternative solution I could parse the PDF myself, but I haven't found a way to do it so far.
I'm using PHP to make the WebApp and verypdf pdf2html to make the conversion.
I know with that little information, is hard to get help.
Any ideas are welcome.
How about trying to cure the overlapping itself. For example you could strip all the styling information from the DIVs after the PDF is parsed into DIVs. Then you can apply your own styles.
It might be useful to know if all the invoices are in the same format/arrangement, or not.
A Friend of mine works on a Newspaper and asked me this on monday and i couldn't confirm if it was possible or not.
I know it's possible to merge 2 PDFs using PHP (as i've seen many other questions already answered), but what i'm not sure of is if i can merge a half-page PDF to fill a space in another PDF.
Imagine the following:
i have PDF1: a Half Page PDF, and then i have a 3 pages PDF: Pdf2.
In the first page of PDF2 i have a empty space to fit PDF1.
Can i do this? how?
I can't give you specific source code, but I can explain how to do it at the very low level. Also, what you're looking for is similar to what's called impositioning in the publishing industry.
You start out the same way as merging, which means pulling in pages from another document. You must bring in all dependencies of the page recursively. But watch out to avoid infinite loops, which do exist in PDF, so you must keep track of visited object. Don't use recursive functions, because your stack will easily overflow, PDF references can be very deep. You should implement the traversal recursion on the heap (Depth First Search is fine).
The key to stamping PDF on PDF is to turn the source Page object into an XObject form (not to be mixed with AcroForms or fillable form fields). An XObject form is very similar to a Page object, with the following exceptions:
The /Type /Page becomes /Type /XObject /Subtype /Form.
The page MediaBox and CropBox together become /BBox in the form. But be careful, both of them can be inherited via the page tree, so you must look for inherited attributes.
The page Rotate (also inheritable) becomes Matrix, which is a transformation (rotation) matrix, instead an angle.
The page's Resources, Group and Metadata can be brought in unchanged and added to the form object.
The page Contents stream must be transferred to the form. However, the page Contents is an external object, and may be an array, which means you need to merge the pieces. The XObject form is a stream object.
All other attributes are tricky, and you might want to ignore them if you are unsure.
Once this is done, all you have to do is paint the XObject form on the new page. You have to generate a unique name for the XObject and add it to the page's Resources. Painting itself is a series of a cm and a Do operators, just like painting an image. If you need to crop the original content, then you also need to set a clipping path before Do.
Needless to say, this is far from trivial, and there are lots of pitfalls. I have implemented this and I can tell you it really works, but it's harder than it seems. You must have a very good low level PDF library, and a very thorough understanding of the PDF specs.
I haven't discussed some of the other details, such as color management (what if you paint DeviceRGB on managed CMYK), PDF/A, PDF/X, transferring annotation and form fields, etc.
If this is beyond you, you should be looking for an open-source impositioning library, because it does pretty much the same. Impositioning means placing two or more pages on a blank sheet of paper, with the purpose of printing a book or a flyer. I do have a commercial solution as well.
I've got a report that can generate over 30,000 records if given a large enough date range. From the HTML side of things, a resultset this large is not a problem since I implement a pagination system that limits the viewable results to 100 at a given time.
My real problem occurs once the user presses the "Get PDF" button. When this happens, I essentially re-run the portion of the report that prints the data (the results of the report itself are stored in a 'save' table so there's no need to re-run the data-gathering logic), and store the results in a variable called $html. Keep in mind that this variable now contains 30,000 records of data plus the HTML needed to format it correctly on the PDF. Once I've got this HTML string created, I pass it to TCPDF to try and generate the PDF file for the user. However, instead of generating the PDF file, it just craps out without an error message (the 'Generating PDf...') dialog disappears and the system acts like you never asked it to do anything.
Through tests, I've discovered that the problem lies in the size of the $html variable being passed in. If the report under 3K records, it works fine. If it's over that, the HTML side of the report will print but not the PDF.
Helpful Info
PHP 5.3
TCPDF for PDF generation (also tried PS2PDF)
Script Memory Limit: 500 MB
How would you guys handle this scale of data when generating a PDF of this size?
Here is how I solved this issue: I noticed that some of the strings that I was having in my HTML output had some slight encoding issues - I ran htmlentities on those particular strings as I was querying the database for them and that cleared the problem.
Don't know if this was what was causing your problem, but my experience was very similar - when I was trying to output an HTML table that had a large size, with about 80.000 rows, TCPDF would display the page header but nothing table-related. This behaviour would be the same with different sets of data and different table structures.
After many attempts I started adding my own pagination - every 15 table rows, I would break the page and add a new table to the following page. That's when I noticed that every once and a while I would get blank pages between a lot of full and correct ones. That's when I realised that there must be a problem with those particular subsets of data, and discovered the encoding issue. It may be that you had something similar and TCPDF was not making it clear what your problem was.
Are you using the writeHTML method?
I went through the performance recommendations here: http://www.tcpdf.org/performances.php
It says "Split large HTML blocks in smaller pieces;".
I found that if my blocks of HTML went over 20,000 characters the PDF would take well over 2 minutes to generate.
I simply split my html up into the blocks and called writeHTML for each block and it improved dramatically. A file that wouldn't generate in 2 minutes before now takes 16 seconds.
TCPDF seems to be a native implementation of PDF generation in PHP. You may have better performance using a compiled library like PDFlib or a command-line app like htmldoc. The latter will have the best chances of generating a large PDF.
Also, are you breaking the output PDF into multiple pages? I.e. does TCPDF know to take a single HTML document and cut it into multiple pages, or are you generating multiple HTML files for it to combine into a single PDF document? That may also help.
I would break the PDF into parts, just like pagination.
1) Have "Get PDF" button on every paginated HTML page and allow downloading of records from that HTML page only.
2) Limit the maximum number of records that can be downloaded. If the maximum limit reaches, split the PDF and let the user to download multiple PDFs.
O.K. so I'm developing a website to feature my fiction writings. I'm putting all of my documents into XML files, pulling and parsing them from the server with PHP and displaying them on the page. You can visit the page here for an example.
As implied from the background image, What I would like to do is take the text and split it into two columns, (with the text from the first spilling into the second), then allow for the overflow to be paginated so that there is no scrolling necessary. In other words, I'd like for the text to read like a book with the paging based on how long the body of the XML document is.
I would like for this to be done on the server side using PHP or something similar. Is there a way I can do this with an xsl stylesheet or a server-side script? I've been looking everywhere and can't seem to find anything.
Any help is appreciated.
Mr. Mutant
This is a surprisingly hard problem in general, and it's one you'll have no end of trouble with if you try to do it on the server. The problem with paginating HTML text is that where the page breaks go are entirely contingent on the client. The server doesn't know the client's screen resolution, font selection, or window size, and apart from the text itself those are the dependent variables for the problem.
I'd be surprised if at this point there weren't some jQuery library that just does this, but when I had to implement it myself about 7 years ago, here's the approach I took:
Create a div for each column. Each one contains the entirety of the document text. Style the divs with fixed line height. Put the column divs bottom in the document's z-order. Now you can lay out the rest of the page, leaving holes of known size in the layout that the divs can show through, and by manipulating the vertical position of each div you can control which line is the first to appear inside a given hole.
You can then let the client manipulate the font size, and as long as you recalculate the height of the holes and then reposition the divs properly, it will all magically work.
There may be ways of doing this in HTML5 that are easier; I would definitely look into that.