dompdf memory issues - php

I'm using DOMPDF to generate about 500 reports from one script. It's running out of memory after about 10-15 PDFs have been generated.
In debugging, it looks like it's loading 8M every time it gets to the font loading stuff, but this seems like something that should be handled with the font caching code.
Any ideas of what's going wrong here? I'd like to post a simple code snippet, but most of it is abstracted into multiple layers, so it's not just a simple copy/paste.

If you're using dompdf 0.6 beta, the memory error is the result of an infinite loop that dompdf enters when rendering tables. This is a known issue that I haven't been able to resolve.
Relevant URLs:
http://code.google.com/p/dompdf/issues/detail?id=34
http://code.google.com/p/dompdf/issues/detail?id=91
(The error you see is pdf PHP Fatal error: Allowed memory size of 268435456 bytes exhausted)

First if this is for anything remotely commercial just get Prince XML. It's substantially better and faster than any other HTML to PDF solution (and I've looked at them all). The cost will quickly be recouped in saved developer time.
Second, the quickest solution is probably to print each report in a separate process to solve any memory leak problems. If this is running from the command line have the outer loop be something like a shell script that will start a process for each report. If it's run from the Web fork a process for each script if you're on an OS that can do that.
Take a look at Convert HTML + CSS to PDF with PHP?.

As indicated by cletus, the quickest solution for you with DOMPDF is probably going to be rendering each report in a separate process. You can write a master script that calls a child script (using exec) which performs the actual rendering. As you can see in this discussion on the DOMPDF support group, it does seem to have the potential to provide a bit of a boost in performance.
It's difficult to say what's going on otherwise regarding memory usage without some kind of example that demonstrates the problem. I don't believe there is much optimization of DOMPDF and the underlaying CPDF rendering engine for multiple instances in a single script. So the font is probably being loaded into memory each time, even though it could use a static variable to cache that data.

Related

PHP file() vs fopen()+fgets() performance debate

I am in process of rewriting some scripts to parse machine generated logs from perl to php
The files range from 20mb~400mb
I am running into this problem to decide if I should use file() or fopen()+fgets() combo to go through the file for some faster performance.
Here is the basic run through,
I check for file size before opening it, and if file is larger than 100mb(pretty rare case, but it does happen from time to time) I will go the fopen+fgets route since I only bumped the memory limit for the script to 384mb, any file larger than 100mb will have chance causing fatal error. Otherwise, I use file().
I am only going through the file once from beginning to the end in both method, line by line.
Here is the question, is it worth it to keep the file() part of the code to deal with the small files? I don't know how exactly file() (i use the SKIP_EMPTY_LINE option as well) works in php, does it map the file into the memory directly or does it shove line by line into the memory while going through it? I ran some benchmark on it, performance is pretty close, average difference is about 0.1s on 40mb file, and file() has advantage over fopen+fgets about 80% of the time(out of 200 test on the same fileset).
Dropping the file part could save me some memory from the system for sure, and considering I have 3 instance of the same script running at the same time, it could save me 1G worth of memory on a 12G system that's also hosting the database and other crap. But I don't want to let the performance of the script down also, since there is like 10k of these logs coming in per day, 0.1s difference actually adds up.
Any suggestion would help and TIA!
I would suggest sticking with one mechanism, like foreach(new \SplFileObject('file.log') as $line). Split your input files and process them in parallel, 2-3x per CPU core. Bonus: lower priority than database on same system. In PHP, this would mean spawning off N copies of the script at once, where each copy has its own file list or directory. Since you're talking about a rewrite and IO performance is an issue, consider other platforms with more capabilities here, eg Java 7 NIO, nodejs asynchronous IO, C# TPL.

how to reduce memory usage in image gd or speed up the process to deallocate memory faster

I have an image GD script which is currently using around 9MB of memory.
I get a lot of traffic and hence sometimes it using up hell lot of RAM on my server.
Is there any way to reduce memory usage for image gd?
Or at-least make script process faster so that it de-allocates the memory which it is using, faster.
I have tried changing image quality, it had no effect.
I also tried changing image pixel size, it reduced the memory usage, but not much.
Thanks.
It's impossible to tell without seeing the code, but unless it contains any major mistakes, then the answer is probably no.
What you might be able to do is use the external imagemagick binary instead - it runs outside PHP's script memory limit - but that is an entirely different technology, and would require you to rewrite your code.
I assume you are already caching GD's results so not every request causes it to run?
Try avoiding using image GD on the fly if you are concerned about memory limits.
It's hard to solve the problem without seeing code, but i can make a suggestion.
Have a different process handle the images, for example, if you want to resize images, don't resize them everytime the user access a page, instead run a cron or a scheduler with window to resize all the images that needs to be resized periodically and save them. so there will be less overhead.
If you provide more code you will get better help

PHP script: How big is too big?

I'm developing a webapp in PHP, and the core library is 94kb in size at this point. While I think I'm safe for now, how big is too big? Is there a point where the script's size becomes an issue, and if so can this be ameliorated by splitting the script into multiple libraries?
I'm using PHP 5.3 and Ubuntu 10.04 32bit in my server environment, if that makes any difference.
I've googled the issue, and everything I can find pertains to PHP upload size only.
Thanks!
Edit: To clarify, the 94kb file is a single file that contains all my data access and business logic, and a small amount of UI code that I have yet to extract to its own file.
Do you mean you have 1 file that is 94KB in size or that your whole library is 94KB in?
Regardless, as long as you aren't piling everything into one file and you're organizing your library into different files your file size should remain manageable.
If a single PHP file is starting to hit a few hundred KB, you have to think about why that file is getting so big and refactor the code to make sure that everything is logically organized.
I've used PHP applications that probably included several megabytes worth of code; the main thing if you have big programs is to use a code caching tool such as APC on your production server. That will cache the compiled (to byte code) PHP code so that it doesn't have to process every file for every page request and will dramatically speed up your code.

Parse very large xml files with PHP

I'm working on a PHP project, and I need to parse large XML file (>240MB) from URL I used xmlReader it works in localhost but not working on shared hosting (BlueHost) it shows 404 error! http://webmashing.com/meilleures-des/cronjob?type=sejours
Is this action need a dedicated server? if yes please give me suggestion.
by the way splitting the XML file can help?
XMLParser is a pull parser, so it doesn't load the entire file into memory as you parse it, so splitting the file will have no effect other than to add complexity to your code. However, if you're holding all the details that you parse in your script, that will take up a lot of memory.
However, you should be getting some error or message from running the script on your shared hosting to identify what the problem is. Was their version of PHP built with --enable-libxml, are you getting a memory allocation error?
You may use SAX (Simple API for XML) parser which is also best solution for reading huge XML file.
As this will not dump whole file into the memory. This will prevent your memory exhaust problem. Yes It will take time to read such huge file.
You may need to check whether your php has libxml and libxml2 modules install using phpinfo(); function.
But Better if can go for XMLReader as this is faster and save your memory usage. You can check peak memory usage using memory_get_peak_usage();
And read file row by row and unset row from array after operation is done on that particular row.
Guessing it's a memory related issue (set memory and time execution limits).
For what it's worth. I have used vtd-xml (java implementation) to parse files over 500MB with success (low memory footprint and fast - maybe the fastest exec. time).

PDFLib in PHP hogging resources and not flushing to file

I just inherited a PHP project that generates large PDF files and usually chokes after a few thousand pages and several gigs of server memory. The project was using PDFLib to generate these files 'in memory'.
I was tasked with fixing this, so the first thing I did was to send PDFLib output to a file instead of building in memory. The problem is, it still seems to be building PDFs memory. And much of the memory never seems to be returned to the OS. Eventually, the whole things chokes and dies.
When I task the program with building only snippets of the large PDFs, it seems that the data is not fully flushed to the file on end_document(). I get no errors, yet the PDF is not readable and opening it in a hex editor makes it obvious that the stream is incomplete.
I'm hoping that someone has experienced similar difficulties.
Solved! Needed to call PDF_delete_textflow() on each textflow, as they are given document scope and don't go away until the document is closed, which was never since all available memory was exhausted before that point.
You have to make sure that you are closing each page as well as closing the document. This would be done by calling the "end_page_ext" at the end of every written page.
Additionally if you are importing pages from another PDF you have to call "close_pdi_page" after each improted page and "close_pdi_document" when you're done with each imported document.

Categories