Speed up wkhtmltopdf in Laravel to print thousands of pages - php

I am using Laravel 8 and wkhtmltopdf to convert dynamically generated HTML to PDF files. The problem is that the process is too slow. Each job is split into at least 20 chunks, and each chunk is a job in a batch that then gets queued according to these instructions. Using supervisord, I am spawning 20 PHP processes. When I run the document generator, all 20 PHP processes get to work, and they launch 20 wkhtmltopdf processes. Everything is happening in the same docker container.
Increasing resources from 1 CPU and 1GB RAM to 2 CPUs and 4GB of RAM nearly doubled the speed. Sadly, going from that to 4 CPUs and 8GB of RAM has not brought any measurable speed gain.
Current speed is 4 files per second. I have tens of thousands to process. Where can I get more performance? What can I change without adding more hardware resources?

Related

best practices to reduce the ram load on the server

I recently saw stats on my vps dashboard. but I wonder why the level of use in RAM so much, when the number of visitors have not boming.
following product details:
OS : CentOS 7 64bit
CPU Cores count : 3
Total CPU(s) speed : 4800Mhz
Memory : 2 GB
The number of online visitors an average of 30 people. this takes up about 50% of the amount of Memory available. so it can be estimated if the number of online visitors reached 60 people, then the use of RAM is overloaded.
Is this at a reasonable level? or I need to set up a strategy to prevent the site from being down?
Additional information: I build a site NOT from wordpress or anything else. all suggestions and opinions are greatly awaited, thank you.
The subject is too broad.
Manage your cache
Make use of virtual memory
Use pointers and references when programming: Even if you are not using C++, you still can call large objects and files by reference.
kill unused processes
Uninstall unnecessary applications. This includes starting up applications if you know what you are doing.
Do not open many windows at once
prefer text and command utilities rather than graphical interfaces.

Apache/PHP only uses 50% of CPU in Windows on VMWare

In Windows Server 2012 with 2 CPUs I have Apache 2.4 with PHP 5.6 and when I generate a PDF document with DOMPDF a cumulative 50% of total CPU power is used. No matter what I do, I cannot get the total over 50%. I tried opening a bunch of windows and creating a multiple series of PDF docs at the same time.
Each individual CPU will be less than 50% and if one spikes up the other spikes down at the same time. It seems like windows is limiting the Apache service to use 50% of the CPU. Is there somewhere to change this?
Edit: my application is already utilizing both CPUs just not to their full capacity, and after 60 seconds of load, the utilization moves to 100%. I think it is not anything to do with threading... maybe an environment setting?
It's not windows limitation but program design itself. I think it is related with CPU cores (for example it has 4 cores and uses just 2, literally 50%).
As far as I know you cannot do anything about this, as it cannot be split to more cores without proper program design.

PHP 5.5, under what circumstances will PHP cause very high committed memory

I am trying to figure out a situation where PHP is not consuming a lot of memory but instead causes a very high Committed_AS result.
Take this munin memory report for example:
As soon as I kick off our Laravel queue (10 ~ 30 workers), committed memory goes through the roof. We have 2G mem + 2G swap on this vps instance and so far there are about 600M unused memory (that's about 30% free).
If I understand Committed_AS correctly, it's meant to be a 99.9% guarantee no out of memory issue given current workload, and it seems to suggest we need to triple our vps memory just to be safe.
I tried to reduce the number of queues from 30 to around 10, but as you can see the green line is quite high.
As for the setup: Laravel 4.1 with PHP 5.5 opcache enabled. The upstart script we use spawn instance like following:
instance $N
exec start-stop-daemon --start --make-pidfile --pidfile /var/run/laravel_queue.$N.pid --chuid $USER --chdir $HOME --exec /usr/bin/php artisan queue:listen -- --queue=$N --timeout=60 --delay=120 --sleep=30 --memory=32 --tries=3 >> /var/log/laravel_queue.$N.log 2>&1
I have seen a lot of cases when high swap use imply insufficient memory, but our swap usage is low, so I am not sure what troubleshooting step is appropriate here.
PS: we don't have this problem prior to Laravel 4.1 and our vps upgrade, here is an image to prove that.
Maybe I should rephrase my question as: how are Committed_AS calculated exactly and how does PHP factor into it?
Updated 2014.1.29:
I had a theory on this problem: since laravel queue worker actually use PHP sleep() when waiting for new job from queue (in my case beanstalkd), it would suggest the high Committed_AS estimation is due to the relatively low workload and relatively high memory consumption.
This make sense as I see Committed_AS ~= avg. memory usage / avg. workload. As PHP sleep() properly, little to no CPU are used; yet whatever memory it consumes is still reserved. Which result in server thinking: hey, you use so much memory (on average) even when load is minimal (on average), you should be better prepared for higher load (but in this case, higher load doesn't result in higher memory footprint)
If anyone would like to test this theory, I will be happy to award the bounty to them.
Two things you need to understand about Committed_AS,
It is an estimate
It alludes how much memory you would need in a worst case scenario (plus the swap). It is dependent on your server workload at the time. If you have a lower workload then the Committed_AS will be lower and vice versa.
If this wasn't an issue with the prior iteration of the framework queue and provided you haven't pushed any new code changes to the production environment, then you will want to compare the two iterations. Maybe spin up another box and run some tests. You can also profile the application with xdebug or zend_debugger to discover possible causal factors with the code itself. Another useful tool is strace.
All the best, you're going to need it!
I have recently found the root cause to this high committed memory problem: PHP 5.5 OPcache settings.
Turns out putting opcache.memory_consumption = 256 cause each PHP process to reserve much more virtual memory (can be seen at VIRT column in your top command), thus result in Munin estimating the potential committed memory to be much higher.
The number of laravel queues we have running in background only exaggerate the problem.
By putting opcache.memory_consumption to the recommended 128MB (we really weren't using all those 256MB effectively), we have cutted the estimating value in half, coupled with recent RAM upgrade on our server, the estimation is at around 3GB, much more reasonable and within our total RAM limit
Committed_AS is the actual size that the kernel has actually promised to processes. And queues runs independently and has nothing to do with PHP or Laravel. In addition to what Rijndael said, I recommend installing New Relic which can be used to find out the problem.
Tip: I've noticed a huge reduction in server-load with NginX-HHVM combination. Give it a try.

Php high memory usage

We have an old facebook app, running smoothly written in native php.
This month we decided to rewrite it in zend-framework 2. Yesterday, after switching to new app it crashed our server with lots of out of memory errors. So we turned back to old app.
I installed xdebug to profile the app. Using memory_get_peak_usage() function i noticed high memory usage.
In the old app, a static page uses only 1 mb memory. But the new one using 7-8 mb approximately on the same page.
Here's the top two rows from webgrind:
Function Invocation Count Total Self Cost Total Inclusive Cost
Composer\Autoload\ClassLoader->loadClass 224 23.31 47.20
Composer\Autoload\ClassLoader->findFile 224 9.57 10.23
Also tried tha apache's ab tool
ab -n 50 -c 5 -C PHPSESSID=SESSIONID http://myhost.com
Result is:
Percentage of the requests served within a certain time (ms)
50% 368
66% 506
75% 601
80% 666
90% 1073
95% 1812
98% 2278
99% 2278
100% 2278 (longest request)
All these results from the production server not localhost.
Is 7-8 mb for a single page normal? If not, how can i reduce it? Should i look for it in zf2 or composer?
I can give code samples if you need. Thank you.
When you migrate a solution from native to Zend, you must be aware of the way Zend works.
Zend is composed of lot of classes, and the memory used increase while you use Objects instead of native/light structures.
To improve memory use, review your code and do the following :
wrap some code in functions, it helps Garbage collector to remove unused objects from memory.
Don't store large lists of object in arrays before printing them, juste print on the fly.
Limit the creation of objects (calls to 'new') in loops.
Hop this helps.
I spent a day to figure out problem. Tried xdebug, xhprof. There was no problem in the code.
We switched back to 2.0.0 and problem solved. I don't know what is wrong with new versions, for now stick with the 2.0.0.
Overall memory usage is around 4mb, no crashes.
composer.json:
"zendframework/zendframework": "2.0.0",

Increasing PHP memory_limit. At what point does it become insane?

In a system I am currently working on, there is one process that loads large amount of data into an array for sorting/aggregating/whatever. I know this process needs optimising for memory usage, but in the short term it just needs to work.
Given the amount of data loaded into the array, we keep hitting the memory limit. It has been increased several times, and I am wondering is there a point where increasing it becomes generally a bad idea? or is it only a matter of how much RAM the machine has?
The machine has 2GB of RAM and the memory_limit is currently set at 1.5GB. We can easily add more RAM to the machine (and will anyway).
Have others encountered this kind of issue? and what were the solutions?
The configuration for the memory_limit of PHP running as an Apache module to server webpages has to take into consideration how many Apache process you can have at the same time on the machine -- see the MaxClients configuration option for Apache.
If MaxClients is 100 and you have 2,000 MB of RAM, a very quick calculation will show that you should not use more than 20 MB *(because 20 MB * 100 clients = 2 GB or RAM, ie the total amount of memory your server has)* for the memory_limit value.
And this is without considering that there are probably other things running on the same server, like MySQL, the system itself, ... And that Apache is probably already using some memory for itself.
Or course, this is also a "worst case scenario", that considers that each PHP page is using the maximum amount of memory it can.
In your case, if you need such a big amount of memory for only one job, I would not increase the memory_limit for PḦP running as an Apache module.
Instead, I would launch that job from command-line (or via a cron job), and specify a higher memory_limit specificaly in this one and only case.
This can be done with the -d option of php, like :
$ php -d memory_limit=1GB temp.php
string(3) "1GB"
Considering, in this case, that temp.php only contains :
var_dump(ini_get('memory_limit'));
In my opinion, this is way safer than increasing the memory_limit for the PHP module for Apache -- and it's what I usually do when I have a large dataset, or some really heavy stuff I cannot optimize or paginate.
If you need to define several values for the PHP CLI execution, you can also tell it to use another configuration file, instead of the default php.ini, with the -c option :
php -c /etc/phpcli.ini temp.php
That way, you have :
/etc/php.ini for Apache, with low memory_limit, low max_execution_time, ...
and /etc/phpcli.ini for batches run from command-line, with virtually no limit
This ensures your batches will be able to run -- and you'll still have security for your website (memory_limit and max_execution_time being security measures)
Still, if you have the time to optimize your script, you should ; for instance, in that kind of situation where you have to deal with lots of data, pagination is a must-have ;-)
Have you tried splitting the dataset into smaller parts and process only one part at the time?
If you fetch the data from a disk file, you can use the fread() function to load smaller chunks, or some sort of unbuffered db query in case of database.
I haven't checked up PHP since v3.something, but you also could use a form of cloud computing. 1GB dataset seems to be big enough to be processed on multiple machines.
Given that you know that there are memory issues with your script that need fixing and you are only looking for short-term solutions, then I won't address the ways to go about profiling and solving your memory issues. It sounds like you're going to get to that.
So, I would say the main things you have to keep in mind are:
Total memory load on the system
OS capabilities
PHP is only one small component of the system. If you allow it to eat up a vast quantity of your RAM, then the other processes will suffer, which could in turn affect the script itself. Notably, if you are pulling a lot of data out of a database, then your DBMS might be require a lot of memory in order to create result sets for your queries. As a quick fix, you might want to identify any queries you are running and free the results as soon as possible to give yourself more memory for a long job run.
In terms of OS capabilities, you should keep in mind that 32-bit systems, which you are likely running on, can only address up to 4GB of RAM without special handling. Often the limit can be much less depending on how it's used. Some Windows chipsets and configurations can actually have less than 3GB available to the system, even with 4GB or more physically installed. You should check to see how much your system can address.
You say that you've increased the memory limit several times, so obviously this job is growing larger and larger in scope. If you're up to 1.5Gb, then even installing 2Gb more RAM sounds like it will just be a short reprieve.
Have others encountered this kind of
issue? and what were the solutions?
I think you probably already know that the only real solution is to break down and spend the time to optimize the script soon, or you'll end up with a job that will be too big to run.

Categories