Any way to reduce htmlspecialchars() CPU usage?

Any way to reduce htmlspecialchars() CPU usage? - php

I have a php 5.4/mysql website with 5 million hits per day, running on a linux server with nginx and php-fpm. Database is located on a separate server.
I've noticed, that at peak times, my webserver load gets up to 15, instead of normal 4 for quad core processor. I've profiled my php application with xdebug and xhprof, and saw, that 90% of CPU work is done by htmlspecialchars() function in Twig templates that I use to display data. There are sometimes from 100 to 1000 htmlspecialchars() calls per page. I've tried to reduce unnacessary escaping, but still it cannot be avoided.
Is there any way I can reduce CPU usage by htmlspecialchars() function? Maybe there is some kind of caching in php for this? Or there is there another way?

Don't use Twig. Just use php-files with this code:
<?php
// Load a php-file and use it as a template
function template($tpl_file, $vars=array()) {
$dir='/usr/local/app/view/'.$tpl_file.'.php';
if(file_exists($dir)){
// Make variables from the array easily accessible in the view
extract($vars);
// Start collecting output in a buffer
ob_start();
require($dir);
// Get the contents of the buffer
$applied_template = ob_get_contents();
// Flush the buffer
ob_end_clean();
return $applied_template;
}
}

Related

CakePHP 3 CakePDF Plugin: Generating a large PDF by looping through content?

So my problems stems from trying to generate large PDF files but being hit by the memory limit / execution timeouts in PHP, the data is too great in volume to simply extend these limits so that solution is out of the question.
I have a background shell task running which handles all of this rendering and then alerts the user once the PDF has been completed.
In theory I would have a loop within this shell which would take in a chunk of data and render it to file, then take the next chunk and do the same. Once out of data to render, the file would then be written and completed ready to be served. This way the memory limit of PHP would not be hit as a manageable chunk will only ever be loaded.
I am currently using the CakePDF(v.3.5) plugin for CakePHP 3 (v.3.5.13) but am sturggling to find a solution which allows the rendering of some data and then adding more data to the same pdf.
Has anyone managed this before or is it out of scope of the plugin? Would another solution be to create multiple PDF files and then merge them together after all separate PDF's have been created?
This is more of a theoretical question if this would work and if anyone has managed it before. I don't have much code to show but if more detail is required then give me a shout and I will try and get something for you or some example code!
Thanks

I don't have direct experience with that version CakePdf, but under CakePHP 2.x I use the wkhtmltopdf engine which takes an .html output to produce the PDF.
If your shell generates such .html in chunks, it is easy to append.
Of course wkhtmltopdf is likely to put some load on the machine to produce the PDF, but since it's a binary, it happens outside of PHP's memory/time contraints.

That's certainly out of the scope of the plugin, it's built around the idea of rendering a single view to a single file, the interface doesn't support chunked creation of a single file, and if I'm not mistaken, none of the supported engines do support that either, at least not in a straightforward and efficient manner when it comes to large documents.
There's certainly lots of ways to do this, creating multiple PDFs and merging/concatenating them afterwards might be one of them, generating the source content in chunks, and passing it to a PDF renderer that can handle lots of content efficiently might be another one, and surely there also might be libraries out there that do explicitly support chunked creation of PDFs...

I thought I would post what I ended up doing for anyone in the future.
I used CakePDF to generate smaller PDF's which I stored in a tmp directory these are all under the limit of PHP's execution time and memory limits as I don't believe altering those provides a good solution. In this step I also saved the names of all of the PDF's generated for use in the next step.
The code for this looked something like:
while (!is_last_pdf) {
// Generate pdf in here with a portion of the data
$CakePdf = new CakePdf();
$CakePdf->template('page', 'default');
$CakePdf->viewVars(compact('data', 'other_stuff'));
// Save file name to array
$tmp_file_list[] = $file_name;
// Update the is_last_pdf variable
is_last_pdf = check_for_more_data();
}
From this I used GhostScript from within the Shell to merge all of the PDF files, the code for this looked something like this:
$output_path = 'output.pdf';
$file_list = '';
// Create a string of all the files to merge
foreach ($tmp_file_list as $file) {
$file_list .= $file . ' ';
}
// Execute GhostScript to merge all the files into the `output.pdf` file
exec('gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=' . $output_path . ' ' . $file_list);
All of the code here was in the Shell file responsible for creating the PDF.
Hope this helps someone :)

Broadcast stream with PHP within localhost

Maybe I'm asking the impossible but I wanted to clone a stream multiple times. A sort of multicast emulation. The idea is to write every 0.002 seconds a 1300 bytes big buffer into a .sock file (instead of IP:port to avoid overheading) and then to read from other scripts the same .sock file multiple times.
Doing it through a regular file is not doable. It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly.
This works perfectly with the script that generates the chunks:
$handle = #fopen($url, 'rb');
$buffer = 1300;
while (1) {
$chunck = fread($handle, $buffer);
$handle2 = fopen('/var/tmp/stream_chunck.tmp', 'w');
fwrite($handle2, $chunck);
fclose($handle2);
readfile('/var/tmp/stream_chunck.tmp');
}
BUT the output of another script that reads the chunks:
while (1) {
readfile('/var/tmp/stream_chunck.tmp');
}
is messy. I don't know how to synchronize the reading process of chunks and I thought that sockets could make a miracle.

It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly
Using a single file without any sort of flow control shouldn't be a problem - tail -F does just that. The disadvantage is that the data will just accululate indefinitely on the filesystem as long as a single client has an open file handle (even if you truncate the file).
But if you're writing chunks, then write each chunk to a different file (using an atomic write mechanism) then everyone can read it by polling for available files....
do {
while (!file_exists("$dir/$prefix.$current_chunk")) {
clearstatcache();
usleep(1000);
}
process(file_get_contents("$dir/$prefix.$current_chunk"));
$current_chunk++;
} while (!$finished);
Equally, you could this with a database - which should have slightly lower overhead for the polling, and simplifies the garbage collection of old chunks.
But this is all about how to make your solution workable - it doesn't really address the problem you are trying to solve. If we knew what you were trying to achieve then we might be able to advise on a more appropriate solution - e.g. if it's a chat application, video broadcast, something else....
I suspect a more appropriate solution would be to use mutli-processing, single memory model server - and when we're talking about PHP (which doesn't really do threading very well) that means an event based/asynchronous server. There's a bit more involved than simply calling socket_select() but there are some good scripts available which do most of the complicated stuff for you.

best way to output an image with php

I have a script called "image.php" that is used to count impressions and then print the image.
This script is called in this way:
<img src="path/image.php?id=12345" />
And it's used very often by my users, i see thousand of request per day
So I am looking to understand what is the best way to output the image at the end of this script:
Method 1 (actually in use):
header("Content-type: $mime"); //$mime is found with getimagesize function
readfile("$image_url");
exit;
Method 2 (pretty sure that is slowest):
header("Content-type: $mime");
echo file_get_contents("$image_url");
exit;
Method 3:
header('Location: '.$image_url);
exit();
Is method 3 better / faster than method 1?

Ok first of all Method 3 is way faster when redirected to the original file.
The first 2 methods need file access and read the file and also they don't use the browser cache!
Also when you store the rendered images, you can better let apache handle your static files.
Apache is way faster than PHP and it uses the right browser caching (3 or 4 times faster wouldn't be a suprise).
What happens is when you request a static file, apache send the Last-Modified header
If your client requests the same image again it sends the If-Modified-Since header with that same date. If the file isn't changed you server respond with an 304 Not Modified header without any data wich saves you a lot IO operations (Besides the ETAG header wich is also used)
For your impressions count of the image, you could create a cronjob that parses your apache access logs so the end-user won't even notice it. But in your case it's easier to count the impressions in your script and then redirect

Essentially, what readfile does is it reads the file directly into the output buffer while file_get_contents loads the file into the memory (string). So, when you output the results the data is copied from the memory into the output buffer, making it two times slower than readfile.

What do ob_start and ob_gzhandler functions really do

I know that ob_start turns on output buffering, but I don't fully understand what it means. To me it means that it just stops outputting the script data.
Is this true? How does the browser output data in this case, do I have to use ob_end_flush() to turn it off in the end?
Since ob_gzhandler compresses web pages, how do browsers handle these pages?
I have seen ob_start("gzhandler") in code, since ob_gzhandler compresses web pages, what does ob_start("gzhandler") mean and how does it apply to both functions?
All help appreciated!

Output buffering means that instead of writing your output directly to the stdout stream, it is instead written to a buffer.
Then when the script finishes (or when you call ob_end_flush()), the contents of that buffer are written to stdout.
Using ob_gzhandler transforms the contents of the buffer before writing it to stdout, such that it is gzip compressed. (Browsers which support gzip compression reverse this on the opposite end, decompressing the content.)

Ok, let me explain it like this,
It is only one of the uses of the buffer system but I think it's kinda cool.
first I want you to look to this animation.
Operating System Start
When you have a php script that has a level based structure like this, for example you may write:
Connection established to database server..
Database selected : my_database
Data query started
Data query ended (found:200 rows)
...
etc. but if you don't use output buffering and flushing, you will see these lines when all of your script execution ends. But, when the thought is "I want to see what my script is doing when!", you first need to..
Sorry you first need to set implicit_flush to "on" at your php.ini file and restart your apache server to see all of this.
second, you need to open the output buffering (shorthand "ob") by "ob_start();", and then,
place anywhere on your code "echo" statements and after that "ob_flush();" commands to see your script running on realtime.
Later, it is also used for file based static content buffering like this:
place ob_start() at the start of your page (or the start of content you want to capture)
place ob_end_flush() at the end of your page (or the end of content you want to capture);
then $my_var = ob_get_contents(); to get all the HTML output that server creates and sends to the client into my_var variable and then use it as you want. Mostly it's saved to a file and by checking the file's last modification date, it's used as a static buffering.
I hope I could light some bulbs on your mind.

how to use ob_start?

I am using PHPSavant templating system for a project and I am not sure how to use ob_start in this.
I have tried before .. for example,
page_header.php
-- ob_start();
page_footer.php
-- ob_end_flush();
But because now I am using a templating system.. am not sure where to put these function.
$template = new Savant3();
$template->some_var = $some_value;
$template->display('default_template');
the default_template contains all of and populate section using some variables (set to $template object). Should I be using ob_start and ob_end_flush where my html code is or to include on each and every php file which calls to this template?
Any ideas? thanks.

You don't have to force a flush, when the PHP script terminates the buffer is flushed.
As long as you put ob_start() at the beginning of your script, that's the best place. In fact you might want to force GZIP compression which will greatly speed up your page display. It seems most servers have GZIP disabled, but you can force it on in your PHP via:
ob_start('ob_gzhandler');

I guess that display method actually outputs the template, so that's the one you should wrap with ob_start and ob_end_flush. However I don't really see advantage of using ob_end_flush around single function call.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Any way to reduce htmlspecialchars() CPU usage? - php

Related

CakePHP 3 CakePDF Plugin: Generating a large PDF by looping through content?

Broadcast stream with PHP within localhost

best way to output an image with php

What do ob_start and ob_gzhandler functions really do

how to use ob_start?

Categories

Resources