Compress Magento HTML Code - php

I'm trying to compress HTML code generated by Magento with this:
Observer.php
public function alterOutput($observer)
{
$lib_path = Mage::getBaseDir('lib').'/Razorphyn/html_compressor.php';
include_once($lib_path);
//Retrieve html body
$response = $observer->getResponse();
$html = $response->getBody();
$html=html_compress($html);
//Send Response
$response->setBody($html);
}
html_compressor.php:
function html_compress($string){
global $idarray;
$idarray=array();
//Replace PRE and TEXTAREA tags
$search=array(
'#(<)\s*?(pre\b[^>]*?)(>)([\s\S]*?)(<)\s*(/\s*?pre\s*?)(>)#', //Find PRE Tag
'#(<)\s*?(textarea\b[^>]*?)(>)([\s\S]*?)(<)\s*?(/\s*?textarea\s*?)(>)#' //Find TEXTAREA
);
$string=preg_replace_callback($search,
function($m){
$id='<!['.uniqid().']!>';
global $idarray;
$idarray[]=array($id,$m[0]);
return $id;
},
$string
);
//Remove blank useless space
$search = array(
'#( |\t|\f)+#', // Shorten multiple whitespace sequences
'#(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+#', //Remove blank lines
'#^(\s)+|( |\t|\0|\r\n)+$#' //Trim Lines
);
$replace = array(' ',"\\1",'');
$string = preg_replace($search, $replace, $string);
//Replace IE COMMENTS, SCRIPT, STYLE and CDATA tags
$search=array(
'#<!--\[if\s(?:[^<]+|<(?!!\[endif\]-->))*<!\[endif\]-->#', //Find IE Comments
'#(<)\s*?(script\b[^>]*?)(>)([\s\S]*?)(<)\s*?(/\s*?script\s*?)(>)#', //Find SCRIPT Tag
'#(<)\s*?(style\b[^>]*?)(>)([\s\S]*?)(<)\s*?(/\s*?style\s*?)(>)#', //Find STYLE Tag
'#(//<!\[CDATA\[([\s\S]*?)//]]>)#', //Find commented CDATA
'#(<!\[CDATA\[([\s\S]*?)]]>)#' //Find CDATA
);
$string=preg_replace_callback($search,
function($m){
$id='<!['.uniqid().']!>';
global $idarray;
$idarray[]=array($id,$m[0]);
return $id;
},
$string
);
//Remove blank useless space
$search = array(
'#(class|id|value|alt|href|src|style|title)=(\'\s*?\'|"\s*?")#', //Remove empty attribute
'#<!--([\s\S]*?)-->#', // Strip comments except IE
'#[\r\n|\n|\r]#', // Strip break line
'#[ |\t|\f]+#', // Shorten multiple whitespace sequences
'#(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+#', //Remove blank lines
'#^(\s)+|( |\t|\0|\r\n)+$#' //Trim Lines
);
$replace = array(' ','',' ',' ',"\\1",'');
$string = preg_replace($search, $replace, $string);
//Replace unique id with original tag
$c=count($idarray);
for($i=0;$i<$c;$i++){
$string = str_replace($idarray[$i][0], "\n".$idarray[$i][1]."\n", $string);
}
return $string;
}
My main concers are two:
Is this a heavy(or good) solution?
Is there a way to optimize this?
Has it really got sense to compress a Magento HTML page(taken resource and time vs real benefit)?

I will not comment or review your code. Deciphering regexes (in any flavor) is not my favorite hobby.
Yes, compressing HTML makes sense if you aim to provide professional services.
If I look at a HTML code of someone's site with lots of nonsense blank spaces and user-useless comments inside and the site disrespects Google's PageSpeed Insights Rules and does not help in making the web faster and eco-friendly then it says to me: be aware, don't trust, certainly don't give them your credit card number
My advice:
read answers to this question: Stack Overflow: HTML minification?
read other developer's code, this is the one I use: https://github.com/kangax/html-minifier
benchmark, e.g. run Google Chrome > Developer tools > Audits
test a lot if your minifier does not unintentionally destroy the pages

There is really no point in doing this if you have gzip compression (and you should have it) enabled. It's a waste of CPU cycles really. You should rather focus on image optimization, reducing number of http requests and setting proper cache headers.

Related

PHP Looping Through Replacing Tags

I'm trying to do custom tags for links, colour and bullet points on a website so [l]...[/l] gets replaced by the link inside and [li]...[/li] gets replaced by a bullet point list.
I've got it half working but there's a problem with the link descriptions, heres the code:
// Takes in a paragraph, replaces all square-bracket tags with HTML tags. Calls the getBetweenTags() method to get the text between the square tags
function replaceTags($text)
{
$tags = array("[l]", "[/l]", "[list]", "[/list]", "[li]", "[/li]");
$html = array("<a style='text-decoration:underline;' class='common_link' href='", "'>" . getBetweenTags("[l]", "[/l]", $text) . "</a>", "<ul>", "</ul>", "<li>", "</li>");
return str_replace($tags, $html, $text);
}
// Tages in the start and end tag along with the paragraph, returns the text between the two tags.
function getBetweenTags($tag1, $tag2, $text)
{
$startsAt = strpos($text, $tag1) + strlen($tag1);
$endsAt = strpos($text, $tag2, $startsAt);
return substr($text, $startsAt, $endsAt - $startsAt);
}
The problem I'm having is when I have three links:
[l]http://www.example1.com[/l]
[l]http://www.example2.com[/l]
[l]http://www.example3.com[/l]
The links get replaced as:
http://www.example1.com
http://www.example1.com
http://www.example1.com
They are all hyperlinked correctly i.e. 1,2,3 but the text bit is the same for all links.
You can see it in action here at the bottom of the page with the three random links. How can i change the code to make the proper URL descriptions appear under each link - so each link is properly hyperlinked to the corresponding page with the corresponding text showing that URL?
str_replace does all the grunt work for you. The problem is that:
getBetweenTags("[l]", "[/l]", $text)
doesn't change. It will match 3 times but it just resolves to "http://www.example1.com" because that's the first link on the page.
You can't really do a static replacement, you need to keep at least a pointer to where you are in the input text.
My advise would be to write a simple tokenizer/ parser. It's actually not that hard. The tokenizer can be really simple, find all [ and ] and derive tags. Then your parser will try to make sense of the tokens. Your token stream can look like:
array(
array("string", "foo "),
array("tag", "l"),
array("string", "http://example"),
array("endtag", "l"),
array("string", " bar")
);
Here is how I would use preg_match_all instead personally.
$str='
[l]http://www.example1.com[/l]
[l]http://www.example2.com[/l]
[l]http://www.example3.com[/l]
';
preg_match_all('/\[(l|li|list)\](.+?)(\[\/\1\])/is',$str,$m);
if(isset($m[0][0])){
for($x=0;$x<count($m[0]);$x++){
$str=str_replace($m[0][$x],$m[2][$x],$str);
}
}
print_r($str);

Remove HTML Entity if Incomplete

I have an issue where I have displayed up to 400 characters of a string that is pulled from the database, however, this string is required to contain HTML Entities.
By chance, the client has created the string to have the 400th character to sit right in the middle of a closing P tag, thus killing the tag, resulting in other errors for code after it.
I would prefer this closing P tag to be removed entirely as I have a "...read more" link attached to the end which would look cleaner if attached to the existing paragraph.
What would be the best approach for this to cover all HTML Entity issues? Is there a PHP function that will automatically close off/remove any erroneous HTML tags? I don't need a coded answer, just a direction will help greatly.
Thanks.
Here's a simple way you can do it with DOMDocument, its not perfect but it may be of interest:
<?php
function html_tidy($src){
libxml_use_internal_errors(true);
$x = new DOMDocument;
$x->loadHTML('<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />'.$src);
$x->formatOutput = true;
$ret = preg_replace('~<(?:!DOCTYPE|/?(?:html|body|head))[^>]*>\s*~i', '', $x->saveHTML());
return trim(str_replace('<meta http-equiv="Content-Type" content="text/html;charset=utf-8">','',$ret));
}
$brokenHTML[] = "<p><span>This is some broken html</spa";
$brokenHTML[] = "<poken html</spa";
$brokenHTML[] = "<p><span>This is some broken html</spa</p>";
/*
<p><span>This is some broken html</span></p>
<poken html></poken>
<p><span>This is some broken html</span></p>
*/
foreach($brokenHTML as $test){
echo html_tidy($test);
}
?>
Though take note of Mike 'Pomax' Kamermans's comment.
why you don't take the last word in the paragraph or content and remove it, if the word is complete you remove it , if is not complete you also remove it, and you are sure that the content still clean, i show you an example for what code will be look like :
while($row = $req->fetch(PDO::FETCH_OBJ){
//extract 400 first characters from the content you need to show
$extraction = substr($row->text, 0, 400);
// find the last space in this extraction
$last_space = strrpos($extraction, ' ');
//take content from the first character to the last space and add (...)
echo substr($extraction, 0, $last_space) . ' ...';
}
just remove last broken tag and then strip_tags
$str = "<p>this is how we do</p";
$str = substr($str, 0, strrpos($str, "<"));
$str = strip_tags($str);

strip_tags disallow some tags

Based on the strip_tags documentation, the second parameter takes the allowable tags. However in my case, I want to do the reverse. Say I'll accept the tags the script_tags normally (default) accept, but strip only the <script> tag. Any possible way for this?
I don't mean somebody to code it for me, but rather an input of possible ways on how to achieve this (if possible) is greatly appreciated.
EDIT
To use the HTML Purifier HTML.ForbiddenElements config directive, it seems you would do something like:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
http://htmlpurifier.org/docs
HTML.ForbiddenElements should be set to an array. What I don't know is what form the array members should take:
array('script','style','applet')
Or:
array('<script>','<style>','<applet>')
Or... Something else?
I think it's the first form, without delimiters; HTML.AllowedElements uses a form of configuration string somewhat common to TinyMCE's valid elements syntax:
tinyMCE.init({
...
valid_elements : "a[href|target=_blank],strong/b,div[align],br",
...
});
So my guess is it's just the term, and no attributes should be provided (since you're banning the element... although there is a HTML.ForbiddenAttributes, too). But that's a guess.
I'll add this note from the HTML.ForbiddenAttributes docs, as well:
Warning: This directive complements %HTML.ForbiddenElements,
accordingly, check out that directive for a discussion of why you
should think twice before using this directive.
Blacklisting is just not as "robust" as whitelisting, but you may have your reasons. Just beware and be careful.
Without testing, I'm not sure what to tell you. I'll keep looking for an answer, but I will likely go to bed first. It is very late. :)
Although I think you really should use HTML Purifier and utilize it's HTML.ForbiddenElements configuration directive, I think a reasonable alternative if you really, really want to use strip_tags() is to derive a whitelist from the blacklist. In other words, remove what you don't want and then use what's left.
For instance:
function blacklistElements($blacklisted = '', &$errors = array()) {
if ((string)$blacklisted == '') {
$errors[] = 'Empty string.';
return array();
}
$html5 = array(
"<menu>","<command>","<summary>","<details>","<meter>","<progress>",
"<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
"<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
"<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
"<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
"<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
"<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
"<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
"<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
"<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
"<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
"<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
"<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
"<title>","<head>","<html>"
);
$list = trim(strtolower($blacklisted));
$list = preg_replace('/[^a-z ]/i', '', $list);
$list = '<' . str_replace(' ', '> <', $list) . '>';
$list = array_map('trim', explode(' ', $list));
return array_diff($html5, $list);
}
Then run it:
$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);
if (count($errors)) {
echo "There were errors.\n";
print_r($errors);
echo "\n";
} else {
// Do strip_tags() ...
}
http://codepad.org/LV8ckRjd
So if you pass in what you don't want to allow, it will give you back the HTML5 element list in an array form that you can then feed into strip_tags() after joining it into a string:
$stripped = strip_tags($html, implode('', $whitelist)));
Caveat Emptor
Now, I've kind've hacked this together and I know there are some issues I haven't thought out yet. For instance, from the strip_tags() man page for the $allowable_tags argument:
Note:
This parameter should not contain whitespace. strip_tags() sees a tag
as a case-insensitive string between < and the first whitespace or >.
It means that strip_tags("<br/>", "<br>") returns an empty string.
It's late and for some reason I can't quite figure out what this means for this approach. So I'll have to think about that tomorrow. I also compiled the HTML element list in the function's $html5 element from this MDN documentation page. Sharp-eyed reader's might notice all of the tags are in this form:
<tagName>
I'm not sure how this will effect the outcome, whether I need to take into account variations in the use of a shorttag <tagName/> and some of the, ahem, odder variations. And, of course, there are more tags out there.
So it's probably not production ready. But you get the idea.
First, see what others have said on this topic:
Strip <script> tags and everything in between with PHP?
and
remove script tag from HTML content
It seems you have 2 choices, one is a Regex solution, both the links above give them. The second is to use HTML Purifier.
If you are stripping the script tag for some other reason than sanitation of user content, the Regex could be a good solution. However, as everyone has warned, it is a good idea to use HTML Purifier if you are sanitizing input.
PHP(5 or greater) solution:
If you want to remove <script> tags (or any other), and also you want to remove the content inside tags, you should use:
OPTION 1 (simplest):
preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);
OPTION 2 (more versatile):
<?php
$html = "<p>Your HTML code</p><script>With malicious code</script>"
$dom = new DOMDocument();
$dom->loadHTML($html);
$script = $dom->getElementsByTagName('script');
$remove = [];
foreach($script as $item)
{
$item->parentNode->removeChild($item);
}
$html = $dom->saveHTML();
Then $html will be:
"<p>Your HTML code</p>"
This is what I use to strip out a list of forbidden tags, can do both removing of tags wrapping content and tags including content, Plus trim off leftover white space.
$description = trim(preg_replace([
# Strip tags around content
'/\<(.*)doctype(.*)\>/i',
'/\<(.*)html(.*)\>/i',
'/\<(.*)head(.*)\>/i',
'/\<(.*)body(.*)\>/i',
# Strip tags and content inside
'/\<(.*)script(.*)\>(.*)<\/script>/i',
], '', $description));
Input example:
$description = '<html>
<head>
</head>
<body>
<p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>
<script type="application/javascript">alert('Hello world');</script>
</body>
</html>';
Output result:
<p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>
I use the following:
function strip_tags_with_forbidden_tags($input, $forbidden_tags)
{
foreach (explode(',', $forbidden_tags) as $tag) {
$tag = preg_replace(array('/^</', '/>$/'), array('', ''), $tag);
$input = preg_replace(sprintf('/<%s[^>]*>([^<]+)<\/%s>/', $tag, $tag), '$1', $input);
}
return $input;
}
Then you can do:
echo strip_tags_with_forbidden_tags('<cancel>abc</cancel>xpto<p>def></p><g>xyz</g><t>xpto</t>', 'cancel,g');
Output: 'abcxpto<p>def></p>xyz<t>xpto</t>'
echo strip_tags_with_forbidden_tags('<cancel>abc</cancel> xpto <p>def></p> <g>xyz</g> <t>xpto</t>', 'cancel,g');
Outputs: 'abc xpto <p>def></p> xyz <t>xpto</t>'

How to minify php page html output?

I am looking for a php script or class that can minify my php page html output like google page speed does.
How can I do this?
CSS and Javascript
Consider the following link to minify Javascript/CSS files: https://github.com/mrclay/minify
HTML
Tell Apache to deliver HTML with GZip - this generally reduces the response size by about 70%. (If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.)
Accept-Encoding: gzip, deflate
Content-Encoding: gzip
Use the following snippet to remove white-spaces from the HTML with the help ob_start's buffer:
<?php
function sanitize_output($buffer) {
$search = array(
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s', // shorten multiple whitespace sequences
'/<!--(.|\s)*?-->/' // Remove HTML comments
);
$replace = array(
'>',
'<',
'\\1',
''
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
ob_start("sanitize_output");
?>
Turn on gzip if you want to do it properly. You can also just do something like this:
$this->output = preg_replace(
array(
'/ {2,}/',
'/<!--.*?-->|\t|(?:\r?\n[ \t]*)+/s'
),
array(
' ',
''
),
$this->output
);
This removes about 30% of the page size by turning your html into one line, no tabs, no new lines, no comments. Mileage may vary
This work for me.
function minify_html($html)
{
$search = array(
'/(\n|^)(\x20+|\t)/',
'/(\n|^)\/\/(.*?)(\n|$)/',
'/\n/',
'/\<\!--.*?-->/',
'/(\x20+|\t)/', # Delete multispace (Without \n)
'/\>\s+\</', # strip whitespaces between tags
'/(\"|\')\s+\>/', # strip whitespaces between quotation ("') and end tags
'/=\s+(\"|\')/'); # strip whitespaces between = "'
$replace = array(
"\n",
"\n",
" ",
"",
" ",
"><",
"$1>",
"=$1");
$html = preg_replace($search,$replace,$html);
return $html;
}
I've tried several minifiers and they either remove too little or too much.
This code removes redundant empty spaces and optional HTML (ending) tags. Also it plays it safe and does not remove anything that could potentially break HTML, JS or CSS.
Also the code shows how to do that in Zend Framework:
class Application_Plugin_Minify extends Zend_Controller_Plugin_Abstract {
public function dispatchLoopShutdown() {
$response = $this->getResponse();
$body = $response->getBody(); //actually returns both HEAD and BODY
//remove redundant (white-space) characters
$replace = array(
//remove tabs before and after HTML tags
'/\>[^\S ]+/s' => '>',
'/[^\S ]+\</s' => '<',
//shorten multiple whitespace sequences; keep new-line characters because they matter in JS!!!
'/([\t ])+/s' => ' ',
//remove leading and trailing spaces
'/^([\t ])+/m' => '',
'/([\t ])+$/m' => '',
// remove JS line comments (simple only); do NOT remove lines containing URL (e.g. 'src="http://server.com/"')!!!
'~//[a-zA-Z0-9 ]+$~m' => '',
//remove empty lines (sequence of line-end and white-space characters)
'/[\r\n]+([\t ]?[\r\n]+)+/s' => "\n",
//remove empty lines (between HTML tags); cannot remove just any line-end characters because in inline JS they can matter!
'/\>[\r\n\t ]+\</s' => '><',
//remove "empty" lines containing only JS's block end character; join with next line (e.g. "}\n}\n</script>" --> "}}</script>"
'/}[\r\n\t ]+/s' => '}',
'/}[\r\n\t ]+,[\r\n\t ]+/s' => '},',
//remove new-line after JS's function or condition start; join with next line
'/\)[\r\n\t ]?{[\r\n\t ]+/s' => '){',
'/,[\r\n\t ]?{[\r\n\t ]+/s' => ',{',
//remove new-line after JS's line end (only most obvious and safe cases)
'/\),[\r\n\t ]+/s' => '),',
//remove quotes from HTML attributes that does not contain spaces; keep quotes around URLs!
'~([\r\n\t ])?([a-zA-Z0-9]+)="([a-zA-Z0-9_/\\-]+)"([\r\n\t ])?~s' => '$1$2=$3$4', //$1 and $4 insert first white-space character found before/after attribute
);
$body = preg_replace(array_keys($replace), array_values($replace), $body);
//remove optional ending tags (see http://www.w3.org/TR/html5/syntax.html#syntax-tag-omission )
$remove = array(
'</option>', '</li>', '</dt>', '</dd>', '</tr>', '</th>', '</td>'
);
$body = str_ireplace($remove, '', $body);
$response->setBody($body);
}
}
But note that when using gZip compression your code gets compressed a lot more that any minification can do so combining minification and gZip is pointless, because time saved by downloading is lost by minification and also saves minimum.
Here are my results (download via 3G network):
Original HTML: 150kB 180ms download
gZipped HTML: 24kB 40ms
minified HTML: 120kB 150ms download + 150ms minification
min+gzip HTML: 22kB 30ms download + 150ms minification
All of the preg_replace() solutions above have issues of single line comments, conditional comments and other pitfalls. I'd recommend taking advantage of the well-tested Minify project rather than creating your own regex from scratch.
In my case I place the following code at the top of a PHP page to minify it:
function sanitize_output($buffer) {
require_once('min/lib/Minify/HTML.php');
require_once('min/lib/Minify/CSS.php');
require_once('min/lib/JSMin.php');
$buffer = Minify_HTML::minify($buffer, array(
'cssMinifier' => array('Minify_CSS', 'minify'),
'jsMinifier' => array('JSMin', 'minify')
));
return $buffer;
}
ob_start('sanitize_output');
Create a PHP file outside your document root. If your document root is
/var/www/html/
create the a file named minify.php one level above it
/var/www/minify.php
Copy paste the following PHP code into it
<?php
function minify_output($buffer){
$search = array('/\>[^\S ]+/s','/[^\S ]+\</s','/(\s)+/s');
$replace = array('>','<','\\1');
if (preg_match("/\<html/i",$buffer) == 1 && preg_match("/\<\/html\>/i",$buffer) == 1) {
$buffer = preg_replace($search, $replace, $buffer);
}
return $buffer;
}
ob_start("minify_output");?>
Save the minify.php file and open the php.ini file. If it is a dedicated server/VPS search for the following option, on shared hosting with custom php.ini add it.
auto_prepend_file = /var/www/minify.php
Reference: http://websistent.com/how-to-use-php-to-minify-html-output/
you can check out this set of classes:
https://code.google.com/p/minify/source/browse/?name=master#git%2Fmin%2Flib%2FMinify , you'll find html/css/js minification classes there.
you can also try this: http://code.google.com/p/htmlcompressor/
Good luck :)
You can look into HTML TIDY - http://uk.php.net/tidy
It can be installed as a PHP module and will (correctly, safely) strip whitespace and all other nastiness, whilst still outputting perfectly valid HTML / XHTML markup. It will also clean your code, which can be a great thing or a terrible thing, depending on how good you are at writing valid code in the first place ;-)
Additionally, you can gzip the output using the following code at the start of your file:
ob_start('ob_gzhandler');
First of all gzip can help you more than a Html Minifier
With nginx:
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
With apache you can use mod_gzip
Second: with gzip + Html Minification you can reduce the file size drastically!!!
I've created this HtmlMinifier for PHP.
You can retrieve it through composer: composer require arjanschouten/htmlminifier dev-master.
There is a Laravel service provider. If you're not using Laravel you can use it from PHP.
// create a minify context which will be used through the minification process
$context = new MinifyContext(new PlaceholderContainer());
// save the html contents in the context
$context->setContents('<html>My html...</html>');
$minify = new Minify();
// start the process and give the context with it as parameter
$context = $minify->run($context);
// $context now contains the minified version
$minifiedContents = $context->getContents();
As you can see you can extend a lot of things in here and you can pass various options. Check the readme to see all the available options.
This HtmlMinifier is complete and safe. It takes 3 steps for the minification process:
Replace critical content temporary with a placeholder.
Run the minification strategies.
Restore the original content.
I would suggest that you cache the output of you're views. The minification process should be a one time process. Or do it for example interval based.
Clear benchmarks are not created at the time. However the minifier can reduce the page size with 5-25% based on the your markup!
If you want to add you're own strategies you can use the addPlaceholder and the addMinifier methods.
I have a GitHub gist contains PHP functions to minify HTML, CSS and JS files → https://gist.github.com/taufik-nurrohman/d7b310dea3b33e4732c0
Here’s how to minify the HTML output on the fly with output buffer:
<?php
include 'path/to/php-html-css-js-minifier.php';
ob_start('minify_html');
?>
<!-- HTML code goes here ... -->
<?php echo ob_get_clean(); ?>
If you want to remove all new lines in the page, use this fast code:
ob_start(function($b){
if(strpos($b, "<html")!==false) {
return str_replace(PHP_EOL,"",$b);
} else {return $b;}
});
Thanks to Andrew.
Here's what a did to use this in cakePHP:
Download minify-2.1.7
Unpack the file and copy min subfolder to cake's Vendor folder
Creates MinifyCodeHelper.php in cake's View/Helper like this:
App::import('Vendor/min/lib/Minify/', 'HTML');
App::import('Vendor/min/lib/Minify/', 'CommentPreserver');
App::import('Vendor/min/lib/Minify/CSS/', 'Compressor');
App::import('Vendor/min/lib/Minify/', 'CSS');
App::import('Vendor/min/lib/', 'JSMin');
class MinifyCodeHelper extends Helper {
public function afterRenderFile($file, $data) {
if( Configure::read('debug') < 1 ) //works only e production mode
$data = Minify_HTML::minify($data, array(
'cssMinifier' => array('Minify_CSS', 'minify'),
'jsMinifier' => array('JSMin', 'minify')
));
return $data;
}
}
Enabled my Helper in AppController
public $helpers = array ('Html','...','MinifyCode');
5... Voila!
My conclusion: If apache's deflate and headers modules is disabled in your server your gain is 21% less size and 0.35s plus in request to compress (this numbers was in my case).
But if you had enable apache's modules the compressed response has no significant difference (1.3% to me) and the time to compress is the samne (0.3s to me).
So... why did I do that? 'couse my project's doc is all in comments (php, css and js) and my final user dont need to see this ;)
You can use a well tested Java minifier like HTMLCompressor by invoking it using passthru (exec).
Remember to redirect console using 2>&1
This however may not be useful, if speed is a concern. I use it for static php output
The easiest possible way would be using strtr and removing the whitespace.
That being said don't use javascript as it might break your code.
$html_minify = fn($html) => strtr($html, [PHP_EOL => '', "\t" => '', ' ' => '', '< ' => '<', '> ' => '>']);
echo $html_minify(<<<HTML
<li class="flex--item">
<a href="#"
class="-marketing-link js-gps-track js-products-menu"
aria-controls="products-popover"
data-controller="s-popover"
data-action="s-popover#toggle"
data-s-popover-placement="bottom"
data-s-popover-toggle-class="is-selected"
data-gps-track="top_nav.products.click({location:2, destination:1})"
data-ga="["top navigation","products menu click",null,null,null]">
Products
</a>
</li>
HTML);
// Response (echo): <li class="flex--item">Products</li>
This work for me on php apache wordpress + w total cache + amp
put it on the top of php page
<?Php
if(substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')){
ob_start ('ob_html_compress');
}else{
ob_start();
}
function ob_html_compress($buf) {
$search = array('/\>[^\S ]+/s', '/[^\S ]+\</s', '/(\s)+/s', '/<!--(.|\s)*?-->/');
$replace = array('>', '<', '\\1', '');
$buf = preg_replace($search, $replace, $buf);
return $buf;
return str_replace(array("\n","\r","\t"),'',$buf);
}
?>
then the rest of ur php or html stuff

Inserting multiple links into text, ignoring matches that happen to be inserted

The site I'm working on has a database table filled with glossary terms. I am building a function that will take some HTML and replace the first instances of the glossary terms with tooltip links.
I am running into a problem though. Since it's not just one replace, the function is replacing text that has been inserted in previous iterations, so the HTML is getting mucked up.
I guess the bottom line is, I need to ignore text if it:
Appears within the < and > of any HTML tag, or
Appears within the text of an <a></a> tag.
Here's what I have so far. I was hoping someone out there would have a clever solution.
function insertGlossaryLinks($html)
{
// Get glossary terms from database, once per request
static $terms;
if (is_null($terms)) {
$query = Doctrine_Query::create()
->select('gt.title, gt.alternate_spellings, gt.description')
->from('GlossaryTerm gt');
$glossaryTerms = $query->rows();
// Create whole list in $terms, including alternate spellings
$terms = array();
foreach ($glossaryTerms as $glossaryTerm) {
// Initialize with title
$term = array(
'wordsHtml' => array(
h(trim($glossaryTerm['title']))
),
'descriptionHtml' => h($glossaryTerm['description'])
);
// Add alternate spellings
foreach (explode(',', $glossaryTerm['alternate_spellings']) as $alternateSpelling) {
$alternateSpelling = h(trim($alternateSpelling));
if (empty($alternateSpelling)) {
continue;
}
$term['wordsHtml'][] = $alternateSpelling;
}
$terms[] = $term;
}
}
// Do replacements on this HTML
$newHtml = $html;
foreach ($terms as $term) {
$callback = create_function('$m', 'return \'<span>\'.$m[0].\'</span>\';');
$term['wordsHtmlPreg'] = array_map('preg_quote', $term['wordsHtml']);
$pattern = '/\b('.implode('|', $term['wordsHtmlPreg']).')\b/i';
$newHtml = preg_replace_callback($pattern, $callback, $newHtml, 1);
}
return $newHtml;
}
Using Regexes to process HTML is always risky business. You will spend a long time fiddling with the greediness and laziness of your Regexes to only capture text that is not in a tag, and not in a tag name itself. My recommendation would be to ditch the method you are currently using and parse your HTML with an HTML parser, like this one: http://simplehtmldom.sourceforge.net/. I have used it before and have recommended it to others. It is a much simpler way of dealing with complex HTML.
I ended up using preg_replace_callback to replace all existing links with placeholders. Then I inserted the new glossary term links. Then I put back the links that I had replaced.
It's working great!

Categories