I'm using the buffer sanitizer, as seen in a PHP manual comment, but having trouble with double newlines in textareas.
When pulling a string out from my database, containing double/triple/quadruple newlines, and putting it into a textarea, the newlines are reduced to only a single newline.
Therefore: Is it possible to have the function exclude all output between <pre>, <textarea> and </pre>, </textarea>?
Seeing this question, How to minify php html output without removing IE conditional comments?, I think i need to use the preg_match, but I'm not sure how to implement it into this function.
The function I'm using is
function sanitize_output($buffer) {
$search = array(
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s' // shorten multiple whitespace sequences
);
$replace = array(
'>',
'<',
'\\1'
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
ob_start("sanitize_output");
And yeah I'm using both this sanitizer and GZIP to get the smallest size possible.
here is an implementation of the function mentioned in the comments:
function sanitize_output($buffer) {
// Searching textarea and pre
preg_match_all('#\<textarea.*\>.*\<\/textarea\>#Uis', $buffer, $foundTxt);
preg_match_all('#\<pre.*\>.*\<\/pre\>#Uis', $buffer, $foundPre);
// replacing both with <textarea>$index</textarea> / <pre>$index</pre>
$buffer = str_replace($foundTxt[0], array_map(function($el){ return '<textarea>'.$el.'</textarea>'; }, array_keys($foundTxt[0])), $buffer);
$buffer = str_replace($foundPre[0], array_map(function($el){ return '<pre>'.$el.'</pre>'; }, array_keys($foundPre[0])), $buffer);
// your stuff
$search = array(
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s' // shorten multiple whitespace sequences
);
$replace = array(
'>',
'<',
'\\1'
);
$buffer = preg_replace($search, $replace, $buffer);
// Replacing back with content
$buffer = str_replace(array_map(function($el){ return '<textarea>'.$el.'</textarea>'; }, array_keys($foundTxt[0])), $foundTxt[0], $buffer);
$buffer = str_replace(array_map(function($el){ return '<pre>'.$el.'</pre>'; }, array_keys($foundPre[0])), $foundPre[0], $buffer);
return $buffer;
}
There is always room for optimation but that works
There is a simple solution for PRE that does not work for TEXTAREA: replace the spaces with then use nl2br() to replace the newlines with BR elements before outputting the values. It's not elegant but it works:
<pre><?php
echo(nl2br(str_replace(' ', ' ', htmlspecialchars($value))));
?></pre>
Unfortunately, it cannot be used for TEXTAREA because the browsers display <br /> as text.
Maybe this will give you the result you need.
But in general i do not recommend this kind of sanitize jobs, it's not good for performance. In these days there is no really need of stripping whitespace characters from a html output.
function sanitize_output($buffer) {
$ignoreTags = array("textarea", "pre");
# find tags that must be ignored and replace it with a placeholder
$tmpReplacements = array();
foreach($ignoreTags as $tag){
preg_match_all("~<$tag.*?>.*?</$tag>~is", $buffer, $match);
if($match && $match[0]){
foreach($match[0] as $key => $value){
if(!isset($tmpReplacements[$tag])) $tmpReplacements[$tag] = array();
$index = count($tmpReplacements[$tag]);
$replacementValue = "<tmp-replacement>$index</tmp-relacement>";
$tmpReplacements[$tag][$index] = array($value, $replacementValue);
$buffer = str_replace($value, $replacementValue, $buffer);
}
}
}
$search = array(
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s' // shorten multiple whitespace sequences
);
$replace = array(
'>',
'<',
'\\1'
);
$buffer = preg_replace($search, $replace, $buffer);
# re-insert previously ignored tags
foreach($tmpReplacements as $tag => $rows){
foreach($rows as $values){
$buffer = str_replace($values[1], $values[0], $buffer);
}
}
return $buffer;
}
function nl2ascii($str){
return str_replace(array("\n","\r"), array("
","
"), $str);
}
$StrTest = "test\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\rtest";
ob_start("sanitize_output");
?>
<textarea><?php echo nl2ascii($StrTest); ?></textarea>
<textarea><?php echo $StrTest; ?></textarea>
<pre style="border: 1px solid red"><?php echo nl2ascii($StrTest); ?></pre>
<pre style="border: 1px solid red"><?php echo $StrTest; ?></pre>
<?php
ob_flush();
raw output
<textarea>test
test</textarea>
<textarea>test
test</textarea>
<pre style="border: 1px solid red">test
test</pre>
<pre style="border: 1px solid red">test
test</pre>
visual output
This is my version of the sanitize HTML. I have commented the code, so it should be clear what it is doing.
function comprimeer($html = '', $arr_tags = ['textarea', 'pre']) {
$arr_found = [];
$arr_back = [];
$arr_temp = [];
// foreach tag get an array with tag and its content
// the array is like: $arr_temp[0] = [ 0 = ['<tag>content</tag>'] ];
foreach ($arr_tags as $tag) {
if(preg_match_all('#\<' . $tag . '.*\>.*\<\/' . $tag . '\>#Uis', $html, $arr_temp)) {
// the tag is present
foreach($arr_temp as $key => $arr_item) {
// for every item of the tag keep the item
$arr_found[$tag][] = $arr_item[0];
// make an nmubered replace <tag>1</tag>
$arr_back[$tag][] = '<' . $tag . '>' . $key . '</' . $tag . '>';
}
// replace all the present tags with the numbered ones
$html = str_replace((array) $arr_found[$tag], (array) $arr_back[$tag], $html);
}
} // end foreach
// clean the html
$arr_search = [
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s' // shorten multiple whitespace sequences
];
$arr_replace = [
'>',
'<',
'\\1'
];
$clean = preg_replace($arr_search, $arr_replace, $html);
// put the kept items back
foreach ($arr_tags as $tag) {
if(isset($arr_found[$tag])) {
// the tag was present replace them back
$clean = str_replace($arr_back[$tag], $arr_found[$tag], $clean);
}
} // end foreach
// give the cleaned html back
return $clean;
} // end function
Related
I want to change some words (random word leaving first and last word) in page in Wordpress . For example Team will be Taem, Blame will be Bamle. I am using str_replace to acheive this with the_content filter
function replace_text_wps($text){
$textr=wp_filter_nohtml_kses( $text );
$rtext= (explode(" ",$textr));
$rep=array();
foreach($rtext as $r)
{
//echo $r;
if (strlen($r)>3)
{
if(ctype_alpha($r)){
$first=substr($r,0,1);
$last=substr($r,-1);
$middle=substr($r,1,-1);
$rep[$r]=$first.str_shuffle($middle).$last;
}
}
}
$text = str_replace(array_keys($rep), $rep, $text);
return $text;
}
add_filter('the_content', 'replace_text_wps',99);
The issue I am facing is when I run str_replace it also changes text in links and classes of HTML. I just want to change the text not html.
For example if I change Content word
<a class='elementor content'>Content Here</a> It becomes <a class='elementor conentt'>Conentt Here</a
Can someone provide a Good solution for this?
If you realy have to use str_replace…
Use preg_split to split between HTML tags and plain text:
function my_text_filter($text) {
$out = "";
$parts = preg_split('/(<[^>]+>)/', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
foreach ($parts as $part) {
if ($part && '<' === $part[0] && '>' === substr($part, -1)) {
$out .= $part; // Is a HTML tag, skip!
continue;
}
$out .= replace_text_wps($part);
}
return $out;
}
add_filter('the_content', 'my_text_filter', 99);
I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.
I am using a way to compress HTML on fly. Below is the function
function compress_page($buffer) {
$search = array(
'/\>[^\S ]+/s', /*strip whitespaces after tags, except space*/
'/[^\S ]+\</s', /*strip whitespaces before tags, except space*/
'/(\s)+/s', /*shorten multiple whitespace sequences*/
);
$replace = array(
'>',
'<',
'\\1',
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
function is working but the problem is, after implement this, germam characters are not showing anymore. They are showing like "�". Can you please help me to find problem.
I tried other ways to minify HTML but get same proble.
Maybe it's happen because you are not add Unicode flag support to regex.
Anyway I write a code to minified:
function sanitize_output($buffer, $type = null) {
$search = array(
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s', // shorten multiple whitespace sequences
'/<!--(.|\s)*?-->/', // Remove HTML comments
'#/\*(.|\s)*\*/#Uu' // Remove JS comments
);
$replace = array(
'>',
'<',
' ',
'',
''
);
if( $type == 'html' ){
// Remove quets of attributs
$search[] = '#(\w+=)(?:"|\')((\S|\.|\-|/|_|\(|\)|\w){1,8})(?:"|\')#u';
$replace[] = '$1$2';
// Remove spaces beetween tags
$search[] = '#(>)\s+(<)#mu';
$replace[] = '$1$2';
}
$buffer = str_replace( PHP_EOL, '', preg_replace( $search, $replace, $buffer ) );
return $buffer;
}
After research, I found this solution. This will minify full html in one line.
function pt_html_minyfy_finish( $html ) {
$html = preg_replace('/<!--(?!s*(?:[if [^]]+]|!|>))(?:(?!-->).)*-->/s', '', $html);
$html = str_replace(array("\r\n", "\r", "\n", "\t"), '', $html);
while ( stristr($html, ' '))
$html = str_replace(' ', ' ', $html);
return $html;
}
Hope this will help someone!
I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.
I am using the following code below to output content from a category, but the content has bold tags which in turn makes my entire sold bold. What would be easiest way to remove the bold text in my code? Any help would be greatly appreciated, as I am using this to learn.
<p><?php $content = get_the_content();
if (mb_strlen($content) > 700) {
$content = mb_substr($content, 0, 700);
// make sure it ends in a word by chomping at last space
$content = mb_substr($content, 0, mb_strrpos($content, " ")).'...<br /><span class="landing_latest_articles_read_more">Read More</span>';
}
echo $content; ?></p>
strip_tags
or this might work
$string = preg_replace("/<b>|</b>/", "", $string);
Here is a function like strip_tags, only it removes only the tags (with attributes) specified:
<?php
function strip_only($str, $tags) {
if(!is_array($tags)) {
$tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags));
if(end($tags) == '') array_pop($tags);
}
foreach($tags as $tag) $str = preg_replace('#</?'.$tag.'[^>]*>#is', '', $str);
return $str;
}
?>
so you will use it like this
<p><?php $content = get_the_content();
if (mb_strlen($content) > 700) {
$content = mb_substr($content, 0, 700);
// make sure it ends in a word by chomping at last space
$content = mb_substr($content, 0, mb_strrpos($content, " ")).'...<br /><span class="landing_latest_articles_read_more">Read More</span>';
$content = strip_only($content, '<b>'); //you want to remove <b> tag
}
echo $content; ?></p>
This is working. i tried it here.
If you only wish to remove bold tags:
$content = preg_replace('/<[\/]?b>/i', '', $content);
^
Though you'd have to be sure that it is only <b> tags making things bold and not font tags.