PHP htmlentities allow <b> and <i> only - php

Using htmlentities() is there a way I can set to allow only <b> and <i> to convert into bold and italic text? I know there was one way of doing this, but i have forgotten.

It's pretty easy
<?php
$string = htmlentities($text);
$string = str_replace(array("<i>", "<b>", "</i>", "</b>"), array("<i>", "<b>", "</i>", "</b>"), $string);

I use a helper function:
# Sanitizer function - removes forbidden tags, including script tags
function strip_tags_attributes( $str,
$allowedTags = array('<a>','<b>','<blockquote>','<br>','<cite>','<code>','<del>','<div>','<em>','<ul>','<ol>','<li>','<dl>','<dt>','<dd>','<img>','<ins>','<u>','<q>','<h3>','<h4>','<h5>','<h6>','<samp>','<strong>','<sub>','<sup>','<p>','<table>','<tr>','<td>','<th>','<pre>','<span>'),
$disabledEvents = array('onclick','ondblclick','onkeydown','onkeypress','onkeyup','onload','onmousedown','onmousemove','onmouseout','onmouseover','onmouseup','onunload') )
{
if( empty($disabledEvents) ) {
return strip_tags($str, implode('', $allowedTags));
}
return preg_replace('/<(.*?)>/ies', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $disabledEvents) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($str, implode('', $allowedTags)));
}
For your example, remove everything except <b> and <i> from the $allowedTags array.

Related

How can I avoid adding href to an overlapping keyword in string?

Using the following code:
$text = "أطلقت غوغل النسخة المخصصة للأجهزة الذكية العاملة بنظام أندرويد من الإصدار “25″ لمتصفحها الشهير كروم.ولم تحدث غوغل تطبيق كروم للأجهزة العاملة بأندرويد منذ شهر تشرين الثاني العام الماضي، وهو المتصفح الذي يستخدمه نسبة 2.02% من أصحاب الأجهزة الذكية حسب دراسة سابقة. ";
$tags = "غوغل, غوغل النسخة, كروم";
$tags = explode(",", $tags);
foreach($tags as $k=>$v) {
$text = preg_replace("/\b{$v}\b/u","$0",$text, 1);
}
echo $text;
Will give the following result:
I love PHP">love PHP</a>, but I am facing a problem
Note that my text is in Arabic.
The way is to do all in one pass. The idea is to build a pattern with an alternation of tags. To make this way work, you must before sort the tags because the regex engine will stop at the first alternative that succeeds (otherwise 'love' will always match even if it is followed by 'php' and 'love php' will never be matched).
To limit the replacement to the first occurence of each word you can remove tag from the array once it has been found and you test if it is always present in the array inside the replacement callback function:
$text = 'I love PHP, I love love but I am facing a problem';
$tagsCSV = 'love, love php, facing';
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(?:' . implode('|', $tags) . ')\b/iu';
$text = preg_replace_callback($pattern, function ($m) use (&$tags) {
$mLC = mb_strtolower($m[0], 'UTF-8');
if (false === $key = array_search($mLC, $tags))
return $m[0];
unset($tags[$key]);
return '<a href="index.php?s=news&tag=' . rawurlencode($mLC)
. '">' . $m[0] . '</a>';
}, $text);
Note: when you build an url you must encode special characters, this is the reason why I use preg_replace_callback instead of preg_replace to be able to use rawurlencode.
If you have to deal with an utf8 encoded string, you need to add the u modifier to the pattern and you need to replace strtolower with mb_strtolower)
the preg_split way
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(' . implode('|', $tags) . ')\b/iu';
$items = preg_split($pattern, $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$itemsLength = count($items);
$i = 1;
while ($i<$itemsLength && count($tags)) {
if (false !== $key = array_search(mb_strtolower($items[$i], 'UTF-8'), $tags)) {
$items[$i] = '<a href="index.php?s=news&tag=' . rawurlencode($tags[$key])
. '">' . $items[$i] . '</a>';
unset($tags[$key]);
}
$i+=2;
}
$result = implode('', $items);
Instead of calling preg_replace multiple times, call it a single time with a regexp that matches any of the tags:
$tags = explode(",", tags);
$tags_re = '/\b(' . implode('|', $tags) . ')\b/u';
$text = preg_replace($tags_re, '$0', $text, 1);
This turns the list of tags into the regexp /\b(love|love php|facing)\b/u. x|y in a regexp means to match either x or y.

Bold First Line of Output

I need to add <b> tags around the first line of text, but not contained in a paragraph tag - rather just all the text before the first <br>.
It's a PHP string being echoed, so it seems to me pretty simple but I'm not sure how I'd get just that section of text, bold it, and then continue with the rest as normal.
$str = "First Line<br>SecondLine<br>Third Line<br>";
echo $str;
//output:
<b>First Line</b><br>SecondLine<br>ThirdLine<br>";
substr and strpos to the rescue!
$firstBreak = strpos($str, '<br>');
if($firstBreak === false) {
$str = "<b>$str</b>";
} else {
$str = '<b>' . substr($str, 0, $firstBreak) . '</b>' . substr($str, $firstBreak);
}
Try:
$first_line = explode('<br>', $str)[0];
$new_str = str_replace($first_line,'<b>'.$first_line.'</b>',$str);

New line to paragraph function

I have this interesting function that I'm using to create new lines into paragraphs. I'm using it instead of the nl2br() function, as it outputs better formatted text.
function nl2p($string, $line_breaks = true, $xml = true) {
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'), trim($string)).'</p>';
else
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'),
trim($string)).'</p>';
}
The problem is that whenever I try to create a single line break, it inadvertently removes the first character of the paragraph below it. I'm not familiar enough with regex to understand what is causing the problem.
Here is another approach that doesn't use regular expressions. Note, this function will remove any single line-breaks.
function nl2p($string)
{
$paragraphs = '';
foreach (explode("\n", $string) as $line) {
if (trim($line)) {
$paragraphs .= '<p>' . $line . '</p>';
}
}
return $paragraphs;
}
If you only need to do this once in your app and don't want to create a function, it can easily be done inline:
<?php foreach (explode("\n", $string) as $line): ?>
<?php if (trim($line)): ?>
<p><?=$line?></p>
<?php endif ?>
<?php endforeach ?>
The problem is with your match for single line breaks. It matches the last character before the line break and the first after. Then you replace the match with <br>, so you lose those characters as well. You need to keep them in the replacement.
Try this:
function nl2p($string, $line_breaks = true, $xml = true) {
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '$1<br'.($xml == true ? ' /' : '').'>$2'), trim($string)).'</p>';
else
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '$1<br'.($xml == true ? ' /' : '').'>$2'),
trim($string)).'</p>';
}
I also wrote a very simple version:
function nl2p($text)
{
return '<p>' . str_replace(['\r\n', '\r', '\n'], '</p><p>', $text) . '</p>';
}
#Laurent's answer wasn't working for me - the else statement was doing what the $line_breaks == true statement should have been doing, and it was making multiple line breaks into <br> tags, which PHP's native nl2br() already does.
Here's what I managed to get working with the expected behavior:
function nl2p( $string, $line_breaks = true, $xml = true ) {
// Remove current tags to avoid double-wrapping.
$string = str_replace( array( '<p>', '</p>', '<br>', '<br />' ), '', $string );
// Default: Use <br> for single line breaks, <p> for multiple line breaks.
if ( $line_breaks == true ) {
$string = '<p>' . preg_replace(
array( "/([\n]{2,})/i", "/([\r\n]{3,})/i", "/([^>])\n([^<])/i" ),
array( "</p>\n<p>", "</p>\n<p>", '$1<br' . ( $xml == true ? ' /' : '' ) . '>$2' ),
trim( $string ) ) . '</p>';
// Use <p> for all line breaks if $line_breaks is set to false.
} else {
$string = '<p>' . preg_replace(
array( "/([\n]{1,})/i", "/([\r]{1,})/i" ),
"</p>\n<p>",
trim( $string ) ) . '</p>';
}
// Remove empty paragraph tags.
$string = str_replace( '<p></p>', '', $string );
// Return string.
return $string;
}
Here's an approach that comes with a reverse method to replace paragraphs back to regular line breaks and vice versa.
These are useful to use when building a form input. When saving a users input you may want to convert line breaks to paragraph tags, however when editing the text in a form, you may not want the user to see any html characters. Then we would replace the paragraphs back to line breaks.
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs
function nl2p($str) {
$arr=explode("\n",$str);
$out='';
for($i=0;$i<count($arr);$i++) {
if(strlen(trim($arr[$i]))>0)
$out.='<p>'.trim($arr[$i]).'</p>';
}
return $out;
}
// Return paragraph tags back to line breaks
function p2nl($str)
{
$str = preg_replace("/<p[^>]*?>/", "", $str);
$str = str_replace("</p>", "\r\n", $str);
return $str;
}
Expanding upon #NaturalBornCamper's solution:
function nl2p( $text, $class = '' ) {
$string = str_replace( array( "\r\n\r\n", "\n\n" ), '</p><p>', $text);
$string = str_replace( array( "\r\n", "\n" ), '<br />', $string);
return '<p' . ( $class ? ' class="' . $class . '"' : '' ) . '>' . $string . '</p>';
}
This takes care of both double line breaks by converting them to paragraphs, and single line breaks by converting them to <br />
Just type this between your lines:
echo '<br>';
This will give you a new line.

str_ireplace not working with German characters

I am using the following function to search for words and color them inside a text. It works perfectly except for German characters (ä, ë, ß, etc). I already tried to encode to utf, decode, checked my meta tags and everything else like that but the problem is not the encoding as they show correctly on the site, they're just not "colored" by this function:
function highlight($keyword, $input, $linktext, $color){
$text = $input;
$word = $keyword;
$text = str_ireplace(" ".$word, ' <span id="">' . $word . '</span>', $text);
$iteration = 1;
while (true) {
$text = preg_replace('/<span.id="">' . $word . '<\/span>/imsxU', '<span style="background:'.$color.'" class="keyword" id="link' .
$iteration . "\" onclick=\"setLink2('$keyword','$linktext',$iteration)\">" . $word . '</span>', $text, 1, $count);
if (!$count) {
break;
}
$y++;
$iteration++;
}
return $text;
}
Any idea of how can I achieve this? I also tried to replace them but the German words should apear as they are on the text so that's a no go =/
as str_ functions in PHP do not support UTF, you have to use the mb_ extension. In your case, replace str_ireplace with mb_eregi_replace

How replace all spaces inside HTML elements with using preg_replace?

I need replace spaces with inside HTML elements.
Example:
<table atrr="zxzx"><tr>
<td>adfa a adfadfaf></td><td><br /> dfa dfa</td>
</tr></table>
should become
<table atrr="zxzx"><tr>
<td>adfa a adfadfaf></td><td><br /> dfa dfa</td>
</tr></table>
If you're working with php, you can do
$content = str_replace(' ', ' ', $content);
use regex to catch data between tags
(?:<\/?\w+)(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+)?)+\s*|\s*)\/?>([^<]*)?
then replace ' ' with ' '
also to catch before and after html :
^([^<>]*)<?
>([^<>]*)$
Edit:
here you go....
<?php
$data="dasdad asd a <table atrr=\"zxzx\"><tr><td>adfa a adfadfaf></td><td><br /> dfa dfa</td></tr></table> asdasd s ";
$exp="/((?:<\\/?\\w+)(?:\\s+\\w+(?:\\s*=\\s*(?:\\\".*?\\\"|'.*?'|[^'\\\">\\s]+)?)+\\s*|\\s*)\\/?>)([^<]*)?/";
$ex1="/^([^<>]*)(<?)/i";
$ex2="/(>)([^<>]*)$/i";
$data = preg_replace_callback($exp, function ($matches) {
return $matches[1] . str_replace(" ", " ", $matches[2]);
}, $data);
$data = preg_replace_callback($ex1, function ($matches) {
return str_replace(" ", " ", $matches[1]) . $matches[2];
}, $data);
$data = preg_replace_callback($ex2, function ($matches) {
return $matches[1] . str_replace(" ", " ", $matches[2]);
}, $data);
echo $data;
?>
it works... slightly modified but it would work without modifications (but i dont think youd understand the code ;) )
Since tokenizing HTML with regular expressions can be quite complicated (especially when allowing SGML quirks), you should use an HTML DOM parser like the one of PHP’s DOM library. Then you can query the DOM, get all text nodes and apply your replacement function on it:
$doc = new DOMDocument();
$doc->loadHTML($str);
$body = $doc->getElementsByTagName('body')->item(0);
mapOntoTextNodes($body, function(DOMText $node) { $node->nodeValue = str_replace(' ', ' ', $node->nodeValue); });
The mapOntoTextNodes function is a custom function I had defined in How to replace text URLs and exclude URLs in HTML tags?

Categories