i am trying to covert HTML to entities using PHP, but i need to except <br> and <a> tags.
here's an example of my code
<?php
$string[0] = "<a href='http://hidd3n.tk'>Needs to stay</a> Filler text in between
<br><br> <script src='http://malicious.com/'></script> NEEDS to go";
$string[1] = htmlentities($string[0], ENT_QUOTES, "UTF-8");
?>
Let me suggest you to use a BBCode which will be way more safe.
EDIT:
Okay i have worked out a way.
Take this function rather safe than previous one:
function convert_myhtml_entities($string){
$string = htmlentities($string, ENT_NOQUOTES, "UTF-8");
$string = preg_replace('/<\s*br\s*(\/|)\s*>/U','<br$1>',$string);
$string = preg_replace('/<\s*a(.*)\s*>/U','<a$1>',$string);
$string = preg_replace('/<\s*\/\s*a\s*>/U','</a>',$string);
return $string;
}
now it is the tested with the string above.
Related
I have a string like bellow:
<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>
In the <code></code> tag I want to replace all the characters < and > with < and >. How should I do?
Example: <code> < div ><code>.
Please tell me if you have any ideas. Thanks all.
try below solution:
$textToScan = '<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>';
// the regex pattern (case insensitive & multiline
$search = "~<code>(.*?)</code>~is";
// first look for all CODE tags and their content
preg_match_all($search, $textToScan, $matches);
//print_r($matches);
// now replace all the CODE tags and their content with a htmlspecialchars() content
foreach($matches[1] as $match){
$replace = htmlspecialchars($match);
// now replace the previously found CODE block
$textToScan = str_replace($match, $replace, $textToScan);
}
// output result
echo $textToScan;
output:
<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>
Don't. Use htmlspecialchars. That is there only to serve that very purpose
echo htmlspecialchars("<a href='test'>Test</a>");
Output of your HTML code
<pre title="language-markup"><code>
<div title="item_content item_view_content"
itemprop="articleBody">abc</div></code></pre>
Another example based on your comment
<code>
<?php
echo htmlspecialchars('html here');?>
</code>
Use either htmlspecialchars() or htmlentities()
$string = "<html></html>"
// Do this
$encodedString = htmlentities($string);
// or
$encodedString = htmlspecialchars($string);
The difference in these two functions is that one will encode everything or better said "entities". The other will only encode special characters.
Below are some quotes from PHP.net
From the PHP documentation for htmlentities:
This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.
From the PHP documentation for htmlspecialchars:
Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.
Ok, I'm trying to fix my problem. I was successed, this is my code to resolve my problem. You can use my way or use Chetan Ameta's way bellow my answer:
function replaceString($string)
{
preg_match_all('/<code>(.*?)<\/code>/', $string, $matches);
$result = [];
foreach ($matches[1] as $key => $match) {
$result[$key] = str_replace(['<', '>'], ['<', '>'], $match);
}
return str_replace($matches[1], $result, $string);
}
$string = '<pre title="language-markup"><code><div title="item_content item_view_content" itemprop="articleBody">abc</div></code></pre>';
echo replaceString($string);
I like this place, thanks all help me, i'm so grateful. Thank again.
I'm looking for a regex that will be able to replace all links like Link with a warning. I've been having a play but no success so far! I've always been bad with regex, can someone point me in the right direction? I have this so far:
Edit: People saying don't use Regex - the HTML will be the output of a markdown parser with all HTML tags in the markdown stripped. Therefore i know that the output of all links will be formatted as stated above, therefore regex would surely be a good tool in this particular situation. I am not allowing users to enter pure HTML. And SO has done something very similar, try creating a javascript link, and it will be removed
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = "/ href\\s*?=\\s*?[\"']\\s*?(javascript)\\s*?(:).*?([\"']) /is";
$replacement = "\"javascript: alert('Javascript links have been blocked');\"";
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>
<form method="post">
<input type="text" name="jsfilter" />
<button type="submit">Submit</button>
</form>
The right regex should be :
$pattern = '/href="javascript:[^"]+"/';
$replacement = 'href="javascript:alert(\'Javascript links have been blocked\')"';
Use strip_tags and htmlSpecialChars() to display user generated content. If you want to let users use specific tags, refer to BBcode.
You should test quote and double quotes, handle white spaces, etc...
$html = preg_replace( '/href\s*=\s*"javascript:[^"]+"/i' , 'href="#"' , $html );
$html = preg_replace( '/href\s*=\s*\'javascript:[^i]+\'/i' , 'href=\'#\'' , $html );
Try this code. I think, this would help.
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = '/a href="javascript:(.*?)"/i';
$replacement = 'a href="javascript: alert(\'Javascript links have been blocked\');"';
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>
I have a string:
[COLOR=gray]A bunch of text.[/COLOR]
And I would like to write a preg_replace that removes everything between "[COLOR=gray]" and "[/COLOR]" -- if it's possible to remove those tags as well, that's great, otherwise I can do a simple replace afterward.
$str = 'dfgdfg[COLOR=gray]A bunch of text.[/COLOR]dfgdfgdfgfg';
$str1 = preg_replace('/\[COLOR=gray\].*\[\/COLOR\]/',"",$str);
echo $str1;
OR
if COLOR is not always gray
$str = 'dfgdfg[COLOR=gray]A bunch of text.[/COLOR]dfgdfgdfgfg';
$str1 = preg_replace('/\[COLOR=\w+\].*\[\/COLOR\]/',"",$str);
echo $str1;
How do I convert a string that has a - or + sign to a html friendly string?
I mean to convert those characters to html notations, like space is and so on...
ps: htmlentities doesn't work. I still see the -/+
Try this
$string = str_replace('+', '+', $string); // Convert + sign
$string = str_replace('-', '-', $string); // Convert - sign
I don't think there is entities for these symbols see: http://www.w3schools.com/tags/ref_entities.asp
I tested with
$str = "- and +"; echo htmlentities($str);
and didn't get entities. According to: http://us.php.net/manual/en/function.htmlentities.php
I would expect them to be encoded if there was encoding available.
No idea what you want to accomplish. But this escapes selected characters to html entities:
$html = preg_replace("/([+-])/e", '"&#".ord("$1").";"', $html);
As far as I am aware, - and + are fine in HTML, and dont have an entity equivalent. See http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Are you sure you're not thinking of URL encoding?
Specify that you want it to use unicode as follows:
htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
Have a look at the 2nd comment on this page:
http://www.php.net/manual/en/function.htmlentities.php#100388
This will enable more encoding characters.
If you just want to encode some, then this is a little lighter weight:
<?php
$ent = array(
'+'=>'+',
'-'=>'+'
);
echo strtr('+ and -', $ent);
?>
I need to strip all <br /> and all 'quotes' (") and all 'ands' (&) and replace them with a space only ...
How can I do this? (in PHP)
I have tried this for the <br />:
$description = preg_replace('<br />', '', $description);
But it returned <> in place of every <br />...
Thanks
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
http://php.net/manual/en/function.strip-tags.php
You can use str_replace like this:
str_replace("<br/>", " ", $orig );
preg_replace etc uses regular expressions and that may not be what you want.
If str_replace() isnt working for you, then something else must be wrong, because
$string = 'A string with <br/> & "double quotes".';
$string = str_replace(array('<br/>', '&', '"'), ' ', $string);
echo $string;
outputs
A string with double quotes .
Please provide an example of your input string and what you expect it to look like after filtering.
To manipulate HTML it is generally a good idea to use a DOM aware tool instead of plain text manipulation tools (think for example what will happen if you enounter variants like <br/>, <br /> with more than one space, or even <br> or <BR/>, which altough illegal are sometimes used). See for example here: http://sourceforge.net/projects/simplehtmldom/
To remove all permutations of br:
<br> <br /> <br/> <br >
check out the user contributed strip_only() function in
http://www.php.net/strip_tags
The "Use the DOM instead of replacing" caveat is always correct, but if the task is really limited to these three characters, this should be o.k.
Try this:
$description = preg_replace('/<br \/>/iU', '', $description);
$string = "Test<br>Test<br />Test<br/>";
$string = preg_replace( "/<br>|\n|<br( ?)\/>/", " ", $string );
echo $string;
This worked for me, to remove <br/> :
(> is recognised whereas > isn't)
$temp2 = str_replace('<','', $temp);
// echo ($temp2);
$temp2 = str_replace('/>','', $temp2);
// echo ($temp2);
$temp2 = str_replace('br','', $temp2);
echo ($temp2);