Need help with preg_replace interpreting {variables} with parameters - php

I want to replace
{youtube}Video_ID_Here{/youtube}
with the embed code for a youtube video.
So far I have
preg_replace('/{youtube}(.*){\/youtube}/iU',...)
and it works just fine.
But now I'd like to be able to interpret parameters like height, width, etc. So could I have one regex for this whether is does or doesn't have parameters? It should be able to inperpret all of these below...
{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}
{youtube height="200px"}Video_ID_Here{/youtube}
{youtube}Video_ID_Here{/youtube}
{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}

Try this:
function createEmbed($videoID, $params)
{
// $videoID contains the videoID between {youtube}...{/youtube}
// $params is an array of key value pairs such as height => 200px
return 'HTML...'; // embed code
}
if (preg_match_all('/\{youtube(.*?)\}(.+?)\{\/youtube\}/', $string, $matches)) {
foreach ($matches[0] as $index => $youtubeTag) {
$params = array();
// break out the attributes
if (preg_match_all('/\s([a-z0-9]+)="([^\s]+?)"/', $matches[1][$index], $rawParams)) {
for ($x = 0; $x < count($rawParams[0]); $x++) {
$params[$rawParams[1][$x]] = $rawParams[2][$x];
}
}
// replace {youtube}...{/youtube} with embed code
$string = str_replace($youtubeTag, createEmbed($matches[2][$index], $params), $string);
}
}
this code matches the {youtube}...{/youtube} tags first and then splits out the attributes into an array, passing both them (as key/value pairs) and the video ID to a function. Just fill in the function definition to make it validate the params you want to support and build up the appropriate HTML code.

You probably want to use preg_replace_callback, as the replacing can get quite convoluted otherwise.
preg_replace_callback('/{youtube(.*)}(.*){\/youtube}/iU',...)
And in your callback, check $match[1] for something like the /(width|showborder|height|color1)="([^"]+)"/i pattern. A simple preg_match_all inside a preg_replace_callback keeps all portions nice & tidy and above all legible.

I would do it something like this:
preg_match_all("/{youtube(.*?)}(.*?){\/youtube}/is", $content, $matches);
for($i=0;$i<count($matches[0]);$i++)
{
$params = $matches[1][$i];
$youtubeurl = $matches[2][$i];
$paramsout = array();
if(preg_match("/height\s*=\s*('|\")([0-9]+px)('|\")/i", $params, $match)
{
$paramsout[] = "height=\"{$match[2]}\"";
}
//process others
//setup new code
$tagcode = "<object ..." . implode(" ", $paramsout) ."... >"; //I don't know what the code is to display a youtube video
//replace original tag
$content = str_replace($matches[0][$i], $tagcode, $content);
}
You could just look for params after "{youtube" and before "}" but you open yourself up to XSS problems. The best way would be look for a specific number of parameters and verify them. Don't allow things like < and > to be passed inside your tags as someone could put do_something_nasty(); or something.

I'd not use regex at all, since they are notoriously bad at parsing markup.
Since your input format is so close to HTML/XML in the first place, I'd rely on that
$tests = array(
'{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}'
, '{youtube height="200px"}Video_ID_Here{/youtube}'
, '{youtube}Video_ID_Here{/youtube}'
, '{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}'
, '{YOUTUBE width="150px" showborder="1"}Video_ID_Here{/youtube}' // deliberately invalid
);
echo '<pre>';
foreach ( $tests as $test )
{
try {
$youtube = SimpleXMLYoutubeElement::fromUserInput( $test );
print_r( $youtube );
}
catch ( Exception $e )
{
echo $e->getMessage() . PHP_EOL;
}
}
echo '</pre>';
class SimpleXMLYoutubeElement extends SimpleXMLElement
{
public static function fromUserInput( $code )
{
$xml = #simplexml_load_string(
str_replace( array( '{', '}' ), array( '<', '>' ), strip_tags( $code ) ), __CLASS__
);
if ( !$xml || 'youtube' != $xml->getName() )
{
throw new Exception( 'Invalid youtube element' );
}
return $xml;
}
public function toEmbedCode()
{
// write code to convert this to proper embode code
}
}

Related

Parse url with pattern in PHP?

How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.

Remove all attributes from PHP string but keep basic markdown tags [duplicate]

How can I use php to strip all/any attributes from a tag, say a paragraph tag?
<p class="one" otherrandomattribute="two"> to <p>
Although there are better ways, you could actually strip arguments from html tags with a regular expression:
<?php
function stripArgumentFromTags( $htmlString ) {
$regEx = '/([^<]*<\s*[a-z](?:[0-9]|[a-z]{0,9}))(?:(?:\s*[a-z\-]{2,14}\s*=\s*(?:"[^"]*"|\'[^\']*\'))*)(\s*\/?>[^<]*)/i'; // match any start tag
$chunks = preg_split($regEx, $htmlString, -1, PREG_SPLIT_DELIM_CAPTURE);
$chunkCount = count($chunks);
$strippedString = '';
for ($n = 1; $n < $chunkCount; $n++) {
$strippedString .= $chunks[$n];
}
return $strippedString;
}
?>
The above could probably be written in less characters, but it does the job (quick and dirty).
Strip attributes using SimpleXML (Standard in PHP5)
<?php
// define allowable tags
$allowable_tags = '<p><a><img><ul><ol><li><table><thead><tbody><tr><th><td>';
// define allowable attributes
$allowable_atts = array('href','src','alt');
// strip collector
$strip_arr = array();
// load XHTML with SimpleXML
$data_sxml = simplexml_load_string('<root>'. $data_str .'</root>', 'SimpleXMLElement', LIBXML_NOERROR | LIBXML_NOXMLDECL);
if ($data_sxml ) {
// loop all elements with an attribute
foreach ($data_sxml->xpath('descendant::*[#*]') as $tag) {
// loop attributes
foreach ($tag->attributes() as $name=>$value) {
// check for allowable attributes
if (!in_array($name, $allowable_atts)) {
// set attribute value to empty string
$tag->attributes()->$name = '';
// collect attribute patterns to be stripped
$strip_arr[$name] = '/ '. $name .'=""/';
}
}
}
}
// strip unallowed attributes and root tag
$data_str = strip_tags(preg_replace($strip_arr,array(''),$data_sxml->asXML()), $allowable_tags);
?>
Here is one function that will let you strip all attributes except ones you want:
function stripAttributes($s, $allowedattr = array()) {
if (preg_match_all("/<[^>]*\\s([^>]*)\\/*>/msiU", $s, $res, PREG_SET_ORDER)) {
foreach ($res as $r) {
$tag = $r[0];
$attrs = array();
preg_match_all("/\\s.*=(['\"]).*\\1/msiU", " " . $r[1], $split, PREG_SET_ORDER);
foreach ($split as $spl) {
$attrs[] = $spl[0];
}
$newattrs = array();
foreach ($attrs as $a) {
$tmp = explode("=", $a);
if (trim($a) != "" && (!isset($tmp[1]) || (trim($tmp[0]) != "" && !in_array(strtolower(trim($tmp[0])), $allowedattr)))) {
} else {
$newattrs[] = $a;
}
}
$attrs = implode(" ", $newattrs);
$rpl = str_replace($r[1], $attrs, $tag);
$s = str_replace($tag, $rpl, $s);
}
}
return $s;
}
In example it would be:
echo stripAttributes('<p class="one" otherrandomattribute="two">');
or if you eg. want to keep "class" attribute:
echo stripAttributes('<p class="one" otherrandomattribute="two">', array('class'));
Or
Assuming you are to send a message to an inbox and you composed your message with CKEDITOR, you can assign the function as follows and echo it to the $message variable before sending. Note the function with the name stripAttributes() will strip off all html tags that are unnecessary. I tried it and it work fine. i only saw the formatting i added like bold e.t.c.
$message = stripAttributes($_POST['message']);
or
you can echo $message; for preview.
I honestly think that the only sane way to do this is to use a tag and attribute whitelist with the HTML Purifier library. Example script here:
<html><body>
<?php
require_once '../includes/htmlpurifier-4.5.0-lite/library/HTMLPurifier/Bootstrap.php';
spl_autoload_register(array('HTMLPurifier_Bootstrap', 'autoload'));
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,b,a[href],i,br,img[src]');
$config->set('URI.Base', 'http://www.example.com');
$config->set('URI.MakeAbsolute', true);
$purifier = new HTMLPurifier($config);
$dirty_html = "
<a href=\"http://www.google.de\">broken a href link</a
fnord
<x>y</z>
<b>c</p>
<script>alert(\"foo!\");</script>
Anzahl besuchter Seiten
<img src=\"www.example.com/bla.gif\" />
<a href=\"http://www.google.de\">missing end tag
ende
";
$clean_html = $purifier->purify($dirty_html);
print "<h1>dirty</h1>";
print "<pre>" . htmlentities($dirty_html) . "</pre>";
print "<h1>clean</h1>";
print "<pre>" . htmlentities($clean_html) . "</pre>";
?>
</body></html>
This yields the following clean, standards-conforming HTML fragment:
broken a href linkfnord
y
<b>c
<a>Anzahl besuchter Seiten</a>
<img src="http://www.example.com/www.example.com/bla.gif" alt="bla.gif" /><a href="http://www.google.de">missing end tag
ende
</a></b>
In your case the whitelist would be:
$config->set('HTML.Allowed', 'p');
HTML Purifier is one of the better tools for sanitizing HTML with PHP.
You might also look into html purifier. True, it's quite bloated, and might not fit your needs if it only conceirns this specific example, but it offers more or less 'bulletproof' purification of possible hostile html. Also you can choose to allow or disallow certain attributes (it's highly configurable).
http://htmlpurifier.org/

str_replace not replacing correctly

I have the following, simple code:
$text = str_replace($f,''.$u.'',$text);
where $f is a URL, like http://google.ca, and $u is the name of the URL (my function names it 'Google').
My problem is, is if I give my function a string like
http://google.ca http://google.ca
it returns
Google" target="_blank">Google</a> Google" target="_blank">Google</a>
Which obviously isn't what I want. I want my function to echo out two separate, clickable links. But str_replace is replacing the first occurrence (it's in a loop to loop through all the found URLs), and that first occurrence has already been replaced.
How can I tell str_replace to ignore that specific one, and move onto the next? The string given is user input, so I can't just give it a static offset or anything with substr, which I have tried.
Thank you!
One way, though it's a bit of a kludge: you can use a temporary marker that (hopefully) won't appear in the string:
$text = str_replace ($f, '' . $u . '',
$text);
That way, the first substitution won't be found again. Then at the end (after you've processed the entire line), simply change the markers back:
$text = str_replace ('XYZZYPLUGH', $f, $text);
Why not pass your function an array of URLs, instead?
function makeLinks(array $urls) {
$links = array();
foreach ($urls as $url) {
list($desc, $href) = $url;
// If $href is based on user input, watch out for "javascript: foo;" and other XSS attacks here.
$links[] = '<a href="' . htmlentities($href) . '" target="_blank">'
. htmlentities($desc)
. '</a>';
}
return $links; // or implode('', $links) if you want a string instead
}
$urls = array(
array('Google', 'http://google.ca'),
array('Google', 'http://google.ca')
);
var_dump(makeLinks($urls));
If i understand your problem correctly, you can just use the function sprintf. I think something like this should work:
function urlize($name, $url)
{
// Make sure the url is formatted ok
if (!filter_var($url, FILTER_VALIDATE_URL))
return '';
$name = htmlspecialchars($name, ENT_QUOTES);
$url = htmlspecialchars($url, ENT_QUOTES);
return sprintf('%s', $url, $name);
}
echo urlize('my name', 'http://www.domain.com');
// my name
I havent test it though.
I suggest you to use preg_replace instead of str_replace here like this code:
$f = 'http://google.ca';
$u = 'Google';
$text='http://google.ca http://google.ca';
$regex = '~(?<!<a href=")' . preg_quote($f) . '~'; // negative lookbehind
$text = preg_replace($regex, ''.$u.'', $text);
echo $text . "\n";
$text = preg_replace($regex, ''.$u.'', $text);
echo $text . "\n";
OUTPUT:
Google Google
Google Google

is there a PHP library that handles URL parameters adding, removing, or replacing?

when we add a param to the URL
$redirectURL = $printPageURL . "?mode=1";
it works if $printPageURL is "http://www.somesite.com/print.php", but if $printPageURL is changed in the global file to "http://www.somesite.com/print.php?newUser=1", then the URL becomes badly formed. If the project has 300 files and there are 30 files that append param this way, we need to change all 30 files.
the same if we append using "&mode=1" and $printPageURL changes from "http://www.somesite.com/print.php?new=1" to "http://www.somesite.com/print.php", then the URL is also badly formed.
is there a library in PHP that will automatically handle the "?" and "&", and even checks that existing param exists already and removed that one because it will be replaced by the later one and it is not good if the URL keeps on growing longer?
Update: of the several helpful answers, there seems to be no pre-existing function addParam($url, $newParam) so that we don't need to write it?
Use a combination of parse_url() to explode the URL, parse_str() to explode the query string and http_build_query() to rebuild the querystring. After that you can rebuild the whole url from its original fragments you get from parse_url() and the new query string you built with http_build_query(). As the querystring gets exploded into an associative array (key-value-pairs) modifying the query is as easy as modifying an array in PHP.
EDIT
$query = parse_url('http://www.somesite.com/print.php?mode=1&newUser=1', PHP_URL_QUERY);
// $query = "mode=1&newUser=1"
$params = array();
parse_str($query, $params);
/*
* $params = array(
* 'mode' => '1'
* 'newUser' => '1'
* )
*/
unset($params['newUser']);
$params['mode'] = 2;
$params['done'] = 1;
$query = http_build_query($params);
// $query = "mode=2&done=1"
Use this:
http://hu.php.net/manual/en/function.http-build-query.php
http://www.addedbytes.com/php/querystring-functions/
is a good place to start
EDIT: There's also http://www.php.net/manual/en/class.httpquerystring.php
for example:
$http = new HttpQueryString();
$http->set(array('page' => 1, 'sort' => 'asc'));
$url = "yourfile.php" . $http->toString();
None of these solutions work when the url is of the form:
xyz.co.uk?param1=2&replace_this_param=2
param1 gets dropped all the time
.. which means it never works EVER!
If you look at the code given above:
function addParam($url, $s) {
return adjustParam($url, $s);
}
function delParam($url, $s) {
return adjustParam($url, $s);
}
These functions are IDENTICAL - so how can one add and one delete?!
using WishCow and sgehrig's suggestion, here is a test:
(assuming no anchor for the URL)
<?php
echo "<pre>\n";
function adjustParam($url, $s) {
if (preg_match('/(.*?)\?/', $url, $matches)) $urlWithoutParams = $matches[1];
else $urlWithoutParams = $url;
parse_str(parse_url($url, PHP_URL_QUERY), $params);
if (strpos($s, '=') !== false) {
list($var, $value) = split('=', $s);
$params[$var] = urldecode($value);
return $urlWithoutParams . '?' . http_build_query($params);
} else {
unset($params[$s]);
$newQueryString = http_build_query($params);
if ($newQueryString) return $urlWithoutParams . '?' . $newQueryString;
else return $urlWithoutParams;
}
}
function addParam($url, $s) {
return adjustParam($url, $s);
}
function delParam($url, $s) {
return adjustParam($url, $s);
}
echo "trying add:\n";
echo addParam("http://www.somesite.com/print.php", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1&fee=0", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1&fee=0&", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?mode=1", "mode=3"), "\n";
echo "\n", "now trying delete:\n";
echo delParam("http://www.somesite.com/print.php?mode=1", "mode"), "\n";
echo delParam("http://www.somesite.com/print.php?mode=1&newUser=1", "mode"), "\n";
echo delParam("http://www.somesite.com/print.php?mode=1&newUser=1", "newUser"), "\n";
?>
and the output is:
trying add:
http://www.somesite.com/print.php?mode=3
http://www.somesite.com/print.php?mode=3
http://www.somesite.com/print.php?newUser=1&mode=3
http://www.somesite.com/print.php?newUser=1&fee=0&mode=3
http://www.somesite.com/print.php?newUser=1&fee=0&mode=3
http://www.somesite.com/print.php?mode=3
now trying delete:
http://www.somesite.com/print.php
http://www.somesite.com/print.php?newUser=1
http://www.somesite.com/print.php?mode=1
You can try this:
function removeParamFromUrl($query, $paramToRemove)
{
$params = parse_url($query);
if(isset($params['query']))
{
$queryParams = array();
parse_str($params['query'], $queryParams);
if(isset($queryParams[$paramToRemove])) unset($queryParams[$paramToRemove]);
$params['query'] = http_build_query($queryParams);
}
$ret = $params['scheme'].'://'.$params['host'].$params['path'];
if(isset($params['query']) && $params['query'] != '' ) $ret .= '?'.$params['query'];
return $ret;
}

How can I remove attributes from an html tag?

How can I use php to strip all/any attributes from a tag, say a paragraph tag?
<p class="one" otherrandomattribute="two"> to <p>
Although there are better ways, you could actually strip arguments from html tags with a regular expression:
<?php
function stripArgumentFromTags( $htmlString ) {
$regEx = '/([^<]*<\s*[a-z](?:[0-9]|[a-z]{0,9}))(?:(?:\s*[a-z\-]{2,14}\s*=\s*(?:"[^"]*"|\'[^\']*\'))*)(\s*\/?>[^<]*)/i'; // match any start tag
$chunks = preg_split($regEx, $htmlString, -1, PREG_SPLIT_DELIM_CAPTURE);
$chunkCount = count($chunks);
$strippedString = '';
for ($n = 1; $n < $chunkCount; $n++) {
$strippedString .= $chunks[$n];
}
return $strippedString;
}
?>
The above could probably be written in less characters, but it does the job (quick and dirty).
Strip attributes using SimpleXML (Standard in PHP5)
<?php
// define allowable tags
$allowable_tags = '<p><a><img><ul><ol><li><table><thead><tbody><tr><th><td>';
// define allowable attributes
$allowable_atts = array('href','src','alt');
// strip collector
$strip_arr = array();
// load XHTML with SimpleXML
$data_sxml = simplexml_load_string('<root>'. $data_str .'</root>', 'SimpleXMLElement', LIBXML_NOERROR | LIBXML_NOXMLDECL);
if ($data_sxml ) {
// loop all elements with an attribute
foreach ($data_sxml->xpath('descendant::*[#*]') as $tag) {
// loop attributes
foreach ($tag->attributes() as $name=>$value) {
// check for allowable attributes
if (!in_array($name, $allowable_atts)) {
// set attribute value to empty string
$tag->attributes()->$name = '';
// collect attribute patterns to be stripped
$strip_arr[$name] = '/ '. $name .'=""/';
}
}
}
}
// strip unallowed attributes and root tag
$data_str = strip_tags(preg_replace($strip_arr,array(''),$data_sxml->asXML()), $allowable_tags);
?>
Here is one function that will let you strip all attributes except ones you want:
function stripAttributes($s, $allowedattr = array()) {
if (preg_match_all("/<[^>]*\\s([^>]*)\\/*>/msiU", $s, $res, PREG_SET_ORDER)) {
foreach ($res as $r) {
$tag = $r[0];
$attrs = array();
preg_match_all("/\\s.*=(['\"]).*\\1/msiU", " " . $r[1], $split, PREG_SET_ORDER);
foreach ($split as $spl) {
$attrs[] = $spl[0];
}
$newattrs = array();
foreach ($attrs as $a) {
$tmp = explode("=", $a);
if (trim($a) != "" && (!isset($tmp[1]) || (trim($tmp[0]) != "" && !in_array(strtolower(trim($tmp[0])), $allowedattr)))) {
} else {
$newattrs[] = $a;
}
}
$attrs = implode(" ", $newattrs);
$rpl = str_replace($r[1], $attrs, $tag);
$s = str_replace($tag, $rpl, $s);
}
}
return $s;
}
In example it would be:
echo stripAttributes('<p class="one" otherrandomattribute="two">');
or if you eg. want to keep "class" attribute:
echo stripAttributes('<p class="one" otherrandomattribute="two">', array('class'));
Or
Assuming you are to send a message to an inbox and you composed your message with CKEDITOR, you can assign the function as follows and echo it to the $message variable before sending. Note the function with the name stripAttributes() will strip off all html tags that are unnecessary. I tried it and it work fine. i only saw the formatting i added like bold e.t.c.
$message = stripAttributes($_POST['message']);
or
you can echo $message; for preview.
I honestly think that the only sane way to do this is to use a tag and attribute whitelist with the HTML Purifier library. Example script here:
<html><body>
<?php
require_once '../includes/htmlpurifier-4.5.0-lite/library/HTMLPurifier/Bootstrap.php';
spl_autoload_register(array('HTMLPurifier_Bootstrap', 'autoload'));
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,b,a[href],i,br,img[src]');
$config->set('URI.Base', 'http://www.example.com');
$config->set('URI.MakeAbsolute', true);
$purifier = new HTMLPurifier($config);
$dirty_html = "
<a href=\"http://www.google.de\">broken a href link</a
fnord
<x>y</z>
<b>c</p>
<script>alert(\"foo!\");</script>
Anzahl besuchter Seiten
<img src=\"www.example.com/bla.gif\" />
<a href=\"http://www.google.de\">missing end tag
ende
";
$clean_html = $purifier->purify($dirty_html);
print "<h1>dirty</h1>";
print "<pre>" . htmlentities($dirty_html) . "</pre>";
print "<h1>clean</h1>";
print "<pre>" . htmlentities($clean_html) . "</pre>";
?>
</body></html>
This yields the following clean, standards-conforming HTML fragment:
broken a href linkfnord
y
<b>c
<a>Anzahl besuchter Seiten</a>
<img src="http://www.example.com/www.example.com/bla.gif" alt="bla.gif" /><a href="http://www.google.de">missing end tag
ende
</a></b>
In your case the whitelist would be:
$config->set('HTML.Allowed', 'p');
HTML Purifier is one of the better tools for sanitizing HTML with PHP.
You might also look into html purifier. True, it's quite bloated, and might not fit your needs if it only conceirns this specific example, but it offers more or less 'bulletproof' purification of possible hostile html. Also you can choose to allow or disallow certain attributes (it's highly configurable).
http://htmlpurifier.org/

Categories