php Swear code doesn't quite work - php

I have this code below which works ok ish.
$swearWords = file("blacklist.txt");
foreach ($swearWords as $naughty)
{
$post = str_ireplace(rtrim($naughty), "<b><i>(oops)</i></b>", $post);
}
The problem is with words that contain thee swear words..
for instant "Scunthorpe" has a bad word within it. this code changes it to S(oops)horpe.
Any ideas how i can fix this ? do I need to

You can replace your str_replace() with a preg_replace that ignores words that have leading and/or trailing letters, so a swear word is only replaced if its standing alone:
$post = "some Scunthorpe text";
$newpost = $post;
$swearWords = file("blacklist.txt");
foreach ($swearWords as $naughty)
{
$naughty = preg_quote($naughty, '/');
$newpost = preg_replace("/([^a-z]+{$naughty}[^a-z]*|[^a-z]+{$naughty}[^a-z]+)/i", "<b><i>(oops)</i></b>", $newpost);
}
if ($newpost) $post = $newpost;
else echo "an error occured during regex replacement";
Note that it still allows swear words like "aCUNT", "soFUCKINGstupid", ... i don't know how you could even handle that.

Swear and profanity filters are notoriously bad at catching "false positives".
The easiest way of dealing with these, in dictionary terms is to use a whitelist (in a similar way to your blacklist). A list of words that contain matches, but that are essentially allowed.
It's worth you reading this: How do you implement a good profanity filter which details the pro's and cons.

This oughta do it:
$swearWords = file("blacklist.txt");
$post_words = preg_split("/\s+/", $post);
foreach ($swearWords as $naughty)
{
foreach($post_words as &$word)
{
if(stripos($word, $naughty) !== false)
{
$word = "<b><i>(oops)</i></b>";
}
}
}
$post = implode(' ', $post_words);
So what's happening? It loads in your swear words, then loops through these. It then loops through all the words in the post, and checks if the current swearword exists in the currently looked at word. If it does, it removes it replaces it with your 'oops'.
Note that this will remove any whitespace formatting, so check this suits your situation first (do you care about tab characters or multiple sequential spaces?)

Related

PHP mention system with usernames with space

I wanted to know if it's possible to make a PHP mention system with usernames with space ?
I tried this
preg_replace_callback('##([a-zA-Z0-9]+)#', 'mentionUser', htmlspecialchars_decode($r['content']))
My function:
function mentionUser($matches) {
global $db;
$req = $db->prepare('SELECT id FROM members WHERE username = ?');
$req->execute(array($matches[1]));
if($req->rowCount() == 1) {
$idUser = $req->fetch()['id'];
return '<a class="mention" href="members/profile.php?id='.$idUser.'">'.$matches[0].'</a>';
}
return $matches[0];
It works, but not for the usernames with space...
I tried to add \s, it works, but not well, the preg_replace_callback detect the username and the other parts of the message, so the mention don't appear...
Is there any solution ?
Thanks !
I know you said that you just removed the ability to add a space, but I still wanted to post a solution. To be clear, I don't necessarily think you should use this code, because it probably is just easier to keep things simple, but I think it should work still.
Your major problem is that almost every mention will incur two lookups because #bob johnson went to the store could be either bob or bob johnson and there's no way to determine that without going to the databases. Caching will greatly reduce this problem, luckily.
Below is some code that generally does what you are looking for. I made a fake database using just an array for clarity and reproducibility. The inline code comments should hopefully make sense.
function mentionUser($matches)
{
// This is our "database" of users
$users = [
'bob johnson',
'edward',
];
// First, grab the full match which might be 'name' or 'name name'
$fullMatch = $matches['username'];
// Create a search array where the key is the search term and the value is whether or not
// the search term is a subset of the value found in the regex
$names = [$fullMatch => false];
// Next split on the space. If there isn't one, we'll have an array with just a single item
$maybeTwoParts = explode(' ', $fullMatch);
// Basically, if the string contained a space, also search only for the first item before the space,
// and flag that we're using a subset
if (count($maybeTwoParts) > 1) {
$names[array_shift($maybeTwoParts)] = true;
}
foreach ($names as $name => $isSubset) {
// Search our "database"
if (in_array($name, $users, true)) {
// If it was found, wrap in HTML
$ret = '<span>#' . $name . '</span>';
// If we're in a subset, we need to append back on the remaining string, joined with a space
if ($isSubset) {
$ret .= ' ' . array_shift($maybeTwoParts);
}
return $ret;
}
}
// Nothing was found, return what was passed in
return '#' . $fullMatch;
}
// Our search pattern with an explicitly named capture
$pattern = '##(?<username>\w+(?:\s\w+)?)#';
// Three tests
assert('hello <span>#bob johnson</span> test' === preg_replace_callback($pattern, 'mentionUser', 'hello #bob johnson test'));
assert('hello <span>#edward</span> test' === preg_replace_callback($pattern, 'mentionUser', 'hello #edward test'));
assert('hello #sally smith test' === preg_replace_callback($pattern, 'mentionUser', 'hello #sally smith test'));
Try this RegEx:
/#[a-zA-Z0-9]+( *[a-zA-Z0-9]+)*/g
It will find an at sign first, and then try to find one or more letter or numbers. It will try to find zero or more inner spaces and zero or more letters and numbers coming after that.
I am assuming the username only contains A-Za-z0-9 and space.

How can I str_replace partially in PHP in a dynamic string with unknown key content

Working in WordPress (PHP). I want to set strings to the database like below. The string is translatable, so it could be in any language keeping the template codes. For the possible variations, I presented 4 strings here:
<?php
$string = '%%AUTHOR%% changed status to %%STATUS_new%%';
$string = '%%AUTHOR%% changed status to %%STATUS_oldie%%';
$string = '%%AUTHOR%% changed priority to %%PRIORITY_high%%';
$string = '%%AUTHOR%% changed priority to %%PRIORITY_low%%';
To make the string human-readable, for the %%AUTHOR%% part I can change the string like below:
<?php
$username = 'Illigil Liosous'; // could be any unicode string
$content = str_replace('%%AUTHOR%%', $username, $string);
But for status and priority, I have different substrings of different lengths.
Question is:
How can I make those dynamic substring be replaced on-the-fly so that they could be human-readable like:
Illigil Liosous changed status to Newendotobulous;
Illigil Liosous changed status to Oldisticabulous;
Illigil Liosous changed priority to Highlistacolisticosso;
Illigil Liosous changed priority to Lowisdulousiannosso;
Those unsoundable words are to let you understand the nature of a translatable string, that could be anything other than known words.
I think I can proceed with something like below:
<?php
if( strpos($_content, '%%STATUS_') !== false ) {
// proceed to push the translatable status string
}
if( strpos($_content, '%%PRIORITY_') !== false ) {
// proceed to push the translatable priority string
}
But how can I fill inside those conditionals efficiently?
Edit
I might not fully am clear with my question, hence updating the query. The issue is not related to array str_replace.
The issue is, the $string that I need to detect is not predefined. It would come like below:
if($status_changed) :
$string = "%%AUTHOR%% changed status to %%STATUS_{$status}%%";
else if($priority_changed) :
$string = "%%AUTHOR%% changed priority to %%PRIORITY_{$priority}%%";
endif;
Where they will be filled dynamically with values in the $status and $priority.
So when it comes to str_replace() I will actually use functions to get their appropriate labels:
<?php
function human_readable($codified_string, $user_id) {
if( strpos($_content, '%%STATUS_') !== false ) {
// need a way to get the $status extracted from the $codified_string
// $_got_status = ???? // I don't know how.
get_status_label($_got_status);
// the status label replacement would take place here, I don't know how.
}
if( strpos($_content, '%%PRIORITY_') !== false ) {
// need a way to get the $priority extracted from the $codified_string
// $_got_priority = ???? // I don't know how.
get_priority_label($_got_priority);
// the priority label replacement would take place here, I don't know how.
}
// Author name replacement takes place now
$username = get_the_username($user_id);
$human_readable_string = str_replace('%%AUTHOR%%', $username, $codified_string);
return $human_readable_string;
}
The function has some missing points where I currently am stuck. :(
Can you guide me a way out?
It sounds like you need to use RegEx for this solution.
You can use the following code snippet to get the effect you want to achieve:
preg_match('/%%PRIORITY_(.*?)%%/', $_content, $matches);
if (count($matches) > 0) {
$human_readable_string = str_replace("%%PRIORITY_{$matches[0]}%%", $replace, $codified_string);
}
Of course, the above code needs to be changed for STATUS and any other replacements that you require.
Explaining the RegEx code in short it:
/
The starting of any regular expression.
%%PRIORITY_
Is a literal match of those characters.
(
The opening of the match. This is going to be stored in the third parameter of the preg_match.
.
This matches any character that isn't a new line.
*?
This matches between 0 and infinite of the preceding character - in this case anything. The ? is a lazy match since the %% character will be matched by the ..
Check out the RegEx in action: https://regex101.com/r/qztLue/1

Replacing words with tag links in PHP

I have a text ($text) and an array of words ($tags). These words in the text should be replaced with links to other pages so they don't break the existing links in the text. In CakePHP there is a method in TextHelper for doing this but it is corrupted and it breaks the existing HTML links in the text. The method suppose to work like this:
$text=Text->highlight($text,$tags,'\1',1);
Below there is existing code in CakePHP TextHelper:
function highlight($text, $phrase, $highlighter = '<span class="highlight">\1</span>', $considerHtml = false) {
if (empty($phrase)) {
return $text;
}
if (is_array($phrase)) {
$replace = array();
$with = array();
foreach ($phrase as $key => $value) {
$key = $value;
$value = $highlighter;
$key = '(' . $key . ')';
if ($considerHtml) {
$key = '(?![^<]+>)' . $key . '(?![^<]+>)';
}
$replace[] = '|' . $key . '|ix';
$with[] = empty($value) ? $highlighter : $value;
}
return preg_replace($replace, $with, $text);
} else {
$phrase = '(' . $phrase . ')';
if ($considerHtml) {
$phrase = '(?![^<]+>)' . $phrase . '(?![^<]+>)';
}
return preg_replace('|'.$phrase.'|i', $highlighter, $text);
}
}
You can see (and run) this algorithm here:
http://www.exorithm.com/algorithm/view/highlight
It can be made a little better and simpler with a few changes, but it still isn't perfect. Though less efficient, I'd recommend one of Ben Doom's solutions.
Replacing text in HTML is fundamentally different than replacing plain text. To determine whether text is part of an HTML tag requires you to find all the tags in order not to consider them. Regex is not really the tool for this.
I would attempt one of the following solutions:
Find the positions of all the words. Working from last to first, determine if each is part of a tag. If not, add the anchor.
Split the string into blocks. Each block is either a tag or plain text. Run your replacement(s) on the plain text blocks, and re-assemble.
I think the first one is probably a bit more efficient, but more prone to programmer error, so I'll leave it up to you.
If you want to know why I'm not approaching this problem directly, look at all the questions on the site about regex and HTML, and how regex is not a parser.
This code works just fine. What you may need to do is check the CSS for the <span class="highlight"> and make sure it is set to some color that will allow you to distinguish that it is high lighted.
.highlight { background-color: #FFE900; }
Amorphous - I noticed Gert edited your post. Are the two code fragments exactly as you posted them?
So even though the original code was designed for highlighting, I understand you're trying to repurpose it for generating links - it should, and does work fine for that (tested as posted).
HOWEVER escaping in the first code fragment could be an issue.
$text=Text->highlight($text,$tags,'\1',1);
Works fine... but if you use speach marks rather than quote marks the backslashes disappear as escape marks - you need to escape them. If you don't you get %01 links.
The correct way with speach marks is:
$text=Text->highlight($text,$tags,"\\1",1);
(Notice the use of \1 instead of \1)

Example code for a comment parser

Anyone know of any sample php (ideally codeigniter) code for parsing user submitted comments. TO remove profanity and HTML tags etc?
Try strip_tags to get rid of any html submitted. You can use htmlspecialchars to escape the tags if you just want to ensure that no html is displayed in the comments - as per Matchu's example, less unintended effects will happen with it than with strip_tags.
For a word filter, depending on how indepth you want to go, there are many examples on the web, from simple to complex. Here's the code from Jake Olefsky's example (the simple one linked previously):
<?
//This is totally free to use by anyone for any purpose.
// BadWordFilter
// This function does all the work. If $replace is 1 it will replace all bad words
// with the wildcard replacements. If $replace is 0 it will not replace anything.
// In either case, it will return 1 if it found bad words or 0 otherwise.
// Be sure to fill the $bads array with the bad words you want filtered.
function BadWordFilter(&$text, $replace)
{
//fill this array with the bad words you want to filter and their replacements
$bads = array (
array("butt","b***"),
array("poop","p***"),
array("crap","c***")
);
if($replace==1) { //we are replacing
$remember = $text;
for($i=0;$i<sizeof($bads);$i++) { //go through each bad word
$text = eregi_replace($bads[$i][0],$bads[$i][5],$text); //replace it
}
if($remember!=$text) return 1; //if there are any changes, return 1
} else { //we are just checking
for($i=0;$i<sizeof($bads);$i++) { //go through each bad word
if(eregi($bads[$i][0],$text)) return 1; //if we find any, return 1
}
}
}
//this will replace all bad words with their replacements. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,1);
//this will not repace any bad words. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,0);
?>
Many more examples of this can be found easily on the web.

Need a regex to add css class to first and last list item

UPDATE:
Thank you all for your input. Some additional information.
It's really just a small chunk of markup (20 lines) I'm working with and had aimed to to leverage a regex to do the work.
I also do have the ability to hack up the script (an ecommerce one) to insert the classes as the navigation is built. I wanted to limit the number of hacks I have in place to keep things easier on myself when I go to update to the latest version of the software.
With that said, I'm pretty aware of my situation and the various options available to me. The first part of my regex works as expected. I posted really more or less to see if someone would say, "hey dummy, this is easy just change this....."
After coming close with a few of my efforts, it's more of the principle at this point. To just know (and learn) a solution exists for this problem. I also hate being beaten by a piece of code.
ORIGINAL:
I'm trying to leverage regular expressions to add a CSS a class to the first and last list items within an ordered list. I've tried a bunch of different ways but can't produce the results I'm looking for.
I've got a regular expression for the first list item but can't seem to figure a correct one out for the last. Here is what I'm working with:
$patterns = array('/<ul+([^<]*)<li/m', '/<([^<]*)(?<=<li)(.*)<\/ul>/s');
$replace = array('<ul$1<li class="first"','<li class="last"$2$3</ul>');
$navigation = preg_replace($patterns, $replace, $navigation);
Any help would be greatly appreciated.
Jamie Zawinski would have something to say about this...
Do you have a proper HTML parser? I don't know if there's anything like hpricot available for PHP, but that's the right way to deal with it. You could at least employ hpricot to do the first cleanup for you.
If you're actually generating the HTML -- do it there. It looks like you want to generate some navigation and have a .first and .last kind of thing on it. Take a step back and try that.
+1 to generating the right html as the best option.
But a completely different approach, which may or may not be acceptable to you: you could use javascript.
This uses jquery to make it easy ...
$(document).ready(
function() {
$('#id-of-ul:firstChild').addClass('first');
$('#id-of-ul:lastChild').addClass('last');
}
);
As I say, may or may not be any use in this case, but I think its a valid solution to the problem in some cases.
PS: You say ordered list, then give ul in your example. ol = ordered list, ul = unordered list
You wrote:
$patterns = array('/<ul+([^<]*)<li/m','/<([^<]*)(?<=<li)(.*)<\/ul>/s');
First pattern:
ul+ => you search something like ullll...
The m modifier is useless here, since you don't use ^ nor $.
Second pattern:
Using .* along with s is "dangerous", because you might select the whole document up to the last /ul of the page...
And well, I would just drop s modifier and use: (<li\s)(.*?</li>\s*</ul>) with replace: '$1class="last" $2'
In view of above remarks, I would write the first expression: <ul.*?>\s*<li
Although I am tired of seeing the Jamie Zawinski quote each time there is a regex question, Dustin is right in pointing you to a HTML parser (or just generating the right HTML from the start!): regexes and HTML doesn't mix well, because HTML syntax is complex, and unless you act on a well known machine generated output with very predictable result, you are prone to get something breaking in some cases.
I don't know if anyone cares any longer, but I have a solution that works in my simple test case (and I believe it should work in the general case).
First, let me point out two things: While PhiLho is right in that the s is "dangerous", since dots may match everything up to the final of the document, this may very well be what you want. It only becomes a problem with not well formed pages. Be careful with any such regex on large, manually written pages.
Second, php has a special meaning of backslashes, even in single quotes. Most regexen will perform well either way, but you should always double-escape them, just in case.
Now, here's my code:
<?php
$navigation='<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li>Water</li>
</ul>';
$patterns = array('/<ul.*?>\\s*<li/',
'/<li((.(?<!<li))*?<\\/ul>)/s');
$replace = array('$0 class="first"',
'<li class="last"$1');
$navigation = preg_replace($patterns, $replace, $navigation);
echo $navigation;
?>
This will output
<ul>
<li class="first">Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li class="last">Water</li>
</ul>
This assumes no line feeds inside the opening <ul...> tag. If there are any, use the s modifier on the first expression too.
The magic happens in (.(?<!<li))*?. This will match any character (the dot) that is not the beginning of the string <li, repeated any amount of times (the *) in a non-greedy fashion (the ?).
Of course, the whole thing would have to be expanded if there is a chance the list items already have the class attribute set. Also, if there is only one list item, it will match twice, giving it two such attributes. At least for xhtml, this would break validation.
You could load the navigation in a SimpleXML object and work with that. This prevents you from breaking your markup with some crazy regex :)
As a preface .. this is waaay over-complicating things in most use-cases. Please see other answers for more sanity :)
Here is a little PHP class I wrote to solve a similar problem. It adds 'first', 'last' and any other classes you want. It will handle li's with no "class" attribute as well as those that already have some class(es).
<?php
/**
* Modify list items in pre-rendered html.
*
* Usage Example:
* $replaced_text = ListAlter::addClasses($original_html, array('cool', 'awsome'));
*/
class ListAlter {
private $classes = array();
private $classes_found = FALSE;
private $count = 0;
private $total = 0;
// No public instances.
private function __construct() {}
/**
* Adds 'first', 'last', and any extra classes you want.
*/
static function addClasses($html, $extra_classes = array()) {
$instance = new self();
$instance->classes = $extra_classes;
$total = preg_match_all('~<li([^>]*?)>~', $html, $matches);
$instance->total = $total ? $total : 0;
return preg_replace_callback('~<li([^>]*?)>~', array($instance, 'processListItem'), $html);
}
private function processListItem($matches) {
$this->count++;
$this->classes_found = FALSE;
$processed = preg_replace_callback('~(\w+)="(.*?)"~', array($this, 'appendClasses'), $matches[0]);
if (!$this->classes_found) {
$classes = $this->classes;
if ($this->count == 1) {
$classes[] = 'first';
}
if ($this->count == $this->total) {
$classes[] = 'last';
}
if (!empty($classes)) {
$processed = rtrim($matches[0], '>') . ' class="' . implode(' ', $classes) . '">';
}
}
return $processed;
}
private function appendClasses($matches) {
array_shift($matches);
list($name, $value) = $matches;
if ($name == 'class') {
$value = array_filter(explode(' ', $value));
$value = array_merge($value, $this->classes);
if ($this->count == 1) {
$value[] = 'first';
}
if ($this->count == $this->total) {
$value[] = 'last';
}
$value = implode(' ', $value);
$this->classes_found = TRUE;
}
return sprintf('%s="%s"', $name, $value);
}
}

Categories