PHP Replacing swear words with phrases - php

So I get how to replace certain words with other ones. What I'm trying to figure out is how to take a word and replace it with a phrase and eliminate all other input.
For example:
bad word is 'dog'
user inputs -> 'You smell like a dog.'
instead of it replacing 'dog' with 'rainbow' or something, I want it to echo something like: 'You are a potty mouth'.
Here's what I have for code:
<?php
$find = array('dog', 'cat', 'bird');
$replace = 'You are a potty mouth.';
if (isset ($_POST['user_input'])&&!empty($_POST['user_input'])) {
$user_input = $_POST['user_input'];
$user_input_new = str_ireplace($find, $replace, $user_input);
echo $user_input_new;
}
?>
With this code it echos: 'You smell like a You are a pottymouth.'
I'm sure this is a repost and I apologize. Everything I've been able to find is documentation on how to replace only parts of strings, not entire ones.

Well, in this case you can just check whether there is a "bad word" in the user input string, and if it returns true, echo "You are a potty mouth."
You would want to use strpos()
e.g.
if( strpos($_POST['user_input'],'dog')!==FALSE ) {
echo('You are a potty mouth');
}
If you have an array of "bad words" you'll want to loop through them to check any occur within user input.

I've been looking at the same issue recently, here's a script I was working on to filter certain words. Still a work in progress but it has the ability to output the user message or a custom message. Hope it helps or points you in the right direction.
define("MIN_SAFE_WORD_LIMIT", 3);
$safe = true;
$whiteList = array();
$blackList = array();
$text = 'Test words fRom a piece of text.';
$blCount = count($blackList);
for($i=0; $i<$blCount; $i++) {
if((strlen($blackList[$i]) >= MIN_SAFE_WORD_LIMIT) && strstr(strtolower($text), strtolower($blackList[$i])) && !strstr(strtolower($text), strtolower($whiteList[$i]))) {
$safe = false;
}
}
if(!$safe) {
// Unsafe, flag for action
echo 'Unsafe';
} else {
echo $text;
}

You don't want to replace the bad words, but the whole string, so you should just match and if matched set the whole string to your replacement string.
Also, as pointed out in the comments the words can be part of another, valid, word so if you want to take that into account, you should match only whole words.
This simple example uses word boundaries in a regular expression to match your words (in the example this would be in a loop, looping over your bad words array):
foreach ($find as $your_word)
{
$search = '/\b' . preg_quote($your_word) . '\b/i';
if (preg_match($search, $_POST['user_input']) === 1)
{
// a match is found, echo or set it to a variable, whatever you need
echo $replace;
// break out of the loop
break;
}
}

Heres an alternative solution, match words and replace with * len of str. this wont match words like Scunthorpe as it uses word boundaries, Also you you can add a 3rd param to reveal the first letters of the word so you know what word was said without seeing it.
<?php
$badwords = array('*c word', '*f word','badword','stackoverflow');
function swear_filter($str,$badwords,$reveal=null) {
//Alternatively load from file
//$words = join("|", array_filter(array_map('preg_quote',array_map('trim', file('badwords.txt')))));
$words = join("|", array_filter(array_map('preg_quote',array_map('trim', $badwords))));
if($reveal !=null && is_numeric($reveal)){
return preg_replace("/\b($words)\b/uie", '"".substr("$1",0,'.$reveal.').str_repeat("*",strlen("$1")-'.$reveal.').""', $str);
}else{
return preg_replace("/\b($words)\b/uie", '"".str_repeat("*",strlen("$1")).""', $str);
}
}
$str="There was a naughty Peacock from Scunthorpe and it said a badword, on stackoverflow";
//There was a naughty Peacock from Scunthorpe and it said a b******, on s************
echo swear_filter($str,$badwords,1);
//There was a naughty Peacock from Scunthorpe and it said a *******, on *************
echo swear_filter($str,$badwords);
?>

Related

PHP strpos() as a way to implement an swear word filter

I wrote a short function that should check if user input does contain any bad words that I predefined inside $bad_words array. I don't even care to replace them - I just want to ban if there are any. The code seems to work as it should - in the example below will detect the quoted string badword and the function does return true.
My question: Is this a good way to use foreach and strpos()? Perhaps there is better way to check if $input contains one of the $bad_words array elements? Or is it just fine as I wrote it?
function checkswearing($input)
{
$input = preg_replace('/[^0-9^A-Z^a-z^-^ ]/', '', $input);//clean, temporary $input that just contains pure text and numbers
$bad_words = array('badword', 'reallybadword', 'some other bad words');//bad words array
foreach($bad_words as $bad_word)
{//so here I'm using a foreach loop with strpos() to check if $input contains one of the bad words or not
if (strpos($input, $bad_word) !== false)
return true;//if there is one - no reason to check further bad words
}
return false;//$input is clean!
}
$input = 'some input text, might contain a "badword" and I\'d like to check if it does or not';
if (checkswearing($input))
echo 'Oh dear, my ears!';
else
{
echo 'You are so polite, so let\'s proceed with the rest of the code!';
(...)
}

Search variable content for specific matches

i have the fowling code in my project:
$title = "In this title we have the word GUN"
$needed_words = array('War', 'Gun', 'Shooting');
foreach($needed_words as $needed_word) {
if (preg_match("/\b$needed_word\b/", $title)) {
$the_word = "ECHO THE WORD THATS FIND INSIDE TITLE";
}
}
I want to check if $title contains one of 15 predefined words,
for example lets say:
if $title contains words "War, Gun, Shooting" then i want to assign the word that is find to $the_word
Thanks in advance for your time!
try this
$makearray=array('war','gun','shooting');
$title='gun';
if(in_array($title,$makearray))
{
$if_included='the value you want to give';
echo $if_included;
}
Note:- This will work if your $title contains exactly the same string that is present as one of the value in the array.Otherwise not.
The best approach would be to use regular expressions, as it is most flexible, and allows you to have more controll over the words which you like to match. To be sure that the string contains words like gun (but also guns), shoot (but also shooting) you can do the following:
$words = array(
'war',
'gun',
'shoot'
);
$pattern = '/(' . implode(')|(', $words) . ')/i';
$if_included = (bool) preg_match($pattern, "Some text - here");
var_dump($if_included);
This matches more then it should. For example it will return true also if the string contains a warning (becouse it starts with war) you can improve this by introducing additinal constraints to certain patterns. For example:
$words = array(
'war(?![a-z])', // now it will match "war", but not "warning"
'gun',
'shoot'
);

Function which searches for a word in a text and highlights all the words which contain it

This function searches for words (from the $words array) inside a text and highlights them.
function highlightWords(Array $words, $text){ // Loop through array of words
foreach($words as $word){ // Highlight word inside original text
$text = str_replace($word, '<span class="highlighted">' . $word . '</span>', $text);
}
return $text; // Return modified text
}
Here is the problem:
Lets say the $words = array("car", "drive");
Is there a way for the function to highlight not only the word car, but also words which contain the letters "car" like: cars, carmania, etc.
Thank you!
What you want is a regular expression, preg_replace or peg_replace_callback more in particular (callback in your case would be recommended)
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
// define your word list
$toHighlight = array("car","lane");
Because you need a regular expression to search your words and you might want or need variation or changes over time, it's bad practice to hard code it into your search words. Hence it's best to walk over the array with array_map and transform the searchword into the proper regular expression (here just enclosing it with / and adding the "accept everything until punctuation" expression)
$searchFor = array_map('addRegEx',$toHighlight);
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/';
}
Next you wish to replace the word you found with your highlighted version, which means you need a dynamic change: use preg_replace_callback instead of regular preg_replace so that it calls a function for every match it find and uses it to generate the proper result. Here we enclose the found word in its span tags
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
$result = preg_replace_callback($searchFor,'highlight',$searchString);
print $result;
yields
The <span class='highlight'>car</span> is driving in the <span class='highlight'>carpark</span>, he's not holding to the right <span class='highlight'>lane</span>.
So just paste these code fragments after the other to get the working code, obviously. ;)
edit: the complete code below was altered a bit = placed in routines for easy use by original requester. + case insensitivity
complete code:
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
$toHighlight = array("car","lane");
$result = customHighlights($searchString,$toHighlight);
print $result;
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/i';
}
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
function customHighlights($searchString,$toHighlight){
// define your word list
$searchFor = array_map('addRegEx',$toHighlight);
$result = preg_replace_callback($searchFor,'highlight',$searchString);
return $result;
}
I haven't tested it, but I think this should do it:-
$text = preg_replace('/\W((^\W)?$word(^\W)?)\W/', '<span class="highlighted">' . $1 . '</span>', $text);
This looks for the string inside a complete bounded word and then puts the span around the whole lot using preg_replace and regular expressions.
function replace($format, $string, array $words)
{
foreach ($words as $word) {
$string = \preg_replace(
sprintf('#\b(?<string>[^\s]*%s[^\s]*)\b#i', \preg_quote($word, '#')),
\sprintf($format, '$1'), $string);
}
return $string;
}
// courtesy of http://slipsum.com/#.T8PmfdVuBcE
$string = "Now that we know who you are, I know who I am. I'm not a mistake! It
all makes sense! In a comic, you know how you can tell who the arch-villain's
going to be? He's the exact opposite of the hero. And most times they're friends,
like you and me! I should've known way back when... You know why, David? Because
of the kids. They called me Mr Glass.";
echo \replace('<span class="red">%s</span>', $string, [
'mistake',
'villain',
'when',
'Mr Glass',
]);
Sine it's using an sprintf format for the surrounding string, you can change your replacement accordingly.
Excuse the 5.4 syntax

Finding string and replacing with same case string

I need help while trying to spin articles. I want to find text and replace synonymous text while keeping the case the same.
For example, I have a dictionary like:
hello|hi|howdy|howd'y
I need to find all hello and replace with any one of hi, howdy, or howd'y.
Assume I have a sentence:
Hello, guys! Shouldn't you say hello me when I say you HELLO?
After my operation it will be something like:
hi, guys! Shouldn't you say howd'y to me when I say howdy?
Here, I lost the case. I want to maintain it! It should actually be:
Hi, guys! Shouldn't you say howd'y to me when I say HOWDY?
My dictionary size is about 5000 lines
hello|hi|howdy|howd'y go|come
salaries|earnings|wages
shouldn't|should not
...
I'd suggest using preg_replace_callback with a callback function that examines the matched word to see if (a) the first letter is not capitalized, or (b) the first letter is the only capitalized letter, or (c) the first letter is not the only capitalized letter, and then replace with the properly modified replacement word as desired.
You can find your string and do two tests:
$outputString = 'hi';
if ( $foundString == ucfirst($foundString) ) {
$outputString = ucfirst($outputString);
} else if ( $foundString == strtoupper($foundString) ) {
$outputString = strtoupper($outputString);
} else {
// do not modify string's case
}
Here's a solution for retaining the case (upper, lower or capitalized):
// Assumes $replace is already lowercase
function convertCase($find, $replace) {
if (ctype_upper($find) === true)
return strtoupper($replace);
else if (ctype_upper($find[0]) === true)
return ucfirst($replace);
else
return $replace;
}
$find = 'hello';
$replace = 'hi';
// Find the word in all cases that it occurs in
while (($pos = stripos($input, $find)) !== false) {
// Extract the word in its current case
$found = substr($input, $pos, strlen($find));
// Replace all occurrences of this case
$input = str_replace($found, convertCase($found, $replace), $input);
}
You could try the following function. Be aware that it will only work with ASCII strings, as it uses some of the useful properties of ASCII upper and lower case letters. However, it should be extremely fast:
function preserve_case($old, $new) {
$mask = strtoupper($old) ^ $old;
return strtoupper($new) | $mask .
str_repeat(substr($mask, -1), strlen($new) - strlen($old) );
}
echo preserve_case('Upper', 'lowercase');
// Lowercase
echo preserve_case('HELLO', 'howdy');
// HOWDY
echo preserve_case('lower case', 'UPPER CASE');
// upper case
echo preserve_case('HELLO', "howd'y");
// HOWD'Y
This is my PHP version of the clever little perl function:
How do I substitute case insensitively on the LHS while preserving case on the RHS?

find occurence of a set of words

I have a pattern with a small list of words that are illegal to use as nicknames set in a pattern variable like this:
$pattern = webmaster|admin|webadmin|sysadmin
Using preg_match, how can I achieve so that nicknames with these words are forbidden, but registering something like "admin2" or "thesysadmin" is allowed?
This is the expression I have so far:
preg_match('/^['.$pattern.']/i','admin');
// Should not be allowed
Note: Using a \b didn't help much.
What about not using regex at all ?
And working with explode and in_array ?
For instance, this would do :
$pattern = 'webmaster|admin|webadmin|sysadmin';
$forbidden_words = explode('|', $pattern);
It explodes your pattern into an array, using | as separator.
And this :
$word = 'admin';
if (in_array($word, $forbidden_words)) {
echo "<p>$word is not OK</p>";
} else {
echo "<p>$word is OK</p>";
}
will get you
admin is not OK
Whereas this (same code ; only the word changes) :
$word = 'admin2';
if (in_array($word, $forbidden_words)) {
echo "<p>$word is not OK</p>";
} else {
echo "<p>$word is OK</p>";
}
will get you
admin2 is OK
This way, no need to worry about finding the right regex, to match full-words : it'll just match exact words ;-)
Edit : one problem might be that the comparison will be case-sensitive :-(
Working with everything in lowercase will help with that :
$pattern = strtolower('webmaster|admin|webadmin|sysadmin'); // just to be sure ;-)
$forbidden_words = explode('|', $pattern);
$word = 'aDMin';
if (in_array(strtolower($word), $forbidden_words)) {
echo "<p>$word is not OK</p>";
} else {
echo "<p>$word is OK</p>";
}
Will get you :
aDMin is not OK
(I saw the 'i' flag in the regex only after posting my answer ; so, had to edit it)
Edit 2 : and, if you really want to do it with a regex, you need to know that :
^ marks the beginning of the string
and $ marks the end of the string
So, something like this should do :
$pattern = 'webmaster|admin|webadmin|sysadmin';
$word = 'admin';
if (preg_match('#^(' . $pattern . ')$#i', $word)) {
echo "<p>$word is not OK</p>";
} else {
echo "<p>$word is OK</p>";
}
$word = 'admin2';
if (preg_match('#^(' . $pattern . ')$#i', $word)) {
echo "<p>$word is not OK</p>";
} else {
echo "<p>$word is OK</p>";
}
Parentheses are probably not necessary, but I like using them, to isolate what I wanted.
And, you'll get the same kind of output :
admin is not OK
admin2 is OK
You probably don't want to use [ and ] : they mean "any character that is between us", and not "the whole string that is between us".
And, as the reference : manual of the preg syntax ;-)
So, the forbidden words can be part of their username but not the whole thing?
In .NET, the pattern would be:
Allowed = Not RegEx.Match("admin", "^(webmaster|admin|webadmin|sysadmin)$")
The "^" matches the beginning of the string, the "$" matches the end, so it's looking for an exact match on one of those words. I'm a bit fuzzy on the corresponding PHP syntax.

Categories