Stop Words function - php

I have this function that returns true if one of the bad words is found in the array $stopwords
function stopWords($string, $stopwords) {
$stopwords = explode(',', $stopwords);
$pattern = '/\b(' . implode('|', $stopwords) . ')\b/i';
if(preg_match($pattern, $string) > 0) {
return true;
}
return false;
}
It seems to work fine.
The problem is that when the array $stopwords is empty ( so no bad words specified ), it always returns true, like if the empty value is recognized as a bad word and it always returns true ( I think the issue it's this but maybe is another one ).
Can anyone help me sorting out this issue?
Thanks

I would use in_array():
function stopWords($string, $stopwords) {
return in_array($string, explode(',',$stopwords));
}
This will save some time instead of the regexp.
EDIT: to match any word in the string
function stopWords($string, $stopwords) {
$wordsArray = explode(' ', $string);
$stopwordsArray = explode(',',$stopwords);
return count(array_intersect($wordsArray, $stopwordsArray)) < 1;
}

Give $stopwords as an array
function stopWords($string, $stopwords) {
//Fail in safe mode, if $stopwords is no array
if (!is_array($stopwords)) return true;
//Empty $stopwords means all is OK
if (sizeof($stopwords)<1) return false;
....

If the array $stopwords is empty, than explode(',', $stopwords) evaluates to an empty string and $pattern equals /\b( )\b/i. This is the reason why your function returns true if $stopwords is empty.
The easiest way to fix it is to add an if statement to check whether the array is empty or not.

You can put a condition like this:
if (!empty ($stopwords)) { your code} else {echo ("no bad words");}
And then ask the user or application to input some bad words.

Related

find bad word using strpos from string

I have following php code to find bad word in a string.
It stop on first bad word found and return true.
The bad words are provided as comma separated list that is converted to array.
$paragraph = "We have fur to sell";
$badWords = "sis, fur";
$badWordsArray = explode(",", $badWords);
function strpos_arr($string, $array, $offset=0) { // Find array values in string
foreach($array as $query) {
if(strpos($string, $query, $offset) !== false) return true; // stop on first true result for efficiency
}
return false;
}
strpos_arr($paragraph, $badWordsArray);
The issue is it also returns true if bad word provided is a part of another word.
I prefer using strpos.
Please also suggest if there is any more efficient way to find bad words.
try this, with reqular expression:
$paragraph = "We have fur to sell";
$badWords = "sis, fur";
$badWordsArray = preg_split('/\s*,\s*/', $badWords, -1, PREG_SPLIT_NO_EMPTY);
var_dump($badWordsArray);
function searchBadWords($string, $array, $offset=0) { // Find array values in string
foreach ($array as $query) {
if (preg_match('/\b' . preg_quote($query, '/') . '\b/i', $string)) return true; // stop on first true result for efficiency
}
return false;
}
var_dump(searchBadWords($paragraph, $badWordsArray));
Explanation:
First. We want to correctly split our $badWords string:
$badWordsArray = preg_split('/\s*,\s*/', $badWords, -1, PREG_SPLIT_NO_EMPTY);
This way we will correctly split strings like "sis, fur" and "sis , fur" and even "sis , , fur" to an array('sis', 'fur').
Then we are performing regexp-search of exact word using \b meta-character. Which means word-boundary in terms of regular expression, that is position between a word-characted and a non-word-character.
Just include spaces in your search string.
$paragraph = "We have fur to sell";
$badWords = "sis, fur";
$badWordsArray = explode(",", $badWords);
function strpos_arr($string, $array, $offset=0) { // Find array values in string
$string = " ".$string." ";
foreach($array as $query) {
$query = " ".$query." ";
if(strpos($string, $query, $offset) !== false) return true; // stop on first true result for efficiency
}
return false;
}
strpos_arr($paragraph, $badWordsArray);

Where the bug of this code

I have a simple code that doesn't work correctly, I have a file like this:
David
Jordan
Steve
& in a simple PHP code:
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
$array[]=$file->fgets();
}
$string = 'Hi , I\'M David';
if(strposa($string, $array)){
echo 'true';
} else {
echo 'false';
}
function strposa($haystack, $needle, $offset=0) {
if(!is_array($needle)) $needle = array($needle);
foreach($needle as $query) {
if(strpos($haystack, $query, $offset) !== false) return true; // stop on first true result
}
return false;
}
but this code doesn't work correctly ,
if
$string = 'Hi , I\'M David';
It's Return false but when $string change to:
$string = 'Hi , I\'M Steve';
It return True!
finally, I find three ways to fix this .
way 1 => use rtrim function:
$array[]=rtrim($file->fgets());
way 2 => use str_replace function :
$array=str_replace("\r\n","",$array);
or
$array[]=str_replace("\r\n","",$file->fgets());
way 3 => use file function :
$array = file("file.txt", FILE_IGNORE_NEW_LINES);
The output from $file->fgets() function will contain newline character \n at the end. That's why strpos() function is returning false.
You have to clear the newline character from fgets() function by using trim() function.

search for a substring which returns true if it is at the end

I would like to search for a substring in php so that it will be at the end of the given string.
Eg
on string 'abd def' if I search for def it would be at the end, so return true. But if I search for abd it will return false since it is not at the end.
Is it possible?
You could use preg_match for this:
$str = 'abd def';
$result = (preg_match("/def$/", $str) === 1);
var_dump($result);
An alternative way to do it which does not require splitting by a separator or regular expressions. This tests whether the last x characters equal the test string, where x equals the length of the test string:
$string = "abcdef";
$test = "def";
if(substr($string, -(strlen($test))) === $test)
{
/* logic here */
}
Assuming whole words:
$match = 'def';
$words = explode(' ', 'abd def');
if (array_pop($words) == $match) {
...
}
Or using a regex:
if (preg_match('/def$/', 'abd def')) {
...
}
This answer should be fully robust regardless of full words or anything else
$match = 'def';
$words = 'abd def';
$location = strrpos($words, $match); // Find the rightmost location of $match
$matchlength = strlen($match); // How long is $match
/* If the rightmost location + the length of what's being matched
* is equal to the length of what's being searched,
* then it's at the end of the string
*/
if ($location + $matchlength == strlen($words)) {
...
}
Please look strrchr() function. Try like this
$word = 'abcdef';
$niddle = 'def';
if (strrchr($word, $niddle) == $niddle) {
echo 'true';
} else {
echo 'false';
}

Filter a set of bad words out of a PHP array

I have a PHP array of about 20,000 names, I need to filter through it and remove any name that has the word job, freelance, or project in the name.
Below is what I have started so far, it will cycle through the array and add the cleaned item to build a new clean array. I need help matching the "bad" words though. Please help if you can
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
// freelance
// job
// project
$cleanArray = array();
foreach ($data1 as $name) {
# if a term is matched, we remove it from our array
if(preg_match('~\b(freelance|job|project)\b~i',$name)){
echo 'word removed';
}else{
$cleanArray[] = $name;
}
}
Right now it matches a word so if "freelance" is a name in the array it removes that item but if it is something like ImaFreelaner then it does not, I need to remove anything that has the matching words in it at all
A regular expression is not really necessary here — it'd likely be faster to use a few stripos calls. (Performance matters on this level because the search occurs for each of the 20,000 names.)
With array_filter, which only keeps elements in the array for which the callback returns true:
$data1 = array_filter($data1, function($el) {
return stripos($el, 'job') === FALSE
&& stripos($el, 'freelance') === FALSE
&& stripos($el, 'project') === FALSE;
});
Here's a more extensible / maintainable version, where the list of bad words can be loaded from an array rather than having to be explicitly denoted in the code:
$data1 = array_filter($data1, function($el) {
$bad_words = array('job', 'freelance', 'project');
$word_okay = true;
foreach ( $bad_words as $bad_word ) {
if ( stripos($el, $bad_word) !== FALSE ) {
$word_okay = false;
break;
}
}
return $word_okay;
});
I'd be inclined to use the array_filter function and change the regex to not match on word boundaries
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
$cleanArray = array_filter($data1, function($w) {
return !preg_match('~(freelance|project|job)~i', $w);
});
Use of the preg_match() function and some regular expressions should do the trick; this is what I came up with and it worked fine on my end:
<?php
$data1=array('JoomlaFreelance','PhillyWebJobs','web2project','cleanname');
$cleanArray=array();
$badWords='/(job|freelance|project)/i';
foreach($data1 as $name) {
if(!preg_match($badWords,$name)) {
$cleanArray[]=$name;
}
}
echo(implode($cleanArray,','));
?>
Which returned:
cleanname
Personally, I would do something like this:
$badWords = ['job', 'freelance', 'project'];
$names = ['JoomlaFreelance', 'PhillyWebJobs', 'web2project', 'cleanname'];
// Escape characters with special meaning in regular expressions.
$quotedBadWords = array_map(function($word) {
return preg_quote($word, '/');
}, $badWords);
// Create the regular expression.
$badWordsRegex = implode('|', $quotedBadWords);
// Filter out any names that match the bad words.
$cleanNames = array_filter($names, function($name) use ($badWordsRegex) {
return preg_match('/' . $badWordsRegex . '/i', $name) === FALSE;
});
This should be what you want:
if (!preg_match('/(freelance|job|project)/i', $name)) {
$cleanArray[] = $name;
}

Stop Words into a string

I want to create a function in PHP that will return true when it finds that in the string there are some bad words.
Here is an example:
function stopWords($string, $stopwords) {
if(the words in the stopwords variable are found in the string) {
return true;
}else{
return false;
}
Please assume that $stopwords variable is an array of values, like:
$stopwords = array('fuc', 'dic', 'pus');
How can I do that?
Thanks
Use the strpos function.
// the function assumes the $stopwords to be an array of strings that each represent a
// word that should not be in $string
function stopWords($string, $stopwords)
{
// input parameters validation excluded for brevity..
// take each of the words in the $stopwords array
foreach($stopwords as $badWord)
{
// if the $badWord is found in the $string the strpos will return non-FALSE
if(strpos($string, $badWord) !== FALSE))
return TRUE;
}
// if the function hasn't returned TRUE yet it must be that no bad words were found
return FALSE;
}
Use regular expressions:
\b matches a word boundary, use it to match only whole words
use flag i to perform case-insensitive matches
Match each word like so:
function stopWords($string, $stopwords) {
foreach ($stopwords as $stopword) {
$pattern = '/\b' . $stopword . '\b/i';
if (preg_match($pattern, $string)) {
return true;
}
}
return false;
}
$stopwords = array('fuc', 'dic', 'pus');
$bad = stopWords('confucius', $stopwords); // true
$bad = stopWords('what the Fuc?', $stopwords); // false
A shorter version, inspired by an answer to this question: determine if a string contains one of a set of words in an array is to use implode to create one big expression:
function stopWords($string, $stopwords) {
$pattern = '/\b(' . implode('|', $stopwords) . ')\b/i';
return preg_match($pattern, $string) > 0;
}
function stopWords($string, $stopwords) {
$words=explode(' ', $string); //splits the string into words and stores it in an array
foreach($stopwords as $stopword)//loops through the stop words array
{
if(in_array($stopword, $words)) {//if the current stop word exists
//in the words contained in $string then exit the function
//immediately and return true
return true;
}
}
//else if none of the stop words were in $string then return false
return false;
}
I'm assuming here that $stopwords is an array to begin with. It should be if it's not.

Categories