I have this function that returns true if one of the bad words is found in the array $stopwords
function stopWords($string, $stopwords) {
$stopwords = explode(',', $stopwords);
$pattern = '/\b(' . implode('|', $stopwords) . ')\b/i';
if(preg_match($pattern, $string) > 0) {
return true;
}
return false;
}
It seems to work fine.
The problem is that when the array $stopwords is empty ( so no bad words specified ), it always returns true, like if the empty value is recognized as a bad word and it always returns true ( I think the issue it's this but maybe is another one ).
Can anyone help me sorting out this issue?
Thanks
I would use in_array():
function stopWords($string, $stopwords) {
return in_array($string, explode(',',$stopwords));
}
This will save some time instead of the regexp.
EDIT: to match any word in the string
function stopWords($string, $stopwords) {
$wordsArray = explode(' ', $string);
$stopwordsArray = explode(',',$stopwords);
return count(array_intersect($wordsArray, $stopwordsArray)) < 1;
}
Give $stopwords as an array
function stopWords($string, $stopwords) {
//Fail in safe mode, if $stopwords is no array
if (!is_array($stopwords)) return true;
//Empty $stopwords means all is OK
if (sizeof($stopwords)<1) return false;
....
If the array $stopwords is empty, than explode(',', $stopwords) evaluates to an empty string and $pattern equals /\b( )\b/i. This is the reason why your function returns true if $stopwords is empty.
The easiest way to fix it is to add an if statement to check whether the array is empty or not.
You can put a condition like this:
if (!empty ($stopwords)) { your code} else {echo ("no bad words");}
And then ask the user or application to input some bad words.
Related
I have following php code to find bad word in a string.
It stop on first bad word found and return true.
The bad words are provided as comma separated list that is converted to array.
$paragraph = "We have fur to sell";
$badWords = "sis, fur";
$badWordsArray = explode(",", $badWords);
function strpos_arr($string, $array, $offset=0) { // Find array values in string
foreach($array as $query) {
if(strpos($string, $query, $offset) !== false) return true; // stop on first true result for efficiency
}
return false;
}
strpos_arr($paragraph, $badWordsArray);
The issue is it also returns true if bad word provided is a part of another word.
I prefer using strpos.
Please also suggest if there is any more efficient way to find bad words.
try this, with reqular expression:
$paragraph = "We have fur to sell";
$badWords = "sis, fur";
$badWordsArray = preg_split('/\s*,\s*/', $badWords, -1, PREG_SPLIT_NO_EMPTY);
var_dump($badWordsArray);
function searchBadWords($string, $array, $offset=0) { // Find array values in string
foreach ($array as $query) {
if (preg_match('/\b' . preg_quote($query, '/') . '\b/i', $string)) return true; // stop on first true result for efficiency
}
return false;
}
var_dump(searchBadWords($paragraph, $badWordsArray));
Explanation:
First. We want to correctly split our $badWords string:
$badWordsArray = preg_split('/\s*,\s*/', $badWords, -1, PREG_SPLIT_NO_EMPTY);
This way we will correctly split strings like "sis, fur" and "sis , fur" and even "sis , , fur" to an array('sis', 'fur').
Then we are performing regexp-search of exact word using \b meta-character. Which means word-boundary in terms of regular expression, that is position between a word-characted and a non-word-character.
Just include spaces in your search string.
$paragraph = "We have fur to sell";
$badWords = "sis, fur";
$badWordsArray = explode(",", $badWords);
function strpos_arr($string, $array, $offset=0) { // Find array values in string
$string = " ".$string." ";
foreach($array as $query) {
$query = " ".$query." ";
if(strpos($string, $query, $offset) !== false) return true; // stop on first true result for efficiency
}
return false;
}
strpos_arr($paragraph, $badWordsArray);
I have a simple code that doesn't work correctly, I have a file like this:
David
Jordan
Steve
& in a simple PHP code:
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
$array[]=$file->fgets();
}
$string = 'Hi , I\'M David';
if(strposa($string, $array)){
echo 'true';
} else {
echo 'false';
}
function strposa($haystack, $needle, $offset=0) {
if(!is_array($needle)) $needle = array($needle);
foreach($needle as $query) {
if(strpos($haystack, $query, $offset) !== false) return true; // stop on first true result
}
return false;
}
but this code doesn't work correctly ,
if
$string = 'Hi , I\'M David';
It's Return false but when $string change to:
$string = 'Hi , I\'M Steve';
It return True!
finally, I find three ways to fix this .
way 1 => use rtrim function:
$array[]=rtrim($file->fgets());
way 2 => use str_replace function :
$array=str_replace("\r\n","",$array);
or
$array[]=str_replace("\r\n","",$file->fgets());
way 3 => use file function :
$array = file("file.txt", FILE_IGNORE_NEW_LINES);
The output from $file->fgets() function will contain newline character \n at the end. That's why strpos() function is returning false.
You have to clear the newline character from fgets() function by using trim() function.
I would like to search for a substring in php so that it will be at the end of the given string.
Eg
on string 'abd def' if I search for def it would be at the end, so return true. But if I search for abd it will return false since it is not at the end.
Is it possible?
You could use preg_match for this:
$str = 'abd def';
$result = (preg_match("/def$/", $str) === 1);
var_dump($result);
An alternative way to do it which does not require splitting by a separator or regular expressions. This tests whether the last x characters equal the test string, where x equals the length of the test string:
$string = "abcdef";
$test = "def";
if(substr($string, -(strlen($test))) === $test)
{
/* logic here */
}
Assuming whole words:
$match = 'def';
$words = explode(' ', 'abd def');
if (array_pop($words) == $match) {
...
}
Or using a regex:
if (preg_match('/def$/', 'abd def')) {
...
}
This answer should be fully robust regardless of full words or anything else
$match = 'def';
$words = 'abd def';
$location = strrpos($words, $match); // Find the rightmost location of $match
$matchlength = strlen($match); // How long is $match
/* If the rightmost location + the length of what's being matched
* is equal to the length of what's being searched,
* then it's at the end of the string
*/
if ($location + $matchlength == strlen($words)) {
...
}
Please look strrchr() function. Try like this
$word = 'abcdef';
$niddle = 'def';
if (strrchr($word, $niddle) == $niddle) {
echo 'true';
} else {
echo 'false';
}
I have a PHP array of about 20,000 names, I need to filter through it and remove any name that has the word job, freelance, or project in the name.
Below is what I have started so far, it will cycle through the array and add the cleaned item to build a new clean array. I need help matching the "bad" words though. Please help if you can
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
// freelance
// job
// project
$cleanArray = array();
foreach ($data1 as $name) {
# if a term is matched, we remove it from our array
if(preg_match('~\b(freelance|job|project)\b~i',$name)){
echo 'word removed';
}else{
$cleanArray[] = $name;
}
}
Right now it matches a word so if "freelance" is a name in the array it removes that item but if it is something like ImaFreelaner then it does not, I need to remove anything that has the matching words in it at all
A regular expression is not really necessary here — it'd likely be faster to use a few stripos calls. (Performance matters on this level because the search occurs for each of the 20,000 names.)
With array_filter, which only keeps elements in the array for which the callback returns true:
$data1 = array_filter($data1, function($el) {
return stripos($el, 'job') === FALSE
&& stripos($el, 'freelance') === FALSE
&& stripos($el, 'project') === FALSE;
});
Here's a more extensible / maintainable version, where the list of bad words can be loaded from an array rather than having to be explicitly denoted in the code:
$data1 = array_filter($data1, function($el) {
$bad_words = array('job', 'freelance', 'project');
$word_okay = true;
foreach ( $bad_words as $bad_word ) {
if ( stripos($el, $bad_word) !== FALSE ) {
$word_okay = false;
break;
}
}
return $word_okay;
});
I'd be inclined to use the array_filter function and change the regex to not match on word boundaries
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
$cleanArray = array_filter($data1, function($w) {
return !preg_match('~(freelance|project|job)~i', $w);
});
Use of the preg_match() function and some regular expressions should do the trick; this is what I came up with and it worked fine on my end:
<?php
$data1=array('JoomlaFreelance','PhillyWebJobs','web2project','cleanname');
$cleanArray=array();
$badWords='/(job|freelance|project)/i';
foreach($data1 as $name) {
if(!preg_match($badWords,$name)) {
$cleanArray[]=$name;
}
}
echo(implode($cleanArray,','));
?>
Which returned:
cleanname
Personally, I would do something like this:
$badWords = ['job', 'freelance', 'project'];
$names = ['JoomlaFreelance', 'PhillyWebJobs', 'web2project', 'cleanname'];
// Escape characters with special meaning in regular expressions.
$quotedBadWords = array_map(function($word) {
return preg_quote($word, '/');
}, $badWords);
// Create the regular expression.
$badWordsRegex = implode('|', $quotedBadWords);
// Filter out any names that match the bad words.
$cleanNames = array_filter($names, function($name) use ($badWordsRegex) {
return preg_match('/' . $badWordsRegex . '/i', $name) === FALSE;
});
This should be what you want:
if (!preg_match('/(freelance|job|project)/i', $name)) {
$cleanArray[] = $name;
}
I want to create a function in PHP that will return true when it finds that in the string there are some bad words.
Here is an example:
function stopWords($string, $stopwords) {
if(the words in the stopwords variable are found in the string) {
return true;
}else{
return false;
}
Please assume that $stopwords variable is an array of values, like:
$stopwords = array('fuc', 'dic', 'pus');
How can I do that?
Thanks
Use the strpos function.
// the function assumes the $stopwords to be an array of strings that each represent a
// word that should not be in $string
function stopWords($string, $stopwords)
{
// input parameters validation excluded for brevity..
// take each of the words in the $stopwords array
foreach($stopwords as $badWord)
{
// if the $badWord is found in the $string the strpos will return non-FALSE
if(strpos($string, $badWord) !== FALSE))
return TRUE;
}
// if the function hasn't returned TRUE yet it must be that no bad words were found
return FALSE;
}
Use regular expressions:
\b matches a word boundary, use it to match only whole words
use flag i to perform case-insensitive matches
Match each word like so:
function stopWords($string, $stopwords) {
foreach ($stopwords as $stopword) {
$pattern = '/\b' . $stopword . '\b/i';
if (preg_match($pattern, $string)) {
return true;
}
}
return false;
}
$stopwords = array('fuc', 'dic', 'pus');
$bad = stopWords('confucius', $stopwords); // true
$bad = stopWords('what the Fuc?', $stopwords); // false
A shorter version, inspired by an answer to this question: determine if a string contains one of a set of words in an array is to use implode to create one big expression:
function stopWords($string, $stopwords) {
$pattern = '/\b(' . implode('|', $stopwords) . ')\b/i';
return preg_match($pattern, $string) > 0;
}
function stopWords($string, $stopwords) {
$words=explode(' ', $string); //splits the string into words and stores it in an array
foreach($stopwords as $stopword)//loops through the stop words array
{
if(in_array($stopword, $words)) {//if the current stop word exists
//in the words contained in $string then exit the function
//immediately and return true
return true;
}
}
//else if none of the stop words were in $string then return false
return false;
}
I'm assuming here that $stopwords is an array to begin with. It should be if it's not.