Check for word in string - php

What is the best way to search for a word in a string
preg_match("/word/",$string)
stripos("word",$string)
Or is there a better way

One benefit to using regexp for this job is the ability to use \b (Regexp word boundary) in the regexp, and other random derivations. If you are only looking for that sequence of letters in a string stripos is likely to be a little better.
$tests = array("word", "worded", "This also has the word.", "Words are not the same", "Word capitalized should match");
foreach ($tests as $string)
{
echo "Testing \"$string\": Regexp:";
echo preg_match("/\bword\b/i", $string) ? "Matched" : "Failed";
echo " stripos:";
echo stripos("word", $string) >= 0 ? "Matched": "Failed";
echo "\n";
}
Results:
Testing "word": Regexp:Matched stripos:Matched
Testing "worded": Regexp:Failed stripos:Matched
Testing "This also has the word.": Regexp:Matched stripos:Matched
Testing "Words are not the same": Regexp:Failed stripos:Matched
Testing "Word capitalized should match": Regexp:Matched stripos:Matched

Like it says in the Notes for preg_match:
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

If you are simply looking for a substring, stripos() or strpos() and friends are much better than using the preg family of functions.

For simple string matching the PHP string functions offer more performance. Regex is more heavyweight and therefore has lower performance.
Having said that, in most cases, the performance difference is small enough to go unnoticed, unless you're looping over an array with hundreds of thousands of elements or more.
Of course, as soon as you start needing "cleverer" matching, regex becomes the only game in town.

There is also substr_count($haystack, $needle) which just returns the number of substring occurences. With the added bonus of not having to worry about 0 equating to false like stripos() if the first occurrence is at position 0. Although that's not a problem if you use strict equality.
http://php.net/manual/en/function.substr-count.php

Related

How to check if array elements exist in a string

I have a list of words in an array. What is the fastest way to check if any of these words exist in an string?
Currently, I am checking the existence of array elements one by one through a foreach loop by stripos. I am curious if there is a faster method, like what we do for str_replace using an array.
Regarding to your additional comment you could explode your string into single words using explode() or preg_split() and then check this array against the needles-array using array_intersect(). So all the work is done only once.
<?php
$haystack = "Hello Houston, we have a problem";
$haystacks = preg_split("/\b/", $haystack);
$needles = array("Chicago", "New York", "Houston");
$intersect = array_intersect($haystacks, $needles);
$count = count($intersect);
var_dump($count, $intersect);
I could imagine that array_intersect() is pretty fast. But it depends what you really want (matching words, matching fragments, ..)
my personal function:
function wordsFound($haystack,$needles) {
return preg_match('/\b('.implode('|',$needles).')\b/i',$haystack);
}
//> Usage:
if (wordsFound('string string string',array('words')))
Notice if you work with UTF-8 exotic strings you need to change \b with teh corrispondent of utf-8 preg word boundary
Notice2: be sure to enter only a-z0-9 chars in $needles (thanks to MonkeyMonkey) otherwise you need to preg_quote it before
Notice3: this function is case insensitve thanks to i modifier
In general regular expressions are slower compared to basic string functions like str_ipos(). But I think it really depends on the situation. If you really need the maximum performance, I suggest making some tests with real-world data.

Find if a String is Contained in Delimiters in PHP

I have a string like this:
$string = '[Canada] [United States]';
I need to detect something like this
if($string contains Canada) {
// Do stuff for Canada
}
You could pull out all text between [ and ] and then see if it is found in the resulting array.
preg_match_all('/\[(.*?)\]/', $string, $matches);
if (in_array('Canada', $matches[1])) {
echo 'Hello, Canada!';
}
CodePad.
If you just used substring searching functions, you'd run the risk that the string you searched for was the substring of a larger block between brackets.
If you are doing simple text matching, and don't want the overhead of using the regular expressions engines, the strpos function may be what you are looking for.
See the PHP Manual page for strpos for more details.
if(strpos($string,"Canada")!==FALSE){
// Do stuff for Canada
}
The use of the strpos() function is preferred over strstr() due to the manual stating that for instances that the programmer only cares about finding a match, not where the match is located, strpos() is faster and less memory intensive.
Use strstr().

How to distinguish between a SHA1 string and a date-time string?

Basically I have two options:
$one = "fdfeb16f096983ada02db49d46a8154475d700ae";
$two = "2011-12-28 05:20:01";
I need some sort of regex, so that I can detect wether the string follows the pattern in $one, or the pattern in $two
Detect if the string is sha1 or datetime.
What would be the best way to determine this?
Thanks
If you are ABSOLUTE sure those are the ONLY two options, I would go with strlen, and not some kind of marvelous regexp.
Even if those are not the only two options (user messed up), I would still go with strlen, and then check specifically for each format, if it is what you expect it to be.
I don't know how to apply regexes in PHP, but assuming the actual regexes are enough...
/^[a-f0-9]{40}$/
Match 40 characters of consisting of a-f or 0-9. The ^ and $ match the beginning and end of the string, so nothing else can be in it.
/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$/
Match strings with the date pattern you have.
if (preg_match($one, $string) {
echo "$string matches $one";
} else if (preg_match($two, $string) {
echo "$string matches $two";
}
Try using the preg_match() function.
http://php.net/manual/en/function.preg-match.php
Conditionals with regex to match each of the options. For the second case:
if(preg_match('|(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}|',$two)
{
echo "Matching two";
}
I don't see a clear pattern for number one, but you could do elseif(s) to detect other potential cases.

Searching for words in a string

What's the best way to search a string in php and find a case insensitive match?
For example:
$SearchString = "This is a test";
From this string, I want to find the word test, or TEST or Test.
Thanks!
EDIT
I should also mention that I want to search the string and if it contains any of the words in my blacklist array, stop processing it. So an exact match of "Test" is important, however, the case is not
If you want to find word, and want to forbid "FU" but not "fun", you can use regularexpresions whit \b, where \b marks the starts and ends of words,
so if you search for "\bfu\b" if not going to match "fun",
if you add a "i" behind the delimiter, its search case insesitive,
if you got a list of word like "fu" "foo" "bar" your pattern can look like:
"#\b(fu|foo|bar)\b#i", or you can use a variable:
if(preg_match("#\b{$needle}\b#i", $haystack))
{
return FALSE;
}
Edit, added multiword example whit char escaping as requested in comments:
/* load the list somewhere */
$stopWords = array( "word1", "word2" );
/* escape special characters */
foreach($stopWords as $row_nr => $current_word)
{
$stopWords[$row_nr] = addcslashes($current_word, '[\^$.|?*+()');
}
/* create a pattern of all words (using # insted of # as # can be used in urls) */
$pattern = "#\b(" . implode('|', $stopWords) . ")\b#";
/* execute the search */
if(!preg_match($pattern, $images))
{
/* no stop words */
}
You can do one of a few things, but I tend to use one of these:
You can use stripos()
if (stripos($searchString,'test') !== FALSE) {
echo 'I found it!';
}
You can convert the string to one specific case, and search it with strpos()
if (strpos(strtolower($searchString),'test') !== FALSE) {
echo 'I found it!';
}
I do both and have no preference - one may be more efficient than the other (I suspect the first is better) but I don't actually know.
As a couple of more horrible examples, you could:
Use a regex with the i modifier
Do if (count(explode('test',strtolower($searchString))) > 1)
stripos, I would assume. Presumably it stops searching when it finds a match, and I guess internally it converts to lower (or upper) case, so that's about as good as you'll get.
http://us3.php.net/manual/en/function.preg-match.php
Depends if you want to just match
In this case you would do:
$SearchString= "This is a test";
$pattern = '/[Test|TEST]/';
preg_match($pattern, $SearchString);
I wasn't reading the question properly. As stated in other answers, stripos or a preg_match function will do exactly what you're looking for.
I originally offered the stristr function as an answer, but you actually should NOT use this if you're just looking to find a string within another string, as it returns the rest of the string in addition to the search parameter.

Is this efficient coding for anti-spam?

if(strpos($string, "A Bad Word") != false){
echo 'This word is not allowed';
}
if(strpos($string, "A Bad Word") != false){
echo 'This word is not allowed';
}
Okay, so I am trying to check the submit data to see if there are inappropriate words. Instead of making 5 instances, is there a more efficient way?
I'm sure there's a more clever way to do this in general.
If you just want to be more concise, then it's probably best to loop over some bad words, instead of adding repetitive, almost identical, conditionals (ifs):
<?PHP
$banned = array('bad','words','like','these');
$looksLikeSpam = false;
foreach($banned as $naughty){
if (strpos($string,$naugty) !== false){
$looksLikeSpam=true;
}
}
if ($looksLikeSpam){
echo "You're GROSS! Just... ew!";
die();
}
Edit: Also, note that in your question-code, you test strpos != false. You really want !==, since strpos() will return 0 if the first word, is, say, PENIS. 0 will be cast to false. See where I'm going here?
Also, you probably want to use stripos(), to be case-insensitive (unless you only care if if people SHOUT offensive words) :-)
Yes, you could make an array of badwords and build a regex out of it. This would also make handling case-insensitivity easy.
$badwords = array('staircase', 'tuna', 'pillow');
$badwords_regex = '/' . implode('|', $badwords) . '/i';
$contains_badwords = preg_match($badwords_regex, $text);
No, it's crap. There is a whole branch of computing science concerning string searching algorithms. Heck, Knuth even dedicated half of TAOCP Volume 3 to it.
Boyer-Moore is a good algorithm, now used in many applications involving searching for multiple needles in a haystack.
You need to be careful with word boundaries, or else people will complain about not being able to enter words like "shuttlecock".
I hope you (or your client) realises that automatic "naughty word" filtering does not remove the need for moderating. There are lots of ways to be offensive without using any of the supposedly naughty words. Even deciding what is or is not offensive depends on the cultural context.
You could combine them as a single regular expression and then use preg_grep() to confirm their existence
Use an array of values and iterate over the array, checking the submitted word each time. If a match is found break out of the loop and return true.
You might use PHP in_array function rather than a loop, if you're checking one word. A regex would be better if you're checking a whole sentence though.
http://us2.php.net/manual/en/function.in-array.php
$bad_word_array=array('weenis','dolt','wanker');
$passed=in_array($suspected_word,$bad_word_array);

Categories