What's the best way to search a string in php and find a case insensitive match?
For example:
$SearchString = "This is a test";
From this string, I want to find the word test, or TEST or Test.
Thanks!
EDIT
I should also mention that I want to search the string and if it contains any of the words in my blacklist array, stop processing it. So an exact match of "Test" is important, however, the case is not
If you want to find word, and want to forbid "FU" but not "fun", you can use regularexpresions whit \b, where \b marks the starts and ends of words,
so if you search for "\bfu\b" if not going to match "fun",
if you add a "i" behind the delimiter, its search case insesitive,
if you got a list of word like "fu" "foo" "bar" your pattern can look like:
"#\b(fu|foo|bar)\b#i", or you can use a variable:
if(preg_match("#\b{$needle}\b#i", $haystack))
{
return FALSE;
}
Edit, added multiword example whit char escaping as requested in comments:
/* load the list somewhere */
$stopWords = array( "word1", "word2" );
/* escape special characters */
foreach($stopWords as $row_nr => $current_word)
{
$stopWords[$row_nr] = addcslashes($current_word, '[\^$.|?*+()');
}
/* create a pattern of all words (using # insted of # as # can be used in urls) */
$pattern = "#\b(" . implode('|', $stopWords) . ")\b#";
/* execute the search */
if(!preg_match($pattern, $images))
{
/* no stop words */
}
You can do one of a few things, but I tend to use one of these:
You can use stripos()
if (stripos($searchString,'test') !== FALSE) {
echo 'I found it!';
}
You can convert the string to one specific case, and search it with strpos()
if (strpos(strtolower($searchString),'test') !== FALSE) {
echo 'I found it!';
}
I do both and have no preference - one may be more efficient than the other (I suspect the first is better) but I don't actually know.
As a couple of more horrible examples, you could:
Use a regex with the i modifier
Do if (count(explode('test',strtolower($searchString))) > 1)
stripos, I would assume. Presumably it stops searching when it finds a match, and I guess internally it converts to lower (or upper) case, so that's about as good as you'll get.
http://us3.php.net/manual/en/function.preg-match.php
Depends if you want to just match
In this case you would do:
$SearchString= "This is a test";
$pattern = '/[Test|TEST]/';
preg_match($pattern, $SearchString);
I wasn't reading the question properly. As stated in other answers, stripos or a preg_match function will do exactly what you're looking for.
I originally offered the stristr function as an answer, but you actually should NOT use this if you're just looking to find a string within another string, as it returns the rest of the string in addition to the search parameter.
Related
I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.
Sadly I have to ask this question but after noodling on this problem the whole morning, I give up. Searching online, man pages, documents, none of it seems to give me a conclusive answer to what I try to do.
Looking for a regular expression for the PHP function preg_match to match a string against a pattern. Now that pattern is what gives me headaches.
The pattern should express the following: string starts with "_MG_" or "IMG_" or "DSC_", followed by four digits, followed by an optional "-N" where N is another digit. For example, "IMG_0123" or "DSC_9876-3" are valid. Everything else should be rejected.
I came up with various patterns, but none of them seems to work. For example, I tried
(_MG_|IMG_|DSC_)[0-9]{4}(-[0-9])?
and this in different variations with ( ) and apostrophes around various sub-expressions and using ? vs {0,1} and whatnot. (I experimented using grep, but got no matches still.) Yes, I know I need to add "/.../" for PHP, but here I left it out for readability's sake.
Can I even express this in a single expressions, or will I have to call the matching function several times? If several matches are required, I might be better off writing a small parser for this particular string matching myself.
Thanks!
EDIT: Here is the code that I'm working with
// Iterate over all images in this gallery folder.
if ($h = opendir($dir)) {
while (($f = readdir($h)) !== false) {
// Skip images whose name doesn't match the requirement.
if (0 == preg_match("/(_MG_|IMG_|DSC_)[0-9]{4}(-[0-9]){0,1}/", $f)) {
continue;
}
...
}
}
And this also allows image names like "_MG_7020-1-2.jpg" or "_MG_7444-5-6.2.jpg" or "IMG_6543_2_4_tonemapped.jpg" but that's not what I want to allow.
<?php
$array = array('IMG_0123', 'DSC_9876-3', '_MG_1234', 'DSC_fail');
foreach($array as $arr) {
if(preg_match("/_MG_|IMG_|DSC_[0-9]{4}[-0-9]*/", $arr)) {
echo $arr . ' => TRUE <br />';
} else {
echo $arr . ' => FALSE <br />';
}
}
?>
The above works as expected for me.
I ran this as well:
<?php
$matches = array();
preg_match('/(_MG_|IMG_|DSC_)[0-9]{4}(-[0-9])?/','IMG_0123-3',$matches );
var_dump($matches);
Output:
array(3) {
[0]=>
string(10) "IMG_0123-3"
[1]=>
string(4) "IMG_"
[2]=>
string(2) "-3"
}
Seems ok, unless I'm missing something, or unless what you're referring to is that preg_match returns false if not all your matchers () match.
Note the return type for preg_match from the php doc:
preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match. preg_match_all() on the contrary will continue until it reaches the end of subject. preg_match() returns FALSE if an error occurred.
So you may be looking to really use preg_match_all() in fact
According to this refiddle, you seem to have it solved just fine. You can use their "unit" test functionality additional "should" and "should not" match scenarios. Granted, that refiddle is using javascript's regex, but I find them to be effectively identical until you get into backreferences and lookarounds.
Here is your original pattern with start and end of string anchor as well as some edits to reduce the pattern length.
Code: (Demo)
var_export(
preg_grep(
'/^(?:DSC|[_I]MG)_\d{4}(?:-\d)?$/',
$array
)
);
I want to check whether the search keyword 'cli' or 'ent' or 'cl' word exists in the string 'client' and case insensitive. I used the preg_match function with the pattern '\bclient\b'. but it is not showing the correct result. Match not found error getting.
Please anyone help
Thanks
I wouldn't use regular expressions for this, it's extra overhead and complexity where a regular string function would suffice. Why not go with stripos() instead?
$str = 'client';
$terms = array('cli','ent','cl');
foreach($terms as $t) {
if (stripos($str,$t) !== false) {
echo "$t exists in $str";
break;
}
}
Try the pattern /cli?|ent/
Explanation:
cli matches the first part. The i? makes the i optional in the search.
| means or, and that matches cli, or ent.
\b is word boundary, It would not match cli in client, you need to remove \b
I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.
It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!
Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.
Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.
Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!
You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.