I want to know if we can replace if(preg_match('/boo/', $anything) and preg_match('/poo/', $anything))
with a regex..
$anything = 'I contain both boo and poo!!';
for example..
From what I understand of your question, you're looking for a way to check if BOTH 'poo' and 'boo' exist within a string using only one regex. I can't think of a more elegant way than this;
preg_match('/(boo.*poo)|(poo.*boo)/', $anything);
This is the only way I can think of to ensure both patterns exists within a string disregarding order. Of course, if you knew they were always supposed to be in the same order, that would make it more simple =]
EDIT
After reading through the post linked to by MisterJ in his answer, it would seem that a more simple regex could be;
preg_match('/(?=.*boo)(?=.*poo)/', $anything);
By using a pipe:
if(preg_match('/boo|poo/', $anything))
You can use the logical or as mentioned by #sroes:
if(preg_match('/(boo)|(poo)/,$anything)) the problem there is that you don't know which one matched.
In this one, you will match "I contain boo","I contain poo" and "I contain boo and poo".
If you want to only match "I contain boo and poo", the problem is really harder to figure out Regular Expressions: Is there an AND operator?
and it seems that you will have to stick with the php test.
To take conditions literally
if(preg_match('/[bp]oo.*[bp]oo/', $anything))
You can achieve this by altering your regular expression, as others have pointed out in other answers. However, if you want to use an array instead, so you do not have to list a long regex pattern, then use something like this:
// Default matches to false
$matches = false;
// Set the pattern array
$pattern_array = array('boo','poo');
// Loop through the patterns to match
foreach($pattern_array as $pattern){
// Test if the string is matched
if(preg_match('/'.$pattern.'/', $anything)){
// Set matches to true
$matches = true;
}
}
// Proceed if matches is true
if($matches){
// Do your stuff here
}
Alternatively, if you are only trying to match strings then it would be much more efficient if you were to use strpos like so:
// Default matches to false
$matches = false;
// Set the strings to match
$strings_to_match = array('boo','poo');
foreach($strings_to_match as $string){
if(strpos($anything, $string) !== false)){
// Set matches to true
$matches = true;
}
}
Try to avoid regular expressions where possible as they are a lot less efficient!
Related
I have made this regex:
(?<=span class="ope">)?[a-z0-9]+?\.(pl|com|net\.pl|tk|org|org\.pl|eu)|$(?=<\/span>)$
It does match the strings like: example.pl, example12.com, something.eu but it will also match the dontwantthis.com.
My question is how to don't match a string in case if it contains the dontwantthis string?
You're probably following your regex with a loop to cycle through matches. In this case, it's probably easiest to just check for the presence of the dontwantthis substring and continue if it's there. Trying to implement it in regex is just asking for trouble.
It seems that you are extracting content from span elements using a regular expression. Now, despite all the reasons why this is not such a good idea...
... just keep the expression you have. Then, if you have a match, filter out the matched entries that should be rejected.
var $match = extractContentFromHtml($html); // use regex here, return false if no match
if ($match && validMatch($match)) {
// do something
}
where validMatch(string) should check if the value exists in some array, for example.
I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.
I'm trying use preg_match in an IF statement and return false if a string contains some templated functions not allowed.
Here are some example templated functions allowed:
{function="nl2br($value.field_30)"}
{function="substr($value.field_30,0,250)"}
{function="addslashes($listing.photo.image_title)"}
{function="urlencode($listing.link)"}
{function="AdZone(1)"}
These are mixed in with html etc.
Now I'd like this preg_match statement to return true if regex matches the code format but didn't contain one of the allowed function keywords:
if (preg_match('(({function=)(.+?)(nl2br|substr|addslashes|urlencode|AdZone)(.+?)\})',$string)) {
// found a function not allowed
} else {
// string contains only allowed functions or doesn't contain functions at all
}
Does anyone know how to do this?
Not quite sure what you're trying here, but if I were to make a regexp that matched a list of words (or function names as the case may be), I'd do somthing like
// add/remove allowed stuff here
$allowed = array( 'nl2br', 'substr', 'addslashes' );
// make the array into a branching pattern
$allowed_pattern = implode('|', $allowed);
// the entire regexp (a little stricter than yours)
$pattern = "/\{function=\"($allowed_pattern)\((.*?)\)\"\}/";
if( preg_match($pattern, $string, $matches) ) {
# string DOES contain an allowed function
# The $matches things is optional, but nice. $matches[1] will be the function name, and
# $matches[2] will be the arguments string. Of course, you could just do a
# preg_replace_callback() on everything instead using the same pattern...
} else {
# No allowed functions found
}
The $allowed array makes it easier to add/remove allowed function names, and the regexp is stricter about the curly brackets, quotes and general syntax, which is probably a good idea.
But first of all, flip the if..else branches, or use a !. preg_match is meant for, well, matching stuff in the string, not for matching stuff that isn't in there. So you can't really get it to return true for something that isn't there
Still, as Álvaro mentioned, regexps probably aren't the best way to go about this, and it is pretty risky to have functions exposed like that, no matter the rest of the code. If you just needed to match words it should work fine, but since it's function calls with arbitrary arguments... well. I can't really recommend it :)
Edit: First time around, I used preg_quote on the imploded string, but that of course just escapes the pipe characters, and then the pattern won't work. So skip preg_quote, but then just be sure that function names don't contain anything that might mess up the final pattern (e.g. run each function name through preg_quote before imploding the array)
I figured out how to check an OR case, preg_match( "/(word1|word2|word3)/i", $string );. What I can't figure out is how to match an AND case. I want to check that the string contains ALL the terms (case-insensitive).
It's possible to do an AND match in a single regex using lookahead, eg.:
preg_match('/^(?=.*word1)(?=.*word2)(?=.*word3)/i', $string)
however, it's probably clearer and maybe faster to just do it outside regex:
preg_match('/word1/i', $string) && preg_match('/word2/i', $string) && preg_match('/word3/i', $string)
or, if your target strings are as simple as word1:
stripos($string, 'word1')!==FALSE && stripos($string, 'word2')!==FALSE && stripos($string, 'word3')!==FALSE
I am thinking about a situation in your question that may cause some problem using and case:
this is the situation
words = "abcd","cdef","efgh"
does have to match in the string:
string = "abcdefgh"
maybe you should not using REG.EXP
If you know the order that the terms will appear in, you could use something like the following:
preg_match("/(word1).*(word2).*(word3)/i", $string);
If the order of terms isn't defined, you will probably be best using 3 separate expressions and checking that they all matched. A single expression is possible but likely complicated.
preg_match( "/word1.*word2.*word3)/i");
This works but they must appear in the stated order, you could of course alternate preg_match("/(word1.*word2.*word3|word1.*word3.*word2|word2.*word3.*word1|
word2.*word1.*word3|word3.*word2.*word1|word3.*word1.*word2)/i");
But thats pretty herendous and you'd have to be crazy!, would be nicer to just use strpos($haystack,$needle); in a loop, or multiple regex matches.
I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.