How to match full words? - php

I use a simple preg_match_all to find the occurrence of a list of words in a text.
$pattern = '/(word1|word2|word3)/';
$num_found = preg_match_all( $pattern, $string, $matches );
But this also match subset of words like abcword123. I need it to find word1, word2 and word3 when they're occurring as full words only. Note that this doesn't always mean that they're separated by spaces on both sides, it could be a comma, semi-colon, period, exclamation mark, question mark, or another punctuation.

IF you are looking to match "word1", "word2", "word3" etc only then using in_array is always better. Regex are super powerful but it takes a lot of cpu power also. So try to avoid it when ever possible
$words = array ("word1", "word2", "word3" );
$found = in_array ($string, $words);
check PHP: in_array - Manual for more information on in_array
And if you want to use regex only try
$pattern = '/^(word1|word2|word3)$/';
$num_found = preg_match_all( $pattern, $string, $matches );
And if you want to get something like "this statement has word1 in it", then use "\b" like
$pattern = '/\b(word1|word2|word3)\b/';
$num_found = preg_match_all( $pattern, $string, $matches );
More of it here PHP: Escape sequences - Manual search for \b

Try:
$pattern = '/\b(word1|word2|word3)\b/';
$num_found = preg_match_all( $pattern, $string, $matches );

You can use \b to match word boundaries. So you want to use /\b(word1|word2|word3)\b/ as your regex.

Related

How to split repeating pattern?

I have a string that repeats a pattern. I have a regular expression that matches the pattern, but I would like to split them instead.
$target = 'a1v33a33v55a2v43';
I would like to split them into a1v33, a33v55, and a2v43. Basically, I want to split the string into an array of ['a1v33', 'a33v55', 'a2v43'].
I've tried the following code, but it only matches the pattern. How can I split them instead?
$target = 'a1v33a33v55a2v43';
$pattern = '/(a[0-9]+v[0-9]+)*$/im';
preg_match($pattern, $target, $match);
echo '<pre>';
print_r($match);
You can use preg_split too:
$result = preg_split('~(?=a)~i', $target, -1, PREG_SPLIT_NO_EMPTY);
Use preg_match_all with '/a[0-9]+v[0-9]+/i':
$target = 'a1v33a33v55a2v43';
$pattern = '/a[0-9]+v[0-9]+/i';
preg_match_all($pattern, $target, $match);
print_r($match);
See the IDEONE demo
The /(a[0-9]+v[0-9]+)*$/im pattern matches some substrings meeting a[0-9]+v[0-9]+ pattern, 1 or more occurrences, up to the end of the string ($). When we remove the quantified grouping with the end-of-line/string anchor, we can match indiviual tokens.

Why is my regex rejecting apostrophes?

I'm making a regex which should match everything like that : [[First example]] or [[I'm an example]].
Unfortunately, it doesn't match [[I'm an example]] because of the apostrophe.
Here it is :
preg_replace_callback('/\[\[([^?"`*%#\\\\:<>]+)\]\]/iU', ...)
Simple apostrophes (') are allowed so I really do not understand why it doesn't work.
Any ideas ?
EDIT : Here is what's happening before I'm using this regex
// This match something [[[like this]]]
$contents = preg_replace_callback('/\[\[\[(.+)\]\]\]/isU',function($matches) {
return '<blockquote>'.$matches[1].'</blockquote>';
}, $contents);
// This match something [[like that]] but doesn't work with apostrophe/quote when
// the first preg_replace_callback has done his job
$contents = preg_replace_callback('/\[\[([^?"`*%#\\\\:<>]+)\]\]/iU', ..., $contents);
try this:
$string = '[[First example]]';
$pattern = '/\[\[(.*?)\]\]/';
preg_match ( $pattern, $string, $matchs );
var_dump ( $matchs );
You can use this regex:
\[\[.*?]]
Working demo
Php code
$re = '/\[\[.*?]]/';
$str = "not match this but [[Match this example]] and not this";
preg_match_all($re, $str, $matches);
Btw, if you want to capture the content within brackets you have to use capturing groups:
\[\[(.*?)]]

PHP preg_match everything except commas?

$regex = "/(.+),(.+);/";
$input = "somestring, 234, sometring5";
preg_match_all($regex, $input, $matches, PREG_SET_ORDER);
I've tried to make it like this:
$regex = "/(.^,+),(.^,+);/";
$input = "somestring, 234, sometring5";
preg_match_all($regex, $input, $matches, PREG_SET_ORDER);
But it doesn't work, because I thought that ^, means except commas, but why it doesn't work?
Because I want to group them by commas, but the commas are symbols itself that the parser gets how to avoid this?
You could just split the string on , and trim the result:
$matches = array_map('trim', explode(',', $input));
Here's a modified version of your RegEx, with some explanations for each modifier.
And for those not willing to visit the link:
A RegEx to match all words in a sentence is /([a-zA-Z0-9]*)/g
[a-zA-Z0-9] means match all non symbol characters (a-z, A-Z and 0-9)
* means match it as many repeating times as possible
g modifier (the /g at the end) means match as many as possible inside the string; don't just stop at the first one.
$regex = "/([^,])/g";
$input = "somestring, 234, sometring5";
preg_match_all($regex, $input, $matches, PREG_SET_ORDER);
will get you everything except commas.

PHP preg_match matches

I'm trying to get all the matches from string:
$string = '[RAND_15]d4trg[RAND_23]';
with preg_match like this:
$match = array();
preg_match('#\[RAND_.*]#', $string, $match);
but after that $match array looks like this:
Array ( [0] => [RAND_15]d4trg[RAND_23] )
What should I do to get both occurrences as 2 separate elements in $match array? I would like to get result like this:
$match[0] = [RAND_15];
$match[1] = [RAND_23];
Use ...
$match = array();
preg_match_all('#\[RAND_.*?]#', $string, $match);
... instead. ? modifier will make the pattern become 'lazy', matching the shortest possible substring. Without it the pattern will try to cover the maximum distance possible, and technically, [RAND_15]d4trg[RAND_23] does match the pattern.
Another way is restricting the set of characters to match with negated character class:
$match = array();
preg_match_all('#\[RAND_[^]]*]#', $string, $match);
This way we won't have to turn the quantifier into a lazy one, as [^]] character class will stop matching at the first ] symbol.
Still, to catch all the matches you should use preg_match_all instead of preg_match. Here's the demo illustrating the difference.

preg_match and a troublesome regex

Could some one tell me why
//string
$content = 'random ${one.var} ${two.var} random';
//match
preg_match('/(?:(?<=\$\{))([\w.]+){1}(?=\})/i', $content, $matches);
is returning
print_R($matches);
//output
array(
[0]=>one.var
[1]=>one.var
);
What i want is
array(
[0]=>one.var
[1]=>two.var
);
Both the whole regex (0) as the inner capture () (1) match the same thing, so that part of the match makes sense. You probably want preg_match_all, which captures all matches...
preg_match_all('/(?<=\$\{)[\w.]+(?=\})/i', $content, $matches);
You should use preg_match_all to perform a global regex search, also - i think you can simplify the pattern this way:
preg_match_all('/\$\{(.*?)\}/', $content, $matches)
just try preg_match_all()
http://php.net/preg_match_all Searches subject for all matches
http://php.net/preg_match Searches subject for a match

Categories