I have an array of patterns:
$patterns= array(
'#http://(www\.)?domain-1.com/.*#i',
'#http://(www\.)?domain-2.com/.*#i',
...
);
and I have long string that contains multiple text and urls I want to match the first url occurred in the string, I only tried this :
foreach ($patterns as $pattern) {
preg_match($pattern, $the_string, $match);
echo '<pre>'; print_r($match); echo '</pre>';
}
It returns empty arrays where there are no match for some patterns and arrays that contains a url but depending on the order of the array $patterns,
how can I find any match of these patterns that occurred first.
You basically have three options:
match a general URL pattern and then run that URL against the patterns you've got. If none matched, continue with the second result from the general pattern.
run all your patterns with the PREG_OFFSET_CAPTURE flag to get the offset the pattern matched at. find the lowest offset, return your result
combine your various patterns to a single pattern. Be aware that there are limits to the length of a pattern (64K in compiled form)
Option 2:
<?php
$text = "hello world http://www.domain-2.com/foo comes before http://www.domain-1.com/bar";
$patterns= array(
'#http://(www\.)?domain-1.com/[^\s]*#i',
'#http://(www\.)?domain-2.com/[^\s]*#i',
);
$match = null;
$offset = null;
foreach ($patterns as $pattern) {
if (preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE)) {
if ($matches[0][1] < $offset || $offset === null) {
$offset = $matches[0][1];
$match = $matches[0][0];
}
}
}
var_dump($match);
beware that I changed your demo patterns. I replaced .* (anything) by [^\s]* (everything but space) to prevent the pattern from matching more than it's supposed to
I guess you're looking for this:
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match)) {
echo '<pre>'; print_r($match); echo '</pre>';
break;
}
}
UPDATE:
Then I think you should work with offsets linke this:
$matches = array();
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match, PREG_OFFSET_CAPTURE)) {
$matches[$match[0][1]] = $match[0][0];
}
}
echo reset($matches);
I can't think of any way except evaluting all the strings one at a time, and grabbing the earliest:
$easliestPos = strlen($the_string) + 1;
$earliestMatch = false;
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match)) {
$myMatch = $match[0];
$myMatchPos = strpos($myMatch, $the_string);
if ($myMatchPos < $easliestPos ) {
$easliestPos = $myMatchPos;
$earliestMatch = $myMatch ;
}
}
}
if ($earliestMatch ) {
echo $earliestMatch;
}
Related
How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
How to make preg find all possible solutions for regular expression pattern?
Here's the code:
<?php
$text = 'Amazing analyzing.';
$regexp = '/(^|\\b)([\\S]*)(a)([\\S]*)(\\b|$)/ui';
$matches = array();
if (preg_match_all($regexp, $text, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
echo "{$match[2]}[{$match[3]}]{$match[4]}\n";
}
}
?>
Output:
Am[a]zing
an[a]lyzing.
Output that i need:
[A]mazing
Am[a]zing
[A]nalyzing.
an[a]lyzing.
You have to use look behind/ahead zero-length assertions (instead of a normal pattern which consumes the characters around what your are looking for): http://www.regular-expressions.info/lookaround.html
Lookaround assertions won't help, for two reasons:
Since they are zero-length, they won't return characters that you need.
As Avinash Raj noted, PHP lookbehind doesn't allow *.
This yields the output that you need:
<?php
$text = 'Amazing analyzing.';
foreach (preg_split('/\s+/', $text) as $word)
{
$matches = preg_split('/(a)/i', $word, 0, PREG_SPLIT_DELIM_CAPTURE);
for ($match = 1; $match < count($matches); $match += 2)
{
$prefix = join(array_slice($matches, 0, $match));
$suffix = join(array_slice($matches, $match+1));
echo "{$prefix}[{$matches[$match]}]{$suffix}\n";
}
}
?>
This function filer the email from text and return matched pattern
function parse($text, $words)
{
$resultSet = array();
foreach ($words as $word){
$pattern = 'regex to match emails';
preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );
$this->pushToResultSet($matches);
}
return $resultSet;
}
Similar way I want to match bad words from text and return them as $resultSet.
Here is code to filter badwords
TEST HERE
$badwords = array('shit', 'fuck'); // Here we can use all bad words from database
$text = 'Man, I shot this f*ck, sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
echo "filtered words <br>";
echo $text."<br/>";
$words = explode(' ', $text);
foreach ($words as $word)
{
$bad= false;
foreach ($badwords as $badword)
{
if (strlen($word) >= strlen($badword))
{
$wordOk = false;
for ($i = 0; $i < strlen($badword); $i++)
{
if ($badword[$i] !== $word[$i] && ctype_alpha($word[$i]))
{
$wordOk = true;
break;
}
}
if (!$wordOk)
{
$bad= true;
break;
}
}
}
echo $bad ? 'beep ' : ($word . ' '); // Here $bad words can be returned and replace with *.
}
Which replaces badwords with beep
But I want to push matched bad words to $this->pushToResultSet() and returning as in first code of email filtering.
can I do this with my bad filtering code?
Roughly converting David Atchley's answer to PHP, does this work as you want it to?
$blocked = array('fuck','shit','damn','hell','ass');
$text = 'Man, I shot this f*ck, damn sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
$matched = preg_match_all("/(".implode('|', $blocked).")/i", $text, $matches);
$filter = preg_replace("/(".implode('|', $blocked).")/i", 'beep', $text);
var_dump($filter);
var_dump($matches);
JSFiddle for working example.
Yes, you can match bad words (saving for later), replace them in the text and build the regex dynamically based on an array of bad words you're trying to filter (you might store it in DB, load from JSON, etc.). Here's the main portion of the working example:
var blocked = ['fuck','shit','damn','hell','ass'],
matchBlocked = new RegExp("("+blocked.join('|')+")", 'gi'),
text = $('.unfiltered').text(),
matched = text.match(matchBlocked),
filtered = text.replace(matchBlocked, 'beep');
Please see the JSFiddle link above for the full working example.
i have this:
$pattern = 'dev/25{LASTNUMBER}/P/{YYYY}'
$var = 'dev/251/P/2014'
in this situation {LASTNUMBER} = 1 how to get this from $var
vars in pattern can by more always in {}
pattern can by different example :
$pattern = '{LASTNUMBER}/aa/bb/P/{OtherVar}'
in this situation var will by 1/aa/bb/p/some and want get 1
I need get {LASTNUMBER} have pattern and have results
Ok maybe is not possible :) or very very hard
use a regex..
if (preg_match('~dev/25([0-9])/P/[0-9]{4}~', $var, $m)) {
$lastnum = $m[1];
}
$parts = explode("/", $pattern);
if (isset($parts[1])) {
return substr($parts[1], -1);
}
will be faster than regex :)
You probably need this:
<?php
$pattern = 'dev/251/P/2014';
preg_match_all('%dev/25(.*?)/P/[\d]{4}%sim', $pattern, $match, PREG_PATTERN_ORDER);
$match = $match[1][0];
echo $match; // echo's 1
?>
Check it online
If you need to loop trough results you can use:
<?php
$pattern = <<< EOF
dev/251/P/2014
dev/252/P/2014
dev/253/P/2014
dev/254/P/2014
dev/255/P/2014
EOF;
preg_match_all('%dev/25(.*?)/P/[\d]{4}%sim', $pattern , $match, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match[1]); $i++) {
echo $match[1][$i]; //echo's 12345
}
?>
Check in online
I am trying to test if a string made up of multiple words and has any values from an array at the end of it. The following is what I have so far. I am stuck on how to check if the string is longer than the array value being tested and that it is present at the end of the string.
$words = trim(preg_replace('/\s+/',' ', $string));
$words = explode(' ', $words);
$words = count($words);
if ($words > 2) {
// Check if $string ends with any of the following
$test_array = array();
$test_array[0] = 'Wizard';
$test_array[1] = 'Wizard?';
$test_array[2] = '/Wizard';
$test_array[4] = '/Wizard?';
// Stuck here
if ($string is longer than $test_array and $test_array is found at the end of the string) {
Do stuff;
}
}
By end of string do you mean the very last word? You could use preg_match
preg_match('~/?Wizard\??$~', $string, $matches);
echo "<pre>".print_r($matches, true)."</pre>";
I think you want something like this:
if (preg_match('/\/?Wizard\??$/', $string)) { // ...
If it has to be an arbitrary array (and not the one containing the 'wizard' strings you provided in your question), you could construct the regex dynamically:
$words = array('wizard', 'test');
foreach ($words as &$word) {
$word = preg_quote($word, '/');
}
$regex = '/(' . implode('|', $words) . ')$/';
if (preg_match($regex, $string)) { // ends with 'wizard' or 'test'
Is this what you want (no guarantee for correctness, couldn't test)?
foreach( $test_array as $testString ) {
$searchLength = strlen( $testString );
$sourceLength = strlen( $string );
if( $sourceLength <= $searchLength && substr( $string, $sourceLength - $searchLength ) == $testString ) {
// ...
}
}
I wonder if some regular expression wouldn't make more sense here.