I've a problem with a regular expression. I have a text which is read from a file. The text can contain one or more IDs separated by comma.
And then I have a list of IDs and want to check if one of these IDs match with my text so I try to use an OR operator:
$idString = '2561,3,261,6,540,33,3105,2085,38,42,1066,49,3377,53,3161,91,356,3179,3695,3184,370,123,3451,124,3710,2188,141,404,1435,160,1443,432,435,440,1721,3261,2498,205,3282,476,482,3301,486,749,3309,243,3059,759,2046,4,262,785,534,541,3360,34,3106,2086,39,43,50,3378,54,1337,61,1351,3157,3162,360,3696,3185,631,3450,3200,666,1436,673,1444,3748,3262,2499,206,3279,3283,470,477,483,3302,490,755,760,2047,2562,1029,263,23,542,35,3107,2087,40,552,553,1321,47,51,3379,55,1338,3163,361,3697,3186,633,3452,639,143,3223,1445,3749,1450,3263,2500,207,3284,478,484,3303,2559,264,1297,22,543,36,44,57,1339,3389,62,3164,3677,362,3180,634,144,1685,1446,430,700,208,3286,479,1249,485,3306,2558,255,265,524,30,288,46,2095,63,2375,3165,403,1447,3242,696,1724,3557,3304,1770,3066,2563,266,544,2338,555,3131,3166,2204,415,1448,1239,3288,480,3305,754,267,545,3370,2378,3152,3170,648,147,679,1449,2537,753,2546,505,2564,3335,268,535,537,539,546,549,65,69,3167,148,3244,744,3068,2565,269,286,547,292,1334,1340,3659,3168,383,153,1705,3267,3060,2566,270,271,3099,548,1660,398,154,1706,2511,746,3332,2568,272,3148,422,3269,752,768,273,3381,3153,3199,155,468,784,274,3093,325,1657,3319,510,3329,3333,275,1432,2230,441,1722,773,3338,276,3641,2108,491,3339,277,2398,107,3181,2245,757,3346,2100,619,1760,2050,3351,2103,667,19,3372,2534,1064,351,1726,2394,2508,2538,2104,3147,2083,2097,2042,2096,2165,2049,2525,2526,1774,2392,2080,2043,2542,2547,2129,2540,2536,2190,2226,2569,2572,2373,2507';
$idString = str_replace(',', '|', $idString);
$text = '1453,2018';
if (preg_match('/' . $idString . '/', $text)) {
echo 'yes' . PHP_EOL;
} else {
echo 'no' . PHP_EOL;
}
I'm expecting that nothing matches because the IDs 1453 and 2018 are not found in my lookup string but it matches. I think that's because the ID 3 matches with 1453 but this is not correct for my use case.
That's too easy to work around it using arrays. You shouldn't use Regular Expressions if you can work with them but it seems this is not your real problem but an MCVE for a different one.
You should use word boundaries \b otherwise a number like 4 is found in 1453. preg_match() third argument holds results to analyze what is going on.
preg_match('/\b(?:' . $idString . ')\b/', $text, $match)
The syntax for preg_match is ($pattern, $text). Change it as follows, worked for me.
<?php
$idString = '2561,3,261,6,540,33,3105,2085,38,42,1066,49,3377,53,3161,91,356,3179,3695,3184,370,123,3451,124,3710,2188,141,404,1435,160,1443,432,435,440,1721,3261,2498,205,3282,476,482,3301,486,749,3309,243,3059,759,2046,4,262,785,534,541,3360,34,3106,2086,39,43,50,3378,54,1337,61,1351,3157,3162,360,3696,3185,631,3450,3200,666,1436,673,1444,3748,3262,2499,206,3279,3283,470,477,483,3302,490,755,760,2047,2562,1029,263,23,542,35,3107,2087,40,552,553,1321,47,51,3379,55,1338,3163,361,3697,3186,633,3452,639,143,3223,1445,3749,1450,3263,2500,207,3284,478,484,3303,2559,264,1297,22,543,36,44,57,1339,3389,62,3164,3677,362,3180,634,144,1685,1446,430,700,208,3286,479,1249,485,3306,2558,255,265,524,30,288,46,2095,63,2375,3165,403,1447,3242,696,1724,3557,3304,1770,3066,2563,266,544,2338,555,3131,3166,2204,415,1448,1239,3288,480,3305,754,267,545,3370,2378,3152,3170,648,147,679,1449,2537,753,2546,505,2564,3335,268,535,537,539,546,549,65,69,3167,148,3244,744,3068,2565,269,286,547,292,1334,1340,3659,3168,383,153,1705,3267,3060,2566,270,271,3099,548,1660,398,154,1706,2511,746,3332,2568,272,3148,422,3269,752,768,273,3381,3153,3199,155,468,784,274,3093,325,1657,3319,510,3329,3333,275,1432,2230,441,1722,773,3338,276,3641,2108,491,3339,277,2398,107,3181,2245,757,3346,2100,619,1760,2050,3351,2103,667,19,3372,2534,1064,351,1726,2394,2508,2538,2104,3147,2083,2097,2042,2096,2165,2049,2525,2526,1774,2392,2080,2043,2542,2547,2129,2540,2536,2190,2226,2569,2572,2373,2507';
$idString = str_replace(',', '|', $idString);
$text = '1453,2018';
if (preg_match('/(' . $text . ')/', $idString)) {
echo 'yes' . PHP_EOL;
} else {
echo 'no' . PHP_EOL;
}
?>
You can see what gets matched by your Regex by outputting the matches, eg:
if (preg_match('/' . $idString . '/', $text, $matches)) {
echo 'yes' . PHP_EOL;
print_r($matches);
} else {
echo 'no' . PHP_EOL;
}
You'd have to adapt your regex to match against whole words only... for example like this:
if (preg_match('/\b(' . $idString . ')\b/', $text)) {
https://regex101.com/r/M1Pieb/2/
Or you could avoid using regex altogether (recommended, its getting a bit crazy..) by using explode
$idString = '2561,3,261,6,540,33,3105,2085,38,42,1066,49,3377,53,3161,91,356,3179,3695,3184,370,123,3451,124,3710,2188,141,404,1435,160,1443,432,435,440,1721,3261,2498,205,3282,476,482,3301,486,749,3309,243,3059,759,2046,4,262,785,534,541,3360,34,3106,2086,39,43,50,3378,54,1337,61,1351,3157,3162,360,3696,3185,631,3450,3200,666,1436,673,1444,3748,3262,2499,206,3279,3283,470,477,483,3302,490,755,760,2047,2562,1029,263,23,542,35,3107,2087,40,552,553,1321,47,51,3379,55,1338,3163,361,3697,3186,633,3452,639,143,3223,1445,3749,1450,3263,2500,207,3284,478,484,3303,2559,264,1297,22,543,36,44,57,1339,3389,62,3164,3677,362,3180,634,144,1685,1446,430,700,208,3286,479,1249,485,3306,2558,255,265,524,30,288,46,2095,63,2375,3165,403,1447,3242,696,1724,3557,3304,1770,3066,2563,266,544,2338,555,3131,3166,2204,415,1448,1239,3288,480,3305,754,267,545,3370,2378,3152,3170,648,147,679,1449,2537,753,2546,505,2564,3335,268,535,537,539,546,549,65,69,3167,148,3244,744,3068,2565,269,286,547,292,1334,1340,3659,3168,383,153,1705,3267,3060,2566,270,271,3099,548,1660,398,154,1706,2511,746,3332,2568,272,3148,422,3269,752,768,273,3381,3153,3199,155,468,784,274,3093,325,1657,3319,510,3329,3333,275,1432,2230,441,1722,773,3338,276,3641,2108,491,3339,277,2398,107,3181,2245,757,3346,2100,619,1760,2050,3351,2103,667,19,3372,2534,1064,351,1726,2394,2508,2538,2104,3147,2083,2097,2042,2096,2165,2049,2525,2526,1774,2392,2080,2043,2542,2547,2129,2540,2536,2190,2226,2569,2572,2373,2507';
$idStrings = explode(',', $idString);
$values = ['1453', '2018'];
$matchedValue = null;
foreach ($values as $value) {
if (in_array($value, $idStrings)) {
$matchedValue = $value;
break;
}
}
if ($matchedValue !== null) {
echo 'yes: ' . $matchedValue;
} else {
echo 'no';
}
Related
I am trying to check if word is occur in a string but not to be the first and last word, if its true then remove the space after and before of the word and replace with a underscore.
Input:
$str = 'This is a cool area";
Output:
$str = 'This is a_cool_area";
I want to check that the word 'cool' is inside the string but not a first and last word. if yes the remove the space & replace with '_'
You can use preg_replace to do this job, using this regex:
/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i
which looks for the word, surrounded by at least one word character on either side (to prevent matching at the beginning or ending of the sentence). Usage in PHP:
$str = 'This is a cool area';
$word = 'cool';
$str = preg_replace('/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i', '_$1_', $str);
echo $str . "\n";
$str = ' Cool areas are cool ';
$str = preg_replace('/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i', '_$1_', $str);
echo $str . "\n";
Output:
This is a_cool_area
Cool areas are cool
Demo on 3v4l.org
function checkWord($str, $word)
{
$arr = explode(" ", $str);
$newArr = array_slice($arr, 1, -1);
$key = array_search($word, $newArr);
if($key !== false)
{
return implode('_',array_slice($arr, $key, 3));
}
else
{
return $str;
}
}
echo checkWord('This is a cool area', 'cool');
Using the following code:
$text = "أطلقت غوغل النسخة المخصصة للأجهزة الذكية العاملة بنظام أندرويد من الإصدار “25″ لمتصفحها الشهير كروم.ولم تحدث غوغل تطبيق كروم للأجهزة العاملة بأندرويد منذ شهر تشرين الثاني العام الماضي، وهو المتصفح الذي يستخدمه نسبة 2.02% من أصحاب الأجهزة الذكية حسب دراسة سابقة. ";
$tags = "غوغل, غوغل النسخة, كروم";
$tags = explode(",", $tags);
foreach($tags as $k=>$v) {
$text = preg_replace("/\b{$v}\b/u","$0",$text, 1);
}
echo $text;
Will give the following result:
I love PHP">love PHP</a>, but I am facing a problem
Note that my text is in Arabic.
The way is to do all in one pass. The idea is to build a pattern with an alternation of tags. To make this way work, you must before sort the tags because the regex engine will stop at the first alternative that succeeds (otherwise 'love' will always match even if it is followed by 'php' and 'love php' will never be matched).
To limit the replacement to the first occurence of each word you can remove tag from the array once it has been found and you test if it is always present in the array inside the replacement callback function:
$text = 'I love PHP, I love love but I am facing a problem';
$tagsCSV = 'love, love php, facing';
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(?:' . implode('|', $tags) . ')\b/iu';
$text = preg_replace_callback($pattern, function ($m) use (&$tags) {
$mLC = mb_strtolower($m[0], 'UTF-8');
if (false === $key = array_search($mLC, $tags))
return $m[0];
unset($tags[$key]);
return '<a href="index.php?s=news&tag=' . rawurlencode($mLC)
. '">' . $m[0] . '</a>';
}, $text);
Note: when you build an url you must encode special characters, this is the reason why I use preg_replace_callback instead of preg_replace to be able to use rawurlencode.
If you have to deal with an utf8 encoded string, you need to add the u modifier to the pattern and you need to replace strtolower with mb_strtolower)
the preg_split way
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(' . implode('|', $tags) . ')\b/iu';
$items = preg_split($pattern, $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$itemsLength = count($items);
$i = 1;
while ($i<$itemsLength && count($tags)) {
if (false !== $key = array_search(mb_strtolower($items[$i], 'UTF-8'), $tags)) {
$items[$i] = '<a href="index.php?s=news&tag=' . rawurlencode($tags[$key])
. '">' . $items[$i] . '</a>';
unset($tags[$key]);
}
$i+=2;
}
$result = implode('', $items);
Instead of calling preg_replace multiple times, call it a single time with a regexp that matches any of the tags:
$tags = explode(",", tags);
$tags_re = '/\b(' . implode('|', $tags) . ')\b/u';
$text = preg_replace($tags_re, '$0', $text, 1);
This turns the list of tags into the regexp /\b(love|love php|facing)\b/u. x|y in a regexp means to match either x or y.
I am creating a findSpellings function that has two parameters $word and $allWords. $allwords is an array that has mis-spellings of words that could sound similar to the $word variable. What I am trying to accomplish is to print out all words that are similar to the $word based on the soundex function. I am having trouble printing out the array with words. My function that I have is below. Any help would be greatly appreciated:
<?php
$word = 'stupid';
$allwords = array(
'stupid',
'stu and pid',
'hello',
'foobar',
'stpid',
'supid',
'stuuupid',
'sstuuupiiid',
);
function findSpellings($word, $allWords){
while(list($id, $str) = each($allwords)){
$soundex_code = soundex($str);
if (soundex($word) == $soundex_code){
//print '"' . $word . '" sounds like ' . $str;
return $word;
return $allwords;
}
else {
return false;
}
}
}
print_r(findSpellings($word, $allWords));
?>
if (soundex($word) == $soundex_code){
//print '"' . $word . '" sounds like ' . $str;
return $word;
return $allwords;
}
You can't have 2 returns, the first return will exit the code.
You could just do something like this:
if (soundex($word) == $soundex_code){
//print '"' . $word . '" sounds like ' . $str;
$array = array('word' => $word, 'allWords' => $allWords);
return $array;
}
And then just retrieve the values out of $array like so:
$filledArray = findSpellings($word, $allWords);
echo "You typed".$filledArray['word'][0]."<br/>";
echo "Were you looking for one of the following words?<br/>";
foreach($filledArray['allWords'] as $value)
{
echo $value;
}
I'm having trouble finding a correct regex to achieve what I want.
I have a sentence like that :
Hi, my name is Stan, you are welcome, hello.
and I would like to transform it like that :
[hi|hello|welcome], my name is [stan|jack] you are [hi|hello|welcome] [hi|hello|welcome].
Right now my regex is half working, because somes words are not replaced, and those replaced are deleting some characters
Here is my test code
<?php
$test = 'Hi, my name is Stan, you are welcome, hello.';
$words = array(
array('hi', 'hello', 'welcome'),
array('stan', 'jack'),
);
$result = $test;
foreach ($words as $group) {
if (count($group) > 0) {
$replacement = '[' . implode('|', $group) . ']';
foreach ($group as $word) {
$result = preg_replace('#([^\[])' . $word . '([^\]])#i', $replacement, $result);
}
}
}
echo $test . '<br />' . $result;
Any help will be appreciated
The regex you are using is overcomplicated. You simply need to use a regex substitution using regular brackets ():
<?php
$test = 'Hi, my name is Stan, you are welcome, hello.';
$words = array(
array('hi', 'hello', 'welcome'),
array('stan', 'jack'),
);
$result = $test;
foreach ($words as $group) {
if (count($group) > 0) {
$imploded = implode('|', $group);
$replacement = "[$imploded]";
$search = "($imploded)";
$result = preg_replace("/$search/i", $replacement, $result);
}
}
echo $test . '<br />' . $result;
Your regular expression:
'#([^\[])' . $word . '([^\]])#i'
matches one character before and after $word as well. And as they do, they replace it. So your replacement string needs to reference these parts, too:
'$1' . $replacement . '$2'
Demo
preg_replace supports array as parameter. No need to iterate with a loop.
$s = array("/(hi|hello|welcome)/i", "/(stan|jack)/i");
$r = array("[hi|hello|welcome]", "[stan|jack]");
preg_replace($s, $r, $str);
or dynamically
$test = 'Hi, my name is Stan, you are welcome, hello.';
$s = array("hi|hello|welcome", "stan|jack");
$r = array_map(create_function('$a','return "[$a]";'), $s);
$s = array_map(create_function('$a','return "/($a)/i";'), $s);
echo preg_replace($s, $r, $str);
//[hi|hello|welcome], my name is [stan|jack], you are [hi|hello|welcome], [hi|hello|welcome].
Example user input that should be denied:
House for sale
Car for rent
WTB iphone with cheap price
How do I make my code deny inputs like those above?
$title = array('rent','buy','sale','sell','wanted','wtb','wts');
$user_title = stripslashes($_POST['title']);
if (in_array($user_title, $title)) {
$error = '<p class="error">Do not include ' . $user_title . ' on your title</p>';
}
If you want your denied words to be complete words and not just part of another word for it to be considered denied, you can use a regex based solution with word boundaries:
// array of denied words.
$deniedWords = array('rent','buy','sale','sell','wanted','wtb','wts');
// run preg_quote on each array element..as it may have a regex meta-char in it.
$deniedWords = array_map('preg_quote',$deniedWords);
// construct the pattern as /(\bbuy\b|\bsell\b...)/i
$pat = '/(\b'.implode('\b|\b',$deniedWords).'\b)/i';
// use preg-match_all to find all matches
if(preg_match_all($pat,$user_title,$matches)) {
// $matches[1] has all the found word(s), join them with comma and print.
$error = 'Do not include ' . implode(',',$matches[1]);
}
Ideone Link
You can use stripos():
$title = array('rent','buy','sale','sell','wanted','wtb','wts');
$user_title = stripslashes($_POST['title']);
foreach($title as $word)
{
if (stripos($user_title, $word) !== false)
{
$error = '<p class="error">Do not include ' . $word . ' on your title</p>';
break;
}
}
You can also use regex:
if (preg_match("/(rent|buy|sale|sell|wanted|wtb|wts)/is", $user_title)) {
$error = '<p class="error">Do not include ' . $user_title . ' on your title</p>';
}
You can utilize explode() in order to separate the words in $user_title and check each one to ensure it does not exist in $title.
$invalidWords = '';
$words = explode(' ', stripslashes($_POST['title']));
foreach($words as $word) {
if (in_array($word, $title)) {
$invalidWords .= ' ' . $word;
}
}
if (!empty($invalidWords)) {
echo '<p class="error">Do not include the following words in your title: ' . $invalidWords . '</p>';
}
RegEx is probably best, but off-hand I cannot easily figure out the expression required in order for you to be able to output all of the invalid words in a list to the user.