How to search phrase with soundex using 1 word? - php

I need to find this phrase 'small side table' with this word 'table' or 'tablr' or 'tables' if the user makes a mistake typing only 1 word in the input.
I tried MATCH AGAINST and LIKE '%%', but simulating the error writing, it doesn't find anything, to correct it i need the whole sentence soundex('small side tablr'), but i only have 'tables' to search
$search="tables";
$data = $connection->query("SELECT title from `posts` where soundex(`title`) = soundex('$search') ");

I would suggest that you should try to use MySQL's regular expression functions to search the text.
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-instr
In your specific case (find the phrase 'small side table'), suppose that user input search keywords: 'table' or 'tablr' or 'tables' or 'rable', then in PHP code, we need to change search keyword to be a proper pattern, for example:
$search = "table"; // or 'tablr' or 'tables' or 'rable'
// define pattern which allow misspellings last letters
$regex_pattern = substr($search, 0, 4).".{1,2}"; # now pattern is 'tabl.{1,2}'
// then add an alternative pattern, which allow misspellings first letter
$regex_pattern .= "|.".substr($search, 1); # now pattern is 'tabl.{1,2}|.able'
// then we could add other patterns as needs..
$data = $connection->query("SELECT title FROM `posts` WHERE REGEXP_LIKE(`title`, '$regex_pattern') ");
You have to modify search parameters to suit your needs (and learn a little bit about regular expression too if you haven't used it before https://regexone.com/).
You could try out MySQL's regular expression functions by using sql playground online here
https://onecompiler.com/mysql/3yukq8apv
It is noteworthy that regular expression could be slow, especially with complex search pattern. So we have to consider how many misspellings patterns which are supported.

Related

Using different names for subpatterns of the same number with preg_replace_callback

I'm having a hard time getting my head around what exactly is being numbered in my regex subpatterns. I'm being given the PHP warning:
PHP Warning: preg_replace_callback(): Compilation failed: different names for subpatterns of the same number are not allowed
When attempting the following:
$input = "A string that contains [link-ssec-34] and a [i]word[/i] here";
$matchLink = "\[link-ssec-(0?[1-9]|[1-9][0-9]|100)\]";
$matchItalic = "\[i](.+)\[\/i]";
$output = preg_replace_callback(
"/(?|(?<link>$matchLink)|(?<italic>$matchItalic))/",
function($m) {
if(isset($m['link'])){
$matchedLink = substr($m['link'][0], 1, -1);
//error_log('m is: ' . $matchedLink);
$linkIDExplode = explode("-",$matchedLink);
$linkHTML = createSubSectionLink($linkIDExplode[2]);
return $linkHTML;
} else if(isset($m['italic'])){
// TO DO
}
},
$input);
If I remove the named capture groups, like so:
"/(?|(?:$matchLink)|(?:$matchItalic))/"
There's no warnings, and I get matches fine but can't target them conditionally in my function. I believe I'm following correct procedure for naming capture groups, but PHP is saying they're using the same subpattern number, which is where I'm lost as I'm not sure what's being numbered. I'm familiar with addressing subpatterns using $1, $2, etc. but don't see the relevancy here when used with named groups.
Goal
Incase I'm using completely the wrong technique, I should include my goal. I was originally using preg_replace_callback() to replace tagged strings that matched a pattern like so :
$output = preg_replace_callback(
"/\[link-ssec-(0?[1-9]|[1-9][0-9]|100)\]/",
function($m) {
$matchedLink = substr($m[0], 1, -1);
$linkIDExplode = explode("-",$matchedLink);
$linkHTML = createSubSectionLink($linkIDExplode[2]);
return $linkHTML;
},
$input);
The requirement has grown to needing to match multiple tags in the same paragraph (My original example included the next one [i]word[/i]. Rather than parsing the entire string from scratch for each pattern, I'm trying to look for all the patterns in a single sweep of the paragraph/string in the belief that it will be less taxing on the system. Researching it led me to believe that using named capture groups in a branch reset was the best means of being able to target matches with conditional statements. Perhaps I'm walking down the wrong trail with this one but would appreciate being directed to a better method.
Result Desired
$input = "A string that contains [link-ssec-34] and a [i]word[/i] here";
$output = "A string that contains <a href='linkfromdb.php'>Link from Database</a> and a <span class='italic'>word</span> here."
With the potential to add further patterns as needed in the format of square brackets encompassing a word or being self-contained.
To answer your question about the warning:
PHP Warning: preg_replace_callback(): Compilation failed: different names for subpatterns of the same number are not allowed
Your pattern defines named matchgroups. But your pattern is using alternations (|) as well, meaning a whole part of the pattern does not need to be matched as all.
That means, that the named pattern link can appear with the match-number 1, but italic can also appear with match-number 1.
Since there is an alternation BOTH the matches can only be the same "number", hence they are only allowed to have the same NAME:
#(?|(?<first>one)|(?<first>two))#
would be allowed.
#(?|(?<first>one)|(?<second>two))#
throws this warning.
Without fully understand what I've done (but will look into it now) I did some trial and error on #bobblebubble comment and got the following to produce the desired result. I can now use conditional statements targeting named capture groups to decide what action to take with matches.
I changed the regex to the following:
$matchLink = "\[link-ssec-(0?[1-9]|[1-9][0-9]|100)\]"; // matches [link-ssec-N]
$matchItalic = "\[i](.+)\[\/i]"; // matches [i]word[/i]
$output = preg_replace_callback(
"/(?<link>$matchLink)|(?<italic>$matchItalic)/",
function($m) { etc...
Hopefully it's also an efficient way, in terms of overhead, of matching multiple regex patterns with callbacks in the same string.

Regex solution to find a regex pattern and parse it.

I am trying to write a simple router for PHP. And I am facing some problem. Example of the routes are as follows.
$route = []
$route['index'] = "/";
$route['home'] = "/home";
$route['blog'] = "/blog/[a-z]";
$route['article'] = "/article/id/[\d+]/title/[\w+]";
Now if we take the last example, I would like the regex only to look for patterns such as [\d+] and [\w+] that is it. I will use explode() to actually cross check if URL contains /blog/, /id/ and /title/. I don't want regex's help with that, but only to detect the patterns and match it.
for example. If a given $URL was dev.test/blog/id/11/title/politics
I would need some like: preg_match($route['url'], $URL)
So, now the preg_match() function knows, that after "/article/id/ there is a pattern asking only for a digit to occur, then if the digit is found it will continue parsing, or else it will show fail or 0.
I don't know much about regex to handle this complex problem.
Your question is a little unclear, but if you want only to capture the [\d+] or [\w+] parts of the target string, you should consider using brackets to capture sub-matches, and the (?:xxx) non-capturing match, which checks for the pattern but does not add it to the array, something like:
$route['article'] = "(?:\/article\/id\/)([\d+])(?:\/title\/)([\w+])";
This will add the matched [\d+] and [\w+] to your matches array only. You'll find them like so:
$matches[0][0] and matches[1][0].
See http://www.regular-expressions.info/tutorial.html for an outstanding tutorial on regexes, by the way.
If you aren't sure of the values of 'article', 'id', and 'title' in advance, then you will probably at least need to be sure of the number of directories given in the url. That means as long as you know the position of the [\d+] and [\w+] entries, you could use
$route['article'] = "(?:\/[\w+]\/[w+]\/)([\d+])(?:\/[\w+]\/)([\w+])"

PHP preg_replace Expertise Sought

I'm creating some custom BBcode for a forum. I'm trying to get the regular expression right, but it has been eluding me for two days. Any expert advice is welcome.
The input (e.g. sample forum post):
[quote=Bob]I like Candace. She is nice.[/quote]
I agree, she is very nice. I like Ashley, too, and especially [Ryan] when he's drinking.
Essentially, I want to encase any names (from a specified list) in [user][/user] BBcode... except, of course, those being quoted, because doing that causes some terrible parsing errors. Below is an example of how I want the output to be.
The desired output:
[quote=Bob]I like [user]Candace[/user]. She is nice.[/quote]
I agree, she is very nice. I like [user]Ashley[/user], too, and especially [[user]Ryan[/user]] when he's drinking.
My current code:
$searchArray = array(
'/(?i)(Ashley|Bob|Candace|Ryan|Tim)/'
);
$replaceArray = array(
"[user]\\0[/user]"
);
$text = preg_replace($searchArray, $replaceArray, $input);
$input is of course set to the post contents (i.e. the first example listed above). How can I achieve the results I want? I don't want the regex to match when a name is preceded by an equals sign (=), but putting a [^=] in front of the names in the regex will make it match any non-equals sign character (i.e. spaces), which then messes up the formatting.
Update
The problem is that by using \1 instead of \0 it is omitting the first character before the names (because anything but = is matched). The output results in this:
[quote=Bob]I like[user]Candace[/user]. She is nice.[/quote]
I agree, she is very nice. I like[user]Ashley[/user], too, and especially [user]Ryan[/user]] when he's drinking.
You were on the right track with the [^=] idea. You can put it outside the capture group, and instead of \\0 which is the full match, use \\1 and \\2 i.e. the first & second capture groups
$searchArray = array(
'/(?i)([^=])(Ashley|Bob|Candace|Ryan|Tim)/'
);
$replaceArray = array(
"\\1[user]\\2[/user]"
);
$text = preg_replace($searchArray, $replaceArray, $input);

Matching multiples

I need to match the following using preg_match()
cats and dogs
catsAndDogs
i like cats and dogs
etc, so i simply stripped out the spaces and lowercased it and used that as my pattern in preg_match();
'/catsanddogs/i'
But now I need to match the following too:
cats+and+dogs
cats_and_dogs
cats+and_Dogs
So is there quick and easy way to do a string replace multiple times, other than nesting it? I want to end up with the same pattern to match with.
Thanks.
try this expression '/cats([+_-])?and([+_-])?dogs/i'
edit: just saw that you don't want a + after the "and" when you already have a + before the "and". If that's right then you should use this expression:
'/cats(\+and\+|\+and_|_and_|and)dogs/i'
I would go with #ITroubs answer in this situation, however, you can do multiple character/string replacements with strtr as follows:
$trans = array(' ' => '','+' => '','-' => '', '_' => '');
$str = 'cats+and_dogs';
echo strtr($str, $trans); // prints: catsanddogs
Read the documentation carefully before use.
OK, I don't think you have well defined your matching rules, but here is my simplified version:
(c|C)ats([+_ ][aA]|A)nd([+_ ][dD]|D)ogs
Probably you want to be case insensitive because you used /i in your pattern, but I would like to chip in with another approach.
The main differences - compared to the other answers - are the expression parts for the word bounding. I use [+_ ][aA]|A, so that the regex will match 'cats and' or 'catsAnd', but not 'catsand'. So the bottomline is, that I would only match camel case text if there is no whitespace in between.

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!
Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.
Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.
Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!
You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

Categories