Searching keywords(from a matrix) in a string(around 500 char) - php

Hey, basically what i am trying to do is automatically assign Tags to a user input string. Now i have 5 tags to be assigned. Each tag will have around 10 keywords. A String can only be assigned one tag. In order to assign tag to string, i need to search for words matching keywords for all the five tags.
Example:
TAGS: Keywords
Drink: Beer, whiskey, drinks, drink, pint, peg.....
Fitness: gym, yoga, massage, exercise......
Apparels: men's shirt, shirt, dress......
Music: classical, western, sing, salsa.....
Food: meal, grilled, baked, delicious.......
User String: Take first step to reach your fitness goals, Pay Rs 199 for Aerobics, Yoga, Kick Boxing, Bollywood Dance and more worth Rs 1000 at The very Premium F Chisel Bounce, Koramangala.
Now i need to decide upon a tag for the above string. I need an time efficient algorithm for this problem. I don't know how to go about matching keywords for strings but i do have a thought about deciding tag. I was thinking to maintain an array count for each tag and as a keyword is matched count for respective tag is increased. if at any time count for any tag reaches 5 we can stop and decide on that tag only this will save us from searching the whole thing.
Please give any advice you have on this. I will be using php just so you know.
thanks

Interesting topic! What you are looking for is something similar to latent semantic indexing. There is questing here.

If the number of tags and keywords is small I would save me writing a complex algorithm and simply do:
$tags = array(
'drink' => array('beer', 'whiskey', ...),
...
);
$string = 'Take first step ...';
$bestTag = '';
$bestTagCount = 0;
foreach ($tags as $tag => $keywords) {
$count = 0;
foreach ($keywords as $keyword) {
$count += substr_count($string, $keyword);
}
if ($count > $bestTagCount) {
$bestTagCount = $count;
$bestTag = $tag;
}
}
var_dump($bestTag);
The algorithm is pretty obvious, but only suited for a small number of tags/keywords.

If you dont mind using an external API, you should try one of these:
http://www.zemanta.com/
http://www.opencalais.com/
Benjamin Nowack: Linked Data Entity Extraction with Zemanta and OpenCalais
To give an example, Zemanta will return the following tags (among other things) for your User String:
Bollywood, Kickboxing, Koramangala, Aerobics, Boxing, Sports, India, Asia
Open Calais will return
Sports, Hospitality Recreation, Health, Recreation, Human behavior, Kick, Yoga, Chisel
Aerobics, Meditation, Indian philosophy, Combat sports, Aerobic exercise, Exercise

Related

Extract a full substring from a partial substring (needle)

As you can see below, I'm attempting to extract the complete substring of an exploded array by using just a few characters to match the substring.
$keyword = array('Four Wheel', 'Power', 'Trailer');
function customSearch($keyword, $featurelistarray){
$key = ''; //possibly reset output
foreach($featurelistarray as $key => $arrayItem){
if( stristr( $arrayItem, $keyword ) ){
$termname = $key;
}
}
}
The array ($featurelistarray) comprises vehicle options, four wheel drive, four wheel disc brakes, power windows, power door locks, floor mats, trailer tow package, and many many more.
The point is to list all the options for a given category, and using the $keyword array to define the category.
I would also like to alphabetize the results. Thank you for the help!
To further explain, the $featurelistarray is exploded from a CSV field. The CSV field has a long length of options listed.
$featurelist=$csvdata['Options'];
$featurelistarray=explode(',',$featurelist);
$termname = $featurelistarray[0];
As you can see, $termname is assigned the first position of the exploded array. This was the original code for these features, but I need more control for $termname.
It seems to me you are trying to make database operations without database. I'd suggest to transform input into some kind of database.

PHP: check user input against text file?

I have the following street names and house numbers in a text file:
Albert Dr: 4116-4230, 4510, 4513-4516
Bergundy Pl: 1300, 1340-1450
David Ln: 3400, 4918, 4928, 4825
Garfield Av: 5000, 5002, 5004, 5006, 8619-8627, 9104-9113
....
This data represents the boundary data for a local neighborhood (i.e., what houses are inside the community).
I want to make a PHP script that will take a user's input (in the form of something like "4918 David Lane" or "3000 Bergundy") search this list, and return a yes/no response whether that house exists within the boundaries.
What would be an efficient way to parse the input (regex?) and compare it to the text list?
Thanks for the help!
It's better to store this info in a database so that you don't have to parse out the data from a text file. Regexes are also not generally applicable to find a number in a range so a general purpose language is advised as well.
But... if you want to do it with regexes (and see why it's not a good idea)
To lookup the numbers for a street use
David Ln:(.*)
To then get the numbers use
[^,]*
You could simply import the file into a string. After this is done, breack each line of the file in an array so Array(Line 1=> array(), Line 2=> array(), etc. After this is done, you can explode using :. After, you'll simply need to search in the array. Not the fastest way, but it may be faster then regex.
You should sincerely consider using a database or re-think how your file are.
Try something like this, put your street names inside test.txt.. Now that you are able to get the details inside the text file, just compare it with the values that you submit in your form.
$filename = 'test.txt';
if(file_exists($filename)) {
if($handle = fopen($filename, 'r')) {
$name = array();
while(($file = fgets($handle)) !==FALSE) {
preg_match('#(.*):(.*)#', $file, $match);
$array = explode(',', $match[2]);
foreach($array as $val) {
$name[$match[1]][] = $val;
}
}
}
}
As mentioned, using a database to store street numbers that are relational to your street names would be ideal. I think a way you could implement this with your text file though is to create a a 2D array; storing the street names in the first array and the valid street numbers in their respective arrays.
Parse the file line by line in a loop. Parse the street name and store in array, then use a nested loop to parse all of the numbers (for ones in a range like 1414-1420, you can use an additional loop to get each number in the range) and build the next array in the initial street name array element. When you have your 2D array, you can do a simple nested loop to check it for a match.
I will try to make a little pseudo-code for you..
pseudocode:
$addresses = array();
$counter = 0;
$line = file->readline
while(!file->eof)
{
$addresses[$counter] = parse_street_name($line);
$numbers_array = parse_street_numbers($line);
foreach($numbers_array as $num)
$addresses[$counter][] = $num;
$line = file->readline
$counter++;
}
It's better if you store your streets in a separate table with IDs, and store numbers in separate table one row for each range or number and street id.
For example:
streets:
ID, street
-----------
1, Albert Dr
2, Bergundy Pl
3, David Ln
4, Garfield Av
...
houses:
street_id, house_min, house_max
-----------------
1, 4116, 4230
1, 4510, 4510
1, 4513, 4516
2, 1300, 1300
2, 1340, 1450
...
In the rows, where no range but one house number, you set both min and max to the same value.
You can write a script, that will parse your txt file and save all data to db. That should be as easy as several loops and explode() with different parameters and some insert queries too.
Then with first query you get street id
SELECT id FROM streets WHERE street LIKE '%[street name]%'
After that you run second query and get answer, is there such house number on that street
SELECT COUNT(*)
FROM houses
WHERE street_id = [street_id]
AND [house_num] BETWEEN house_min AND house_max
Inside [...] you put real values, dont forget to escape them to prevent sql injections...
Or you even can run just one query using JOIN.
Also you should make sure that your given house number is integer, not float.

Reliable and effective custom search & replace function - preg or str replace

In a few different guises I've asked about this "filter" on here and WPSE. I'm now taking a different approach to it, and I'd like to make it solid and reliable.
My situation:
When I create a post in my WordPress CMS, I want to run a filter which searches for certain terms and replaces them with links.
I have the terms that I want to search for in two arrays: $glossary_terms and $species_terms.
$species_terms is a list of scientific names of fishes, such as Apistogramma panduro.
$glossary_terms is a list of fishkeeping glossary terms such as abdomen, caudal-fin and Gram's Method.
There are a few nuances worth noting:
Speed is not an issue, as I will be running this filter in the background rather than when a user visits the page or whan an author submits/edits a species profile or post.
Some of the post content being filtered may contain HTML with these terms in, like <img src="image.jpg" title="Apistogramma panduro male" />. Obviously these shouldn't be replaced.
Species are often referred to with an abbreviated Genus, so instead of Apistogramma panduro, you'll often see A. panduro. This means I need to search & replace all of the species terms as an abbreviation too - Apistogramma panduro, A. panduro, Satanoperca daemon, S. daemon etc.
If caudal-fin and caudal both exist in the glossary terms, caudal-fin should be replaced first.
I was contemplating simply adding a preg_replace which searched for the terms, but only with a space on the left, (i.e. ( )term) and a space, comma, exclamation, full-stop or hyphen on the right (i.e. term(, . ! - )) but that won't help me to not break the image HTML.
Example content
<br />
It looks very similar to fishes of the <i>B. foerschi</i> group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.
Instead it appears to be a member of the <i>B. coccina</i> group which currently includes <i>B. brownorum</i>, <i>B. burdigala</i>, <i>B. coccina</i>, <i>B. livida</i>, <i>B. miniopinna</i>, <i>B. persephone</i>, <i>B. tussyae</i>, <i>B. rutilans</i> and <i>B. uberis</i>.
Of these it's most similar in appearance to <i>B. uberis</i> but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.
Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).<br />
^^^ This example here has had the correct links manually inserted. The filter shouldn't break these links!
It looks very similar to fishes of the B. foerschi group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.
Instead it appears to be a member of the B. coccina group which currently includes B. brownorum, B. burdigala, B. coccina, B. livida, B. miniopinna, B. persephone, B. tussyae, B. rutilans and B. uberis.
Of these it's most similar in appearance to B. uberis but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.
Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).
^^^ Same example pre-formatting.
[caption id="attachment_542" align="alignleft" width="125" caption="Amazonas Magazine - now in English!"]<img class="size-thumbnail wp-image-542" title="Amazonas English" src="/wp-content/uploads/2011/12/Amazonas-English-1-288x381.jpg" alt="Amazonas English" width="125" height="165" />[/caption]
Edited by Hans-Georg Evers, the magazine 'Amazonas' has been widely-regarded as among the finest regular publications in the hobby since its launch in 2005, an impressive achievment considering it's only been published in German to date. The long-awaited English version is just about to launch, and we think a subscription should be top of any serious fishkeeper's Xmas list...
The magazine is published in a bi-monthly basis and the English version launches with the January/February 2012 issue with distributors already organised in the United States, Canada, the United Kingdom, South Africa, Australia, and New Zealand. There are also mobile apps availablen which allow digital subscribers to read on portable devices.
It's fair to say that there currently exists no better publication for dedicated hobbyists with each issue featuring cutting-edge articles on fishes, invertebrates, aquatic plants, field trips to tropical destinations plus the latest in husbandry and breeding breakthroughs by expert aquarists, all accompanied by excellent photography throughout.
U.S. residents can subscribe to the printed edition for just $29 USD per year, which also includes a free digital subscription, with the same offer available to Canadian readers for $41 USD or overseas subscribers for $49 USD. Please see the Amazonas website for further information and a sample digital issue!
Alternatively, subscribe directly to the print version here or digital version here.
^^^ This will likely only have a few Glossary terms in rather than any species links.
Example terms
$species_terms
339 => 'Aulonocara maylandi maylandi',
340 => 'Aulonocara maylandi kandeensis',
341 => 'Aulonocara sp. "walteri"',
342 => 'Aulonocara sp. "stuartgranti maleri"',
343 => 'Aulonocara stuartgranti',
344 => 'Benthochromis tricoti',
345 => 'Boulengerochromis microlepis',
346 => 'Buccochromis lepturus',
347 => 'Buccochromis nototaenia',
348 => 'Betta brownorum',
349 => 'Betta foerschi',
350 => 'Betta coccina',
351 => 'Betta uberis'
As you can see above, the general format for these scientific names is "Genus species", but can often include "sp." or "aff." (for species which aren't officially described) and "Genus species subspecies" formats.
$glossary_terms
1 => 'abdomen',
2 => 'caudal',
3 => 'caudal-fin',
4 => 'caudal-fin peduncle',
5 => 'Gram\'s Method'
If anyone can come up with a filter which meets all these conditions and requirements, I'd like to offer a bounty.
Thanks in advance,
I think it's better to use DOMDocument functionality than regexps. Here is a working prototype:
// Each dynamically constructed regexp will contain at most 70 subpatterns
define('GROUPS_PER_REGEXPS', 70);
$speciesTerms = array(
339 => '(?:Aulonocara|A\.) maylandi maylandi',
340 => '(?:Aulonocara|A\.) maylandi kandeensis',
344 => '(?:Benthochromis|B\.) tricoti',
345 => '(?:Boulengerochromis|B\.) microlepis',
);
function matchTerms($text) {
// Globals are not good. I left it for the simplicity
global $speciesTerms;
$result = array();
$t = 0;
$speciesCount = count($speciesTerms);
reset($speciesTerms);
while ($t < $speciesCount) {
// Maps capturing group identifiers to term ids
$termMapping = array();
// Dynamically construct regexp
$groups = '';
$c = 1;
while (list($termId, $termPattern) = each($speciesTerms)) {
if (!empty($groups)) {
$groups .= '|';
}
// Match word boundaries, so we don't capture "B. tricotisomeramblingstring"
$groups .= '(\b' . $termPattern . '\b)';
$termMapping[$c++] = $termId;
if (++$t % GROUPS_PER_REGEXPS == 0) {
break;
}
}
$regexp = "/$groups/m";
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
for ($i = 1; $i < $c; $i++) {
foreach ($matches[$i] as $matchData) {
// matchData[0] holds matched string, e.g. Benthochromis tricoti
// matchData[1] holds offset, e.g. 15
if (isset($matchData[0]) && !empty($matchData[0])) {
$result[] = array(
'text' => $matchData[0],
'offset' => $matchData[1],
'id' => $termMapping[$i],
);
}
}
}
}
// Sort by offset in descending order
usort($result, function($a, $b) {
return $a['offset'] > $b['offset'] ? -1 : 1;
});
return $result;
}
$doc = DOMDocument::loadHTML($html);
// Stack will be used to avoid recursive functions
$stack = new SplStack;
$stack->push($doc);
while (!$stack->isEmpty()) {
$node = $stack->pop();
if ($node->nodeType == XML_TEXT_NODE && $node->parentNode instanceof DOMElement) {
// $node represents text node
// and it's inside a tag (second condition in the statement above)
// Check that this text is not wrapped in <a> tag
// as we don't want to wrap it twice
if ($node->parentNode->tagName != 'a') {
$matches = matchTerms($node->wholeText);
foreach ($matches as $match) {
// Create new link element in the DOM
$link = $doc->createElement('a', $match['text']);
$link->setAttribute('href', 'species/' . $match['id']);
$link->setAttribute('class', 'link_species');
// Save the text after the link
$remainingText = $node->splitText($match['offset'] + strlen($match['text']));
// Save the text before the link
$linkText = $node->splitText($match['offset']);
// Replace $linkText with $link node
// i.e. 'something' becomes 'something'
$node->parentNode->replaceChild($link, $linkText);
}
}
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $childNode) {
$stack->push($childNode);
}
}
}
$body = $doc->getElementsByTagName('body');
echo $doc->saveHTML($body->item(0));
Implementation details
I've only showed how to replace species terms, glossary terms will be same. Links are formed in form "species/$id". Abbreviations are handled correctly. DOMDocument is a very reliable parser, it can deal with broken markup and is fast.
?: in regexp allows not to count this subpattern as a capturing group (documentation on subpatterns). Without proper counting of subpatterns, we can't retrieve the termId. The idea is that we build a big regexp pattern by joining all regexps specified in $speciesTerms array and separating them with a pipe |. Final regexp for the first two species would be (spaces for clarity):
First capturing group Alternation Second capturing group
( (?:Aulonocara|A\.) maylandi maylandi ) | ( (?:Aulonocara|A\.) maylandi kandeensis )
So, the text "Examples: Aulonocara maylandi maylandi, A. maylandi kandeensis" will give following matches:
$matches[1] = array('Aulonocara maylandi maylandi') // Captured by the first group
$matches[2] = array('A. maylandi kandeensis') // Captured by the second group
We can clearly say that all elements in matches[1] are referring to the species Aulonocara maylandi maylandi or A. maylandi maylandi which has id = 339.
In short: Use (?:) if you're using subpatterns in $speciesTerms.
UPDATE
Each dynamically created regexp has a limit on maximal number of subpatterns, which is defined as a const at the top. This allows avoiding PCRE limit on number of subpatterns in regexp.
Important notes:
If you have a lot of terms you should rewrite matchTerms, because regexp has a limit on a number of subpatterns. In this case it's optimal to prebuild array of regexps out of every N terms.
matchTerms generates regexp at every call, obviously it can be done only once
It's possible to use advanced regexps in speciesTerms
strlen => mb_strlen if you're using multibyte encodings
Supplied $html will be wrapped in a <body> tag (unless it's already wrapped)
It would be better to parse the HTML rather than trying to use regular expressions. Regex is good when you have something specific you want to match, but gets quirky when you're trying to NOT match certain things.
Using http://simplehtmldom.sourceforge.net/ :
function addLinks(&$p, $species, $terms) {
// much easier to say "not in an anchor tag" with parsed content than with regex
if ($p->tag != 'a') {
// pull out existing elements so they aren't replaced
$children = array();
$x = 0;
foreach ($p->children as &$e) {
$children[] = $e->outertext;
$e->outertext = '---child-'.$x.'---';
$x++;
}
foreach($species as $s) {
$p->innertext = str_replace(
$s,
''.$s.'',
$p->innertext);
}
foreach($term as $t) {
$p->innertext = str_replace(
$t,
'<a href="glossary/'.
strtolower($t[0]).'/'.
strtolower(str_replace(' ','-',$t)).'">'.$t.'</a>',
$p->innertext);
}
// restore previous child elements
foreach($children as $x => $e) {
$p->innertext = str_replace('---child-'.$x.'---', $e, $p->innertext);
}
foreach ($p->children() as &$e) {
addLinks($e, $species, $terms);
}
}
}
$html = new simple_html_dom();
// you may have to wrap $content in a div. not exactly sure how partial content is handled
$html->load($content);
addLinks($html, $species_terms, $glossary_terms);
$content = $html->save();
I haven't used simple_html_dom a whole lot, but that should get you pointed in the right direction.

How to remove only entirely duplicate values from an array?

I have a website where my database is set up with different artists and song titles within the same row, where it might look this:
artist: The Monkees, title: I'm A Believer
artist: The Monkees, title: Daydream Believer
artist: The Hollies, title: The Air That I Breathe
artist: The Hollies, title: Bus Stop
artist: The Beatles, title: Hello, Goodbye
artist: The Beatles, title: Yellow Submarine
And I have an autocomplete widget set up with my site's search form that is fed a json_encoded array filled with 'artist' values.
The first problem is that if a user were to begin typing "the" into the search form, values would come up like this:
The Monkees
The Monkees
The Hollies
The Hollies
The Beatles
The Beatles
So I used the array_unique function to remove duplicate values, but it seems that even if a value has one duplicate word (this case being "the"), it is removed entirely, so only the first value is returned:
The Monkees
Where the output I would like to have would be:
The Monkees
The Hollies
The Beatles
So, what might be another way I can remove these duplicate values and display them the way I would like?
EDIT:
Here is my source code:
<?php
include 'includes/config.php';
$return_arr = array();
$term = ($_GET['term']);
if ($con)
{
$artist = mysql_query("SELECT * FROM songs WHERE artist LIKE '%$term%' LIMIT 0, 5");
while ($row = mysql_fetch_array($artist, MYSQL_ASSOC)) {
$row_array['value'] = strtolower($row['artist']);
array_push($return_arr,$row_array);
}
}
mysql_close($con);
echo json_encode(array_unique($return_arr));
?>
array_unique uses a strict comparison. So differences in case and whitespace are taken into consideration. Since all of those values seem to be strings, it's likely the reason why array_unique is not working the way you would expect.
Your database structure makes it pretty difficult to weed out duplicates. I would suggest refactoring it into a table of artists and a table of songs, where songs simply reference the id of artist. This will give you a better chance of being able to keep your artist list unique.
Also, one thing I would do for your autocomplete is set it up to ignore certain strings. ('a', 'an', 'the') These are known as stopwords, and help search results be more relevant by not performing a search on common words.

PHP/mysql array search algorithm

I'd like to be able to use php search an array (or better yet, a column of a mysql table) for a particular string. However, my goal is for it to return the string it finds and the number of matching characters (in the right order) or some other way to see how reasonable the search results are, so then I can make use of that info to decide if I want to display the top result by default or give the user options of the top few.
I know I can do something like
$citysearch = mysql_query(" SELECT city FROM $table WHERE city LIKE '$city' ");
but I can't figure out a way to determine how accurate it is.
The goal would be:
a) find "Milwaukee" if the search term were "milwakee" or something similar.
b) if the search term were "west", return things like "West Bend" and "Westmont".
Anyone know a good way to do this?
You should check out full text searching in MySQL. Also check out Zend's port of the Apache Lucene project, Zend_Search_Lucene.
More searching led me to the Levenshtein distance and then to similar_text, which proved to be the best way to do this.
similar_text("input string", "match against this", $pct_accuracy);
compares the strings and then saves the accuracy as a variable. The Levenshtein distance determines how many delete, insert, or replace functions on a single character it would need to do to get from one string to the other, with an allowance for weighting each function differently (eg. you can make it cost more to replace a character than to delete a character). It's apparently faster but less accurate than similar_text. Other posts I've read elsewhere have mentioned that for strings of fewer than 10000 characters, there's no functional difference in speed.
I ended up using a modified version of something I found to make it work. This ends up saving the top 3 results (except in the case of an exact match).
$input = $_POST["searchcity"];
$accuracy = 0;
$runner1acc = 0;
$runner2acc = 0;
while ($cityarr = mysql_fetch_row($allcities)) {
$cityname = $cityarr[1];
$cityid = $cityarr[0];
$city = strtolower($cityname);
$diff = similar_text($input, $city, $tempacc);
// check for an exact match
if ($tempacc == '100') {
// closest word is this one (exact match)
$closest = $cityname;
$closestid = $cityid;
$accuracy = 100;
break;
}
if ($tempacc >= $accuracy) { // more accurate than current leader
$runner2 = $runner1;
$runner2id = $runner1id;
$runner2acc = $runner1acc;
$runner1 = $closest;
$runner1id = $closestid;
$runner1acc = $accuracy;
$closest = $cityname;
$closestid = $cityid;
$accuracy = $tempacc;
}
if (($tempacc < $accuracy)&&($tempacc >= $runner1acc)) { // new 2nd place
$runner2 = $runner1;
$runner2id = $runner1id;
$runner2acc = $runner1acc;
$runner1 = $cityname;
$runner1id = $cityid;
$runner1acc = $tempacc;
}
if (($tempacc < $runner1acc)&&($tempacc >= $runner2acc)) { // new 3rd place
$runner2 = $cityname;
$runner2id = $cityid;
$runner2acc = $tempacc;
}
}
echo "Input word: $input\n<BR>";
if ($accuracy == 100) {
echo "Exact match found: $closestid $closest\n";
} elseif ($accuracy > 70) { // for high accuracies, assumes that it's correct
echo "We think you meant $closestid $closest ($accuracy)\n";
} else {
echo "Did you mean:<BR>";
echo "$closestid $closest? ($accuracy)<BR>\n";
echo "$runner1id $runner1 ($runner1acc)<BR>\n";
echo "$runner2id $runner2 ($runner2acc)<BR>\n";
}
This can be very complicated, and I am not personally aware of any good 3rd party libraries although I'm sure they exist. Others may be able to suggest some canned solutions, though.
I have written something similar from scratch a few times in the past. If you go down that route, it is probably not something you'd want to do in PHP by itself as every query would involve getting all of the records and performing your calculations on them. It will almost certainly involve creating a set of index tables that meet your specifications.
For instance, you would have to come up with rules for how you imagine that "Milwaukee" could end up spelled "milwakee." My solution to this was to do vowel compression and duplication compression (not sure if these are actually search terms). So, milwaukee would be indexed as:
milwaukee
m_lw__k__
m_lw_k_
When the search query came in for "milwaukee", I would run the same process on the text input, and then run a search on the index table for:
SELECT cityId,
COUNT(*)
FROM myCityIndexTable
WHERE term IN ('milwaukee', 'm_lw__k__', 'm_lw_k_')
When the search query came in for "milwakee", I would run the same process on the text input, and then run a search on the index table for:
SELECT cityId,
COUNT(*)
FROM myCityIndexTable
WHERE term IN ('milwaukee', 'm_lw_k__', 'm_lw_k_')
In the case of Milwaukee (spelled correctly), it would return "3" for the count.
In the case of Milwakee (spelled incorrectly) ,it would return "2" for the count (since it would not match the m_lw__k__ pattern as it only had one vowel in the middle).
If you sort the results based on the count, you would end up meeting one of your rules, that "Milwaukee" would end up being sorted higher as a possible match than "Milwakee."
If you want to build this system in a generic way (as hinted by your use of $table in the query) then you'd probably need another mapping table somewhere in there to map your terms to the appropriate table.
I'm not suggesting this is the best (or even a good) way to go about this, just something I've done in the past that might prove useful to you if you plan to try and do this without a third party solution.
Most maddening result with LIKE is this one "%man" this will return all woman in file!
In case of listing perhaps a not too bad solution is to keep on shortening the searching needle. In your case a match will come up when your searching $ is as short as "milwa".

Categories