Small scope/huge frustration mystery: buggy array behavior in php - php

So crazy! I have a bug that's 100% reproducible, it happens in only a few lines of code, yet I cannot for the life of me determine what the problem is.
My project is a workout maker, and the mystery involves two functions:
get_pairings: It makes a set of $together_pairs (easy) and $mixed_pairs (annoying), and combines them into $all_pairs, used to make the workout.
make_mixed_pairs: this has different logic depending on whether it's a partner vs solo workout. Both cases return a set of $mixed_pairs (in the same exact format), called by the function above.
The symptoms/clues:
The case of the solo workout is fine, $all_pairs will only contain $mixed_pairs (because as it's defined, $together_pairs are only for partner workouts)
In the case of the a partner workout, when I combine the two sets in get_pairings(), $all_pairs only successfully gets the first set I give it! (If I swap those lines at step 2 and add $together_pairs first, $all_pairs contains only those. If I do $mixed_pairs first, $all_pairs contains only that).
Then if I uncomment that second-to-last line in make_mixed_pairs() just for troubleshooting to see what happens, then $all_pairs does successfully include exercises from both sets!
That suggests the problem is something I'm doing wrong in making the arrays in make_mixed_pairs(), but I confirmed that the resulting format is identical in both cases.
Anyone see what else I could be missing? I've been narrowing down this bug for 4 hours so far- I can't make it any smaller, and I can't see what's wrong :(
Update: I updated the for loop in make_mixed_pairs() to stop at $mixed_pair_count - 1 (instead of just $mixed_pair_count), and now I sometimes get one single 'together_pair' mixed in the $all_pairs results; the same damn one each time, weirdly. Though it's not 'fixed', because again when I change the order that I add the two sets in get_pairings, when I add $together_pairs first, then $all_pairs is ENTIRELY those- it's so strange...
Here are the functions: first get_pairings (relevant part is right before and after step 2):
/**
* Used in make_workout.php: take the user's available resources, and return valid exercises
*/
function get_pairings($exercises, $count, $outdoor_partner_workout)
{
// 1. Prep our variables, and put exercises into the appropriate buckets
$mixed_exercises = array();
$together_pairs = array();
$mixed_pairs = array();
$all_pairs = array();
$selected_pairs = array();
// Sort the valid exercises: self_pairing exercises go as they are, with extra
// array for consistent formatting. Mixed ones go into $mixed_exercises array
// for more specialized pairing in make_mixed_pairs
foreach($exercises as $exercise)
{
if ($exercise['self_pairing'])
{
$pair = array($exercise);
array_push($together_pairs, [$pair]);
}
else
{
$this_exercise = array($exercise);
array_push($mixed_exercises, $this_exercise);
}
}
// Now get the mixed_pairs
$mixed_pairs = make_mixed_pairs($mixed_exercises, $outdoor_partner_workout);
// 2. combine together into one set, and select random pairs for the workout
// Add both sets to the array of all pairs (to pick from afterward)
$all_pairs += $mixed_pairs;
$all_pairs += $together_pairs;
// Now let's choose at random our desired # of pairs, and save them in $selected_pairs
$pairing_keys = array_rand($all_pairs, $count);
foreach($pairing_keys as $key)
{
array_push($selected_pairs, $all_pairs[$key]);
}
// Finally, shuffle it so we don't always see the self-pairs first
shuffle($selected_pairs);
return $selected_pairs;
}
And the other one- make_mixed_pairs: there are two cases, the first is complicated (and shows the bug) and the second is simple (and works):
/**
* Used by get_pairings: in case of a partner workout that has open space (where
* one person can travel to a point while the other does an exercise til they return)
* we'll pair exercises in a special way. (If not, fine to grab random pairs)
*/
function make_mixed_pairs($mixed_exercises, $outdoor_partner_workout)
{
$mixed_pairs = array();
// When it's an outdoor partner workout, we want to pair travelling with stationary
// put them into arrays and then we'll make pairs using one from each
if ($outdoor_partner_workout)
{
$mixed_travelling = array();
$mixed_stationary = array();
foreach($mixed_exercises as $exercise)
{
if ($exercise[0]['travelling'])
{
array_push($mixed_travelling, $exercise);
}
else
{
array_push($mixed_stationary, $exercise);
}
}
shuffle($mixed_travelling);
shuffle($mixed_stationary);
// determine the smaller set, and pair exercises that many times
$mixed_pair_count = min(count($mixed_travelling), count($mixed_stationary));
for ($i=0; $i < $mixed_pair_count; $i++)
{
$this_pair = array($mixed_travelling[$i], $mixed_stationary[$i]);
array_push($mixed_pairs, $this_pair); // problem is adding them here- we get only self_pairs
}
}
// Otherwise we can just grab pairs from mixed_exercises
else
{
// shuffle the array so it's in random order, then chunk it into pairs
shuffle($mixed_exercises);
$mixed_pairs = array_chunk($mixed_exercises, 2);
}
// $mixed_pairs = array_chunk($mixed_exercises, 2); // when I replace it with this, it works
return $mixed_pairs;
}

Oh for Pete's sake: I mentioned this to a friend, who told me that union is flukey in php, and that I should use array_merge instead.
I replaced these lines:
$all_pairs += $together_pairs;
$all_pairs += $mixed_pairs;
with this:
$all_pairs = array_merge($together_pairs, $mixed_pairs);
And now it all works

Related

PHP similar_text is WAY too slow

I am trying to build an 'analysis' feature for my translation software. A translator will be able to analyze a project which checks for similarities in a glossary.
On a project with 10,000 rows (each row contains a source text between 1-500 characters) with a glossary containing 25,000 terms, my current analysis algorithm takes a RIDICULOUS amount of time. I need to get this down to a couple of minutes maximum.
My algorithm looks something like this (I removed code that doesn't effect performance):
foreach($rows as $row){ //10,000 rows
$source = $rows["source"];
$matchPercent = $glossary->findMatch($source); //This line of code is extremely slow
$matchPercents[$matchPercent]++;
}
//Now I have an array of all the matching percentages and how many rows fall into each percentage match
public function findMatch($source)
{
$highestMatchPercent = 0;
foreach ($this->terms as $term) { //25,000 terms
$matchPercent = 0;
similar_text(strtolower($source), strtolower($term), $matchPercent);
$matchPercent = floor($matchPercent);
if ($matchPercent > $highestMatchPercent) $highestMatchPercent = $matchPercent;
if ($highestMatchPercent == 100) return $highestMatchPercent; //Added effeciency
}
return $highestMatchPercent;
}
How can I achieve similar results and speed this process up?
I've tried levenshtein, but it's max character limit is a problem.

Filter a php array to only elements that have a matching value to $name in $data

I have a php script getting all folders in a posts folder and making them into a list.
I have a $postinfo_str variable assigned to a json file for each folder which I am using to store post date and category/tag info etc in.
I also have a $pagetitle variable assigned to a title.php include file for each folder. So say I am on a "June 2018" archive page, the text in that file will be "June 2018". If I am on say a "Tutorials" category page, that will be the text in the title.php.
In the json file, I have:
{
"Arraysortdate": "YYYYMMDD",
"Month": "Month YYYY",
"Category": ["cat1", "cat2", "etc"]
}
I am ordering the array newest to oldest using krsort with Arraysortdate as key.
How do I filter the array using $pagetitle as input, finding if there is a match in $postinfo_str, and if there isn't, remove that folder from the array?
All I can seem to find regarding array sorting is where the info in the $pageinfo_str is basically the array and so by that, the $title is the input and the output is the matching text from the $postinfo_str, whereas I want the output to be the folders that only have the matching text in the $postinfo_str to what the input ($pagetitle) is.
Here is my code I have.. Keep in mind this is flat file, I do not want a database to achieve this. See comments if you want an explaination.
<?php
$BASE_PATH = '/path/to/public_html';
// initial array containing the dirs
$dirs = glob($BASE_PATH.'/testblog/*/posts/*', GLOB_ONLYDIR);
// new array with date as key
$dirinfo_arr = [];
foreach ($dirs as $cdir) {
// get current page title from file
$pagetitle = file_get_contents("includes/title.php");
// get date & post info from file
$dirinfo_str = file_get_contents("$cdir/includes/post-info.json");
$dirinfo = json_decode($dirinfo_str, TRUE);
// add current directory to the info array
$dirinfo['dir'] = $cdir;
// add current dir to new array where date is the key
$dirinfo_arr[$dirinfo['Arraysortdate']] = $dirinfo;
}
// now we sort the new array
krsort($dirinfo_arr);
foreach($dirinfo_arr as $key=>$dir) {
$dirpath = $dir['dir'];
$dirpath = str_replace('/path/to/public_html/', '', $dirpath);
?>
<!--HTML HERE SUCH AS--!>
TEXT <br>
<?php
};
?>
I have difficulties following your problem description. Your code example is slightly confusing. It appears to load the same global includes/title.php for each directory. Meaning, the value of $pagetitle should be the same every iteration. If this is intended, you should probably move that line right outside the loop. If the file contains actual php code, you should probably use
$pagetitle = include 'includes/title.php';
or something similar. If it doesn't, you should probably name it title.txt. If it is not one global file, you should probably add the path to the file_get_contents/include as well. (However, why wouldn't you just add the title in the json struct?)
I'm under the assumption that this happened by accident when trying to provide a minimal code example (?) ... In any case, my answer won't be the perfect answer, but it hopefully can be adapted once understood ;o)
If you only want elements in your array, that fulfill certain properties, you have essentially two choices:
don't put those element in (mostly your code)
foreach ($dirs as $cdir) {
// get current page title from file
$pagetitle = file_get_contents("includes/title.php");
// get date & post info from file
$dirinfo_str = file_get_contents("$cdir/includes/post-info.json");
$dirinfo = json_decode($dirinfo_str, TRUE);
// add current directory to the info array
$dirinfo['dir'] = $cdir;
// add current dir to new array where date is the key
// ------------ NEW --------------
$filtercat = 'cat1';
if(!in_array($filtercat, $dirinfo['Category'])) {
continue;
}
// -------------------------------
$dirinfo_arr[$dirinfo['Arraysortdate']] = $dirinfo;
array_filter the array afterwards, by providing a anonymous function
// ----- before cycling through $dirinfo_arr for output
$filtercat = 'cat1';
$filterfunc = function($dirinfo) use ($filtercat) {
return in_array($filtercat, $dirinfo['Category']));
}
$dirinfo_arr = array_filter($dirinfo_arr, $filterfunc);
you should read up about anonymous functions and how you provide local vars to them, to ease the pain. maybe your use case is bettersuited for array_reduce, which is similar, except you can determine the output of your "filter".
$new = array_filter($array, $func), is just a fancy way of writing:
$new = [];
foreach($array as $key => $value) {
if($func($value)) {
$new[$key] = $value;
}
}
update 1
in my code samples, you could replace in_array($filtercat, $dirinfo['Category']) with in_array($pagetitle, $dirinfo) - if you want to match on anything that's in the json-struct (base level) - or with ($pagetitle == $dirinfo['Month']) if you just want to match the month.
update 2
I understand, that you're probably just starting with php or even programming, so the concept of some "huge database" may be frightening. But tbh, the filesystem is - from a certain point of view - a database as well. However, it usually is quite slow in comparison, it also doesn't provide many features.
In the long run, I would strongly suggest using a database. If you don't like the idea of putting your data in "some database server", use sqlite. However, there is a learning curve involved, if you never had to deal with databases before. In the long run it will be time worth spending, because it simplifys so many things.

Creating a different array related to different times a loop happens php

there. I'm having a problem with creating arrays in certain conditions in php, i'll try to explain. Here's my code:
for ($i = 1; $i < $tamanho_array_afundamento; $i++) {
if ($array_afundamento[$i] - $array_afundamento[$i - 1] > 1) {
$a = $array_afundamento[$i - 1];
$con->query('CREATE TABLE IF NOT EXISTS afunda_$a
SELECT (L1_forma_tensao_max + L1_forma_tensao_min)/2 as L1_forma_tensao, (L2_forma_tensao_max + L2_forma_tensao_min)/2 as L2_forma_tensao, (L3_forma_tensao_max + L3_forma_tensao_min)/2 as L3_forma_tensao
FROM afundamento
WHERE id > $prevNum AND id < $a');
$tabelas_intervalos_afunda1 = ($con->query("SELECT * FROM afunda_$a");
while ($row = $tabelas_intervalos_afunda->fetch(PDO::FETCH_ASSOC)) {
$array_forma_onda_fase1_afund[] = $row['L1_forma_tensao'];
$array_forma_onda_fase2_afund[] = $row['L2_forma_tensao'];
$array_forma_onda_fase3_afund[] = $row['L3_forma_tensao'];
}
$prevNum = $a;
}
}
So as u can see, i have an if statement in a for loop, what i'm wishing to do is to create
one set of:
{
$array_forma_onda_fase1_afund[] = $row['L1_forma_tensao'];
$array_forma_onda_fase2_afund[] = $row['L2_forma_tensao'];
$array_forma_onda_fase3_afund[] = $row['L3_forma_tensao'];
}
every time the if statement is runned. I was trying replacing this in the original code:
{
$array_forma_onda_fase1_afund_$a[] = $row['L1_forma_tensao'];
$array_forma_onda_fase2_afund_$a[] = $row['L2_forma_tensao'];
$array_forma_onda_fase3_afund_$a[] = $row['L3_forma_tensao'];
}
so as $a is changed everytime the if statement is accessed, i could have a different set of these arrays for everytime the if statement is accessed, but php doesn't accept this and i wouldn't have a very good result, though if i can reach it i would be pleased.
But my goal is to get:
{
$array_forma_onda_fase1_afund_1[] = $row['L1_forma_tensao'];
$array_forma_onda_fase2_afund_1[] = $row['L2_forma_tensao'];
$array_forma_onda_fase3_afund_1[] = $row['L3_forma_tensao'];
}
{
$array_forma_onda_fase1_afund_2[] = $row['L1_forma_tensao'];
$array_forma_onda_fase2_afund_2[] = $row['L2_forma_tensao'];
$array_forma_onda_fase3_afund_2[] = $row['L3_forma_tensao'];
}
...
where the last number represents the array retrieved for the n-th time the if statement runned. Does someone have a tip for it?
Thanks in advance! Would appreciate any help.
EDIT
As asked, my real world terms is as follows:
I have a table from which i need to take all the data that is inside a given interval. BUT, there's a problem, my data is a sine function whose amplitude may change indefinite times (the data bank is entered by the user) and, when the amplitude goes inside that interval, i need to make some operations like getting the least value achieved while the data was inside that interval and some other parameters, for each interval separately, (That's why i created all those tables.) and count how many times it happpened.
So, in order to make one of the operations, i need an array with the data for each time the databank entered by the user goes in that interval (given by the limits of the create query.).
If i were not clear, just tell me please!
EDIT 2
Here's the image of part of the table i'm working with:
http://postimg.org/image/5vegnk043/
so, when the sine gets inside the interval i need, it can be seen by the L1_RMS column, who accuses it, so it's when i need to get the interval data until it gets outside the interval. But it may happens as many times as this table entered by the user brings it on and we need to bear in mind that i need all the intervals separately to deal with the data of each one.
Physics uh?
You can do what you wanted with the arrays, it's not pretty, but it's possible.
You can dynamically name your arrays with the _$a in the end, Variables variables, such as:
${"array_forma_onda_fase3_afund_" . $a}[] = "fisica é medo";

Logic Issue - how many / which small boxes in a big box - PHP/MySQL

I've got a problem, and I'll try and describe it in as simplest terms as possible.
Using a combination of PHP and MySQL I need to fulfil the following logic problem, this is a simplified version of what is required, but in a nutshell, the logic is the same.
Think boxes. I have lots of small boxes, and one big box. I need to be able to fill the large box using lots of little boxes.
So lets break this down.
I have a table in MySQL which has the following rows
Table: small_boxes
id | box_size
=============
1 | 100
2 | 150
3 | 200
4 | 1000
5 | 75
..etc
This table can run up to the hundreds, with some boxes being the same size
I now have a requirement to fill one big box, as an example, of size 800, with all the combinations of small_boxes as I find in the table. The big box can be any size that the user wishes to fill.
The goal here is not efficiency, for example, I don't really care about going slightly under, or slightly over, just showing the different variations of boxes that can possibly fit, within a tolerance figure.
So if possible, I'd like to understand how to tackle this problem in PHP/MySQL. I'm quite competent at both, but the problem lies in how I approach this.
Examples would be fantastic, but I'd happily settle for a little info to get me started.
You should probably look into the glorious Knapsack problem
https://codegolf.stackexchange.com/questions/3731/solve-the-knapsack-problem
Read up on this.
Hopefully you've taking algebra 2..
Here is some PHP code that might help you out:
http://rosettacode.org/wiki/Knapsack_problem/0-1#PHP
Thanks to maxhd and Ugo Meda for pointing me in the right direction!
As a result I've come to something very close to what I need. I'm not sure if this even falls into the "Knapsack problem", or whichever variation thereof, but here's the code I've come up with. Feel free to throw me any constructive criticism!
In order to try and get some different variants of boxes inside the knapsack, I've removed the largest item on each main loop iteration, again, if there's a better way, let me know :)
Thanks!
class knapsack {
private $items;
private $knapsack_size;
private $tolerance = 15; //Todo : Need to make this better, perhaps a percentage of knapsack
private $debug = 1;
public function set_knapsack_size($size){
$this->knapsack_size = $size;
}
public function set_items($items){
if(!is_array($items)){
return false;
}
//Array in the format of id=>size, ordered by largest first
$this->items = $items;
}
public function set_tolerance($tolerance){
$this->tolerance = $tolerance;
}
private function remove_large_items(){
//Loop through each of the items making sure we can use this sized item in the knapsack
foreach($this->items as $list_id=>$list){
//Lets look ahead one, and make sure it isn't the last largest item, we will keep largest for oversize.
if($list["size"] > $this->knapsack_size && (isset($this->items[$list_id+1]) && $this->items[$list_id+1]["size"] > $this->knapsack_size)){
unset($this->items[$list_id]);
}else{
//If we ever get here, simply return true as we can start to move on
return true;
}
}
return true;
}
private function append_array($new_data,$array){
if(isset($array[$new_data["id"]])){
$array[$new_data["id"]]["qty"]++;
}else{
$array[$new_data["id"]]["qty"] = 1;
}
return $array;
}
private function process_items(&$selected_items,$knapsack_current_size){
//Loop the list of items to see if we can fit it in the knapsack
foreach($this->items as $list){
//If we can fit the item into the knapsack, lets add it to our selected_items, and move onto the next item
if($list["size"] <= $knapsack_current_size){
$this->debug("Knapsack size is : ".$knapsack_current_size." - We will now take ".$list["size"]." from it");
$selected_items = $this->append_array($list,$selected_items);
$knapsack_current_size -= $list["size"];
//Lets run this method again, start recursion
$knapsack_current_size = $this->process_items($selected_items,$knapsack_current_size);
}else{
//Lets check if we can fit a slightly bigger item into the knapsack, so we can eliminate really small items, within tolerance
if(($list["size"] <= $knapsack_current_size + $this->tolerance) && $knapsack_current_size > 0){
$this->debug("TOLERANCE HIT : Knapsack size is : ".$knapsack_current_size." - We will now take ".$list["size"]." from it");
$selected_items = $this->append_array($list,$selected_items);
$knapsack_current_size -= $list["size"];
}
}
//Lets see if we have to stop the recursion
if($knapsack_current_size < 0){
return $knapsack_current_size;
}
}
}
private function debug($message=""){
if(!$this->debug){
return false;
}
echo $message."\n";
}
public function run(){
//If any of the variables have not been set, return false
if(!is_array($this->items) || !$this->knapsack_size){
return false;
}
//Lets first remove any items that may be too big for the knapsack
$this->remove_large_items();
//Lets now check if we still have items in the array, just incase the knapsack is really small
if(count($this->items) == 0){
return false;
}
//Now that we have a good items list, and we have no items larger than the knapsack, lets move on.
$variants = array();
foreach($this->items as $list_id=>$list){
$this->debug();
$this->debug("Finding variants : ");
$selected_items = array();
$this->process_items($selected_items,$this->knapsack_size);
$variants[] = $selected_items;
//Remove the largest variant, so we get a new set of unique results
unset($this->items[$list_id]);
}
return $variants;
}
}
$products = array(
array("id"=>1,"size"=>90),
array("id"=>2,"size"=>80),
array("id"=>3,"size"=>78),
array("id"=>4,"size"=>66),
array("id"=>5,"size"=>50),
array("id"=>6,"size"=>42),
array("id"=>7,"size"=>36),
array("id"=>8,"size"=>21),
array("id"=>9,"size"=>19),
array("id"=>10,"size"=>13),
array("id"=>11,"size"=>7),
array("id"=>12,"size"=>2),
);
$knapsack = new knapsack();
$knapsack->set_items($products);
$knapsack->set_knapsack_size(62);
$result = $knapsack->run();
var_dump($result);

PHP/mysql array search algorithm

I'd like to be able to use php search an array (or better yet, a column of a mysql table) for a particular string. However, my goal is for it to return the string it finds and the number of matching characters (in the right order) or some other way to see how reasonable the search results are, so then I can make use of that info to decide if I want to display the top result by default or give the user options of the top few.
I know I can do something like
$citysearch = mysql_query(" SELECT city FROM $table WHERE city LIKE '$city' ");
but I can't figure out a way to determine how accurate it is.
The goal would be:
a) find "Milwaukee" if the search term were "milwakee" or something similar.
b) if the search term were "west", return things like "West Bend" and "Westmont".
Anyone know a good way to do this?
You should check out full text searching in MySQL. Also check out Zend's port of the Apache Lucene project, Zend_Search_Lucene.
More searching led me to the Levenshtein distance and then to similar_text, which proved to be the best way to do this.
similar_text("input string", "match against this", $pct_accuracy);
compares the strings and then saves the accuracy as a variable. The Levenshtein distance determines how many delete, insert, or replace functions on a single character it would need to do to get from one string to the other, with an allowance for weighting each function differently (eg. you can make it cost more to replace a character than to delete a character). It's apparently faster but less accurate than similar_text. Other posts I've read elsewhere have mentioned that for strings of fewer than 10000 characters, there's no functional difference in speed.
I ended up using a modified version of something I found to make it work. This ends up saving the top 3 results (except in the case of an exact match).
$input = $_POST["searchcity"];
$accuracy = 0;
$runner1acc = 0;
$runner2acc = 0;
while ($cityarr = mysql_fetch_row($allcities)) {
$cityname = $cityarr[1];
$cityid = $cityarr[0];
$city = strtolower($cityname);
$diff = similar_text($input, $city, $tempacc);
// check for an exact match
if ($tempacc == '100') {
// closest word is this one (exact match)
$closest = $cityname;
$closestid = $cityid;
$accuracy = 100;
break;
}
if ($tempacc >= $accuracy) { // more accurate than current leader
$runner2 = $runner1;
$runner2id = $runner1id;
$runner2acc = $runner1acc;
$runner1 = $closest;
$runner1id = $closestid;
$runner1acc = $accuracy;
$closest = $cityname;
$closestid = $cityid;
$accuracy = $tempacc;
}
if (($tempacc < $accuracy)&&($tempacc >= $runner1acc)) { // new 2nd place
$runner2 = $runner1;
$runner2id = $runner1id;
$runner2acc = $runner1acc;
$runner1 = $cityname;
$runner1id = $cityid;
$runner1acc = $tempacc;
}
if (($tempacc < $runner1acc)&&($tempacc >= $runner2acc)) { // new 3rd place
$runner2 = $cityname;
$runner2id = $cityid;
$runner2acc = $tempacc;
}
}
echo "Input word: $input\n<BR>";
if ($accuracy == 100) {
echo "Exact match found: $closestid $closest\n";
} elseif ($accuracy > 70) { // for high accuracies, assumes that it's correct
echo "We think you meant $closestid $closest ($accuracy)\n";
} else {
echo "Did you mean:<BR>";
echo "$closestid $closest? ($accuracy)<BR>\n";
echo "$runner1id $runner1 ($runner1acc)<BR>\n";
echo "$runner2id $runner2 ($runner2acc)<BR>\n";
}
This can be very complicated, and I am not personally aware of any good 3rd party libraries although I'm sure they exist. Others may be able to suggest some canned solutions, though.
I have written something similar from scratch a few times in the past. If you go down that route, it is probably not something you'd want to do in PHP by itself as every query would involve getting all of the records and performing your calculations on them. It will almost certainly involve creating a set of index tables that meet your specifications.
For instance, you would have to come up with rules for how you imagine that "Milwaukee" could end up spelled "milwakee." My solution to this was to do vowel compression and duplication compression (not sure if these are actually search terms). So, milwaukee would be indexed as:
milwaukee
m_lw__k__
m_lw_k_
When the search query came in for "milwaukee", I would run the same process on the text input, and then run a search on the index table for:
SELECT cityId,
COUNT(*)
FROM myCityIndexTable
WHERE term IN ('milwaukee', 'm_lw__k__', 'm_lw_k_')
When the search query came in for "milwakee", I would run the same process on the text input, and then run a search on the index table for:
SELECT cityId,
COUNT(*)
FROM myCityIndexTable
WHERE term IN ('milwaukee', 'm_lw_k__', 'm_lw_k_')
In the case of Milwaukee (spelled correctly), it would return "3" for the count.
In the case of Milwakee (spelled incorrectly) ,it would return "2" for the count (since it would not match the m_lw__k__ pattern as it only had one vowel in the middle).
If you sort the results based on the count, you would end up meeting one of your rules, that "Milwaukee" would end up being sorted higher as a possible match than "Milwakee."
If you want to build this system in a generic way (as hinted by your use of $table in the query) then you'd probably need another mapping table somewhere in there to map your terms to the appropriate table.
I'm not suggesting this is the best (or even a good) way to go about this, just something I've done in the past that might prove useful to you if you plan to try and do this without a third party solution.
Most maddening result with LIKE is this one "%man" this will return all woman in file!
In case of listing perhaps a not too bad solution is to keep on shortening the searching needle. In your case a match will come up when your searching $ is as short as "milwa".

Categories