mysql LIKE = imprecise search - php

I'm using a simple query for my search:
SELECT * FROM table WHERE field LIKE '%term%'
if I have a field = "Company Name 123" and I search for Company 123 the result is null
how can I improve this? it only finds if the term is in sequence

Replace spaces with %
$newTerm = str_replace(' ', '%', $term);
$sql = "SELECT * FROM table WHERE field LIKE '%$term%'"
$r = mysql_qery($sql, $conn);

You need to put a % between Company and 123 in order for it to match. You might want to check out full text search functions.

try to replace spaces
$searchtext =str_replace(' ','%',$searchtext);

you could:
split your searchterm into words and build a query with a lot of ANDs (or ORs if you just want to find one of the parts) out of it (ugly, but i've seen this a lot of times)
replace ' '(space) with % (thats a wildcard) in your term (the way to go)

Related

Find most popular strings in a table

I am using MyISAM engine with fulltext indexing for storing a list of strings.
These strings can be a single word, or a sentence.
If I want to know how many times string hello appears in my table, I do
SELECT COUNT(*) Total
FROM String s
WHERE
MATCH (s.name) AGAINST ('hello')
I would like to create a similar report, but for all strings. Result should be a list of TOP-N strings that are the most common in this table (top ones most probably are "the", "a", "to" etc.).
Exact match case is pretty obvious:
SELECT name as String, count(*) as Total
FROM String
GROUP
BY name
ORDER
BY total desc
LIMIT *some number*
But it counts only whole strings.
Is there any way to achieve my desired result?
Thanks.
I guess there is no easy way for this. I would create a "statistic table" for this purpose only. One column for words themselves, one column for the number of occurrences. (Primary key on the first column of course.)
For this with a PL/SQL block scanning all strings, and split them for words.
If the string is not found in the statistic table, you insert a new row.
If the string is found in the statistic table, you increase the value in the second column.
This can run for a pretty long time, but after the first run is ready, you only have to check the new strings on insert, perhaps with a trigger. (Assuming you want to use it not once but regularly.)
Hope this helps, I have no simpler answer.
i think if you use the LIKE command will works
select name, count(*) as total from String where name like '%hello%' group by name order by total
let me know
I didn't find any solution with SQL and my Full text index, but I managed to get my desired result by getting all of my strings from DB and processing them on the backend with php:
//get all strings from DB
$queryResult = $db->query("SELECT name as String FROM String");
//Combine all of them into array
while($row = $queryResult->fetch_array(MYSQLI_ASSOC)) {
$stringArray[] = $row['String'];
}
//"Glue" all these strings into one huge string
$text = implode(" ", $stringArray);
//Make everything lowercase
$textLowercase = strtolower($text);
//Find all words
$result = preg_split('/[^a-z]/', $textLowercase, -1, PREG_SPLIT_NO_EMPTY);
//Filter some unwanted words
$result = array_filter($result, function($x){
return !preg_match("/^(.|the|and|of|to|it|in|or|is|a|an|not|are)$/",$x);
});
//Count a number of occurrence of each word
$result = array_count_values($result);
//Sort
arsort($result);
//Select TOP-N strings, where N is $amount
$result = array_slice($result, 0, $amount);

PHP SQL ignore all characters before "-" like statement

I'm trying to select a mssql row however I want to ignore the characters in a certain column before a certain character.
This is the column format:
|5555-55555|
I want to ignore everything before the dash "-" and only see if my variable $search shows up after. Then that's the row I want. I would prefer if it was an exact match. Is this possible or do I have to create a new column with just the number after the dash?
$search = '55555'
$query .= "WHERE '$columnName' LIKE '%$search%'";
Will match any *-55555
$search = '55555';
$query .= "WHERE '$columnName' LIKE '%-$search'";

how do i get PHP / REGEXP to match all keywords, instead of any?

I have a PHP / mySQL search function that uses one or more keywords in the value $keywords_search . The problem is that it matches ANY keywords, meaning if a search is done it returns results that either contain just "milk" or just "beer" if someone searches for "milk beer". How do I change it so that the result must contain ALL keywords in the string?
The following is the code:
$query[] = "((a.name REGEXP '( )*(" . str_replace(' ', ')*( )*(', $keywords_search) . ")( )*')
OR (a.description REGEXP '( )*(" . str_replace(' ', ')*( )*(', $keywords_search) . ")( )*'))";
Any help is greatly appreciated!
You'd be better off putting your keywords into a normalized child table, which makes such searches trivial. MySQL's regex matching doesn't allow you to specify how many of a series of alternations should be matched, it just tells you if ANY of them matched.
e.g
SELECT count(*) AS cnt
FROM keywords
WHERE keyword IN ('a', 'b', 'c')
GROUP BY id
HAVING (cnt = 3)
would bring up only those records where any particular id has all three keywords present.
With the regex version and an un-normalized string field, you'd have to split the keywords into multiple separate regexes and chain them with AND
e.g.
SELECT ...
FROM ...
WHERE keyword REGEX ('keyword1') AND keyword REGEX ('keyword2') AND ....
You could do something like this with your keywords:
$keywords = array('key', 'word');
$keywordQuery = [];
foreach ($keywords as $keyword) {
$keywordQuery[] = "((a.name REGEXP '( )*({$keyword})( )*') AND (a.description REGEXP '( )*({$keyword})( )*'))";
}
$query = implode(' AND ', $keywordQuery);
Better than Regexp, you could also use MySQL's fulltext searching if you can - Boolean Full-Text Searches.

How to search the random words from a phrase using PHP?

My orginal Pname in the table 'english' is "The Digital Santa Monica Mug".If users try to search using "Digital Mug", its not returning productwith the pname containing digital mug .
am using this query:
select *
from english
where((pname like '%$val%'
or desp1 like '%$search%'
or pid like '%$search%' $key_value)
and warehouse=0 and cid !=49)
group by pid;
use pname like '%".implode('%', explode(' ', $val))."%' instead of pname like '%$val%'
In this case order will matter. Means Digital Mug will give you result but MUG Digital won't.
Use full text searches for that
The thing is not working as The Digital Santa Monica Mug when searched as
Digital Mug will be taken as '%Digital Mug%' which tries to match a value having Digital Mug having words before and after.
Eg : THE Digital Mug Paradise
Such a text will be matched.
So try MYSQL FULL TEXT SEARCH for that
FULL TEXT SEARCH
Either what The C Man advised (split the search phrase and search for every word), or fulltext search.
For "splitting words" method, I'd advise to:
use regular expressions for splitting, something likepreg_match_all('#[a-zA-Z0-9]+#', $text, $words);you don't need to
search for symbols like "$", do you?
write a function that would generate where clause for you.
Function for generating where clause might look like this:
function generateFilter(array $fields, array $words) {
// prepare $word for putting into SQL statement
foreach ( $words as &$word ) {
// ensure that wildcard characters are used as regular characters
$word = str_replace('%', '\\%', $word);
$word = str_replace('_', '\\_', $word);
// prevent SQL injections
$word = mysql_real_escape_string($word);
}
unset($word);
// generate filter
$filter = array();
foreach ( $words as $word ) {
$wordFilter = array();
foreach ( $fields as $field ) {
$wordFilter[] = "{$field} like '%$word%'";
}
$filter[] = implode(' or ', $wordFilter);
}
$filter = '(' . implode(') and (', $filter) . ')';
return $filter;
}
$filter = generateFilter(
array('name', 'surname', 'address'),
array('john', 'doe')
);
echo $filter;
Result:
(name like '%john%' or surname like '%john%' or address like '%john%') and
(name like '%doe%' or surname like '%doe%' or address like '%doe%')
If you use prepared statements (which is highly advised), this function would be a bit more complicated, as resulting string would have placeholders for variables, while $words would be put into some array of variables that have to be bound to prepared statement.
"Splitting words" method works for small strings and small amounts of data. If you have huge amounts of data and/or large strings, consider using fulltext search. It does not require to split search phrase, though it has some limitations - it needs fulltext index on columns that are used for searching (IIRC, you can create index on multiple columns and then use fulltext search on all indexed columns at the same time, i.e., you don't have to search every column spearately), it has minimal length of keyword and it might give non-strict results, e.g., sometimes only 3 of 5 keywords might appear in result. Though, it gives relevance of every result - results that are closer to search terms will have higher relevance. This is useful for sorting results by relevance.
While creating index may seem to be an "extra work" for you, it will allow DBMS to perform the search faster than without index.
You could split up the input value into two different words. In order to do this, do
$term_array = explode(" ", $val);
Now, $term_array will hold both words separately, and you can run queries on the words individually. For example, you could go through the query twice, and run the same query on the single words. However, doing this would result in duplicates (and likely some unnecessary results). You could probably think of some kind of query using the two separated words that would yield better results, though.
To construct query:
$split = explode(" ", $val);
$qry_pname = "pname LIKE '%".implode("%' or pname LIKE '%", $split)."'%";
$qry = "
SELECT *
FROM english
WHERE( $qry_pname
or desp1 like '%$search%'
or pid like '%$search%' $key_value)
and warehouse=0 and cid !=49)
group by pid;
";

Compare words, also need to look for plurals and ing?

I have two list of words, suppose LIST1 and LIST2. I want to compare LIST1 against LIST2 to find the duplicates, but it should find the plural of the word as well as ing form also. For example.
Suppose LIST1 has word "account", and LIST2 has words "accounts,accounting" When i do compare the result should show two match for word "account".
I am doing it in PHP and have the LIST in mysql tables.
You can use a technique called porter stemming to map each list entry to its stem, then compare the stems. An implementation of the Porter Stemming algorithm in PHP can be found here or here.
What I would do is take your word and compare it directly to LIST2 and at the same time remove your word from every word your're comparing looking for a left over ing, s, es to denote a plural or ing word (this should be accurate enough). If not you'll have to generate an algorithm for making plurals out of words as it not as simple as adding an S.
Duplicate Ending List
s
es
ing
LIST1
Gas
Test
LIST2
Gases
Tests
Testing
Now compare List1 to List2. During the same loop of comparison do a direct comparision to items and one where the word, from list 1, is removed from the current word you're looking at in list 2. Now just check is this result is in the Duplicate Ending List.
Hope that makes sense.
The problem with that is, in English at least, plurals are not all standard extensions, nor are present participles. You can make an approximation by using all words +'ing' and +'s', but that will give false positives and negatives.
You can handle it directly in MySQL if you wish.
SELECT DISTINCT l2.word
FROM LIST1 l1, LIST l2
WHERE l1.word = l2.word OR l1.word + 's' = l2.word OR l1.word + 'ing' = l2.word;
This function will output the plural of a word.
http://www.exorithm.com/algorithm/view/pluralize
Something similar could be written for gerunds and present participles (ing forms)
You might consider using the Doctrine Inflector class in conjunction with a stemmer for this.
Here's the algorithm at a high level
Split search string on spaces, process words individually
Lowercase the search word
Strip special characters
Singularize, replace differing portion with wildcard ('%')
Stem, replace differing portion with wildcard ('%')
Here's the function I put together
/**
* Use inflection and stemming to produce a good search string to match subtle
* differences in a MySQL table.
*
* #string $sInputString The string you want to base the search on
* #string $sSearchTable The table you want to search in
* #string $sSearchField The field you want to search
*/
function getMySqlSearchQuery($sInputString, $sSearchTable, $sSearchField)
{
$aInput = explode(' ', strtolower($sInputString));
$aSearch = [];
foreach($aInput as $sInput) {
$sInput = str_replace("'", '', $sInput);
//--------------------
// Inflect
//--------------------
$sInflected = Inflector::singularize($sInput);
// Otherwise replace the part of the inflected string where it differs from the input string
// with a % (wildcard) for the MySQL query
$iPosition = strspn($sInput ^ $sInflected, "\0");
if($iPosition !== null && $iPosition < strlen($sInput)) {
$sInput = substr($sInflected, 0, $iPosition) . '%';
} else {
$sInput = $sInput;
}
//--------------------
// Stem
//--------------------
$sStemmed = stem_english($sInput);
// Otherwise replace the part of the inflected string where it differs from the input string
// with a % (wildcard) for the MySQL query
$iPosition = strspn($sInput ^ $sStemmed, "\0");
if($iPosition !== null && $iPosition < strlen($sInput)) {
$aSearch[] = substr($sStemmed, 0, $iPosition) . '%';
} else {
$aSearch[] = $sInput;
}
}
$sSearch = implode(' ', $aSearch);
return "SELECT * FROM $sSearchTable WHERE LOWER($sSearchField) LIKE '$sSearch';";
}
Which I ran with several test strings
Input String: Mary's Hamburgers
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'mary% hamburger%';
Input String: Office Supplies
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'offic% suppl%';
Input String: Accounting department
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'account% depart%';
Probably not perfect, but it's a good start anyway! Where it will fall down is when multiple matches are returned. There's no logic to determine the best match. That's where things like MySQL fulltext and Lucene come in. Thinking about it a little more, you might be able to use levenshtein to rank multiple results with this approach!

Categories