SQL Find similar word and order by matching - php

How can i find similar word in sql? for example:
i have those words -
hell, llo, hl, lh
I want to find the matching word to "hello" or "helloworld"
And the order will be by most matching letters
i tried to do with "LIKE" but it limited results

You may have some luck with fulltext indexes

I think you are looking for something like this:
SELECT words.*
FROM words
WHERE 'hello' LIKE CONCAT('%', word, '%')
ORDER BY
LENGTH(word) DESC
Please see fiddle here. Please notice that this query won't be very fast, as it can't make use of an index.

Try using the LOCATE function, which returns the first character position when it finds a needle in a haystack. You could do that for each of your terms with an IF > 0 THEN...:
SELECT * FROM tablename
ORDER BY
(
IF(LOCATE('hell', field_to_search) > 0, 1, 0)
+ IF(LOCATE('llo', field_to_search) > 0, 1, 0)
+ IF(LOCATE('hl', field_to_search) > 0, 1, 0)
+ IF(LOCATE('lh', field_to_search) > 0, 1, 0)
) DESC
If field_to_search contains (case insensitively) all four terms, it will have 4 in that order by field, and zero if it doesn't match any etc. This won't limit your results like WHERE field_to_search LIKE '%hell%' etc.
Using the IF(LOCATE structure will also allow you to give different weighting to different search terms, e.g. if it matches hell you might give it 4 points, llo might give it 3. So it could theoretically match hl and lh but not hell or llo and still come up above a term that matches the top two (if you weight the bottom two higher than the top etc).
Documentation: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate

Related

Copy column B to column A (sorta)

I'm trying to figure out a safe way (if possible) to take the "url" column and copy it into the "company" column ... but I don't just want to copy the whole thing - let me try to explain.
I want to copy the company name like for example: apple, mircosoft..etc (from the path) and place them into their company column (to the left). I have about 5000+ results that need to be done and done safety. They all have the same file path structure with "../../images...."
Could I use something like UPDATE with SET?
UPDATE table SET company = url
Thanks for an feedback! I really appreciate it!
You can pick out the N left-most "fields" in a string separated by a character of your choice.
SELECT SUBSTRING_INDEX(url, '/', 4)
FROM mytable
LIMIT 10;
Returns:
../../images/apple
etc.
Then use a -1 to get the right-most field of that result to get the last one.
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(url, '/', 4), '/', -1)
FROM mytable
LIMIT 10;
Returns:
apple
Once you are happy with the expression, use it in an UPDATE:
UPDATE mytable SET company =
SUBSTRING_INDEX(SUBSTRING_INDEX(url, '/', 4), '/', -1);
See https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substring-index
In mysql 8 you can use the regexp_substr function to use a regular expression to find a substring. Something like:
SELECT REGEXP_SUBSTR(url, '[^/]+', 14) FROM table;
Which would find any character that is not a / but starting at character 14 (part right after your leading ../../...etc string.

SQL Query for closest match at the beginning of a string

Am currently using Mysql and PHP.
Looking for a query that will take a number and find the closet match for the begining of a set of digits, for example I have the number 019235678910, 026725678910, 026825678910 and my table looks like this.
Table - Destintation
Name Number
Watford 01923
Oxford 026
Romford 026
Crawford 0267
Topford 02672
So when I pass 019235678910 the result would be Watford, 026725678910 would be Topford and 026825678910 would be Oxford and Romford.
I'm also not sure if MYSQL can do this directly or would need to work in conjunction with PHP?
Here one way for getting all of them:
select d.*
from Destination d join
(select length(Number) as maxlen, number
from destination d
where YOURVALUE like concat(Number, '%')
order by maxlen desc
limit 1
) dsum
on d.Number = dsum.Number
Because you are looking for initial sequences, there is only one maximum match on the numbers (hence the limit 1 works).
By the way, the field called number is clearly a character field. Personally, I think it bad practice to call a character field "number" -- something called cognitive dissonance.
SELECT Name, Number
FROM Destintation
WHERE LEFT('026725678910', LENGTH(Number)) = Number
or perhaps
WHERE '026725678910' LIKE CONCAT(Number, '%')

MySQL Full-text search - search for short words

I have a problem. I made a simple search engine which searches by brand and model of car. For reasons of query performance and a lot of data in database, I decided to use full-text search. It's ok, but now I come across the problem:
I would like to find all cars with brand "Audi" and with model "Q7". For now, I have this SQL query, but it doesn't work right, because of word length "Q7":
SELECT `a`.`id`, `a`.`title`, `a`.`askprice`, `a`.`description`, `a`.`picture`
FROM (`mm_ads` as a)
WHERE `a`.`category` = '227'
AND `a`.`askprice` >= '0'
AND `a`.`askprice` <= '144000'
AND (MATCH(a.title) AGAINST ('+audi +q7' IN BOOLEAN MODE ))
GROUP BY `a`.`id`
ORDER BY `a`.`id` ASC
LIMIT 30
I don't have access to modify MySQL config file, to set ft_min_word_len to value 2. For now value is 3. Is there any other way to deal with that?
Here is another problem:
I would like to get all cars brand "BMW" and model "116". For example, I have a car named BMW, 1, 116i. My SQL query is:
`SELECT `a`.`id`, `a`.`title`, `a`.`askprice`, `a`.`description`, `a`.`picture`
FROM (`mm_ads` as a)
WHERE `a`.`category` = '227'
AND `a`.`askprice` >= '0'
AND `a`.`askprice` <= '144000'
AND (MATCH(a.title) AGAINST ('+bmw +116' IN BOOLEAN MODE))
GROUP BY `a`.`id`
ORDER BY `a`.`id` ASC
LIMIT 30`
Search return 0 rows. Why? All input strings ("BMW", "116") are min length 3. What am I doing wrong?
Regards, Mario
I had a similar issue when dealing with match against (regarding text length) and my answer was to strlen the string first and switch between like and match against for shorter words. Not what I would call graceful, but it was all I could do since I too had no access to the config.
As for the second question, are you sure the default isn't 4? I recall I couldn't search on the term "art" in my case. 3 letters. Had to go with like on everything below 4 chars.
Unless you have access to the config file and can change it I fear there is very little to do.
A change to ft_min_word_len requires a server restart and a full rebuild of the full text index.
As found here
Try this:
for this search: "bmw 116i"
(MATCH(a.title) AGAINST ('+bmw +116i "bmw 116i"' IN BOOLEAN MODE ))
not the best solution but might help...

PHP mysql search queries

I'm trying to create a search engine for an inventory based site. The issue is that I have information inside bbtags (like in [b]test[/b] sentence, the test should be valued at 3, whereas sentence should be valued at 1).
Here is an example of an index:
My test sentence, my my (has a SKU of TST-DFS)
The Database:
|Product| word |relevancy|
| 1 | my | 3 |
| 1 | test | 1 |
| 1 |sentence| 1 |
| 1 | TST-DFS| 10 |
But how would I match TST-DFS if the user typed in TST DFS? I would like that SKU to have a relevancy of say 8, instead of the full 10..
I have heard that the FULL TEXT search feature in MySQL would help, but I can't seem to find a good way to do it. I would like to avoid things like UNIONS, and to keep the query as optimized as possible.
Any help with coming up with a good system for this would be great.
Thanks,
Max
But how would I match TST-DFS if the user typed in TST DFS?
I would like that SKU to have a relevancy of say 8, instead of the full 10..
If I got the question right, the answer is actually easy.
Well, if you forge your query a little before sending it to mysql.
Ok, let's say we have $query and it contains TST-DFS.
Are we gonna focus on word spans?
I suppose we should, as most search engines do, so:
$ok=preg_match_all('#\w+#',$query,$m);
Now if that pattern matched... $m[0] contains the list of words in $query.
This can be fine-tuned to your SKU, but matching against full words in a AND fashion is pretty much what the user presumes is happening. (as it happens over google and yahoo)
Then we need to cook a $expr expression that will be injected into our final query.
if(!$ok) { // the search string is non-alphanumeric
$expr="false";
} else { // the search contains words that are no in $m[0]
$expr='';
foreach($m[0] as $word) {
if($expr)
$expr.=" AND "; // put an AND inbetween "LIKE" subexpressions
$s_word=addslashes($word); // I put a s_ to remind me the variable
// is safe to include in a SQL statement, that's me
$expr.="word LIKE '%$s_word%'";
}
}
Now $expr should look like "words LIKE '%TST%' AND words LIKE '%DFS%'"
With that value, we can build the final query:
$s_expr="($expr)";
$s_query=addslashes($query);
$s_fullquery=
"SELECT (Product,word,if((word LIKE '$s_query'),relevancy,relevancy-2) as relevancy) ".
"FROM some_index ".
"WHERE word LIKE '$s_query' OR $s_expr";
Which shall read, for "TST-DFS":
SELECT (Product,word,if((word LIKE 'TST-DFS'),relevancy,relevancy-2) as relevancy)
FROM some_index
WHERE word LIKE 'TST-DFS' OR (word LIKE '%TST%' AND word LIKE '%DFS%')
As you can see, in the first SELECT line, if the match is partial, mysql will return relevancy-2
In the third one, the WHERE clause, if the full match fails, $s_expr, the partial match query we cooked in advance, is tried instead.
I like to lower case everything and strip out special characters (like in a phone number or credit card I take everything out on both sides that isn't a number)
Rather than try to create your own FTS solution, you could try to fit the MySQL FTS engine to your requirements. What I've seen done is create a new table to store your FTS data. Create a column for each different piece of data that you want to have a different relevance. For your sku field you could store the raw sku, with spaces, underscores, hyphens and any other special character intact. Then store a stripped down version with all these things removed. You may also want to store a version with leading zeros removed, as people often leave things like that out. You can store all these variations in the same column. Store your product name in another column, and the product description in another column. Create a separate index on each column. Then when you do your search, you can search each column individually, and multiply the rank of the results based on how important you think that column is. So you could multiply sku results by 10, title by 5 and leave description results as is. You may have to do a little experimentation to get the results you want, but it may ultimately be simpler than creating your own index.
Create a keywords table. Something along the lines of:
integer keywordId (autoincrement) | varchar keyword | int pointValue
Assign all possible keywords, skus, etc, into this table. Create another table, a post-keywords bridge, (assuming postId is the id you've assigned in your original table) along the lines of:
integer keywordId | integer postId
Once you have this, you can easily add keywords to each post as it is interested. To calculate total point value for a given post, a query such as the following should do the trick:
SELECT sum(pointValue) FROM keywordPostsBridge kpb
JOIN keywords k ON k.keywordId = kpb.keywordId
WHERE kpb.postId = YOUR_INTENDED_POST
I think the solution is quite straightforward unless I missed something.
Basically run two search, one is exact match, the other is like match or regex match.
Join two resultsets together, like match left join exact match. Then for example:
final_relevancy = (IFNULL(like_relevancy, 0) + IFNULL(exact_relevancy, 0) * 3) / 4
I didn't try this myself though. Just an idea.
I would add a column that is stripped of all special character's, misspellings, and then upcased (or create a function that compares on text that has been stripped and upcased). That way your relevancy will be consistent.
/*
q and q1 - you table
this query takes too much resources,
make from it update-query ( scheduled task or call it on_save if you develop new system )
*/
SELECT
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
/*many replace with junk characters
or create custom function
or if you have full db access install his https://launchpad.net/mysql-udf-regexp
*/
THEN REPLACE(REPLACE( word, '-', ' ' ), '#', ' ')
ELSE word
END word ,
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
THEN 8
ELSE relevancy
END relevancy
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q
UNION
SELECT *
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q1
it is a page coading where query result shows
**i can not use functions by use them work are more easier**
<html>
<head>
</head>
<body>
<?php
//author S_A_KHAN
//date 10/02/2013
$dbcoonect=mysql_connect("127.0.0.1","root");
if (!$dbcoonect)
{
die ('unable to connect'.mysqli_error());
}
else
{
echo "connection successfully <br>";
}
$data_base=mysql_select_db("connect",$dbcoonect);
if ($data_base==FALSE){
die ('unable to connect'.mysqli_error($dbcoonect));
}
else
{
echo "connection successfully done<br>";
***$SQLString = "select * from user where id= " . $_GET["search"] . "";
$QueryResult=mysql_query($SQLString,$dbcoonect);***
echo "<table width='100%' border='1'>\n";
echo "<tr><th bgcolor=gray>Id</th><th bgcolor=gray>Name</th></tr>\n";
while (($Row = mysql_fetch_row($QueryResult)) !== FALSE) {
echo "<tr><td bgcolor=tan>{$Row[0]}</td>";
echo "<td bgcolor=tan>{$Row[1]}</td></tr>";
}
}
?>
</body>
</html>

MySQL a real LIKE statement

I'm helping a friend build a dictionary-of-sorts for a project he's working on. Part of the project is to create a Search functionality. The database is in MySQL, backend in php.
Now, running our simple query was a piece of cake:
SELECT *,
(
(CASE WHEN word LIKE '%$query%' THEN 1 ELSE 0 END) +
(CASE WHEN defin LIKE '%$query%' THEN 1 ELSE 0 END)
) AS relev
FROM dictionary
WHERE word LIKE '%$q%'
OR defin LIKE '%$q%'
ORDER BY relev DESC;
It produced good results; for example, inputting "fire" gave us fire, firemen, firetruck, on fire, etc. However, we also want room for error: We want the mistake "prnk" to give us prank, prink and also pink, or the word "mule" to also suggest the word "mole".
Quite surprisingly, we weren't able to find any information on it. The relevence system is entirely superficial because we don't need actual relevence (just an overall pointer), but we do need something (and that's why we went for the LIKE statement and not the MATCH...AGAINST statement, where we found nowhere to sort by relevence.)
The database only consists of three things: id, word, defin. Simple as that, as that was the required complexity (or simplicity.)
Thanks to anyone in advance.
Try testing if the word sounds like one in the dictionary, so something along the lines of:
SELECT *,
(
(CASE WHEN word LIKE '%$query%' THEN 1 ELSE 0 END) +
(CASE WHEN defin LIKE '%$query%' THEN 1 ELSE 0 END) +
(CASE WHEN LEFT(SOUNDEX(word), 4) = LEFT(SOUNDEX('$query'), 4) THEN 1 ELSE 0 END) +
(CASE WHEN LEFT(SOUNDEX(defin), 4) = LEFT(SOUNDEX('$query'), 4) THEN 1 ELSE 0 END)
) AS relev
FROM dictionary
WHERE word LIKE '%$q%'
OR defin LIKE '%$q%'
ORDER BY relev DESC;
Regarding the prank...
http://webarto.com/80/did-you-mean-api
$q = "prnk"
$dym = new DYM;
$spell = $dym->check($q);
if(!empty($spell)){
echo $spell; // prank
}
(not really API, not really reliable, but it's working in less than 0.5s)
For mule/mole part try finding Levenshtein implementation for SQL...
http://www.artfulsoftware.com/infotree/queries.php?&bw=1280#552 (link not working but Google it)
http://php.net/manual/en/function.levenshtein.php
Try SOUNDS LIKE:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#operator_sounds-like
Take a look at this: Soundex (Sounds Like) function in mysql
Apart from Soundex and Levenshtein, you could also look into Metaphone or Double Metaphone, even though the latter ones doesn't have built-in support in MySQL. PHP supports it though, see metaphone - and there are a few double metaphone implementations floating around as well (ie. http://swoodbridge.com/DoubleMetaPhone/).

Categories