Using levenshtein in MySQL search for one result

Using levenshtein in MySQL search for one result - php

I am trying to do a search on my MySQL database to get the row that contains the most similar value to the one searched for.
Even if the closest result is very different, I'd still like to return it (Later on I do a string comparison and add the 'unknown' into the learning pool)
I would like to search my table 'responses' via the 'msg1' column and get one result, the one with the lowest levenshtein score, as in the one that is the most similar out of the whole column.
This sort of thing:
SELECT * FROM people WHERE levenshtein('$message', 'msg1') ORDER BY ??? LIMIT 1
I don't quite grasp the concept of levenshtein here, as you can see I am searching the whole table, sorting it by ??? (the function's score?) and then limiting it to one result.
I'd then like to set $reply to the value in column "reply" from this singular row that I get.
Help would be greatly appreciated, I can't find many examples of what I'm looking for. I may be doing this completely wrong, I'm not sure.
Thank you!

You would do:
SELECT p.*
FROM people p
ORDER BY levenshtein('$message', msg1) ASC
LIMIT 1;
If you want a threshold (to limit the number of rows for sorting, then use a WHERE clause. Otherwise, you just need ORDER BY.

Try this
SELECT * FROM people WHERE levenshtein('$message', 'msg1') <= 0

Related

How to get a SUM of an attribute in Sphinx?

I have Sphinx Search running on production, performing search with keywords, accessed through official sphinxapi.php. Now I need to output a sum of an attribute called price along with search results, similar to SQL query "SELECT SUM(t.price) from table_name t WHERE condition". This data is supposed to be displayed on a web page like "Showing 1 - 10 out of 12345 results, total cost is $67890". As documentation says, SUM() function is available when used with GROUP BY. However, the documentation does not provide enough details on implementation, googling and searching Stackoverflow doesn't help much as well.
Questions:
How should I group the search result?
Can it be performed with 1 Sphinx request, or do I have to get the search results first and then query Sphinx again to get the sum of found documents?
Please advise. An example will be really helpful. Thank you.

You will need to run a second query. The 'sum' is wanted on the WHOLE result set, whereas normal grouping, the aggregation is run per row. In your example, there is an implicit GROUP BY '1' which aggregates all rows.
So would need to use Grouping to do same in sphinx.
http://sphinxsearch.com/docs/current.html#clustering
Using the aggregation function is relatively easy, use with setSelect, but not sure SetGroupBy has a syntax to group all rows so will have to emulate it.
//all normal setup need for normal query here
$cl->SetLimits($offset,$limit);
$cl->AddQuery($query, $index);
//add the group query
$cl->setSelect("1 as one, SUM(price) as sum_price");
$cl->setGroupBy("one",SPH_GROUPBY_ATTR); //dont care about sorting
$cl->setRankingMode(SPH_RANK_NONE); //no point actually ranking results.
$cl->SetLimits(0,1);
$cl->AddQuery($query, $index);
//run both queries at once...
$results = $cl->RunQueries();
var_dump($results);
//$results[0] contains the normal text query results, use its total_found
//$results[1] second contains just the SUM() data
This also shows setting up as Multi-Queries!
http://sphinxsearch.com/docs/current.html#multi-queries

MySQL Select efficient first and last row

I want to get two rows from my table in a MySQL databse. these two rows must be the first one and last one after I ordered them. To achieve this i made two querys, these two:
SELECT dateBegin, dateTimeBegin FROM worktime ORDER BY dateTimeBegin ASC LIMIT 1;";
SELECT dateBegin, dateTimeBegin FROM worktime ORDER BY dateTimeBegin DESC LIMIT 1;";
I decided to not get the entire set and pick the first and last in PHP to avoid possibly very large arrays. My problem is, that I have two querys and I do not really know how efficient this is. I wanted to combine them for example with UNION, but then I would still have to order an unsorted list twice which I also want to avoid, because the second sorting does exactly the same as the first
I would like to order once and then select the first and last value of this ordered list, but I do not know a more efficient way then the one with two querys. I know the perfomance benefit will not be gigantic, but nevertheless I know that the lists are growing and as they get bigger and bigger and I execute this part for some tables I need the most efficient way to do this.
I found a couple of similar topics, but none of them adressed this particular perfomance question.
Any help is highly appreciated.

(This is both an "answer" and a rebuttal to errors in some of the comments.)
INDEX(dateTimeBegin)
will facilitate SELECT ... ORDER BY dateTimeBegin ASC LIMIT 1 and the corresponding row from the other end, using DESC.
MAX(dateTimeBegin) will find only the max value for that column; it will not directly find the rest of the columns in that row. That would require a subquery or JOIN.
INDEX(... DESC) -- The DESC is ignored by MySQL. This is almost never a drawback, since the optimizer is willing to go either direction through an index. The case where it does matter is ORDER BY x ASC, y DESC cannot use INDEX(x, y), nor INDEX(x ASC, y DESC). This is a MySQL deficiency. (Other than that, I agree with Gordon's 'answer'.)
( SELECT ... ASC )
UNION ALL
( SELECT ... DESC )
won't provide much, if any, performance advantage over two separate selects. Pick the technique that keeps your code simpler.
You are almost always better off having a single DATETIME (or TIMESTAMP) field than splitting out the DATE and/or TIME. SELECT DATE(dateTimeBegin), dateTimeBegin ... works simply, and "fast enough". See also the function DATE_FORMAT(). I recommend dropping the dateBegin column and adjusting the code accordingly. Note that shrinking the table may actually speed up the processing more than the cost of DATE(). (The diff will be infinitesimal.)
Without an index starting with dateTimeBegin, any of the techniques would be slow, and get slower as the table grows in size. (I'm pretty sure it can find both the MIN() and MAX() in only one full pass, and do it without sorting. The pair of ORDER BYs would take two full passes, plus two sorts; 5.6 may have an optimization that almost eliminates the sorts.)
If there are two rows with exactly the same min dateTimeBegin, which one you get will be unpredictable.

Your queries are fine. What you want is an index on worktime(dateTimeBegin). MySQL should be smart enough to use this index for both the ASC and DESC sorts. If you test it out, and it is not, then you'll want two indexes: worktime(dateTimeBegin asc) and worktime(dateTimeBegin desc).
Whether you run one query or two is up to you. One query (connected by UNION ALL) is slightly more efficient, because you have only one round-trip to the database. However, two might fit more easily into your code, and the difference in performance is unimportant for most purposes.

sphinx search math operation between fields

Since I'm moving to sphinx search engine to improove my ebsite performance I'm trying to translate the old mysql queries to new sphinx language.
The point is to sort results based on a math operation between votes to my posts and the points given for each vote (going from 1 to 5).
So for example, if i got 3 votes for a post and I got vote 1=5points vote 2=3points and vote 3=2points, my table will contain a field named votes with an integer = 3 (votes=3) and a field with an integer of 5+3+2 (points=10).
Due to this the final rating for such post will be points/votes, in this example it will be 10/3=3,333...
Assuming I'm using the sphinx api to get a list of top rated posts in DESCENDING order, this is the old mysql query i had on my php script:
mysql_query("SELECT * FROM table ORDER BY points/votes DESC LIMIT $start,$stop");
I tried to build a sphinx query, but it is not working and always giving 0 results. Please read tall the // commented lines that describe all the tries I did.
require("sphinxapi.php");
$cl = new SphinxClient;
$index = index;
$cl->setServer("localhost", 9312);
$cl->SetMatchMode(SPH_MATCH_FULLSCAN);
//$cl->SetSortMode(SPH_SORT_EXTENDED, 'IDIV(points,votes) DESC'); //not working
//$cl->SetSortMode(SPH_SORT_EXTENDED, '(points DIV votes) DESC'); //not working
//$cl->SetSortMode(SPH_SORT_EXTENDED, 'points/votes DESC'); //not working
//$cl->SetSortMode(SPH_SORT_EXTENDED, '(points/votes) DESC'); //not working
$cl->setLimits($start,$stop,$max_matches=1000);
$query = "";
Would you please help me out finding what's wrong... thanks.

You will need to use SPH_SORT_EXPR
$cl->SetSortMode(SPH_SORT_EXPR, '(points/votes) DESC');

Firstly you need points and votes to be Attributes, NOT fields. Attributes are stored in the index, can be used for sorting etc. Arithmetic can only be performed on numeric attributes (not strings)
The correct syntax for SPH_SORT_EXPR (assuming you've already got the attributes) would be
$cl->SetSortMode(SPH_SORT_EXPR, 'points/votes');
SPH_SORT_EXPR is ALWAYS descending, so you dont need it DESC on the end.
But rather than have sphinx calculate that ratio every single time, you would porbbaly be better calculating during sql_query and storing it as single number attribute. TIP: store as an integer, not float. Integers are more efficient to sort by.

Understanding COUNT() as `count`

I'm currently learning how to build a site in PHP MySQL. However, I seem to fail to understand COUNT() as count and wouldn't mind some further explanation.
I get the principles of COUNT, 0 || 1, and how it returns all the values that pertain to that query.
But, don't see how COUNT as count works. Anyhow, this is how the code I'm writing goes - so we have a working example - and where I first became perplexed.
"SELECT COUNT(id) as count, id
FROM user
WHERE email='$email' AND password='".md5$password."'"

That is what is called alias which is sometimes used to show a more appealing column header to users or the calling code
SELECT COUNT(`id`) as `count`....
will print
count
--------
5
The alias standing as the column header instead of any arbitrary string: See the SQLFiddle to see the difference
From the fiddle you can see that the header column looks somehow e.g.
count(*)
--------
5

With Count() you can count the returning rows of a result set. The also the official MySQL documentation about count:
Databases are often used to answer the question, “How often does a certain type of data occur in a table?” For example, you might want to know how many pets you have, or how many pets each owner has, or you might want to perform various kinds of census operations on your animals.
Counting the total number of animals you have is the same question as “How many rows are in the pet table?” because there is one record per pet. COUNT(*) counts the number of rows, so the query to count your animals looks like this:
SELECT COUNT(*) FROM pet;
The part with AS count means that this colum will get a name which you can use e.g. in PHP. See also this explenation on w3schools:
You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names.
An alias name could be anything, but usually it is short.

as count is just an alias. You can use as for any field or method selected. it means you change the name of the column being returned in your dataset.
SELECT `field` as another_name
So:
SELECT COUNT(*) as `count`
Just renames the column from COUNT(*) to count making it easier to work with whereever you are maniuplating your result set.
It also makes for easier access within your current query. Many would do the following with large table names:
SELECT * FROM `table_with_ridiculous_name` as twrn WHERE twrn.id = 1

If you ran this sql:
SELECT COUNT(id), id ....
You would get (after doing a *_fetch_assoc) $row['numberofrecordshere'] which would be very hard to echo (or use in a comparison) unless you knew how many records there would be (which would defeat the purpose of this result, anyway)
Returning it as count allows you to get to it in the resulting array by using $row['count']

Array in php mysql query

when i use arrays mysql query it's really slow. are there any tricks that makes this faster?
e.g:
SELECT *
FROM posts
WHERE type IN ('1','2','5')
ORDER BY id ASC
takes much longer then other queries.

If type is integer type, then remove apostrophes:
WHERE type IN (1, 2, 5)
If not - change type type to integer

There a number of things to speed up queries I'd consider looking up the following:
Normalizing. Perhaps the structure of your tables isn't the most efficient?
Indexing. This will improve query times if you KNOW what you want to search on.

EXPLAIN will tell you why your query is slow. Based on that you can make adjustments to your query or your table structure to solve the problem. In this case, use it like this:
EXPLAIN SELECT * FROM posts WHERE type IN (1,2,5) ORDER BY id ASC

Here you need to look two things
First, ('1','2','5') this is string not integer it should be (1,2,5).
Second, you can apply indexes on type field.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Using levenshtein in MySQL search for one result - php

You would do: SELECT p.* FROM people p ORDER BY levenshtein('$message', msg1) ASC LIMIT 1; If you want a threshold (to limit the number of rows for sorting, then use a WHERE clause. Otherwise, you just need ORDER BY.

Try this SELECT * FROM people WHERE levenshtein('$message', 'msg1') <= 0

Related

How to get a SUM of an attribute in Sphinx?

MySQL Select efficient first and last row

sphinx search math operation between fields

Understanding COUNT() as `count`

Array in php mysql query

Categories

Resources