I have a table that contains 3 text fields, and an ID one.
The table exists solely to get collection of ID's of posts based on relevance of a user search.
Problem is I lack the Einsteinian intellect necessary to warp the SQL continuum to get the desired results -
SELECT `id` FROM `wp_ss_images` WHERE `keywords` LIKE '%cute%' OR `title` LIKE '%cute%' OR `content` LIKE '%cute%'
Is this really enough to get a relevant-to-least-relevant list, or is there a better way?
Minding of course databases could be up to 20k rows, I want to keep it efficient.
Here is an update - I've gone the fulltext route -
EXAMPLE:
SELECT `id` FROM `wp_ss_images` WHERE MATCH (`keywords`,`title`,`content`) AGAINST ('+cute +dog' IN BOOLEAN MODE);
However it seems to be just grabbing all entries with any of the words. How can I refine this to show relevance by occurances?
To get a list of results based on the relevance of the number of occurrences of keywords in each field (meaning cute appears in all three fields first, then in 2 of the fields, etc.), you could do something like this:
SELECT id
FROM (
SELECT id,
(keywords LIKE '%cute%') + (title LIKE '%cute%') + (content LIKE '%cute%') total
FROM wp_ss_images
) t
WHERE total > 0
ORDER BY total DESC
SQL Fiddle Demo
You could concatenate the fields which will be better than searching them individually
SELECT `id` FROM `wp_ss_images` WHERE CONCAT(`keywords`,`title`,`content`) LIKE '%cute%'
This doesn't help with the 'greatest to least' part of your question though.
Related
I have a SQL database with music songs. Each song of course has an artist, an album and a genre. They also have a general 'popularity' counter, which was obtained from an external source. However, I want to give users the opportunity to vote on the songs as well. In the end, the search results should be ordered on this popularity, as well as the accuracy of the results with the original query.
The current query I use is as follows:
SELECT *
FROM p2pm_tracks
WHERE
`artist` LIKE '%$searchquestion%' OR
`genres` LIKE '%$searchquestion%' OR
`trackname` LIKE '%$searchquestion%' OR
`album_name` LIKE '%$searchquestion%'
ORDER BY `popularity` DESC
LIMIT $startingpoint, $resultsperpage
I struggle with the following:
Users search for something. I look in all fields: song title, artist, album and genre. However, usually a certain search query contains (parts of) multiple of these tracks.
For instance, a user might search for Opening Philip Glass.
In this case, the first word is the name of the song, and the second and third words are the artist name.
Another example:
If I split the query on spaces, the correct tracks are found. However, if another track that matches only one of these words has a higher popularity, it will be returned before the one that actually accurately matches the search query.
I still want to sort the results in a way that things that match bigger parts of the query at once are at the top. How can I do that using SQL?
I have the static popularity and want to create a new one. Therefore, I want to use the average of all votes on a certain track (these votes are stored in another table), except in the cases where there are no votes yet. How can I construct a SQL query that does this?
My application is built in PHP, but I would like to do as much as possible of this in SQL, preferably in as few queries as possible to reduce latency.
Any help would be appreciated.
You can add a weight for every column in your search results.
Here's the code:
SELECT *,
CASE WHEN `artist` LIKE '%$searchquestion%' THEN 1 ELSE 0 END AS artist_match,
CASE WHEN `genres` LIKE '%$searchquestion%' THEN 1 ELSE 0 END AS genres_match,
CASE WHEN `trackname` LIKE '%$searchquestion%' THEN 1 ELSE 0 END AS trackname_match,
CASE WHEN `album_name` LIKE '%$searchquestion%' THEN 1 ELSE 0 END AS album_name_match,
FROM p2pm_tracks
WHERE
`artist` LIKE '%$searchquestion%' OR
`genres` LIKE '%$searchquestion%' OR
`trackname` LIKE '%$searchquestion%' OR
`album_name` LIKE '%$searchquestion%'
ORDER BY
`artist_match` DESC,
`genres_match` DESC,
`trackname_match` DESC,
`album_name_match` DESC,
`popularity` DESC,
LIMIT $startingpoint, $resultsperpage
This query will gather the results related to:
the artist FIRST,
THEN the genre,
THEN the track's title,
THEN the album's name,
THEN the popularity of the song
To optimize this query, you should avoid using "LIKE" and use "FULLTEXT SEARCH" instead.
The optimized code will be:
SELECT *,
CASE WHEN MATCH (artist) AGAINST ('$searchquestion') THEN 1 ELSE 0 END AS artist_match,
CASE WHEN MATCH (genres) AGAINST ('$searchquestion') THEN 1 ELSE 0 END AS genres_match,
CASE WHEN MATCH (trackname) AGAINST ('$searchquestion') THEN 1 ELSE 0 END AS trackname_match,
CASE WHEN MATCH (album_name) AGAINST ('$searchquestion') THEN 1 ELSE 0 END AS album_name_match,
FROM p2pm_tracks
WHERE
MATCH (artist) AGAINST ('$searchquestion') OR
MATCH (genres) AGAINST ('$searchquestion') OR
MATCH (trackname) AGAINST ('$searchquestion') OR
MATCH (album_name) AGAINST ('$searchquestion')
ORDER BY
`artist_match` DESC,
`genres_match` DESC,
`trackname_match` DESC,
`album_name_match` DESC,
`popularity` DESC,
LIMIT $startingpoint, $resultsperpage
And make sure that you're using the MyISAM engine for the MySQL table and that you created indexes for the columns you want to search.
The code for your MySQL table should look like:
CREATE TABLE p2pm_tracks (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
artist VARCHAR(255) NOT NULL,
trackname VARCHAR(255) NOT NULL,
...
...
FULLTEXT (artist,trackname)
) ENGINE=MyISAM;
For more info, check the following:
- http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html
- http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
If you're looking for something more advanced, then look into Solr (based on Lucene), Sphinx, ElasticSearch (based on Lucene) etc.
MySQL is not that good in searching for text :(
What you could try to do is take a look at full text search functionality (http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html)
With the match against function you can get a relevance where you can order on.
SELECT p2pm_tracks.*,
MATCH (artist, genres) AGAINST ('some words') AS relevance,
MATCH (artist) AGAINST ('some words') AS artist_relevance
Please don't use LIKE. It's very slow. You can use full text search in mysql but you can not determinate which column is more important.
Better solution is mysql with sphinx.
Hmm, to match your 1. example is difficult in SQL, I´m not sure if there is a function.
what you need is something like this funktion in php
http://php.net/manual/function.similar-text.php
Or you select in your sql query only per average vote and calculate how "good" the results match via php and the similar-text function.
I am searching from 3 tables currently (will search in more after sorting this out). This query brings all the results in the order of the tables listed in query. Whereas I want to get the most relevant search results first.
(Select name, url, text, 'behandlinger_scat' AS `table` from behandlinger_scat where name LIKE '%KEYWORD%' OR text LIKE '%KEYWORD%')
UNION
(Select name, url, text, 'hudsykdommer_scat' AS `table` from hudsykdommer_scat where name LIKE '%KEYWORD%' OR text LIKE '%KEYWORD%')
UNION
(Select name, url, text, 'om_oss' AS `table` from om_oss where name LIKE '%KEYWORD%' OR text LIKE '%KEYWORD%')
Any help would be appreciated.
You can use a method to order by the points you dynamically give the results, as in this example (you will need to alias your tables so SQL will understand what column you're referring to):
ORDER BY
CASE WHEN name LIKE table.keywords THEN 100 ELSE 0 END +
CASE WHEN name LIKE table2.keywords THEN 10 ELSE 0 END +
CASE WHEN text LIKE table2.keyword THEN 1 ELSE 0 END
DESC
This is merely an example, but the concept is the following:
You decide how many "points" each "match" will receive (e.g name matches keyword is 100 points, text matches it - a little less) then, each row "accumulates" points with correlation to its matches, and the row with the most points shows first.
I'm building a search on my site and I noticed it doesn't work when you enter more than one word into the search. Here's the gist of the query:
SELECT * FROM `blog` WHERE `content` LIKE '%$keyword%' OR `title` LIKE '%$keyword%' ORDER BY `id` DESC
The weird things is that when I test the query in phpMyAdmin it returns the expected results. On my website however, no results are found.
I tried replacing spaces in the keyword with %s, but that didn't change anything.
The problem is that LIKE does pattern matching rather than actually search for keywords. You should create a fulltext Index on your database columns and use WHERE MATCH keywords AGAINST column. That will properly search for all keywords in any order and be a lot faster anyway.
I just tried this in my database and using LIKE in the query is more than 66 times as fast than using MATCH with fulltext index. I'm using two tables which are "connected" to each other. One is tags and the other one is products.
So what I did was that I added a fulltext index to the tag column in the tags table and performed the match against that column. The query than joins the products and then spits out some data about the item. That took about 4 seconds with ~3000 products & ~3000 tags.
I then tried it by first exploding the search string by whitespaces, and then imploding the result with %' OR tags.tag LIKE '%. This took about 0,06 seconds with the same amount of products and tags.
Like when the user writes the article title in the input field, I want to search existing articles to see if there are similar ones.
For eg.
SQL search query like stackoverflow
I want to find the most relevant articles related to this title.
I know it's something like:
WHERE article_title LIKE 'word'
but how do I handle multiple keywords?
Use a fulltext index, which'd be something like:
SELECT ... FROM ... WHERE MATCH (fieldname) AGAINST ('keyword keyword keyword');
Or hack up the query to look like
SELECT ... FROM ... WHERE (fieldname LIKE '%keyword%' OR fieldname LIKE '%keyword%' etc...)
Of the two, the fulltext version will be faster, as it can use an index. The 'LIKE %...% version will be very expensive, as wildcard search of that sort cannot use indexes at all. The downside is that fulltext indexes are only available on MyISAM tables, and will probably never be available for InnoDB tables.
You need to have full text search for that.
Make sure you are using MyISAM as the engine for the table you want to search on.
Have the following table
Table articles
--------------
id integer autoincrement PK
title varchar(255) with fulltext index
contents textblob with fulltext index
And use a query like:
SELECT id
, MATCH(title, contents) AGAINST ('$title_of_article_thats_being_edited')
as relevance
FROM articles
WHERE MATCH(title, contents) AGAINST ('$title_of_article_thats_being_edited')
ORDER BY relevance
Note that SO refines the list when you enter tags.
WHERE article_title LIKE '%word1%word2%'
will return all rows in which article_title contains 'word1' and 'word2' in this particular order
I have a programme listing database with all the information needed for one programme packed into one table (I should have split programmes and episodes into their own) Now since there are multiple episodes for any given show I wish to display the main page with just the title names in ascending and chosen letter. Now I know how to do the basic query but this is all i know
SELECT DISTINCT title FROM programme_table WHERE title LIKE '$letter%'
I know that works i use it. But I am using a dynamic image loading that requires a series number to return that image full so how do I get the title to be distinct but also load the series number from that title?
I hope I have been clear.
Thanks for any help
Paul
You can substitute the DISTINCT keyword for a GROUP BY clause.
SELECT
title
, series_number
FROM
programme_table
WHERE title LIKE '$letter%'
GROUP BY
title
, series_number
There are currently two other valid options:
The option suggested by Mohammad is to use a HAVING clause in stead of the WHERE clause this is actually less optimal:
The WHERE clause is used to restrict records, and is also used by the query optimizer to determine which indexes and tables to use. HAVING is a "filter" on the final result set, and is applied after ORDER BY and GROUP BY, so MySQL cannot use it to optimize the query.
So HAVING is a lot less optimal and you should only use it when you cannot use 'WHERE' to get your results.
quosoo points out that the DISTINCT keyword is valid for all listed columns in the query. This is true, but generally people do not recommend it (there is no performance difference *In some specific cases there is a performance difference***)**. The MySQL optimizer however spits out the same query for both so there is no actual performance difference.
Update
Although MySQL does apply the same optimization to both queries, there is actually a difference: when DISTINCT is used in combination with a LIMIT clause, MySQL stops as soon as it finds enough unique rows. so
SELECT DISTINCT
title
, series_number
FROM
programme_table
WHERE
title LIKE '$letter%'
is actually the best option.
select title,series_number from programme_table group by title,series_number having title like '$letter%';
DISTINCT keyword works actually for a list of colums so if you just add the series to your query it should return a set of unique title, series combinations:
SELECT DISTINCT title, series FROM programme_table WHERE title LIKE '$letter%'
Hey thanks for that but i have about 1000 entries with the same series so it would single out the series as well rendering about 999 programmes useless and donot show.
I however found out away to make it unique and show the series number
SELECT * FROM four a INNER JOIN (SELECT title, MIN(series) AS MinPid FROM four WHERE title LIKE '$letter%' GROUP BY title) b ON a.title = b.title AND a.series = b.MinPid
Hopefully it helps anyone in the future and thank you for the replies :)