Search algorithms or tool for searching from database - php

I have this database table:
Column Type
source text
news_id int(12)
heading text
body text
source_url tinytext
time timestamp
news_pic char(100)
location char(128)
tags text
time_created timestamp
hits int(10)
Now I was searching for an algorithm or tool to perform a search for a keyword in this table which contains news data. Keyword should be searched in heading,body,tags and number of hits on the news to give best results.

MySQL already has the tool you need built-in: full-text search. I'm going to assume you know how to interact with MySQL using PHP. If not, look into that first. Anyway ...
1) Add full-text indexes to the fields you want to search:
alter table TABLE_NAME add fulltext(heading);
alter table TABLE_NAME add fulltext(body);
alter table TABLE_NAME add fulltext(tags);
2) Use a match ... against statement to perform a full-text search:
select * from TABLE_NAME where match(heading, body, tags, hits) against ('SEARCH_STRING');
Obviously, substitute your table's name for TABLE_NAME and your search string for SEARCH_STRING in these examples.
I don't see why you'd want to search the number of hits, as it's just an integer. You could sort by number of hits, however, by adding an order clause to your query:
select * from TABLE_NAME where match(heading, body, tags, hits) against ('SEARCH_STRING') order by hits desc;

Related

Advanced search in mysql column with row of words separated by coma

Hello everyone as the topic says I am looking for alternative or advanced using of "LIKE".
I have column which contains a row of words p.e. "keyword1,keyword2,another_keyword" and when I use
$sql = mysql_query("SELECT * FROM table WHERE `column` LIKE '%keyword1%' ");
It hardly find it p.e. this example works but when i try to find shorter strings it has problems and sometimes it does not find anything.
I tried put a whitespace after comas and it helped but if there is a way where I can search for match with this specification of column I would be happy.
You may move keywords into individual table.
Or you can use SET field type, if the list of your keywords don't change.
Storing comma separated list of your words is a bad idea example using like in your scenario is hard to find the exact work in comma separated list instead you can add new table which relates to your current table and store each the word in a new row with the associated identity like
table1
id title
1 test1
2 test2
kewords_table
table1_id word
1 word1
1 word2
1 word3
and query will be
select t.*
from table1 t
join kewords_table k
on(t.id = k.table1_id)
where k.word = 'your_keyword'
If you can't alter your structure you can use find_in_set()
SELECT * FROM table WHERE find_in_set('your_keyword',`column`) > 0
try something like this:
SELECT * FROM tablename
WHERE column LIKE '%keyword1%'
OR column LIKE '%keyword2%';
for more info see here:Using SQL LIKE and IN together
MySQL allows you to perform a full-text search based on very complex queries in the Boolean mode along with Boolean operators. This is why the full-text search in Boolean mode is suitable for experienced users.
First You have to add FULLTEXT index to that perticuler column :
ALTER TABLE table_name ADD search_column TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, ADD FULLTEXT search_column (search_column);
Run following query for searching :
SELECT * FROM table WHERE MATCH(search_column) AGAINST("keyword1")
for more info see here : https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

Select from 3 possible columns, order by occurances / relevance

I have a table that contains 3 text fields, and an ID one.
The table exists solely to get collection of ID's of posts based on relevance of a user search.
Problem is I lack the Einsteinian intellect necessary to warp the SQL continuum to get the desired results -
SELECT `id` FROM `wp_ss_images` WHERE `keywords` LIKE '%cute%' OR `title` LIKE '%cute%' OR `content` LIKE '%cute%'
Is this really enough to get a relevant-to-least-relevant list, or is there a better way?
Minding of course databases could be up to 20k rows, I want to keep it efficient.
Here is an update - I've gone the fulltext route -
EXAMPLE:
SELECT `id` FROM `wp_ss_images` WHERE MATCH (`keywords`,`title`,`content`) AGAINST ('+cute +dog' IN BOOLEAN MODE);
However it seems to be just grabbing all entries with any of the words. How can I refine this to show relevance by occurances?
To get a list of results based on the relevance of the number of occurrences of keywords in each field (meaning cute appears in all three fields first, then in 2 of the fields, etc.), you could do something like this:
SELECT id
FROM (
SELECT id,
(keywords LIKE '%cute%') + (title LIKE '%cute%') + (content LIKE '%cute%') total
FROM wp_ss_images
) t
WHERE total > 0
ORDER BY total DESC
SQL Fiddle Demo
You could concatenate the fields which will be better than searching them individually
SELECT `id` FROM `wp_ss_images` WHERE CONCAT(`keywords`,`title`,`content`) LIKE '%cute%'
This doesn't help with the 'greatest to least' part of your question though.

Efficient search query

I have a table in a database with a structure like this
Keywords
id int(11)
U_id int(11)
keywords text
create_date int(11)
U_id is a foreign key, id is the primary key
The keywords field is a list of words created by users separated by commas
I was wondering if someone could suggest an efficient query to search such a table.
You should change your database design so that you have a table called user_keyword and store each keyword in a separate row. You can then index this table and search it easily and efficiently:
WHERE keyword = 'foo'
If you can't modify the database then you can use FIND_IN_SET but it won't be very efficient:
WHERE FIND_IN_SET('foo', keywords)
Separate keywords in its own table, "connect" it to the old table via FOREIGN KEY, index it and you'll be able to search for exact keywords of keyword prefixes efficiently.
For example:
id U_id keywords create_date
1 - A,B,C -
Becomes:
PARENT_TABLE:
id U_id create_date
1 - -
CHILD_TABLE:
id keyword
1 A
1 B
1 C
Provided there is an index on keyword, the following query should be efficient:
SELECT * FROM PARENT_TABLE
WHERE id IN (SELECT id FROM CHILD_TABLE WHERE keyword = ...)
---EDIT---
Based on Johan's comments below, it appears that InnoDB uses what is known as "index-organized tables" under Oracle or "clusters" under most other databases. Provided you don't need to query "from parent to child" (i.e. "give me all keywords for given id"), the PRIMARY KEY on CHILD_TABLE should be:
{keyword, id}
Since the keyword is the first field in the composite index, WHERE keyword = ... (or WHERE keyword LIKE 'prefix%') can use this index directly.
If you using MyISAM, you can create a fulltext index on field keywords. Then search using:
select * from keywords k where match('test') against(k.keywords);
Of course CSV in a database is just about the worst thing you can do. You should put keywords in a separate table. Make sure to use InnoDB for all tables.
Table tags
-------------
id integer auto_increment primary key
keyword_id integer foreign key references keywords(id)
keyword varchar(40)
Now you can select using:
SELECT k.* FROM keywords k
INNER JOIN tags t ON (t.keyword_id = k.id)
WHERE t.keyword LIKE 'test' //case insensitive comparison.
Much much faster than CSV.
You have 2 options :
re-structor your database, create an extra table called Keywords, and that should include a U_id which will be a foreign key mapped to your user table, and that way you can easily insert each keyword into the Keywords table and then search it using something :
SELECT * FROM Keywords WHERE keyword LIKE %KEYWORD%
you can get the keywords field, seperate the keywords and put them into an array using your preferred language and then search the array.

SQL search query like stackoverflow

Like when the user writes the article title in the input field, I want to search existing articles to see if there are similar ones.
For eg.
SQL search query like stackoverflow
I want to find the most relevant articles related to this title.
I know it's something like:
WHERE article_title LIKE 'word'
but how do I handle multiple keywords?
Use a fulltext index, which'd be something like:
SELECT ... FROM ... WHERE MATCH (fieldname) AGAINST ('keyword keyword keyword');
Or hack up the query to look like
SELECT ... FROM ... WHERE (fieldname LIKE '%keyword%' OR fieldname LIKE '%keyword%' etc...)
Of the two, the fulltext version will be faster, as it can use an index. The 'LIKE %...% version will be very expensive, as wildcard search of that sort cannot use indexes at all. The downside is that fulltext indexes are only available on MyISAM tables, and will probably never be available for InnoDB tables.
You need to have full text search for that.
Make sure you are using MyISAM as the engine for the table you want to search on.
Have the following table
Table articles
--------------
id integer autoincrement PK
title varchar(255) with fulltext index
contents textblob with fulltext index
And use a query like:
SELECT id
, MATCH(title, contents) AGAINST ('$title_of_article_thats_being_edited')
as relevance
FROM articles
WHERE MATCH(title, contents) AGAINST ('$title_of_article_thats_being_edited')
ORDER BY relevance
Note that SO refines the list when you enter tags.
WHERE article_title LIKE '%word1%word2%'
will return all rows in which article_title contains 'word1' and 'word2' in this particular order

Full text search - tag system problem

I store tags in 255 varchar area, like this type;
",keyword1,keyword2,keyword3,key word 324,",keyword1234,
(keyword must start and end comma (commakeyword123comma))
-
I can find a keyword3 like this sql query;
select * from table where keyword like = '%,keyword3,%'
CREATE TABLE IF NOT EXISTS `table1` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`tags` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `tags` (`tags`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2242 ;
INSERT INTO `table1` (`id`, `tags`) VALUES
(2222, ',keyword,'),
(2223, ',word is not big,'),
(2224, ',keyword3,'),
(2225, ',my keys,'),
(2226, ',hello,keyword3,thanks,'),
(2227, ',hello,thanks,keyword3,'),
(2228, ',keyword3,hello,thanks,'),
(2239, ',keyword3 but dont find,'),
(2240, ',dont find keyword3,'),
(2241, ',dont keyword3 find,');
(returns 2224,2226,2227,2228)
-
I must change this like command for FULL TEXT SEARCH.
select * from table1 where match (tags) against (",keyword3," in boolean mode)
sql command find 2239,2240,2241 but i dont want to find %keyword3% or keyword3
http://prntscr.com/137u9
ideas to find only ,keyword3, ?
,keyword3,
thank you
You can't use full text search alone for this - it searches only for words. Here are a few different alternatives you could use:
You can use a full text search to quickly find candidate rows and then afterwords use a LIKE as you are already doing to filter out any false matches from the full text search.
You can use FIND_IN_SET.
You can normalize your database - store only one keyword per row.
INSERT INTO `table1` (`id`, `tag`) VALUES
(2222, 'keyword'),
(2223, 'word is not big'),
(2224, 'keyword3'),
(2225, 'my keys'),
(2226, 'hello'), -- // 2226 has three rows with one keyword in each.
(2226, 'keyword3'),
(2226, 'thanks'),
(2227, 'hello'),
-- etc...
Of those I'd recommend normalizing your database if it is at all possible.
First of all FULL TEXT is intended to be used for text searches. So there are limitations to what you can do with it. To do what you want you need to check the Boolean Mode specifications and see if the " operator can help you, but even with this your searches may not be 100% accurate. You would need to impose a word format for your keywords (preferably no word delimiters inside them like ).
Is there a reason for storing all the tags in one row?
I would store each "tag" in a row then do as andreas suggests and do something like this:
SELECT * FROM table1 WHERE tag IN('keyword0', 'keyword1', 'etc.')
If you need, for some reason, to return all the tags in one row, you could store them individually and GROUP_CONCAT them together.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

Categories