Full text search - tag system problem - php

I store tags in 255 varchar area, like this type;
",keyword1,keyword2,keyword3,key word 324,",keyword1234,
(keyword must start and end comma (commakeyword123comma))
-
I can find a keyword3 like this sql query;
select * from table where keyword like = '%,keyword3,%'
CREATE TABLE IF NOT EXISTS `table1` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`tags` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `tags` (`tags`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2242 ;
INSERT INTO `table1` (`id`, `tags`) VALUES
(2222, ',keyword,'),
(2223, ',word is not big,'),
(2224, ',keyword3,'),
(2225, ',my keys,'),
(2226, ',hello,keyword3,thanks,'),
(2227, ',hello,thanks,keyword3,'),
(2228, ',keyword3,hello,thanks,'),
(2239, ',keyword3 but dont find,'),
(2240, ',dont find keyword3,'),
(2241, ',dont keyword3 find,');
(returns 2224,2226,2227,2228)
-
I must change this like command for FULL TEXT SEARCH.
select * from table1 where match (tags) against (",keyword3," in boolean mode)
sql command find 2239,2240,2241 but i dont want to find %keyword3% or keyword3
http://prntscr.com/137u9
ideas to find only ,keyword3, ?
,keyword3,
thank you

You can't use full text search alone for this - it searches only for words. Here are a few different alternatives you could use:
You can use a full text search to quickly find candidate rows and then afterwords use a LIKE as you are already doing to filter out any false matches from the full text search.
You can use FIND_IN_SET.
You can normalize your database - store only one keyword per row.
INSERT INTO `table1` (`id`, `tag`) VALUES
(2222, 'keyword'),
(2223, 'word is not big'),
(2224, 'keyword3'),
(2225, 'my keys'),
(2226, 'hello'), -- // 2226 has three rows with one keyword in each.
(2226, 'keyword3'),
(2226, 'thanks'),
(2227, 'hello'),
-- etc...
Of those I'd recommend normalizing your database if it is at all possible.

First of all FULL TEXT is intended to be used for text searches. So there are limitations to what you can do with it. To do what you want you need to check the Boolean Mode specifications and see if the " operator can help you, but even with this your searches may not be 100% accurate. You would need to impose a word format for your keywords (preferably no word delimiters inside them like ).

Is there a reason for storing all the tags in one row?
I would store each "tag" in a row then do as andreas suggests and do something like this:
SELECT * FROM table1 WHERE tag IN('keyword0', 'keyword1', 'etc.')
If you need, for some reason, to return all the tags in one row, you could store them individually and GROUP_CONCAT them together.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

Related

MySQL - How to check if a value exists before appending to a TEXT Field?

On duplicated record, I want to update the record by appending a string to the TEXT column in table, on the condition that the appending value does not already exist in that TEXT column.
I have come so far with my query
INSERT INTO events (event_id, event_types)
VALUES ("1", "partyEvent")
ON DUPLICATE KEY UPDATE event_types = CONCAT(event_types, ",testEvent")
Is there a such check with MySQL, or is necessary that I fetch the record and do the comparison myself with PHP?
It looks like event_types is a denormalized field, containing a comma-separated sequence of text strings. With respect, this is a notorious database design antipattern. The next programmer to work on your code will be very unhappy indeed.
I'll answer your question even though it pains me.
First of all, how can you tell whether a particular string occurs within a comma-separated set of text strings? FIND_IN_SET() can do this.
FIND_IN_SET('testEvent', event_types)
returns a value greater than zero if 'testEvent' shows up in the column.
So we can use it in your event_types = clause. If FIND_IN_SET comes up with a positive number, you want event_types = event_types, that is, an unchanged value. If not, you want what you have in your question. How to do this? Use IF(condition,trueval,falseval). Try this:
...UPDATE event_types = IF(FIND_IN_SET('testEvent',event_types) > 0,
CONCAT(event_types, ',', 'testEvent'),
event_types)
There's a much better way however. Make a new table called event_types defined like this.
CREATE TABLE event_types (
event_id INT(11) NOT NULL,
event_type VARCHAR(50) NOT NULL,
PRIMARY KEY (event_id, event_type)
)
This has a compound primary key, meaning it cannot have duplicate event_type values for any particular event_id.
Then you will add a type to an event using this query:
INSERT IGNORE INTO event_types (event_id, event_type)
VALUES (1, 'testEvent');
The IGNORE tells MySQL to be quiet if there's already a duplicate.
If you must have your event types comma-separated for processing by some program, this aggregate query with GROUP_CONCAT() will produce them..
SELECT e.event_id, GROUP_CONCAT(t.event_type ORDER BY t.event_type) event_types
FROM events e
LEFT JOIN event_types t ON e.event_id = t.event_it
GROUP BY e.event_id
You can find all the events with a particular type like this.
SELECT event_id FROM event_types WHERE event_type='testEvent')
Pro tip: Comma separated: bad. Normalized: good.
Don't worry, we've all made this design mistake once or twice.

MySQL Combining FULLTEXT with a LIKE Fallback

I'm building my app to use a single search table for searching all different object types ie: posts, pages, products etc., based on this article.
My table layout looks like so:
CREATE TABLE IF NOT EXISTS myapp_search_index (
id int(11) unsigned NOT NULL,
language_id int(11) unsigned NOT NULL,
`type` varchar(24) COLLATE utf8_unicode_ci NOT NULL,
object_id int(11) unsigned NOT NULL,
`text` text COLLATE utf8_unicode_ci NOT NULL
PRIMARY KEY (id,language_id),
FULLTEXT KEY `text.fdx` (`text`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
My search query looks like so:
$items = $db->escape($query);
$query = $db->query("
SELECT *,
SUM(MATCH(text) AGAINST('+{$items}' IN BOOLEAN MODE)) as score
FROM {$db->prefix}search_index
WHERE MATCH(text) AGAINST('+{$items}' IN BOOLEAN MODE)
GROUP BY language_id, type, object_id
ORDER BY score DESC
LIMIT " . (int)$start . ", " . (int)$limit . "
");
This works great except where we run into fulltext limitations like stop words and min word length.
For instance I have 2 entries in the table for my About Us page, one holds the page title, and one holds the content of the page.
Running the query about us returns no results as about is a stop word, and us is less than the minimum 4 letters.
So, my thought was to create a conditional fallback query using a traditional LIKE parameter as such:
if (!$query->num_rows):
$query = $db->query("
SELECT *
FROM {$db->prefix}search_index
WHERE text LIKE '%{$items}%'
GROUP BY language_id, type, object_id
ORDER BY id DESC
LIMIT " . (int)$start . ", " . (int)$limit . "
");
endif;
And once again this works fine. My About Us page now comes up just fine in the results.
But what I'd like is to run this all in one query and maintain the score somehow.
Is this possible?
EDIT:
Ok so in response to Michael's answer and comments I've changed my query to this:
SELECT *,
SUM(MATCH(text) AGAINST('{$search}' IN BOOLEAN MODE)) as score
FROM {$db->prefix}test_index
WHERE (
MATCH(text) AGAINST('{$search}' IN BOOLEAN MODE)
AND text LIKE '%{$search}%')
OR text LIKE '%{$search}%'
GROUP BY language_id, type, object_id
ORDER BY score DESC
I set up a test table with 100K rows, 50K of which do contain my lorem ipsum search term.
This queries the entire table and returns results in 0.6379 microseconds without any query caching as of yet.
Can anyone tell me if this seems like a fair compromise?
Play around with natural language mode too with multi-word:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN NATURAL LANGUAGE MODE) AS relevance
FROM testproduct
ORDER BY relevance DESC
We often just go with solr integration, throwing json csv and text files at it.
There is not a way to elegantly combine fulltext search and LIKE together to get more results.
This is because the two predicates would have to be combined with an OR, which would in turn mean a full table scan (or full index scan if a suitable BTREE exists) is required to test the LIKE expression. All rows would have to be evaluated, which would remove any optimization you're getting from the fulltext search.
In the opposite situation, combining MATCH and LIKE using AND instead of OR -- in cases where the fulltext match returns insufficiently precise matches -- works great because the optimizer uses the fulltext index to find all possible matching rows, then filters the identified rows against the LIKE expression. (Fulltext indexes are almost always preferred by the optimizer, when other possible query plans exist.) Unfortunately, that's the opposite of what you need.

Advanced search in mysql column with row of words separated by coma

Hello everyone as the topic says I am looking for alternative or advanced using of "LIKE".
I have column which contains a row of words p.e. "keyword1,keyword2,another_keyword" and when I use
$sql = mysql_query("SELECT * FROM table WHERE `column` LIKE '%keyword1%' ");
It hardly find it p.e. this example works but when i try to find shorter strings it has problems and sometimes it does not find anything.
I tried put a whitespace after comas and it helped but if there is a way where I can search for match with this specification of column I would be happy.
You may move keywords into individual table.
Or you can use SET field type, if the list of your keywords don't change.
Storing comma separated list of your words is a bad idea example using like in your scenario is hard to find the exact work in comma separated list instead you can add new table which relates to your current table and store each the word in a new row with the associated identity like
table1
id title
1 test1
2 test2
kewords_table
table1_id word
1 word1
1 word2
1 word3
and query will be
select t.*
from table1 t
join kewords_table k
on(t.id = k.table1_id)
where k.word = 'your_keyword'
If you can't alter your structure you can use find_in_set()
SELECT * FROM table WHERE find_in_set('your_keyword',`column`) > 0
try something like this:
SELECT * FROM tablename
WHERE column LIKE '%keyword1%'
OR column LIKE '%keyword2%';
for more info see here:Using SQL LIKE and IN together
MySQL allows you to perform a full-text search based on very complex queries in the Boolean mode along with Boolean operators. This is why the full-text search in Boolean mode is suitable for experienced users.
First You have to add FULLTEXT index to that perticuler column :
ALTER TABLE table_name ADD search_column TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, ADD FULLTEXT search_column (search_column);
Run following query for searching :
SELECT * FROM table WHERE MATCH(search_column) AGAINST("keyword1")
for more info see here : https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

Whats wrong with these SQL statements?

Problem 1: Using the SQL CREATE TABLE statement, create a table, MOVSTARDIR, with attributes for the movie number, star number, and director number and the 4 acting awards. The primary key is the movie number, star number and director number (all 3), with referential integrity enforced. The director number is the director for that movie, and the star must have appeared in that movie.
Load MOVSTARDIR (from existing tables) using INSERt INTO.
My answer:
CREATE TABLE MOVSTARDIR
(MVNUM SHORT NOT NULL, STARNUM SHORT NOT NULL, DIRNUM SHORT NOT NULL, BESTF TEXT, BESTM TEXT, SUPM TEXT, SUPF TEXT)
ALTER TABLE MOVSTARDIR
ADD PRIMARY KEY (MVNUM,STARNUM,DIRNUM)
INSERT INTO MOVSTARDIR
SELECT MOVIE.MVNUM,STAR.STARNUM,DIRECTOR.DIRNUM... BESTF,BESTM,SUPM,SUPF
FROM MOVSTAR, DIRECTOR, MOVIE
WHERE MOVSTAR.MVNUM=MOVIE.MVNUM
AND MOVIE.DIRNUM=DIRECTOR.DIRNUM`
*Its giving me an error saying something is wrong with "create table" statement and it highlights the word "alter" in the SQL statement. Also how do i add referential integrity?*
Problem 2:List the directors in MOVSTARDIR with the total awards won from the 4 award categories included in the table. List the director name (not number), and the count in each of the 4 categories and the sum for all 4 categories. Group the report by the director name (i.e. one line per director, each director appears once), and order it by the sum (descending). Only show lines where the sum is more than 3.
SELECT DISTINCT DIRNAME, COUNT(BESTF) AS BESTFE, COUNT(BESTM) AS BESTML,
COUNT(SUPM) AS SUPML, COUNT(SUPF) AS SUPFE,
(COUNT(BESTM) COUNT(BESTF) COUNT(SUPM) COUNT(SUPF)) AS TOTAL
FROM MOVSTARDIR, DIRECTOR
WHERE MOVSTARDIR.DIRNUM=DIRECTOR.DIRNUM
AND ((BESTM IS NOT NULL) OR (BESTF IS NOT NULL) OR (SUPM IS NOT NULL)
OR (SUPF IS NOT NULL))
GROUP BY DIRNAME
HAVING (COUNT(BESTM) COUNT(BESTF) COUNT(SUPM) COUNT(SUPF)) 3
ORDER BY (COUNT(BESTM) COUNT(BESTF) COUNT(SUPM) COUNT(SUPF))DESC`
*Problem with this is it list all records not just wins*
if the database is needed i can send the data base through email.
For Problem 1:
If you are using mysql, the query for create should be as follows
CREATE TABLE `MOVSTARDIR` (
`MVNUM` SMALLINT NOT NULL ,
`STARNUM` SMALLINT NOT NULL ,
`DIRNUM` SMALLINT NOT NULL ,
`BESTF` TEXT NOT NULL ,
`BESTM` TEXT NOT NULL ,
`SUPM` TEXT NOT NULL ,
`SUPF` TEXT NOT NULL
);
You're missing the semicolon after each of the statements, causing Access to treat the entire text as one statement.
Your tags show MySQL, SQL Server and SQL. The syntax of the SQL can vary according to the RDBMS.
Assuming you are using MySQL, these are the issues with your query.
a. Data type - There is no SHORT in MySQL. You can use SMALLINT
b. You need to add semi colons after each sql statement
Even if you are using any other RDBMS, you need to refer the corresponding SQL manual and verify that you specify the exact data types.
Access doesn't allow to run a batch of queries, only one by one.
So, run first CREATE TABLE, then ALTER and so on.

Search algorithms or tool for searching from database

I have this database table:
Column Type
source text
news_id int(12)
heading text
body text
source_url tinytext
time timestamp
news_pic char(100)
location char(128)
tags text
time_created timestamp
hits int(10)
Now I was searching for an algorithm or tool to perform a search for a keyword in this table which contains news data. Keyword should be searched in heading,body,tags and number of hits on the news to give best results.
MySQL already has the tool you need built-in: full-text search. I'm going to assume you know how to interact with MySQL using PHP. If not, look into that first. Anyway ...
1) Add full-text indexes to the fields you want to search:
alter table TABLE_NAME add fulltext(heading);
alter table TABLE_NAME add fulltext(body);
alter table TABLE_NAME add fulltext(tags);
2) Use a match ... against statement to perform a full-text search:
select * from TABLE_NAME where match(heading, body, tags, hits) against ('SEARCH_STRING');
Obviously, substitute your table's name for TABLE_NAME and your search string for SEARCH_STRING in these examples.
I don't see why you'd want to search the number of hits, as it's just an integer. You could sort by number of hits, however, by adding an order clause to your query:
select * from TABLE_NAME where match(heading, body, tags, hits) against ('SEARCH_STRING') order by hits desc;

Categories