Optimizing mysql fulltext search - php

I want to make a search with fulltext in my web. I need the search with a pagination. my database have 50,000+ rows/per table. I have alter my table and make (title,content,date) to be index. the table is always update, there still have a column id which is automatic increase. and the latest date is always at the end of table.
date varchar(10)
title text
content text
but whole query time will cost 1.5+ seconds. I search the many articles via google, some wrote that only limit Index field word length can help the search more quickly. but as a text type, it can not alter a certain length like that( i have tried ALTER TABLE table_1 CHANGEtitletitleTEXT(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, not work)
date varchar(10)
title text(500)
content text(1000)
so, Except Sphinx and third part script. how to optimization fulltext search with only sql? query code here:
(SELECT
title,content,date
FROM table_1
WHERE MATCH (title,content,date)
AGAINST ('+$Search' IN BOOLEAN MODE))
UNION
(SELECT
title,content,date
FROM table_2
WHERE MATCH (title,content,date)
AGAINST ('+$Search' IN BOOLEAN MODE))
Order By date DESC
Thanks.

Based on the question's follow-up comments, you've a btree index on your columns rather than a full text index.
For MATCH (title,content) against search, you would need:
CREATE FULLTEXT INDEX index_name ON tbl_name (title,content);
I'm not sure it'll accept the date field in there (the latter is probably not relevant anyway).

I have a comprehensive plan for you to optimize MySQL for FULLTEXT indexing as thoroughly as possible
The first thing you should do is : Get rid of the stopword list
This has annoyed some people over the years because of being unaware that over 600 words are excluded from a FULLTEXT index.
Here is tabular view of those stopwords.
There are two ways to bypass this
Bypass Option 1) Create a custom stopword list.
You can actually submit to mysql a list of your preferred stopwords. Here is the default:
mysql> show variables like 'ft%';
+--------------------------+----------------+
| Variable_name | Value |
+--------------------------+----------------+
| ft_boolean_syntax | + -><()~*:""&| |
| ft_max_word_len | 84 |
| ft_min_word_len | 4 |
| ft_query_expansion_limit | 20 |
| ft_stopword_file | (built-in) |
+--------------------------+----------------+
5 rows in set (0.00 sec)
OK, not let's create our stopword list. I usually set the English articles as the only stopwords.
echo "a" > /var/lib/mysql/stopwords.txt
echo "an" >> /var/lib/mysql/stopwords.txt
echo "the" >> /var/lib/mysql/stopwords.txt
Next, add the option to /etc/my.cnf plus allowing 1-letter, 2-letter, and 3 letter words
[mysqld]
ft_min_word_len=1
ft_stopword_file=/var/lib/mysql/stopwords.txt
Finally, restart mysql
service mysql restart
If you have any tables with FULLTEXT indexes already in place, you must drop those FULLTEXT indexes and create them again.
Bypass Option 2) Recompile the source code
The filename is storage/myisam/ft_static.c. Just alter the C structure that holds the 600+ words so that it is empty. Having fun recompiling !!!
Now that the FULLTEXT config is solidified, here is another major aspect to consider:
Write proper refactored queries so that the MySQL Query Optimizer works right !!!
What I am now mentioning is really undocumented: Whenever you perform queries that do JOINs and the WHERE clause contains the MATCH function for FULLTEXT searching, it tends to cause the MySQL Query Optimizer to treat the query like a full table scan when it comes to searching the columns invoved in the FULLTEXT index. If you plan to query a table using a FULLTEXT index, ALWAYS refactor your query to have the FULLTEXT search return only keys in a subquery and connect those keys to your main table. Otherwise, the FULLTEXT index will put the MySQL Query Optimizer in a tailspin.

For further ideas regarding full-text search optimization in MySQL, see How to optimize MySQL Boolean Full-Text Search? (Or what to replace it with?) - C#

Related

SQL query is very slow for certain parameters (MySQL)

I am making a PHP backend API which executes a query on MySQL database. This is the query:
SELECT * FROM $TABLE_GAMES WHERE
($GAME_RECEIVERID = '$userId'OR $GAME_OTHERID = '$userId')
ORDER BY $GAME_ID LIMIT 1"
Essentially, I'm passing $userId as parameter, and getting row with smallest $GAME_ID value and it would return result in less than 100 ms for users that have around 30 000 matching rows in table. However, I have since added new users, that have around <100 matching rows, and query is painfully slow for them, taking around 20-30 seconds every time.
I'm puzzled to why the query is so much slower in situations where it is supposed to return low amount of rows, and extremely fast when returns huge amount of rows especially since I have ORDER BY.
I have read about parameter sniffing, but as far as I know, that's the SQL Server thing, and I'm using MySQL.
EDIT
Here is the SHOW CREATE statement:
CREATE TABLEgames(
IDint(11) NOT NULL AUTO_INCREMENT,
SenderIDint(11) NOT NULL,
ReceiverIDint(11) NOT NULL,
OtherIDint(11) NOT NULL,
Timestamptimestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (ID)
) ENGINE=MyISAM AUTO_INCREMENT=17275279 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Here is the output of EXPLAIN
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
| 1 | SIMPLE | games | NULL | index | NULL | PRIMARY | 4 | NULL | 1 |
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE games NULL index NULL PRIMARY 4 NULL 1 19.00 Using where
I tried prepared statement, but still getting the same result.
Sorry for poor formatting, I'm still noob at this.
You need to use EXPLAIN to analyse the performance of the query.
i.e.
EXPLAIN SELECT * FROM $TABLE_GAMES WHERE
($GAME_RECEIVERID = '$userId'OR $GAME_OTHERID = '$userId')
ORDER BY $GAME_ID LIMIT 1"
The EXPLAIN would provide the information about the select query with execution plan.
It is great tool to identify the slowness in the query. Based on the obtained information you can create the Indexes for the columns used in WHERE clause .
CREATE INDEX index_name ON table_name (column_list)
This would definitely increase the performance of the query.
Your query is being slow because it cannot find a matching record fast enough. With users where a lot of rows match, chances of finding a record to return are much higher, all other things being equal.
That behavior appears when $GAME_RECEIVERID and $GAME_OTHERID aren't part of an index, prompting MySQL to use the index on $GAME_ID because of the ordering. However, since newer players have not played the early games, there are literally millions of rows that won't match, but have to be checked nonetheless.
Unfortunately, this is bound to get worse even for old users, as your database grows. Ideally, you will add indexes on $GAME_RECEIVERID and $GAME_OTHERID - something like:
ALTER TABLE games
ADD INDEX receiver (ReceiverID),
ADD INDEX other (OtherID)
PS: Altering a 17 million rows table is going to take a while, so make sure to do it during a maintenance window or similar if this is used in production.
Is this the query after the interpolation? That is, is this what MySQL will see?
SELECT * FROM GAMES
WHERE RECEIVERID = '123'
OR OTHERID = '123'
ORDER BY ID LIMIT 1
Then this will run fast, regardless:
SELECT *
FROM GAMES
WHERE ID = LEAST(
( SELECT MIN(ID) FROM GAMES WHERE RECEIVERID = '123' ),
( SELECT MIN(ID) FROM GAMES WHERE OTHERID = '123' )
);
But, you will need both of these:
INDEX(RECEIVERID, ID),
INDEX(OTHERID, ID)
Your version of the query is scanning the table until it finds a matching row. My version will
make two indexed lookups;
fetch the other columns for the one row.
It will be the same, fast, speed regardless of how many rows there are for USERID.
(Recommend switching to InnoDB.)

Advanced search in mysql column with row of words separated by coma

Hello everyone as the topic says I am looking for alternative or advanced using of "LIKE".
I have column which contains a row of words p.e. "keyword1,keyword2,another_keyword" and when I use
$sql = mysql_query("SELECT * FROM table WHERE `column` LIKE '%keyword1%' ");
It hardly find it p.e. this example works but when i try to find shorter strings it has problems and sometimes it does not find anything.
I tried put a whitespace after comas and it helped but if there is a way where I can search for match with this specification of column I would be happy.
You may move keywords into individual table.
Or you can use SET field type, if the list of your keywords don't change.
Storing comma separated list of your words is a bad idea example using like in your scenario is hard to find the exact work in comma separated list instead you can add new table which relates to your current table and store each the word in a new row with the associated identity like
table1
id title
1 test1
2 test2
kewords_table
table1_id word
1 word1
1 word2
1 word3
and query will be
select t.*
from table1 t
join kewords_table k
on(t.id = k.table1_id)
where k.word = 'your_keyword'
If you can't alter your structure you can use find_in_set()
SELECT * FROM table WHERE find_in_set('your_keyword',`column`) > 0
try something like this:
SELECT * FROM tablename
WHERE column LIKE '%keyword1%'
OR column LIKE '%keyword2%';
for more info see here:Using SQL LIKE and IN together
MySQL allows you to perform a full-text search based on very complex queries in the Boolean mode along with Boolean operators. This is why the full-text search in Boolean mode is suitable for experienced users.
First You have to add FULLTEXT index to that perticuler column :
ALTER TABLE table_name ADD search_column TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, ADD FULLTEXT search_column (search_column);
Run following query for searching :
SELECT * FROM table WHERE MATCH(search_column) AGAINST("keyword1")
for more info see here : https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

MySQL Like function

On my html page, the user has the option to either enter a text string, check mark options, or do both. This data is then placed inside a mysql query which displays the data.
The fact that the user is allowed to enter a string means that I am using the LIKE function in the mysql query.
Correct me if I am wrong, but I believe the LIKE function can slow the query down a lot.
In relation to the above statement, I would like to know whether an empty string in the LIKE function would make a difference, so for example:
select * from hello;
select * from hello where name like "%%";
If it does make a significant difference (I believe this database will be growing larger) what are your ideas on how to deal with this.
My first idea was that I will have 2 queries:
One with the like functionality
and one without the like functionality. Depending on what the user enters, the correct query will be called.
So for example if the user leaves the search box empty, the like function will not be needed, there fore it will send a null character, and an if statement will select the other option (without the like functionality) when it sees there is a null character.
Is there a better way of doing this?
In general, the LIKE function will be slow unless it begins with a fixed string and the column has an index. If you do LIKE 'foo%', it can use the index to find all rows that begin with foo, because MySQL indexes use B-trees. But LIKE '%foo' cannot make use of an index, because B-trees only optimize looking for prefixes; this has to do a sequential scan of the entire table.
And even when you use the version with a prefix, the performance improvement depends on how much that prefix reduces the number of rows that have to be searched. If you do LIKE 'foo%bar', and 90% of your rows begin with foo, this will still have to scan 90% of the table to test whether they end with bar.
Since LIKE '%%' doesn't have a fixed prefix, it will perform a full scan of the table, even though there isn't actually anything to search for. It would be best if your PHP script tested whether the user provided a search string, and omit the LIKE test if there's nothing to search for.
I believe the LIKE function can slow the query down a lot
I would expect that not to be the case. How hard would it be to test it?
Regardless which version of the query you run, the DBMS still has to examine every row in the table. That will require some extra work by the CPU, but for large tables, disk I/O will be the limiting factor. LIKE '%%' will discard rows with null values - hence potentially reducing the amount of data the DBMS needs to retain in the result set / transfer to the client which may be significant saving.
As Barbar says, providing an expression without a leading wildcard will allow the DBMS to use an index (if one is available) which will have a big impact on performance.
Its hard to tell from your question (you didn't provide much in the way of example queries/data nor any detail of what the application does) but the solution to your problem might be full text indexing
Using the World database sample from the mysql software distribution, I first did a simple explain on queries with and without where clauses without filtering effects:
mysql> explain select * from City;
mysql> explain select * from City where true;
mysql> explain select * from City where Name = Name;
In these first three cases, the result is as follow:
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| 1 | SIMPLE | City | ALL | NULL | NULL | NULL | NULL | 4080 | |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
While for the last query, I got the following:
mysql> explain select * from City where Name like "%%";
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| 1 | SIMPLE | City | ALL | NULL | NULL | NULL | NULL | 4080 | Using where |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
You can see that for this particular query, the where condition was not optimized away.
I also performed a couple of measurements, to check if indeed there would be a sensible difference, but:
the table having only 4080 rows, I used a self cross join to render longer computation times
I used having clauses to cut down on display overhead (1).
Measurement results:
mysql> select c1.Name, c2.Name from City c1, City c2 where concat(c1.Name,c2.Name) = concat(c1.Name,c2.Name) having c1.Name = "";
Empty set (5.22 sec)
The above query, as well as one with true or c1.Name = c1.Name performed sensibly the same, within less than a 0.1 sec margin.
mysql> reset query cache;
mysql> select c1.Name, c2.Name from City c1, City c2 where concat(c1.Name,c2.Name) like "%%" having c1.Name = "";
Empty set (13.80 sec)
This one also took around the same amount of time when run several times (in between query cache resets) (2).
Clearly the query optimizer doesn't see an opportunity for the later case. The conclusion is that you should try to avoid as much as possible the use of that clause, even if it doesn't change the result set.
(1): having clause filtering happening after data consolidation from the query, I assumed it shouldn't change the actual query computation load ratio.
(2): interestingly, I initially tried a simple where c1.Name like ”%%", and got around 5.0 sec. timing results, which led me to try out with a more elaborate clause. I don't think that result changes the overall conclusion; it could be that in that very specific case, the filtering actually has a beneficial effect. Hopefully a mysql guru will explain that result.

Possible way to find Username from 1000,000 Users entries

I have Db of 100,000 users in MYSQL. In that DB i am having column ID,username,Fname,Lname, etc..
When www.example.com/Jim or www.example.com/123 (Where JIM is username and 123 is ID in the users table)
I am using MYSQL query : select * from users where ID = 123 OR username = Jim
I am executing above query in PHP.
Output of the above query is :
| ID | Username | fname | lname |
+----+----------+--------+---------+
|123 | jim | Jim | Jonson |
My Problem is its taking huge time to select username or ID in the DB.
I have used following query
SELECT * FROMusersUSE INDEX (UsersIndexId) where id=123
Is this right way to call Index ?
EXPLAIN SELECT * FROM `users` WHERE ID =327
OP
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users Const PRIMARY,UsersIndexId PRIMARY 4 const
1
I sugest you take a look at this: How MySQL Uses Indexes
Quoting from the first paragraph:
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially.
That should help speed up your search.
(Edit: Updated the link to a newer version of the SQL docs)
PS: More specifically, column indexes might be what you want.
You can find more info about adding indexes here: Create Index Syntax
To complete #Kjartan answer, you can try the following :
ALTER TABLE users ADD INDEX id_i (`ID`);
ALTER TABLE users ADD INDEX username_i (`Username`);
Your queries should be faster.

Detecting spammers with MySQL

I see an ever increasing number of users signing up on my site to just send duplicate SPAM messages to other users. I've added some server side code to detect duplicate messages with the following mysql query:
SELECT count(content) as msgs_sent
FROM messages
WHERE sender_id = '.$sender_id.'
GROUP BY content having count(content) > 10
The query works well but now they're getting around this by changing a few charctersr in their messages. Is there a way to detect this with MySQL or do I need to look at each grouping returned from MySQL and then use PHP to determine the percentage of similarity?
Any thoughts or suggestions?
Fulltext Match
You could look at implementing something similar to the MATCH example here:
mysql> SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)
So for your example, perhaps:
SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages
WHERE MATCH (content) AGAINST ('your string')
AND score > 1;
Note that to use these functions your content column would need to be a FULLTEXT index.
What is score in this example?
It is a relevance value. It is computed through the process described below:
Every correct word in the collection and in the query is weighted
according to its significance in the collection or query.
Consequently, a word that is present in many documents has a lower
weight (and may even have a zero weight), because it has lower
semantic value in this particular collection. Conversely, if the word
is rare, it receives a higher weight. The weights of the words are
combined to compute the relevance of the row.
From the documentation page.

Categories