finding words in mysql huge database - php

So, i've never worked with a database this huge. We are talking about 200.000.000++ words that i want to be able to search through. How should i approach this? using the normal "where" statement would take 10+++ minutes, should i split up the database or something?
Any help would be great!

MySQL FULLTEXT indexes are quite useful when searching for words. You have to define the fields which contain the relevant text/character strings as indexes. Then you can use
SELECT * FROM table WHERE MATCH (text_index_field) AGAINST ('what you need to look for');

You should use MySql FULLTEXT indexing.
Use AlTER TABLE for create a FULLTEXT index on your desire column.
and from http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Full-text indexes can be used only with MyISAM tables. (In MySQL 5.6 and up, they can also be used with InnoDB tables.) Full-text indexes can be created only for CHAR, VARCHAR, or TEXT columns.

Related

PHP/MySQL: match / against escaping stopword

Here's my query:
SELECT * from description WHERE (match(description) AGAINST ( '+will +smith' in boolean mode))
I'm aware that will is a stopword that's why i'm getting an empty result.
How would it work that i can still use both words for this query? Do i need to escape it in somekind of way?
There isn't a way to "escape" a stopword for a given search. Think of it this way: when creating the fulltext index, it skips indexing words if they are stopwords. That is, the words are not stored in the fulltext index. So you can't subsequently escape the word in a given search and have a word magically appear in the fulltext index since that wasn't included when the index was created.
Assuming you are using fulltext search with InnoDB, the solution is apparently to define your own table storing stopwords. Then you can put a customized set of words into the table, and use the configuration variable innodb_ft_server_stopword_table to make your instance of MySQL use your custom table before creating your fulltext index. This way, the word you want to be indexed will be included as it builds the fulltext index.
See https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html
But this is a global variable, so it will affect all fulltext index creation on all tables on that MySQL instance. I suppose you could set the innodb_ft_server_stopword_table to your custom table, build your fulltext index, and then set the option back to its usual value. But that would be tricky, because anytime you rebuild your fulltext index (for instance during an alter table or optimize table), it would revert to the default stopwords.

Is there a way to compress a MySQL column where values repeat very often?

I have a InnoDB table with a VARCHAR column, with tens of thousands instances of the same text under it. Is there a way to compact it on-the-fly in order to save space? Is some kind of INDEX enough?
Can't InnoDB see that the values are the same, and use less space by internally assigning them some ID or whatever?
If the task is as simple as it seems, then what you are looking for is normalisation.
In simple terms, what you have to do is make this column contain Foreign Keys to another table, which has the values for this table. Now, store newer values in the other table, and when a value previously exists you do not need to make another entry for that in the table. Form this relation between the tables and in your original table a huge amount of space will be saved.
I suggest you to read up about redundancies and normalisation.
Hope it solves your problem.
You can use MySQL ENUM data type. It stores the values as indexes, but upon select you see the text value.
Here is the documentation:
http://dev.mysql.com/doc/refman/5.7/en/enum.html
Cons are that not all databases support ENUM type so you may find that as a problem if some day you decide to switch databases.
There also some other limitations pointed here:
http://dev.mysql.com/doc/refman/5.7/en/enum.html#enum-limits

MySql - how can you create a unique constraint on a combination of two values in two columns

I have a problem with creating index described in answer for this question: sql unique constraint on a 2 columns combination
I am using MySql, and I received syntax error, my version of this query is as follows:
CREATE UNIQUE INDEX ON friends (LEAST(userID, friendID), GREATEST(userID, friendID));
LEAST and GREATEST functions are available in MySql, but maybe the syntax should be different?
I tried to make an ALTER TABLE version, but it does not worked as well.
In MySQL, you can't use functions as the values for indexes.
The documentation does not explicitly state this, however, it is a basic characteristic of an index to only support "fixed" data:
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
Generally, this "fixed" data is an individual column/field; with string-fields (such as varchar or text) you can have a prefix-index and not the entire column. Check out CREATE INDEX for more info on that.
The unique index that you're trying to create in you example will have a single record ever; that's not really a beneficial index since it doesn't help for searching the entire table. However, if you index your table on userID, friendID, using the LEAST() and GREATEST() functions in a SELECT statement will be optimized thanks to the index itself, so it may be what you're after in this case.

table with about 1 million unique keywords utf_unicode performance

I have a table with 1 million unique keywords in all languages stored in utf_unicode format. Lately I have been having problems with selects with each select taking up to 1 second. This is really causing a slowdown in the queries.
The structure for the keyword table is (keyword_id, keyword, dirty) -> The keyword_id is the primary key, keyword has unique index and dirty has a simple index. keyword has a varchar type with 20 chars max. The dirty is a boolean.
The problems are being faced when selecting with "keyword" in the where field. How can I speed this table up.
I am using MySQL with PHP.
SAMPLE QUERY
SELECT k_id
FROM table
WHERE keyword = "movies"
Have you considered using a memory table instead of myisam in my experience is goes 10 times faster then myisam. You'll just need another table to rebuild from if the server crashes. Also instead of varchar use char 20. This will make the table a fixed format and mysql will be able to find it's result much faster.
If you have unique keywords and if you aren't doing any similarity/"like" queries, then you can create a hash index. That would guarantee a single row lookup.
The minor disadvantage this may have is that creating a hash index may take up more space than a regular index(btree based).
Refrences:
https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html
https://dev.mysql.com/doc/refman/8.0/en/create-index.html

phpmyadmin, how to update table,field support full-text search with aql sentence?

I have a database, there already has 300,000+ rows in it. Now I want add a full-text search function? I want to update my table field myassac.title support a full-text search. How to control in the phpmyadmin with sql sentence? Thanks.
As long as the myassac table is a MyISAM type table (InnoDB does NOT support fulltext indexes), then it's a simple matter of
ALTER TABLE myassac ADD FULLTEXT (title);

Categories