I'm counting the occurrence of the exact word "value". I have used this code, and the result is this:
description count
value value 2
valueses 1
val 0
I want the results to be like this, where the sub-string of what i'm searching is not counted. In this case valueses count is 0.
description count
value value 2
valueses 0
val 0
Any solutions would be much appreciated thanks.
MySQL has regular-expression support built-in. That could be pretty useful for weeding out false-positives if you can chop up the strings first by your own criteria for the word boundaries:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
If it were me, I'd want to make a stored function to apply boundaries and do reg-ex checks on the chunks all-at-once.
Related
This question already has answers here:
Which is faster — INSTR or LIKE?
(4 answers)
Closed 3 months ago.
I have a table with a column that stores a random string like this:
example_id = qwhs77gt65g7*
Now some of the data on that column has asterisks(*) while others don’t have one.
I need to select those that has one. I’m using this query:
SELECT example_id FROM example_tbl WHERE example_id LIKE ‘%*%’
Now this is usually not a problem but I’m querying millions of rows and as I understand LIKE operator is affecting my performance. It takes hours to complete the query
My question is whats the alternative to the LIKE operator?
PS the asterisks is always at the end of the string. I dont know if that can help
Since you're mentioning "The asterisks are always at the end" then you can try
WHERE example_id LIKE '%*'.
This will finds any values that end with "*"
OR
Is to search for sub-string in columns of the table.
The one way to achieve it to use instr() function, instr() function takes 3 parameters in account .
Syntax : instr( rank, string, sub_string )
rank :
Integer type expression giving the position corresponding to the 1st
character in the string from which the sub-string search begins.
String : String is your text.
sub_string : The substring which you
are looking for. The instr() returns 0 if it does not find the
match.
Now how to apply this to the table?. As instr() function is of x3 so its simple to apply.
eg : Filter[ZCT] Where instr(1,ALLOTEDTO,”Ram”) <> 0.
where 1 is the start position to look for sub-string , ALLOTEDTO is column name which consist of String and last parameter is sub-string itself. This will give you all records from table where ALLOTEDTO column consist sub-string “Ram”
which is equivalent to.
Select * from ZCT where ALLOTEDTO like ‘%Ram%’.
Note: Instr() function is case sensitive so always use the upper-case or lower-case function with it.
So I'm doing a random MATCH AGAINST for phrases/words.
I've got these phrases in the database
WHERE
MATCH (keywords.keyword) AGAINST ('$keyword*' IN BOOLEAN MODE)
how are you?
how will you?
When the search/$keyword is identical to one of the phrases - the other one still gets selected from time to time.
An identical match should be that, identical right?
Is it because the search is matching a single word and not the entire string/phrase?
Can't see how else to solve this.Any help is greatly appreciated! Thanks.
You need to learn about stop words and minimum word length. These are key parameters that control what "words" get indexed.
The stop word list consists of common words, such as "are", that are ignored in the index (and in searching).
The minimum word length is the minimum length of a word. It is 3 (innodb_ft_min_token_size = 3) or 4 (ft_min_word_len = 4).
Both of these can be overridden, but you have to rebuild the index.
I am in the process of learning MySQL and querying, and right now working with PHP to begin with.
For learning purposes I chose a small anagram solver kind of project to begin with.
I found a very old English language word list on the internet freely available to use as the DB.
I tried querying, find in set and full-text search matching but failed.
How can I:
Match a result letter by letter?
For example, let's say that I have the letters S-L-A-O-G to match against the database entry.
Since I have a large database which surely contains many words, I want to have in return of the query:
lag
goal
goals
slag
log
... and so on.
Without having any other results which might have a letter used twice.
How would I solve this with SQL?
Thank you very much for your time.
$str_search = 'SLAOG';
SELECT word
FROM table_name
WHERE word REGEXP '^[{$str_search}]+$' # '^[SLAOG]+$'
// Filter the results in php afterwards
// Loop START
$arr = array();
for($i = 0; $i < strlen($row->word); $i++) {
$h = substr($str_search, $i, 0);
preg_match_all("/{$h}/", $row->word, $arr_matches);
preg_match_all("/{$h}/", $str_search, $arr_matches2);
if (count($arr_matches[0]) > count($arr_matches2[0]))
FALSE; // Amount doesn't add up
}
// Loop END
Basicly run a REGEXP on given words and filter result based on how many occurencies the word compared with the search word.
The REGEXP checks all columns, from beginning to end, with a combination of given words. This may result in more rows then you need, but it will give a nice filter nonetheless.
The loop part is to filter words where a letter is used more times then in the search string. I run a preg_match_all() on each letter in found the word and the search word to check the amount of occurencies, and compare them with count().
If you want a quick and dirty solution....
Split the word you're trying to get anagrams for into individual letters. Assign each letter an individual prime number value, and multiply them all together; eg:
C - 2
A - 3
T - 5
For a total of 30
Then step through your dictionary list, and do the same operation on each word in that. If your target word's value is divisible exactly by the dictionary word's value, then you know that the dictionary word has only letters that occur in your target word.
You can speed it up by pre-calculating the dictionary values, and then querying for just the right values:
SELECT * FROM dictionary WHERE ($searchWordTotal % wordTotal) = 0
(searchWordTotal is the total for the word you're looking for, and wordTotal is the one from the database)
I should get around to writing this properly one of these days....
since you only want words with the letters given, and no others, but you dont need to use all the letters, then i suggest logic like this:
* take your candidate word,
* do a string replace of the first occurrence of each letter in your match set,
* set the new value to null
* then finally wrap all that in a strlength to see if there are any characters left.
you can do all that in sql - but a little procedure will probably look more familiar to most coders.
I have a list of words in a dropdown and I have a single word that is looking for a suiting partner(user is choosing it)
To make this easier for the user(because the list can be very long and the porcess has to be fast) I want to give a possible option.
I already had a look how i van change the selected word.
I want to find the alphabetically "nearest" option but i have no idear how i could find out which word is the nearest neigbore....
I already googled with all words I could think of to get a solution but I couldn´t find something.
Does someone have an idear how i can do it?
The levenshtein function will compute the 'closeness' of 2 string. You could rank the words you have relative to user's string and return the string with the lowest value.
have a look at this library, it contains Fuzzy string matching functions for javascript, including stemming, lehvenstein distance and metaphones: http://code.google.com/p/yeti-witch/
If by alphabetically you mean matching letters read from the left, the answer is easy. Simply go through every letter of the word and compare it with the ones in the select drop down. The word that shares the longest starting substring is your "nearest".
The simplest (and probably fastest) thing in javascript is finding (by binary search) where to put the word in sorted array of your option words using < and > string operators.
For more advanced and precise results, use Levenshtein distance
I am thinking about this all day and can't seem to figure out an memory efficient and speedy way.
The problem is:
for example, I have these letters:
e f j l n r r t t u w x (12 letters)
I am looking for this word
TURTLE (6 letters)
How do I find all the possible words in the full range (12 words) with php?
( Or with python, if that might be a lot easier? )
Things I've tried:
Using permutations: I have made all strings possible using a permutation algorithm, put them in array (only the ones 6 chars long) and do an in_array to check if it matches one of the words in my array with valid words (in this case, containing TURTLE, but sometimes two or three words).
This calculating costs a lot of memory and time, especially with 6+ characters to get permutations of.
creating a regex (I am bad at this). I wanted to create a regex to check if 6 of the 12 (input) characters are in a word from the "valid array". problem is, we don't know what letter from the 12 will be the starting position and the position of the other words.
An example of this would be:
http://drawsomethingwords.net/
I hope you can help me with this problem, as I would really like to fix this.
Thanks for all of your time :)
I've encountered similar problems when writing a crossword editor (e.g., find all words of length 5 with a 'B' in second position). Basically it comes down to:
Process a word list and organize words by length (i.e., a list of all words of length 2, length 3, length 4, etc). The reason is that you often know the length of the word(s) that you wish to search for. If you want to search for words of unknown length, you can repeat a search again for a different word list.
Insert each separate word list into a tertiary search tree which makes searching for words a lot faster. Each node in the tree contains a character and you can descend the tree to search for words. There are also specialized data structures such as a trie but I have not (yet) explored.
Now for your problem, you could use the search tree to write a search function such as
function findWords($tree, $letters) {
// ...
}
where tree is the search tree containing the words of the length that you wish to search for and letters is a list of valid characters. In your example, letters would be the string efjlnrrttuwx.
The search tree allows you to search for words, one character at a time, and you can keep track of characters that you have encountered so far. As long as these characters are in the list of valid letters, you keep searching. Once you've encountered a leaf node in the search tree, you have found an existing word which you can add to the result. If you encounter a character which is not in letters (or it has already been used), you can skip that word and continue the search elsewhere in the search tree.
My crossword editor Palabra contains an implementation of the above steps (a part is done in Python but mostly in C). It works fast enough for Ubuntu's default word list containing roughly 70K words.
There are probably better ways, but this is just off the top of my head:
I assume you have a database of words (i.e. dictionary). Add fields a-z to the database table. Write a script that sums up the count of each letter in the word and writes them in the a-z fields as an integer. I.E. for balloon, the table would look like:
id name a b ... l ... n ... o
1 balloon 1 1 2 ... 1 ... 2
Then when the user enters a word, you calculate how many of each character are in that word and match that up with the database.
// User enters 'zqlamonrlob'
// You count the letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 1 0 0 0 0 0 0 0 0 0 2 1 1 2 0 1 1 0 0 0 0 0 0 0 1
// Query the database
$sql = "SELECT `name` FROM `my_table` WHERE `a` <= {$count['a'] AND `b` <= {$count['b'] ...}";
That would get you a list of words that use some or all of the letters that the user entered.
Here's a regex, just to show it can (but not necessarily should) be done:
preg_match('/^(?:t()|u()|r()|t()|l()|e()|.)+$\1\2\3\4\5\6/i', 'efjlnrrttuwx')
matches.
How does it work? Empty capturing parentheses always match if the preceding letter matches. The backreferences at the end of the regex make sure that each of the characters has participated in the match. Therefore,
preg_match('/^(?:t()|u()|r()|t()|l()|e()|.)+$\1\2\3\4\5\6/i', 'efjlnrrtuwx')
(correctly) will not match because there is only one t in the string but the regex needs two different ts.
The problem is that of course the regex engine has to check many permutations to arrive at this conclusion. While a successful match may be quite fast (175 steps of the regex engine in the first case), an unsuccessful match attempt can be expensive (3816 steps in the second case).
I think you need to approach this problem from the opposite direction.
Loop through your list of words, testing the words with the specified number of characters, to see if the word characters are in the specified character set.