Mysql Match against query with special charters search - php

I have following match against query which searches records from database table based on search phrase.
SELECT * FROM My_Table WHERE MATCH (catchall) AGAINST ('"horse"' IN BOOLEAN MODE)
This query works properly. When search phrase contains special characters like '(' etc It just skips such special characters.
If i search for "(horse)" it gives me same result as it gives for "horse".
SELECT * FROM My_Table WHERE MATCH (catchall) AGAINST ('"(horse)"' IN BOOLEAN MODE)
Does it mean match against query doesn't work with special characters or am i missing something. Please suggest. Thanks.
I tried by removing IN BOOLEAN MODE from the query but it didn't work.

from the documentation:
Parentheses group words into subexpressions. Parenthesized groups can be nested.
if you want to treat prenthes as "word chars", there are two possibilitys:
If you want to change the set of characters that are considered word
characters, you can do so in two ways. Suppose that you want to treat
the hyphen character ('-') as a word character. Use either of these
methods:
Modify the MySQL source: In myisam/ftdefs.h, see the true_word_char()
and misc_word_char() macros. Add '-' to one of those macros and
recompile MySQL.
Modify a character set file: This requires no recompilation. The
true_word_char() macro uses a “character type” table to distinguish
letters and numbers from other characters. . You can edit the
contents in one of the character set XML files to specify
that '-' is a “letter.” Then use the given character set for your
FULLTEXT indexes.
After making the modification, you must rebuild the indexes for each
table that contains any FULLTEXT indexes.
a third way would be to not use MATCH ... AGAINST at all and use LIKE instead - but this might get complicated (if you want to use the other operators of ful-text-searches such as +/-) and slow down your query.

Related

Search for special chars in mysql full text search

I am doing a search function in php, and I am allowing BOOLEAN search, but when I enter text containing chars like #, the query fails.
For example, when I search for #everyone, it throws an error.
I tried to solve this by adding doublequotes, but it doesn't work as expected, since for the search #everyone, it works but returns rows containing everyone and #everone.
I would like to know how we can search for words containing special chars in mysql full text search
Here's my query (simplified) :
SELECT * FROM messages WHERE MATCH(body) AGAINST ('#everyone' IN BOOLEAN MODE)
By default, MySQL does not treat '#' as a valid character for a word. If you want to treat '#' it, then review the documentation on the subject.
After you have made the changes, then you will need to re-build your index.

Slightly complex regex to match a negative look behind followed by an exact phrase

So I have the following regex:
(?<!\.)\b([\w\#\-]+) *\b(IN|NOT IN|LIKE|NOT LIKE|BETWEEN|REGEXP|NOT|IS|XOR)+\b *
which I'm looking to help me match some SQL code.
However, it looks like I'm going to have a problem with the phrases in the second bracket e.g. 'NOT IN' and 'NOT LIKE'
I need a regex that will either match or not match (no partial matches like the way my current regex works).
customers.id NOT IN (SELECT MAX(customers_service.customer_id)) should not match at all
customers.id NOT LIKE (SELECT MAX(customers_service.customer_id)) should not match at all
id NOT IN (SELECT MAX(customers_service.customer_id)) should match
id IN (SELECT MAX(customers_service.customer_id)) should match
I was using RegexBuddy to check and I get matches for No. 1 and No. 2 using my regex.
Also,
id NOT IN (SELECT MAX(customers_service.customer_id)) only matches id NOT, as opposed to id NOT IN
id NOT LIKE (SELECT MAX(customers_service.customer_id)) only matches id NOT, as opposed to id NOT LIKE
I'd like to modify this regex to capture the condition of the negative look behind, and also the exact phrases in the second bracket, or match nothing at all (no partials).
How can I get this done?
First, \b does not match the beginning or end of a word. That's how it's always described, but it's a lie. What \b matches is a position that's followed by a word character but not preceded by one--(?=\w)(?<!\w)--or preceded by a word character and not followed by one--(?<=\w)(?!\w). If those conditions are not exactly what you want to match, you're probably better off not using \b at all.
The names you're trying to match apparently can contain # and - as well as the standard "word" characters (letters, digits and underscores), so word boundaries are useless. In general, to make sure you match a complete word, you would use a negative lookbehind and a negative lookahead:
(?<![\w#-])[\w#-]+(?![\w#-])
In your case, you also want to make sure the preceding character isn't ., and you know the following character has to be whitespace, so that part of your regex would be:
(?<![.\w#-])[\w#-]+\s+
The bigger problem is that this can also match things you don't want it to--i.e., keywords like NOT and IN. I suggest two remedies. First, tighten up the regex for the keywords so compound keywords like NOT IN and NOT LIKE are treated as atomic units:
(?:NOT(?:\s+(?:IN|LIKE))?|IN|LIKE|BETWEEN|REGEXP|IS(?:\s+NOT)?|XOR)\b
Second, use that in a lookahead to make sure the first word you match is not (part of) a keyword. Here's the full regex, split into two lines for readability:
(?<![.\w#-])(?!(?:NOT(?:\s+(?:IN|LIKE))?|IN|LIKE|BETWEEN|REGEXP|IS|XOR)\b)[\w#-]+\s+
(?:NOT(?:\s+(?:IN|LIKE))?|IN|LIKE|BETWEEN|REGEXP|IS|XOR)\b\s*
You can make it easier to maintain by defining a subroutine group for the keywords. Here's how that might look as a PHP string literal:
'~
(?(DEFINE)(?<KEYWORD>
(?:NOT(?:\s+(?:IN|LIKE))?|IN|LIKE|BETWEEN|REGEXP|IS(?:\s+NOT)?|XOR)\b
))
(?<![.\w#-])(?!(?&KEYWORD))[\w#-]+\s+(?&KEYWORD)\s*
~ix'
...and here's a demo.
Your wording's a bit confusing, but as I understand, the negative lookbehind is working as you'd expect.
For the "partial match" problem, you just have to order your keywords by decreasing length:
(?<!\.)\b([\w\#\-]+) *\b(NOT LIKE|BETWEEN|REGEXP|NOT IN|LIKE|NOT|IN|IS|XOR)+\b *
This way it attempts to capture "more complete" keywords before settling for shorter ones.
Edit
I see what's going on, now. In the case of
customers.id NOT IN (SELECT MAX(customers_service.customer_id))
the reason there's a match is that NOT is being matched by (?<!\.)\b([\w\#\-]+), and IN is being matched as the operator. In other words, it thinks NOT is a column name.
The only way to get around this is to add a constraint. For example, if you know the string always begins with a table/column identifier, then do this:
^\s+([\w\#\-]+) *\b(NOT LIKE|BETWEEN|REGEXP|NOT IN|LIKE|NOT|IN|IS|XOR)+\b *
****
No need for a lookbehind nor a word boundary, this way.
If you cannot make that constraint though, then it's tricky, if not completely impractical (since you'd basically have to build an SQL parser out of regex). The key is to give your regular expression some way of distinguishing identifiers from operators; otherwise it can't tell. If you know all your identifiers are lower-case, that might work for your purposes, though flimsy.
Ok then. So after much "regexing", here's the regex that did the trick for me:
(?<=\s)(?!(?:not|is)(?=\s))([\w\#\-]+)(?=\s) (?<=\s)(NOT LIKE|NOT IN|IS NOT|BETWEEN|REGEXP|LIKE|XOR|NOT|IN|IS)(?=\s)
Of course in my preg function I would use a case-insensitive pattern modifier.
I had to find the other pieces from other questions I posted here on StackOverflow.
Cheers.

Why when I create a fulltext index does my match / against query return no results?

I've created a FULLTEXT index ...
ALTER TABLE pads ADD FULLTEXT search (Keywords, ProgramName, English45)
ProgramName is a Varchar however, even if I don't add that in the index I still get no results. In my list of indexes Cardinality is 1 for this index.
Heres the query I'm using.
select PadID from Pads WHERE MATCH(keywords,ProgramName,English45)
AGAINST('games')
However, this is my goal.
select PadID from Pads WHERE MATCH(keywords,ProgramName,English45)
AGAINST('games') AND RemovemeDate = '2001-01-01 00:00:00'
ORDER BY VersionAddDate DESC
Here's my Pads Table fields.
I need my query to return the word where it occurs as part of the three fields.
I guess, you can try
MATCH(keywords,ProgramName,English45)
AGAINST('games' in boolean mode)
default search behavior
A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators. The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. Full-text searches are natural language searches if no modifier is given.
boolean search
A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Common words such as “some” or “then” are stopwords and do not match if present in the search string. The IN BOOLEAN MODE modifier specifies a boolean search

mysql match against query

I am using mysql match against query in my search query like this way
MATCH(film_name) AGAINST ('the vacation' IN BOOLEAN MODE).
But previously i use this one
film_name like '%the vacation%'
So my question is that i am getting the right result now by using match and against but the problem is that when i am using like there i can use the % sign before and after the search string so if the search string present with in the string then it was return the result so plz tell how to write my " MATCH(film_name) AGAINST ('the vacation' IN BOOLEAN MODE) " so that it also behaves `like '%'.
If the file name is 'rocketsingh'
then if i run film_name like '%rocket%' then it shows me the result
but if i run MATCH(film_name) AGAINST ('rocket' IN BOOLEAN MODE) then it will not show any result. Please suggest what to do.
MATCH command allows only prefixed wildcards but not postfixed wilcards. Since single words are indexed, a postfix wildcard is impossible to manage in the usual way index does. You can't retrieve *vacation instantly from index because left characters are the most important part of index.
The answer is: You can't. The MATCH AGAINST operator matches words, not strings. There is a difference between the two. Also note that your example will also match vacation, the, vacation the or the something something vacation and other. You should read here what searches you can do.
You should stick to your first option with LIKE if you don't want word searches.

problem with '-' and '&' string in mysql

I have a search engine in PHP. When a search normally it's OK. Search text is 'company', and in the database there is 'company' in the field...
The problem is when the search text is &company or -company and the data is &company or -company there is no match. why?
problem with the - and & string...
Try putting your search terms in quotes. This should help mysql know you mean those characters literally in fulltext search:
SELECT * FROM tablename MATCH (company) AGAINST ('"&company"' IN BOOLEAN MODE)
SELECT * FROM tablename MATCH (company) AGAINST ('"-company"' IN BOOLEAN MODE)
If you are using FullText Search then the & and - are reserved characters. I could not find any nice solution to this problem. What I did is just remove the special character and run the full text search. For example if they are looking for At&t I run a search for "AT" "T", but if you have noise words At and A are in there and you will not get any results.
Another solution is to detect when they are requesting a special character and run a LIKE '%&Company%' search instead of a full text search, but this will affect the performance of the query.

Categories