I'm using solr to search my data, and I used to search with the query like
title:*something here*
And it is working fine without spaces. But if I search like
seva sa
Even I have below
seva samithi
It is searching with either "seva" or "sa". Can anyone suggest me to do a proper way to search in solr.
You're performing a wildcard search. Wildcard searches are not analyzed, so they bypass the regular analysis chain - the only thing that will match "*sema sa*" is a single token that contain the whole string in the exact case. That would probably be either a StrField or a field indexed with a KeywordTokenizer instead of the regular tokenizer.
A better solution if you want to match any content within one of the words might be to use a ngramfilter, so that each token gets indexed in its shorter forms.
Use (seva sa) assuming you have OR some default operator
Try these tutorial
http://www.solrtutorial.com/solr-query-syntax.html
From what I saw from your query you are using a wild card, there are ways to search on solr because by default it uses a fuzzy system to search that is why it is ending up the result you are posting.
Related
I'm finding a solution for search. There are few product with name:
USB Kingston 8GB
USB Kingmax 8GB
USB Transcend 8GB
USB Sandisk 4GB
I'm using mysql database, I've tried FullText Search.
SELECT * FROM PRODUCTS WHERE MATCH('productName') AGAINST ('usb 8g').
and also sphinx but i did't get any results when type "usb 8g". But "usb 8gb", it's worked.
And I also need when user type 'ubs 8gb', it's will return correct results too.
Any solution to auto-recognize like Google ?
You have to use wildcard character % to match part of data string.
Have not tested, but should work like:
SELECT * FROM PRODUCTS WHERE MATCH('productName') AGAINST ('usb 8g%')
P.S. Please sanitize user input before sending to SQL statement.
On Sphinx for this specific situation, using say min_prefix_len=2 and expand_keywords=1 would work. This makes part word matches possible. Ie so that '8g' will match '8gb', in effect the query becomes '8g*'. There is also a wildcard on the end of 'usb' as in its also matching 'usb*' - that shouldnt really affect anything,as unlikely yo have many other words beginning with those chars.
Ultimately its a tradeoff, on how 'fuzzy' to make the search, as this could introduce all sorts of side effects. Difficult to think of a good example, but something like searching for 'case' would then match 'casebook'. But case and casebook at compeltely different things.
I am needing to query my database to find records that fall between 2 dates (this is simple enough) however I need to refine the search so that it only finds records where the email falls within certain constraints, basically I need to delete any row that falls between 2 dates and has a format of
x.xxxxxXXXXX#xxxxxxxx.xxx
basically I need to look for email address that start with a letter followed by full stop and have 5 numbers before the # sign. Is this possible with mySQL and if so how, and if not how could I search for these email address with PHP?
You need to use regular expressions. MySQL 5.1 supports these: documentation page. This also can be done in PHP using preg_match.
You regurlar expression could look like: [a-zA-Z]\.[a-zA-Z]+[0-9]{5}#.+
You could also use like, if you find it useful. Example
LIKE '_.__________#______.___'
However, it will not detect numbers.
In that case you have to use regex
docs
Regex should be like this: (changed from example above)
[a-zA-Z]{1}\.[a-zA-Z]+[0-9]{5}#([_a-z0-9\-])\.[a-zA-Z]
How do you do so that when you search for "alien vs predator" you also get results with the string "alienS vs predator" with the "S"
example http://www.torrentz.com/search?q=alien+vs+predator
how have they implemented this?
is this advanced search engine stuff?
This is known as Word Stemming. When the text is indexed, words are "stemmed" to their "roots". So fighting becomes fight, skiing becomes ski, runs becomes run, etc. The same thing is done to the text that a user enters at search time, so when the search terms are compared to the values in the index, they match.
The Lucene project supports this. I wouldn't consider it an advanced feature. Especially with the expectations that Google has set.
Checking for plurals is a form of stemming. Stemming is a common feature of search engines and other text matching. See the wikipedia page: http://en.wikipedia.org/wiki/Stemming for a host of algorithms to perform stemming.
Typically when one sets up a search engine to search for text, one will construct a query that's something like:
SELECT * FROM TBLMOVIES WHERE NAME LIKE '%ALIEN%'
This means that the substring ALIEN can appear anywhere in the NAME field, so you'll get back strings like ALIENS.
When words are indexed they are indexed by root form. For example for "aliens", "alien", "alien's", "aliens'" are all stored as "alien".
And when words are search search engine also searches only the root form "alien".
This is often called as Porter Stemming Algorithm. You can download its realization for your favorite language here - http://tartarus.org/~martin/PorterStemmer/
This is a basic feature of a search engine, rather than just a program that matches your query with a set of pre-defined results.
If you have the time, this is a great read, all about different algorithms, and how they are implemented.
You could try using soundex() as a fuzzy match on your strings. If you save the soundex with the title then compare that index vs a substring using LIKE 'XXX%' you should have a decent match. The higher the substring count the closer they will match.
see docs: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex
I have a particular problem and need to know the best way to go about solving it.
I have a php string that can contain a number of keywords (tags actually). For example:-
"seo, adwords, google"
or
"web development, community building, web design"
I want to create a pool of keywords that are related, so all seo, online marketing related keywords or all web development related keywords.
I want to check the keyword / tag string against these pools of keywords and if for example seo or adwords is contained within the keyword string it is matched against the keyword pool for online marketing and a particular piece of content is served.
I wish to know the best way of coding this. I'm guessing some kind of hash table or array but not sure the best way to approach it.
Any ideas?
Thanks
Jonathan
Three approaches come to my mind, although I'm sure there could be more. Of course in any case I would store the values in a database table (or config file, or whatever depending on your application) so it can be edited easily.
1) Easiest: Convert the list into a regular expression of the form "keyword1|keyword2|keyword3" and see if the input matches.
2) Medium: Add the words to a hashtable, then split the input into words (you may have to use regular expression replacing to remove punctuation) and try to find each word of input in the hashtable.
3) Hardest: This may not work depending on your exact situation, but if all the possible content can be indexed by a search solution (like Apache SOLR, for example) then your list of keywords could be used as a search string and you could return results above a particular level of relevance.
It's hard to know exactly which solution would work best without knowing more about your source data. A large number of keywords may jam up a regular expression, but if it's a short list then it might work great. If your inputs are long then #2 won't work so well because you have to test each and every input word. As always your mileage may vary, so I would start with the easiest solution I thought would work and see if the performance is acceptable.
I am using MySQL fulltext and PHP (codeigniter) to search a database containing RSS items. Problem is some of these items's titles use underscores instead of spaces. Since MySQL considers underscores as part of a word, these items will never be matched in the search, unless the user types the exact title including underscores.
Server is shared so I don't have access to MySQL Server System Variables.
Can this behavior be changed in some other way?
Can this maybe be done through the search query itself?
I know I could just replace all underscore occurrences in the DB by spaces, but this would compromise the original integrity of those titles though. Just wondering if there's another way of doing this.
I know I could just replace all underscore occurrences in the DB by spaces, but this would compromise the original integrity of those titles though. Just wondering if there's another way of doing this.
You can instead of replacing underscores in original title field, use a separate field dedicated to fulltext searches.
This allows you to replace underscores, plus aggregates keywords into this field (category names, authors, tags, etc.) to enhance search results relevance.
We used this a lot of times with success for getting rid of HTML tags in content infering with search
I don't think this can be done without access to the server. The only way I have ever seen to do it is the first comment on this mySQL manual page ("How I added '-' to the list of word characters"). It requires stopping the server and changing internal configuration.
Your best bet is probably creating a second column with removed underscores, and to search that.