How to PHP Search when miss one character? - php

I'm finding a solution for search. There are few product with name:
USB Kingston 8GB
USB Kingmax 8GB
USB Transcend 8GB
USB Sandisk 4GB
I'm using mysql database, I've tried FullText Search.
SELECT * FROM PRODUCTS WHERE MATCH('productName') AGAINST ('usb 8g').
and also sphinx but i did't get any results when type "usb 8g". But "usb 8gb", it's worked.
And I also need when user type 'ubs 8gb', it's will return correct results too.
Any solution to auto-recognize like Google ?

You have to use wildcard character % to match part of data string.
Have not tested, but should work like:
SELECT * FROM PRODUCTS WHERE MATCH('productName') AGAINST ('usb 8g%')
P.S. Please sanitize user input before sending to SQL statement.

On Sphinx for this specific situation, using say min_prefix_len=2 and expand_keywords=1 would work. This makes part word matches possible. Ie so that '8g' will match '8gb', in effect the query becomes '8g*'. There is also a wildcard on the end of 'usb' as in its also matching 'usb*' - that shouldnt really affect anything,as unlikely yo have many other words beginning with those chars.
Ultimately its a tradeoff, on how 'fuzzy' to make the search, as this could introduce all sorts of side effects. Difficult to think of a good example, but something like searching for 'case' would then match 'casebook'. But case and casebook at compeltely different things.

Related

MongoDb Full text search eats memory when i search on com and other small words

Thank you for reading this, i have an collection with full text th index size is of the index is 809.7MB (Mongo Compass) but when i search for com or other small words the memory is full (8GB memory).
Its a sharding.
Does anyone know why this is?
what are your indexes? small words sounds like they are not the first, left most characters of the field...you have a wild card in front of the word?...if so it is a very inefficient search...
if I understand; your text search then must touch every document.
perhaps you have no alternative but the way to do a faster query is to:
a. match to the index
b. text search on the beginning letters i.e. ^ symbol as search the first letters is much more efficient than searching anywhere in the string...
if this is not possible, and text searching is going to be a major component of your application you would consider some strategies:
* create key search words as part of the data input that can be used by the text query process
* delimit the pool of possible docs in some way perhaps a date range, topic, etc - - ultimately you probably would want to index on these and include them in your text query.

PHP: searching with search terms for similar text on webpage

I'm busy with a program that needs to find similar text on a webpage. In SQL we have 400.000 search terms. For example, the search terms can be ‘San Miguel Pale Pilsen’, ‘Schaumburger Bali’ and ‘Rizmajer Cortez’.
Now I'm checking each word on the webpage in the database. For each word on the webpage I send a select query with a %like% operator. For each result I use similar text with php. If the word and the search term aren’t equal to the amount of words in it, it will get some extra words of the webpage to make it equal.
(And yes I know that it isn’t smart)
The problem is it takes a lot of time and server must work hard for it.
What is the best and fastest way to find similar text on a webpage?
The LIKE operator will be always slow if you start the pattern with a % wild card. This happens since you are negating the ability of MariaDB to use any indexing.
Considering you need to find words in any location of the VARCHAR column the best solution is to implement bona fide Full Text Search. See MariaDB's Full-Text Index Overview.
Searches will become orders of magnitude faster, not to mention scalability.

PHP/SQL: Multiple fuzzy keyword search based on likeness (Advanced SQL Search)

Current Situation:
I am currently running a keyword search using multiple keywords in PHP and SQL. The field I'm applying the search to is the title field, which is a 250 VARCHAR field.
A user can input a single keyword, e.g. "apple" or also multiple, e.g. "apple banana yellow". The first option is trivial. For the second option, my current algorithm works like this:
Try and find items that match the exact entire string "apple banana yellow" in the title. Order the results by index id.
If no more results matching the exact entire string are found, or if none are found in the first place, search for all titles containing either "apple", "banana", or "yellow". Order the results by index id.
The algorithm is very basic but funny enough works pretty well.
What I'm looking for:
However I am now looking to implement a smarter search algorithm without having to rely on external paid scripts like Amazon services. I'm looking for a way to implement the following:
fuzzy search (I've read about SOUNDEX or levenshtein which may realize this)
smarter keyword search (Don't just either return items that match ALL words or JUST A SINGLE WORD, but maybe also 2 words or 3 words before)
order by relevance/likeness (Order by likeness of the search to the title, and not just the index id)
(Bonus: maybe even implement search for exact strings, like using " " on google to find exactly the words between the quotation marks)
What is the best way to get started with such a search? I am using InnoDB for MySQL.
Assuming MySQL, you can add a FULL Text index. Then, there are a number of functions that will allow you to so basic searches that meet all the needs you list: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
You end up using syntax like:
SELECT * FROM table_name WHERE MATCH(column_with_fulltext_index_on_it)
AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE)
To see the match score
SELECT column_with_fulltext_index_on_it, MATCH(column_with_fulltext_index_on_it)
AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE) AS score FROM table_name WHERE MATCH(column_with_fulltext_index_on_it)
AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE)
It can be a little learning curve to overcome to understand how you can tweak the match clause perfect for your needs, but your examples seem pretty basic though (except the smarter search).
Also, good to note, there are system configs you need to control the the min/max characters of words/tokens to index by. You can read https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html to get deeper understanding of indexing options. Percona is a good resource as well https://www.percona.com/blog/2013/02/26/myisam-vs-innodb-full-text-search-in-mysql-5-6-part-1/ (typically more human digestible than the MySQL Doc's).
If you need to do more complex searches, you can look at adding other technologies like Solr, but I've always recommended, get the basic working with what you got, only adopt a new tech if you hit a brick wall, or have good metric on existing solution and know the new tech will somehow improve (speed, storage space, quality of results, etc...). If you can't quantify, stick to basic until you can.
Here's a good tutorial: http://www.w3resource.com/mysql/mysql-full-text-search-functions.php

Does neo4jphp(or REST) support search queries with both white spaces and wildcards

Does neo4jphp(REST)(indexing on Lucene), support search queries with both white spaces and wildcards.
Actually I am running the following query:
$testindex->query(name:"jim grand udu*"); //here lucene indexes neo4j nodes by property "name"
but it does not match anything, even if exact matches are available. It seems the * here loses its power and is just passed as a string. If i use it with a single word term it works.
$testindex->query(name:jim*); //This Works
It seems * loses its meaning inside quotes but white spaces dont work unless i use quotes - So seems they cant be used together. Any help would be appreciated, i cant find the solution in the documentation of neo4jphp or neo4j's REST, I know this is possible in lucene and neo4j javaapi using wildcardquery. Thanks!
Try this (it works for me on Neo4j - 2.0.0-M06)
GET http://localhost:7474/db/data/index/node/node_auto_index?query=name:Ke*nu~%20AND%20name:R*ves~
i.e. name:[first_string] AND name:[second_string].....
Search for Apache Lucene - Query Parser Syntax for more details.

Need suggestion of alternative to Fulltext search

I am in need of a lightweight fast search solution.
Today I use Fulltext in boolean mode, where every searchword is mandatory in the results.
The function is fast, working and meets the requirements.
BUT some of the fulltext limitations, http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html, have appeared to be a problem. The site is on a hosted server and Im not allowed to change the mysql settings (e.g. minimum lenght)
E.g.
the search must be able to find red, 11 and ab.cdwhich todays full text solution can't.
http://sphinxsearch.com/ is what you're looking for
though you have to understand that smaller words you find the bigger indexes you use.
Use Lucene, it's very often implemented with MySQL and it'll be both faster and more featureful.
Using the built-in FTS engine is relatively bad practice, especially since it doesn't work with the slightly more reliable InnoDB engine.
The only thing that would come to mind, would to be basing your search off the number of occurrences you can find. Your actual index method could vary, depending on what the DB offers.
Assuming DB size isn't an issue, a (very) basic approach would be to break the search blobs (say, a post on stackoverflow) into each word, normalize it (remove plurals, strip 'logic' words such as and, etc.) then insert each word as a new record, together with the ID that identifies your indexed resource.
Count the instances of the ID, order by count, higher number = more relevant.
Not exactly my field though, so tred carefully! =]
I'd recommend you try distance searching: Levenshtein
Or search for "N-gram fulltext indexing".
I haven't mucked around with it, but I read the theory of full text searching (with mysql at least) a little while back.
If memory serves me correctly you can use full text search for what you want, but you need to configure (and I think a recompile) to get it to work on smaller number of search characters. I think it is set to a default number of 4 characters. You'll want to change it to 2 characters in length with a few other options thrown in and test the results you get.
Someone correct me if this is incorrect. I would rather not throw him on a red herring.

Categories