In my web application there will be several users. and they have their own contents uploaded to my webapp. For each content they upload it has a title, description and tags(keywords). I can write a search script to search for content or user name. but they keywords when they have given with a spelling mistake it doesn't return any result. For example if there is a user named "Michael" in the database and the search query was "Micheal" i should get "Did you mean to search for 'Michael'" which is none other than a search suggestion.
Also this suggestion should be for the contents uploaded by the user. An user may keep their content's title as "Michael's activities May 2011" and suggestions should be generated for individual words.
You could use SOUNDEX to search for similar-sounding names, like that:
SELECT * FROM users WHERE SOUNDEX(name) = SOUNDEX(:input)
or like that
SELECT * FROM users WHERE name SOUNDS_LIKE :input
(which is completely equivalent)
Edit: if you need to use an algorithm other than Soundex, as Martin Hohenberg suggested, you would need to add an extra column to your table, called, for example, sound_equivalent. (This is actually a more efficient solution as this column can be indexed). The request would then be:
SELECT * FROM users WHERE sound_equivalent = :input_sound_equivalent
The content of the sound_equivalent column can then be generated with a PHP algorithm, and inserted in the table with the rest of user parameters.
You can also use the php library pspell to get suggestions if you have no search results.
Maybe create a database of the most common words (like: dog, house, city, numbers, water, internet). Don't need to make it big (<10000 words).
Then when you explode the search term, check the "word" database for words LIKE the search terms. Then just echo out the suggestions.
Related
I am supporting a public blog to which users could publish their posts. Some users have more than thousand different texts and they might not remember, that they have already published some text. I would like to help users not to publish duplicates.
Comparing texts for exact equality is not good - user might have changed text a little, or formatting, or copied from a different program, etc. So I need a quick estimate, if there is a similar text in existing database.
My technology stack includes PHP, MySQL and Redis. How can I solve my problem using those or other instruments?
PHP has a function called similar_text which you can use to calculate the amount of matching characters or the similarity in percent.
http://php.net/manual/en/function.similar-text.php
You could then check if the given text is within a certain margin of older blog posts.
If you don't want to check for similarity in text you could try to tag the posts based on tags of the original blog or subject of the blog. And then show the users the posts they made with similar tags.
You can use MySQL's match - against in a full text indexed column.
As an example:
SELECT table.*,
MATCH(userText) AGAINST ('this is user input') AS relevancy
FROM table
ORDER BY relevancy DESC;
So this will give you results ordered by relevancy.
Don't forget to add full text index on column userText.
I have a search on my website and im trying to show a list of the most popular search terms on my site it sort of works but it isn't matching the strings close enough.
This is what i'm currently using:
$sql = "SELECT * FROM db WHERE r_name LIKE '%".$searchname."%' OR r_number like '%".$searchname."%'
However if a user searches for say Game Name and another searches for Game Name (Reviews) it will add 2 entries into my database, Is there a way to do a similarity test before entering the entry ?
yes, there is a way but it requires you to fiddle with it a little. there is a search called
SOUNDEX, which will match things that are very close. It might not be a perfect solution to your question, but it is definitely something that might get you started in the right direction.
SELECT * FROM db WHERE SOUNDEX( db.r_name ) LIKE SOUNDEX( '{$searchname}' );
I believe that if you have an entry 'lowercasexd' , and do soundex like('what is lowercasexd?'), it will find the entry that's associated with 'lowercasexd'.
Be aware that this type of search take a little while to run compare to '=' searches on indexed databases(on my database it does about 5-6k entries per second) so it is NOT recommended for anything big. If you want a near-perfect solution, I suggest you read about google's search mechanism, and look up some search engine source code if your project is significant enough.
Well i currently want to do a search engine with SQL and PHP. I crrently use the following query:
SELECT * FROM info WHERE name LIKE '%$q%' LIMIT 10
But i want to select the info with 'name' that start with $q, not the ones that cointain $q.
Simply remove the first wildcard (%):
SELECT * FROM info WHERE name LIKE 'X%' LIMIT 10
If you are wanting to search fields that have multiple names and you want to search for a name that might be in the middle of the field, then it might be better to use Full Text Search (FTS). For example, if the name field in the OP refers to a full name and could contain "John Doe" and you want to search for "Doe", then FTS is probably going to be better (more accurate and faster) than using a LIKE operator. With FTS, the individual words in text are indexed and so searches can be very fast, plus it allows for more complex searches such as the ability to find two words (or names) in a field that are not adjacent (to pick one example at random).
The specific database vendor is not mentioned in the OP, but if FTS is supported, then it might be a good choice for what you are attempting. Some FTS information for SQL Server and for MySQL is easily found with a search.
I want to create an autosuggest for a fulltext search with AJAX, PHP & MySQL.
I am looking for the right way to implement the backend. While the user is typing, the input field should give him suggests. Suggests should be generated from text entrys in a table.
Some information for this entrys: They are stored in fulltext, generated from PDF with 3-4 pages each. There not more than 100 entrys for now and will reach a maximum of 2000 in the next few years.
If the user starts to type, the word he is typing should be completed with a word which is stored in the DB, sorted by occurrences descending. Next step is to suggest combinations with other words, witch have a high occurrence in the entrys matching the first word. Surely you can compare it to Google autosuggest.
I am thinking about 3 different ways to implement this:
Generate an index via cronjob, witch counts occurrences of words and combinations over night. The user searches on this index.
I do a live search within the entrys with an 'LIKE "%search%"' function. Then I look for the word after the this and GROUP them by occurrence.
I create a logfile for all user searches, and look for good combinations like in 1), so the search gets more intelligent with each search action.
What is the best way to start with this? The search should be fast and performant.
Is there a better possibility I did not think about?
I'd use mysql's MATCH() AGAINST() (http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html), eg:
SELECT *
FROM table
WHERE MATCH(column) AGAINST('search')
ORDER BY MATCH(column) AGAINST('search')
Another advantage is that you could further tweak the importance of words being searched for (if neccessary), like:
MATCH(column) AGAINST('>important <lessimportant') IN BOOLEAN MODE
Or say that certain words of the search term are to be required, whilst others may not be present in the result, eg:
MATCH(column) AGAINST('+required -prohibited') IN BOOLEAN MODE
I think, the idea no 1 is the best. By the way, dont't forget to eliminate stopwords from autosuggest (an, the, by, ...).
I'm making a search engine which creates a query depending on what's chosen on the search page. Since the query is limited I can not just include anything in it, which is where I have to do the rest with IF clauses and query while-loops. So, there's no age field in the table, but a birthdate field, so I use an IF function to check if the age is correct and then prints out the username(s) in a while loop.
This works alright, but I also need to add two more fields into the search which are as follows: County and City, right now working on the County selection of the page. What I can't figure out by myself is the logic behind how I'm supposed to manage to print out the users that fit all the required fields without having 1000 IF ELSE's.
I thought of SELECTing and filtering out all the correct zip codes to the county/region chosen, then put it in an array, and then validate the output query while-loop against it, but that didn't work so well either.
In my database I have 3 tables which look like this:
county_table
id, name_of_county
municipial_table
id, county_id, municipial_name
zip_code_table
zip, zip_place_name, municipial_id
These are pre-made for my country. So, given the zip code of the user, I will have to do two different SELECT queries to connect it to the county_table (zip->municipial->county).
So basically, what I'm trying to say; I want the search engine to output the users that have the correct data, this depending if the age, region and city fields are selected. They need to be independent and not like:
if($age>X){
if($county==Y){
if($city==Z){
-OUTPUT RESULTS HERE-
} } }
Now, the problem with this is: What if one of the fields are not requested in the search? Say, the age? The county number? The city? I think what I need are non-nested and independent IF blocks, but I'm not sure how to set it up correctly.
Help very much appreciated, thank you alot.
My re-inventing the wheel? Unless you're making a search engine for educational purpose, use something that's already been tested and optimized. See this related question.
.