PHP/SQL Check if a similar entry already exists in the database

PHP/SQL Check if a similar entry already exists in the database - php

I have a search on my website and im trying to show a list of the most popular search terms on my site it sort of works but it isn't matching the strings close enough.
This is what i'm currently using:
$sql = "SELECT * FROM db WHERE r_name LIKE '%".$searchname."%' OR r_number like '%".$searchname."%'
However if a user searches for say Game Name and another searches for Game Name (Reviews) it will add 2 entries into my database, Is there a way to do a similarity test before entering the entry ?

yes, there is a way but it requires you to fiddle with it a little. there is a search called
SOUNDEX, which will match things that are very close. It might not be a perfect solution to your question, but it is definitely something that might get you started in the right direction.
SELECT * FROM db WHERE SOUNDEX( db.r_name ) LIKE SOUNDEX( '{$searchname}' );
I believe that if you have an entry 'lowercasexd' , and do soundex like('what is lowercasexd?'), it will find the entry that's associated with 'lowercasexd'.
Be aware that this type of search take a little while to run compare to '=' searches on indexed databases(on my database it does about 5-6k entries per second) so it is NOT recommended for anything big. If you want a near-perfect solution, I suggest you read about google's search mechanism, and look up some search engine source code if your project is significant enough.

Related

faster way for Search in multiple databases

I am working on big eCommerce shopping website. I have around 40 databases. i want to create search page which show 18 result after searching by title in all databases.
(SELECT id_no,offers,image,title,mrp,store from db1.table1 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
UNION ALL (SELECT id_no,offers,image,title,mrp,store from db3.table3 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
UNION ALL (SELECT id_no,offers,image,title,mrp,store from db2.table2 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
LIMIT 18
currently i am using the above query its working fine for 4 or more character keyword search like laptop nokia etc but takes 10-15 sec for processes but for query with keyword less than 3 characters it takes 30-40sec or i end up with 500 internal server error. Is there any optimized way for searching in multiple databases. I generated two index primary and full text index with title
Currently my search page is in php i am ready to code in python or any
other language if i gets good speed

You can use the sphixmachine:http://sphinxsearch.com/. This is powerfull search for database. IMHO Sphinx this best decision
for search in your site.

FULLTEXT is not configured (by default) for searching for words less than three characters in length. You can configure that to handle shorter words by setting a ...min_token_size parameter. Read this. https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html You can only do this if you control the MySQL server. It won't be possible on shared hosting. Try this.
FULLTEXT is designed to produce more false-positive matches than false-negative matches. It's generally most useful for populating dropdown picklists like the ones under the location field of a browser. That is, it requires some human interaction to choose the correct record. To expect FULLTEXT to be able to do absolutely correct searches is probably a bad idea.
You simply cannot use AND column LIKE '%whatever%' if you want any reasonable performance at all. You must get rid of that. You might be able to rewrite your python program to do something different when the search term is one or two letters, and thereby avoid many, but not all, LIKE '%a%' and LIKE '%ab%' operations. If you go this route, create ordinary indexes on your title columns. Whatever you do, don't combine the FULLTEXT and LIKE searches in a single query.
If this were my project I'd consider using a special table with columns like this to hold all the short words from the title column in every row of each table.
id_pk INT autoincrement
id_no INT
word VARCHAR(3)
Then you can use a query like this to look up short words
SELECT a.id_no,offers,image,title,mrp,store
FROM db1.table1 a
JOIN db1.table1_shortwords s ON a.id_no = s.id_no
WHERE s.word = '$searchkey'
To do this, you will have to preprocess the title columns of your other tables to populate the shortwords tables, and put an index on the word column. This will be fast, but it will require a special-purpose program to do the preprocessing.
Having to search multiple tables with your UNION ALL operation is a performance problem. You will be able to improve performance dramatically by redesigning your schema so you need search only one table.
Having to search databases on different server machines is a performance problem. You may be able to rig up your python program to search them in parallel: that is, to somehow use separate tasks to search each one, then aggregate the results. Each of those separate search tasks requires its own connection to the data base, so this is not a cheap or simple solution.
If this system faces the public web, you will have to redesign it sooner or later, because it will never perform well enough as it is now. (Sorry to be the bearer of bad news.) Many system designers like to avoid redesigning systems after they become enormous. So, if I were you I would get the redesign done.

If your focus is on searching, then bend the schema to facilitate searching rather than the other way around.
Collect all the strings to search for in a single table. Whereas a UNION of 40 tables does work, it will be ~40 times as slow as having the strings collected together.
Use FULLTEXT when the words are long enough, use some other technique when they are not. (This addresses your 3-char problem; see also the Answer discussing innodb_ft_min_token_size. You are using InnoDB, correct?)
Use + and boolean mode to say that a word is mandatory: MATCH(col) AGAINST("+term" IN BOOLEAN MODE)
Do not add on a LIKE clause unless there is a good reason.

performance issue from 5 queries in one page

As i am a junior PHP Developer growing day by day stuck in a performance problem described here:
I am making a search engine in PHP ,my database has one table with 41 column and million's of rows obviously it is a very large dataset. In index.php i have a form for searching data.When user enters search keyword and hit submit the action is on search.php with results.The query is like this.
SELECT * FROM TABLE WHERE product_description LIKE '%mobile%' ORDER BY id ASC LIMIT 10
This is the first query.After result shows i have to run 4 other query like this:
SELECT DISTINCT(weight_u) as weight from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country_unit) as country_unit from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country) as country from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(hs_code) as hscode from TABLE WHERE product_description LIKE '%mobile%'
These queries are for FILTERS ,the problem is this when i submit search button ,all queries are running simultaneously at the cost of Performance issue,its very slow.
Is there any other method to fetch weight,country,country_unit,hs_code speeder or how can achieve it.
The same functionality is implemented here,Where the filter bar comes after table is filled with data,How i can achieve it .Please help
Full Functionality implemented here.
I have tried to explain my full problem ,if there is any mistake please let me know i will improve the question,i am also new to stackoverflow.

Firstly - are you sure this code is working as you expect it? The first query retrieves 10 records matching your search term. Those records might have duplicate weight_u, country_unit, country or hs_code values, so when you then execute the next 4 queries for your filter, it's entirely possible that you will get values back which are not in the first query, so the filter might not make sense.
if that's true, I would create the filter values in your client code (PHP)- finding the unique values in 10 records is going to be quick and easy, and reduces the number of database round trips.
Finally, the biggest improvement you can make is to use MySQL's fulltext searching features. The reason your app is slow is because your search terms cannot use an index - you're wild-carding the start as well as the end. It's like searching the phonebook for people whose name contains "ishra" - you have to look at every record to check for a match. Fulltext search indexes are designed for this - they also help with fuzzy matching.

I'll give you some tips that will show useful in many situations when querying a large dataset, or mostly any dataset.
If you can list the fields you want instead of querying for '*' is a better practice. The weight of this increases as you have more columns and more rows.
Always try to use the PK's to look for the data. The more specific the filter, the less it will cost.
An index in this kind of situation would come pretty handy, as it will make the search more agile.
LIKE queries are generally pretty slow and resource heavy, and more in your situation. So again, the more specific you are, the better it will get.
Also add, that if you just want to retrieve data from this tables again and again, maybe a VIEW would fit nicely.
Those are just some tips that came to my mind to ease your problem.
Hope it helps.

MYSQL like + group

Well I'm having a problem mainly caused by bad structure in database. I'm coding this for a company whose code is quite messy and the table is very large so I don't think it's an option to fix the structure.
Anyway, my issue is that I'm trying to somehow group a value that won't be alone in the string...
They are storing values separated with commas... So it would be like
field: "category" value: 'var1, var2, var3'
And I will search using this query:
SELECT name, category
FROM companies
WHERE (MATCH(name, category) AGAINST ('$search' IN BOOLEAN MODE)
OR category LIKE '$search%')
It would match with for example var2 (it's not limited to 3 variables though, can be solo or many more) and I'd split it manually in PHP, no problem. Although I will not get enough matches, I want e.g. 10 matches by different searches. To be more specific I'm making an autosuggest feature, which means I will for example want to match "moto%" with motorbike, motor alone or whatever but I keep getting the same values, like there'd be a couple of 100 of results that contains "motorbike" and I don't know how to filter them, as I'm not able to use GROUP BY due to bad db structure...
I found this: T-SQL - GROUP BY with LIKE - is this possible?
It SEEMED as something that would be a solution, but as far as I've tried I could not get it work with what I wanted.
So I'm wondering which solutions there are... If there are ABSOLUTELY no way of working this around I might probably have to fix the db structure (but this really has to be the last option)

Start taking steps to make database structure proper. Make an extra table and fill it with split values.
Then you can use proper queries to select the data you need. Both you and next developer will have less troubles with this project in the future, not mentioning queries speed gain.

I am not sure why i cannot write a comment, but maybe you can try this:
SELECT name, category FROM companies WHERE category LIKE '$search%' or LOCATE('search', category)>0;
That would look if in category appears any of your 'search' value.

I would have to agree that you should make the database right. It'll save you much trouble and time later. However, using SELECT DISTINCT may fix your immediate issue.

How can I search for multiple terms in multiple table columns?

I have a table that lists people and all their contact info. I want for users to be able to perform an intelligent search on the table by simply typing in some stuff and getting back results where each term they entered matches at least one of the columns in the table. To start I have made a query like
SELECT * FROM contacts WHERE
firstname LIKE '%Bob%'
OR lastname LIKE '%Bob%'
OR phone LIKE '%Bob%' OR
...
But now I realize that that will completely fail on something as simple as 'Bob Jenkins' because it is not smart enough to search for the first an last name separately. What I need to do is split up the the search terms and search for them individually and then intersect the results from each term somehow. At least that seems like the solution to me. But what is the best way to go about it?
I have heard about fulltext and MATCH()...AGAINST() but that sounds like a rather fuzzy search and I don't know how much work it is to set up. I would like precise yes or no results with reasonable performance. The search needs to be done on about 20 columns by 120,000 rows. Hopefully users wouldn't type in more than two or three terms.
Oh sorry, I forgot to mention I am using MySQL (and PHP).
I just figured out fulltext search and it is a cool option to consider (is there a way to adjust how strict it is? LIMIT would just chop of the results regardless of how well it matched). But this requires a fulltext index and my website is using a view and you can't index a view right? So...

I would suggest using MATCH / AGAINST. Full-text searches are more advanced searches, more like Google's, less elementary.
It can match across multiple tables and rank them to how many matches they have.
Otherwise, if the word is there at all, esp. across multiple tables, you have no ranking. You can do ranking server-side, but that is going to take more programming/time.
Depending on what database you're using, the ability to do cross columns can become more or less difficult. You probably don't want to do 20 JOINs as that will be a very slow query.
There are also engines such as Sphinx and Lucene dedicated to do these types of searches.

BOOLEAN MODE
SELECT * FROM contacts WHERE
MATCH(firstname,lastname,email,webpage,country,city,street...)
AGAINST('+bob +jenkins' IN BOOLEAN MODE)
Boolean mode is very powerful. It might even fulfil all my needs. I will have to do some testing. By placing + in front of the search terms those terms become required. (The row must match 'bob' AND 'jenkins' instead of 'bob' OR 'jenkins'). This mode even works on non-indexed columns, and thus I can use it on a view although it will be slower (that is what I need to test). One final problem I had was that it wasn't matching partial search terms, so 'bob' wouldn't find 'bobby' for example. The usual % wildcard doesn't work, instead you use an asterisk *.

Database structure for saving search results

I currently work for a social networking website.
My boss recently had the idea to show search results by random instead of normal results (registration date). The problem with that is simple and obvious: if you go from one page to another, it's going to show you different results each time as the list is randomized each time.
I had the idea to store results in database+cookies something like this:
Cookie containing a serialized version of the $_POST request (needed if we want to do a re-sort)
A table which would serve as the base for the search id => searches (id,user_id, creation_date)
A table which would store the results and their order => searches_results (search_id, order, user_id)
Flow chart would look like something like that:
After each searches I store the "where" into a cookie or session
Then I erase the previous search in "searches"
Then I delete previous results in "searches_results"
Then I insert a row into "searches" for the key
Then I insert each user row into "searches_results"
And finally I redirect the user to somethink like ?search_id=[search_key]
There is a big flaw here : performances .... it is definetly possible to make the system OR down OR very slow.
Any idea what would be the best to structure this ?

What if instead of ordering randomly, you ordered by some function where the order is known and repeatable, just non-obvious? You could seed such a function with some data from the search query to make it be even less obvious that it repeats. This way, you can page back and forth through your results and always get what you expect. Music players use this sort of function for their shuffle feature (so that if you click back, you get the previous song, and if you click next again, you're back where you started). I'm sure you can divine some function to accomplish this... bitwise XORing ID values with some constant (from the query) and then sorting by the resulting number might be sufficient. I chose XOR arbitrarily because it's a trivially simple function that will get you repeatable and non-obvious results.

Hum maybe, but doesn't the xor operator only say if it is an OR exclusive ? I mean, there is no mathematical operation here, as far as I know of tho.

Sorry, I know this doesn't help, but I don't understand why your boss would want this?
I know that if I search for a person on a social network, then I want the results to be ordered by relevance and relevance only. I think that randomized results would just be frustrating for the user, but maybe that's just me.
For example, if I search for "John Smith", then first first batch of results better be people named "John Smith". Then show me similar names near the end of the results. I don't want to search for "John Smith" and get "Jon Smithers" as my second result.

Well, I'm with Matt in asking "Why?"
I think rmeador has a good suggestion as well. You could randomly sort by a different field or some sort of algorithm. Just from the permutations of DESC / ASC on last updated or some other result field.
Other option would be to do an initial search the first time and return only related ID's and then store the full ID's string in the database and each subsequent page is then a lookup against those ID's.
My two cents.
I can see a scenario where a randomized result set is useful but not for searching but for browsing profiles or artists or local events. It offers more exposure to those that wouldn't show up in a traditionally directed search.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.