I am searching for string in mysql database in php,which i m doing with like keyword.The problem is when i am having synonyms entered from user.Like in the database i m having all the products with the name shoes,and user enters footwear or some mismatched words.how to search for these different conditions..
Please guide on this.
Currently i m using following select query
Select * from table where name like '%user entered string%'
Please guide on how to tackle these conditions
Let's sum it up.
Let's assume that you have a table products which, I highly assume and hope, consists of products with unique ids.
products
id product
1 Shoes
2 Trousers
And so on. If you were to add an another column, let's say, synonyms, it would look something like this:
products
id product synonyms
1 Shoes Footwear, stuff on feet,
2 Trousers
We don't want this. You'd have to select the synonyms column, parse the string and make sure you don't ruin the column when you somehow wish to add new synonyms to each product.
It makes much more sense to have an atomic database where you'd have a table of synonyms where each synonym references a unique id in your products table, this way it's easy to delete old synonyms and add new ones.
products
id product
1 Shoes
2 Trousers
synonyms
id product_id synonym
1 1 Footwear
2 1 stuff on feet
You can then look up in this table if the original select like statement fails.
Using an external data-source is also a possibility but this is probably the most suitable way to go if you want to control the flow and avoid external sources.
For me, the best way is , add all match possibility keywords (shoes, footwear, men dress and etc.) to all products (to keywords column) when you add to base , which user will be search. And search products from this column(or table).
Related
I am working on a website which hold millions of records now (apologies cannot reveal which site) initially it had few hundred records so the query below was acceptable
Query: SELECT * FROM….WHERE category LIKE ‘%,3,%’;
But now it just kills the database as for each query it has to go through the entire 2Mil records with above query
Category table
ID NAME
1 Female
2 Fashion
3 Clothing
4 Accessories
5 Top
6 Dress
7 Earring
8 Short dress
9 Long dress
10 Male
Product table
ID…..Category….other bits
1 ,1,2,3,6,9, ……
2 ,1,2,4,7,
3 ,1,2,3,5,
4 ,10,2,3,4,
you have the picture as what is happening above. Now if I do FullText index on category row in product table it gives only 1 cardinality :(
How can I overcome this?
I have considered duplicating row with each category but the database is huge currently 2 GIG and with duplicates it will turn roughly 10 GIG… more like a problem then a solution
Keep in mind that storing numbers as strings takes about twice as many bytes per digit as storing numbers as integers. Plus all those commas.
So if you're concerned about space, it won't be as much expansion as you fear to store the data in a normalized fashion.
And it will allow you to write proper queries that take advantage of indexes. So if there is some expansion, you will have traded a little bit of storage space for a big improvement in speed.
Tip: if you're using InnoDB, the primary key doesn't cost any storage because the table itself is stored as the primary key index. You should define your normalized table with the category id first and then the product id second, if you need to optimize for searches by category.
CREATE TABLE CategoryProduct (
categoryid INT,
productid INT,
PRIMARY KEY (categoryid, productid)
);
See also my answer to Is storing a delimited list in a database column really that bad? for more disadvantages to using comma-separated lists.
I would consider a new table, say Product_Category (unimaginative I know) where each row contains a column for a Foreign Key (FK) relation to the Product.id and a column for the category.
The category column can probably be a TINYINT which would only require 1 byte to store while I guess the FK column would be the same as the Product.id column (probably INT - 4 bytes), you could then index both columns so you can either find out which categories a product belongs to as well as which products belong in a category. Also, this table wouldn't need to have a Primary Key (i.e. id), saving you an an extra 4 bytes.
(see MySQL Data Type Storage Requirements)
With this solution each row in this new database would take up about 5 bytes. Since each character in the sting takes up 1 byte (Assuming ASCII and latin1 encoding), you would be looking at an increase of 3 bytes (including comma) per category per product by removing Product.category and putting the items into Product_Category, however that's no where near as big a gain as duplicating entire product rows. However, there is the cost of changing your code (unless you're far better than I am at joins).
Does this help any?
One solution I've seen is to use three tables:
categories lists your categories
products lists your products, without any attached category information
category_map is a special table: each row links a product_id to a category_id
To look up products by category, you can then match rows in category_map against rows in products.
This is an imperfect example, but it gets the gist of it:
SELECT * FROM
(
SELECT * FROM category_map
WHERE category_id=1
) AS map
INNER JOIN products
ON products.id = map.product_id;
Table joins are a very powerful tool; you may want to spend some time reading up on them, if you're new to using them. Coding Horror has a visual explanation that skims over the details.
It would be a good idea to set up foreign key constraints or otherwise make sure that entries in category_map correspond to existing entries in products and categories.
I have a system in which I have to select "similar" records. Imagine a database containing a big list of products and when the user enters partial name of a product, a list of products come up as suggestions about the product he is searching for. These products have a longer description field too.
This is NOT about a WHERE product_name LIKE '%entered_string%' query, I think. The logic is akin to the one Stack Overflow might use, id est: when you ask a question, it prompts you with Questions that may already have your answer and Similar questions, both obviously using a method to derive what I want to ask from my question title/content and search against the database, showing the results.
I just wonder whether it is accomplishable with PHP and using MySQL as the database.
Example:
Entering food should give us results like 1kg oranges, bread and cookies. Both of these would have something similar which could help to link them programmatically to each other.
There can lots of methods to approach this scenario. but I think straight one is to have multiple keywords/tags mapped with every item. so when user types in, you would not be searching item table, you should be searching the mapped keywords and based on that searching loading the relevant items.
If you want similar products to show up, you need to put that information in your database.
So, make a category for foods, and assign every food product to that category. That way you can select similar products easily. There is no other efficient way to do this
So your database:
categories:
|id|name
1 fruit
2 Cars
Products
|id|name|category_id
1 apple 1
2 Ford focus 2
And you can select like this:
SELECT `name`,`id` FROM `products` WHERE category_id = 1;
Another way (as suggested in a comment) are tags
Products
|id|name|tags
1 apple "fruit food delicious"
2 Ford focus "Car wheels bumper"
Best way is to use a fulltext search on the tags:
SELECT * FROM `products` WHERE MATCH(tags) AGAINST ('fruit')
Make sure to have a fulltext index on tags.
I have a table with products that fall under specific categories, but the products within each category can contain multiple meta data tracking field
Table: products
id name category metadata
1 something 1 blue,red,purple
2 something else 2 left,right,middle
I have been trying to contemplate the best method to have a single product table but can't seem to squeeze the metadata in conveniently. for now I have created a table with all the metadata and fields for tracking the related category (the sequence is so i can order them withing a dropdown etc..)
Updated table: products
id name category metadata
1 something 1 1,2,3
2 something else 2 4,5,6
Table: metadata
id category sequence option
1 1 1 blue
2 1 2 red
3 1 3 purple
4 2 1 left
5 2 2 right
6 2 3 middle
If this format makes sense .. I am trying to generate a query that will search for values in my product table and grab each and all of the related meta values. The issue I am having is trying to find a unique value in the products field. if I do a MySQL search for LIKE(%1%) I will get matches for 1, 11, 21, 31 etc ... I thought of adding a leading and trailing comma to the field by default and then search for ",1," which would be unique .. but there has to be a better way ...
Any recommendations (regarding format or query)?
It's not an ideal design to have comma-separated values within a single database field. Aside from the problem you mentioned (difficult to search), your queries will be less efficient, as the DB won't be able to use indices for the lookup.
I'd recommend making a separate table products_metadata with a many-to-one relationship to the products table. Have the *metadata_id*, and the *product_id*, which is a foreign key linking back to the products table. That will make your job much easier.
You want to add another table, which links products to their metadata. It will have two columns: productid and metadataid which refer to the relevant entries in the products and metadata tables respectively. Then you no longer keep metadata in the products table, but JOIN them together as required.
Let's say I have 10 books, each book has assigned some categories (ex. :php, programming, cooking, cookies etc).
After storing this data in a DB I want to search the books that match some categories, and also output the matched categories for each pair of books.
What would be the best approach for a fast and easy to code search:
1) Make a column with all categories for each book, the book rows would be unique (categs separated by comma in each row ) -> denormalisation from 1NF
2) Make a column with only 1 category in each row and multiple rows per book
I think it is easier for other queries if I store the categories 1 by 1 (method 2), but harder for that specific type of search. Is this correct?
I am using PHP and MySQL.
PPS : I know multi relational design, I prefer not joining every time the tables. I'm using different connection for some tables but that's not the problem. I'm asking what's the best approach for a db design for this type of search: a user type cooking, cookies, potatoes and I want to output pairs of books that have 1,2 more or all matched categs. I'm looking for a fast query, or php matching technique for this thing... Tell me your pint of view. Hope I'm understood
Use method 2 -- multiple rows per book, storing one category per row. It's the only way to make searching for a given category easy.
This design avoids repeating groups within a column, so it's good for First Normal Form.
But it's not just an academic exercise, it's a practical design that is good for all sorts of things. See my answer to Is storing a comma separated list in a database column really that bad?
What you want to do is have one table for books, one table for categories, and one table for connecting books and categories. Something like this:
books
book_id | title | etc
categories
category_id | title | etc
book_categories
book_id | category_id
This is called a many-to-many relationship. You should probably google it to learn more.
This relationship is a Many-To-Many (a book can have multiple categories and a category can be used in several books).
Then we have the following:
Got it?
=]
I would recommend approach number 2. This is because approach 1 requires a full text search of the category column.
You may have some success by splitting it up into two tables: One table has one line per book and a unique id (call the table books), and the other has one line per book per category and references the book id from the first table (call the table bookcategories). Then if you only need book data you use table books, where if you need categories you join both tables.
The first time I tried to do this, I created a field in the category table called query. That contained strings like:
brand = "Burberry" AND type != "Watch"
Which I then inserted into the WHERE clause of a query to find a category's products.
That probably wasn't the best design.
My second attempt was to use a tagging system. I would create a tag table with tags like Burberry and Watch. I had a table tying the tags to the products (HABTM). I also had a table tying the tags to the categories.
The table tying tags to categories had an extra field called include which if it was a 1 then all products selected must also have that tag. Or if it was a 0 then all products selected must NOT have that tag.
This seemed to be a better design then my original, but it required some pretty complex joins.
Now I need to approach this problem once again.
One difference is I am now using the CakePHP (1.3) framework.
Before I try reinventing the wheel again. I was wondering if there are any known patterns/solutions I could use?
Probably you've already done that somehow by now, but here are my 2cents:
I'd drop Categories<->Tags, because I feel that you're unnecessarily duplicating data with it.
I.e. tables should be just categories, categories_products, products, products_tags and tags.
This way:
you wouldn't have to bother about changing category tags when products are added or removed from category
your searches would become more uniform (since there's only one tagging table)
and your tags still would be no more than 3 JOINS away - which is quite comfortable :)
From what I can understand you should have 5 tables:
Categories
Products
Tags
Categories_Tags
Products_Tags
UPDATE: When the user defines what should be selected, the HABTM tables are updated so that the tags/categories link to the products they should be linked to only.
So the query will look something like:
SELECT * FROM products WHERE ID in (SELECT product_id from tag list to include) AND ID NOT IN (select product_id FROM tag list to NOT include)
Maybe I'm missing what you're trying to accomplish here, but this sounds like you're making it more complicated than it needs to be.
Create three tables: Product, Category, and ProductCategory. Product and Category each have an id. Then ProductCategory includes ProductId / CategoryId pairs.
Like:
Product
ProductId Name
1 Lamp
2 Carpet
3 Drill
4 Power cord
5 3/8" bolt
Category
CategoryId Name
1 Electrical
2 Home decor
3 Hardware
ProductCategory
ProductId CategoryId
1 1
1 2
2 2
3 1
3 3
4 1
5 3
Then if you want, e.g., to know all the "Hardware" items:
select product.*
from category
join productcategory using (categoryid)
join product using (productid)
where category.name='Hardware'