I'm using a self constructed database model. This model is constructed for a webshop application. this is how it looks like:
First of all I have a table for my products. This contains only general data like id and articlenr, for all of the product attributes (like name, price,etc) I have made seperate tables per type, so I have the following tables :
product_att_varchar
product_att_decimal
product_att_int
product_att_select
product_att_text
product_att_date
these tables are related by a relational table procuct_att_relational
My problem is the performance of this structure, if I want all the attributes of a specific product if have to use so much joins that it will slow down very much.
Does anyone have a solution for this???
Thanks
This model is called EAV (entity-attribute-value) and has its drawbacks and benefits.
Benefits are that it's very flexible and can be extended easily. It may be useful if you have very large number of very sparse attributes, the attributes cannot be predicted at design time (say, user-provided), or the attributes that are rarely used.
The drawbacks are performance and inability to index several attributes at the same time. However, if your database system allows indexed views (like SQL Server) or clustered storage of multiple tables (like Oracle), then by using these techniques performance can be improved.
However, storing all attributes in one record will still be faster.
I don't see any good reason to move those attributes out of the product table. It'd be one thing if you did it because you had some data that suggested a problem, but it looks like you thought "this will be better". Why did you do it this way right off the bat?
If you did it this way because it was generated for you, I'd recommend abandoning that generator.
People keep coming back to this model because they think it's "flexible". Well, it is I suppose, but that flexibility comes at a huge price: Every update and every query is slow and complex. Quassnoi mentions that if the attributes are sparse, i.e. most entity instances have only a small percentage of the possible attributes, this can save space. This is true, but the flip side is that if it is not sparse, this takes hugely more space, because now you have to store the attribute name or code for every attribute in addition to the value, plus you need to repeat some sort of key to identify the logical entity instance for every attribute.
The only time I can think of when this would be a good idea is if the list of attributes needs to be updatable on the fly, that is, a user needs to be able to decide to create a new attribute whenever he likes. But then what will the system do with this attribute? If you just want the user to be able to type it in and then later retrieve what he typed, easy enough. But will it affect processing in any way? Like, if the user decides to add a "clearance sale code", how will your program know how this affects the sale price? It could be done of course: You could have additional screens where the user enters data that somehow describes how each field affects pricing or re-ordering or whatever. But that would add yet more layers of complexity.
So my short answer is: Unless you have a very specialized requirement, don't do this. If you are trying to build a database describing items that you sell, with things like description and price and and quantity on hand, then create one table with fields like description and price and quantity on hand. Life is hard enough without going out of your way to make it harder.
Related
I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?
You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.
#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.
If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.
You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.
I'm developing a stock and warehouse management system using relational databases (MySQL) and PHP. Due to the fact that the stock products will have multiple characteristics (widths, heights, weights, measures, colors, etc) there raises the need of having a database model approach of storing the attributes and the possibility to add/edit new attributes, alter product types and so on.
So, in the current concept I can see only 3 viable models:
store all attributes in a single table, as separated column and
based on product type (probably category) to serve them to the end
user to fill
the EAV (Entity - Attribute - Value) model that will involve
something like this:
a category table containing classes of attributes
a class of attributes table that will contain separate classes with multiple attributes (in this manner we ensure that we can add to a category a class of attributes without the need to manually add to similar categories attributes one after the other)
a attributes table responsible for the attribute itself
a attributes values table where we store the values
Store all common attributes in a single table and create multiple tables for all different category type: this model would require to change the database every time we encounter a new category type
The second model is inspired from here.
After reading a lot regarding the EAV model I now have doubts over this model and I am little concern regarding the ways I will have to connect different product attributes in orders / invoices and so on.. Even the validation of forms seems that it will be a real pain of using the EAV model, but still.. I wouldn't like to have a single table with 100+ columns and then to be ready to add new columns whenever a new attribute is to be added..
So, the question would be: is there a cheaper solution? Or could the EAV model be improved?
I know it's a long and old debate, but everybody is just pointing to NoSQL and I only rely on RDBMS..
EDIT:
The downside of those approaches (or of most of the approaches found) is that:
for a specified attribute there probably should exist a measure unit
(eq. attribute weight should have a drop down with measuring units)
a specified attribute should be mandatory or not
all attributes should have a validation on form submit
Until now, the only feasible solution would be to create a new table for every new category, and deal in that table with all custom attributes and rules. But, yet again, it would end up to a real pain when a new category is to be set up.
EDIT 2:
The option of using a Json column in MySQL, does not solve from my point of view any of the downsides mentioned above.. OR, maybe I am wrong and I don't clearly see the big picture..
I gather that these are your primary requirements:
Flexible attributes
Your exact need here is unclear: it sounds like you either expect the attributes to change, or at least expect that all attributes will not always be applicable to all products (i.e. a sparse matrix)
Products are also categorized, and the category will (at least partially) determine what attributes are applicable to a product
The attributes themselves may have additional properties aside from their value, that must be provided by the user (i.e. a unit that goes with a weight)
Input validation is a must, and checks things like:
All required attributes are present
Attributes which are not applicable are not present
Attributes have valid values
User-provided attribute properties have valid values
You probably also want to make sure you can search/filter efficiently by attributes
These different requirements all result in different technical needs, and different technical solutions. Some are matters of database, and some will have to be solved in code regardless of database choice. Obviously you are aware of some of these issues, but I think it is worth really breaking it down:
Flexible Attributes
Having a list of flexible attributes (as you know) does not work well with RDBMS systems where your table schema has to be pre-defined. This includes pretty much all of the SQLs, and definitely MySQL. The issue is that changing the table schema is expensive and for large tables can take minutes or hours, making it practically impossible to add attributes if you have to add a column to a table to do it.
Even if your list of attributes rarely changes, a large table of attributes is very inefficient if most products don't have a value for most attributes (i.e. a sparse matrix).
In the long run, you just won't get anywhere if your attributes are stored as a column in tables. Even if you break it down per-category, you are still going to have large empty tables that you can't add columns to dynamically.
If you stick with an RDBMS your only option is really an EAV system. Having considered, researched, and implemented EAV systems, I wouldn't worry too much about all the hype you hear about them on the internet. I know that there are lots of articles out there talking about the EAV "anti-pattern", and I'm the kind of person who takes proper use of software design patterns seriously, but EAV does have a perfectly valid time and place, and this is it. In the long run you will not be able to do this on an RDBMS without EAV. You could certainly look at a NoSQL system that is designed for this specific kind of problem, but when the rest of your database is in a standard RDBMS, installing or switching to a NoSQL system just to store your attribute values is almost certainly overkill. You certainly aren't going to want to lose the ACID compliance that a RDMBS comes with, and most NoSQL systems don't guarantee ACID compliance. There is a wave of NewSQL systems out there that are designed to get the best of both worlds, but if this is just one part of a larger application (which I'm sure is the case), it probably isn't worth investigating completely new technologies just to make this one feature happen. You could also consider using something like JSON storage inside MySQL to store your attribute values. That is a viable option now that MySQL has better JSON support, but that only makes a small change to the big picture: you would still need all your other EAV tables to keep track of allowed attributes, categories, etc. It is only the attribute values that you would be able to place inside of the JSON data, so the potential benefits of JSON storage are relatively small (and have other issues that I will mention down the road).
So in summary, I would say that as long as the rest of your application runs on a RDBMS, it is perfectly reasonable to use EAV to manage flexible attributes. If you were trying to build your entire system in an EAV inside of a RDBMS, then you would definitely be wasting your time and I'd tell you to go find a good NoSQL database that fits the problem you are trying to solve. The disadvantages of EAV do still apply though: you can't easily perform consistency checks within your RDBMS system, and will have to do that yourself in code.
Categorized products with category-specific attributes
You've pretty much got it here. This is relatively straight-forward inside an EAV system. You will have your attributes table, you will have a category table, and then you will need a standard one-to-many or many-to-many relationship between the attributes and categories table which will determine which attributes are available to which category. You obviously also have a relationship between products and categories, so you know which products therefore need which attributes.
Your option #3 is designed to fulfill this requirement, but having a table with each attribute as a column will scale very poorly as your system grows, and will definitely break if you ever need to dynamically add attributes. You don't want to be running ALTER TABLE statements on the fly, especially if you have more than a few thousand records.
Managing attribute properties
It is one thing to store dynamic attributes and values. It is another problem entirely to store dynamic attributes, values, and associated meta data (i.e. store a weight as well as the unit the weight is in). This however is no longer a database problem, but rather a code problem. In terms of actually storing the information your best bet is to probably store your meta data inside your attribute values table, and rely upon some code abstractions to handle the input validation as well as form building. That can get quite complicated quite fast, especially if done wrong, and talking through such a system would take another entire post. However, I think you are on the right track: for a fancier attribute that requires both a value and meta data, you need to somehow assign a class that is responsible for input processing and form validation. For instance for a simple text field you have a "text" class that reads the user's value out of the form and stores it in the proper "attribute_values" table, with no meta data stored. Then for your "weight" attribute you would have a "weight" attribute that stores the number given by the user (i.e. 0.5) but then also stores the unit the user specified with that number (i.e. 'lbs') and persists both to the "attribute_values" table (in pseudo-SQL): INSERT INTO attribute_values value='0.5', meta_data='{"unit":"lbs"}', product_id=X, attribute_id=X. Ironically JSON probably would be a good way to store this meta data, since the exact meta data kept will also vary by attribute type, and I doubt you would another level of tables to handle that variation in your EAV tables.
Again, this is more of a code problem than storage problem. If you decided to do JSON tables the overall picture to meet this requirement wouldn't change: your "attribute type classes" would simply store the meta data in a different way. That would probably look something like: UPDATE products SET attributes='{"weight":0.5,"unit":"lbs"}' WHERE id=X
Input Validation
This will have to be handled exclusively by code regardless of how you store your data, so this requirement doesn't matter much in terms of deciding your database structure. A class-based system as described above will also be able to handle input validation, if properly executed.
Sort/Search/Filter
This doesn't matter if you are exclusively using your attributes for data storage/retrieval, but will you be searching on attributes at all? With a proper EAV system and good indexes, you can actually search/sort efficiently in an RDBMS system (although it can start to get painful if you search by more than a handful of indexes at a time). I haven't looked in detail, but I'm pretty sure that using JSON for storage won't scale well when it comes to searching. While MySQL can work with JSON now and search the columns directly, I seriously doubt that such searching/sorting makes use of MySQL indexes, which means that it won't work with large databases. I could be wrong on that one though. It would be worth digging into before committing to a MySQL/JSON storage setup, if you were going to do something like that.
Depending on your needs, this is also a good place to compliment an RDBMS system with a NoSQL system. Having managed large-ish (~1.5 million product) e-commerce systems before, I have found that MySQL tends to fall flat in the searching/sorting category, especially if you are doing any kind of text searching. In an e-commerce system a query like: "Show me the results that best match the term 'blue truck' and have the attribute 'For ages 3-5'" is common, but doing something like that in MySQL is about impossible, primarily because of the need for relevancy based sorting and scoring. We solved this problem by using Apache Solr (Elastic is a similar solution) and it managed our searching/sorting/search term scoring very well. In this case it was a two database solution. MySQL kept all the actual data and stored attributes in EAV tables, and anytime something got updated we pushed a record of everything to Apache Solr for additional storage. When a query came in from a user we would query Apache Solr which was an expert at text searching and could also handle the attribute filtering with no trouble, and then we would pull the full product record out of our MySQL database. The system worked beautifully. We had 1.5 million products, thousands of custom attributes, and had no trouble running the whole thing off of a single virtual server. Obviously there was a lot of code going on behind the scenes, but the point is that it definitely worked and wasn't difficult to maintain. Never had any issues with performance from either MySQL or Solr.
Well, this is just one approach. You could simplify this if you don't need or want all of this.
You could, for example, use a Json column in Mysql, to store all of the extra attributes. Another idea, in the product type, add a json column to store the custom attributes and types, and use this to draw the form on the screen.
I would recommend you to go through an EAV database first in order to understand the database creation & its values.
You can follow magento DB structure which uses EAV model.
EAV stands for Entity attribute and value model. Let’s closely have a look at all parts.
Entity: Data items are represented as entity, it can be a product or customer or a category. In the database each entity have a record.
Attribute: These are belongs to different entity, for example a Customer entity have attributes like Name, Age, Address etc. In Magento database all attributes are listed in a single table.
Value: Simply the values of the attributes, for example for the Name attribute the value will be “Rajat”.
EAV is used when you have many attributes for an entity and these attribute are dynamic (added/removed).
Also there is a high possibility that many of these attribute would have empty or null value most of the time.
In such a situation EAV structure has many advantages mainly with optimized mysql storage
For Your case - Category can also have attributes, products can also have attributes so on with customers etc ...
Let's take an example of categories. Following are the tables provided by magento:
1. catalog_category_entity
2. catalog_category_entity_datetime
3. catalog_category_entity_decimal
4. catalog_category_entity_int
5. catalog_category_entity_text
6. catalog_category_entity_varchar
7. catalog_category_flat
Follow this link to know more about table
Magento Category Tables
For attributes which are select box. You can put dropdown values under option values.
Follow this to link to understand magento eav structure which will give you clear picture about how EAV model work & how you can make a best use of it.
magento table structure
There are three approaches if you want to stick with a relational database.
The first is best if you know in advance the attributes for all the products. You chose one of the three ways to store polymorphic data in a relational model.
It's "clean" from a relational point of view - you're just using rows and columns, but each of the 3 options has its own benefits and drawbacks.
If you don't know your attributes at development time, I'd recommend against these solutions - they'd require significant additional tooling.
The next option is EAV. The benefits and drawbacks are well documented - but your focus on "validating input forms" is only one use case for the data, and I think you could easily find your data becomes "write only". Providing sorting/filtering, for instance, becomes really hard ("find all products with a height of at least 12, and sort by material_type" is almost impossible using the EAV model).
The option I prefer is a combination of relational data for the core, invariant data, and document-centric (JSON/XML) for the variant data.
MySQL can query JSON natively - so you can sort/filter by the variant attributes. You'd have to create your own validation logic, though - perhaps by integrating JSON Schema in your data entry applications.
By using JSON Schema, you can introduce concepts that "belong together", and provide lookup values. For instance, if you have product weight, your schema might say weight always must have a unit of measure, with the valid options being kilogram, milligram, ounce, pound etc.
If you have foreign key relationships in the variant data, you have a problem - for instance, "manufacturer" might link to a manufacturers table. You can either model this as an explicit column, or in the JSON and do without SQL's built-in foreign key tools like joins.
Currently, I am dealing with database structure and I would like to get a piece of advice.
I have 2 objects: banner and ad.
For them I may create banner table and ad table, which will hold all the info about each entity. As main advantage I see that everything related to 1 entity is in this entity table.
On the other hand, I may some table like:
entity_properties.
It will hold value_id entity_id property value. The main advantage is that for entities I need only some basic fields, other fields can be put in this table.
But I am not sure which is the better practice and performance?
Thanks in advance.
For the sake of normalization it is always better to have 1 table per 1 entity. Normalization is an aim or an approach to minimize redundancy and dependency in relational databases . In your case banner and ad are different entities. For now it seems that you can use them in same table. So "redundancy" is not the case. However, what if you want to add some additional fields later?
In addition code complexity and readability is another issue. For instance, when you add different types of object in same table you need to add an internal logic to differentiate them in your code. This means you have complex and probably less readable code.
That depends on the exact use of your system and the attributes/values you're trying to store.
As I see it, I think it would be good to save the important and required information in one table, your 'ad' table, and the rest in the 'ad_entities' table, with an ad_id, entity_name, entity_value, or something similar for your application.
This is a good performance choice since you'll be able to get all the information about the current Ad or all Ads using just one quite simple query, which your objects can easily figure out.
I kind of get how the indexing in Magento works, but I haven't seen any good documentation on this. I would kind of like to know the following.
How it works
What is its purpose
Why is it important
What are the details everyone should know about it
Anything else that can help someone fully understand what indexing is and how it is used in Magento
I think having this information will be of great use for others in my boat that don't fully get the indexing process.
UPDATE:
After the comment on my question and Ankur's answer, I am thinking I am missing something in my knowledge of just normal Database Indexing. So is this just Magento's version of handling indexing and is it better for me to get my answer in terms of Database indexing in general, such as this link here How does database indexing work?
Magento's indexing is only similar to database-level indexing in spirit. As Anton states, it is a process of denormalization to allow faster operation of a site. Let me try to explain some of the thoughts behind the Magento database structure and why it makes indexing necessary to operate at speed.
In a more "typical" MySQL database, a table for storing catalog products would be structured something like this:
PRODUCT:
product_id INT
sku VARCHAR
name VARCHAR
size VARCHAR
longdesc VARCHAR
shortdesc VARCHAR
... etc ...
This is fast for retrieval, but it leaves a fundamental problem for a piece of eCommerce software: what do you do when you want to add more attributes? What if you sell toys, and rather than a size column, you need age_range? Well, you could add another column, but it should be clear that in a large store (think Walmart, for instance), this would result in rows that are 90% empty and attempting to maintenance new attributes is nigh impossible.
To combat this problem, Magento splits tables into smaller units. I don't want to recreate the entire EAV system in this answer, so please accept this simplified model:
PRODUCT:
product_id INT
sku VARCHAR
PRODUCT_ATTRIBUTE_VALUES
product_id INT
attribute_id INT
value MISC
PRODUCT_ATTRIBUTES
attribute_id
name
Now it's possible to add attributes at will by entering new values into product_attributes and then putting adjoining records into product_attribute_values. This is basically what Magento does (with a little more respect for datatypes than I've displayed here). In fact, now there's no reason for two products to have identical fields at all, so we can create entire product types with different sets of attributes!
However, this flexibility comes at a cost. If I want to find the color of a shirt in my system (a trivial example), I need to find:
The product_id of the item (in the product table)
The attribute_id for color (in the attribute table)
Finally, the actual value (in the attribute_values table)
Magento used to work like this, but it was dead slow. So, to allow better performance, they made a compromise: once the shop owner has defined the attributes they want, go ahead and generate the big table from the beginning. When something changes, nuke it from space and generate it over again. That way, data is stored primarily in our nice flexible format, but queried from a single table.
These resulting lookup tables are the Magento "indexes". When you re-index, you are blowing up the old table and generating it again.
Hope that clarifies things a bit!
Thanks,
Joe
Magento indexing is not similar to normal database indexing and is more like database denormalization (http://en.wikipedia.org/wiki/Denormalization) process. In most cases it takes the EAV structure and makes it available for flat table structure which is by no doubt faster to access and search through.
If your normal EAV query would be 200 left joins to get all the products in catalog and data over their attributes and layered navigation values then after "indexing" this data is available through denormalized data structure for faster querying/access
Magento indexing is somehow similar to normal database indexing, but the differece is that you need to do it manually in some case.
when you do indexing, for example the catalog indexing then it make entry of your catalog product in the separate table for the different type of sorting, A small example is store, suppose you have a product and different detail for the different store, then first it will fetch the record from the complex joins in the separate table(when you will perform indexing)
Other best example is layered navigation indexing: if you will run the layered navigation indexing then it will check in the product database for the all shop by filter attribute then on every attribute how may product are available it will also store that value.
Mainly such type of indexing are required if you are doing some direct database changes or though your own custom code
Please let me know if you have other query on indexing
I'm am designing my database/domain for an eCommerce application and I'm having a hard time figuring out how to store products.
The website will sell a wide range of products, pens, thongs, tattoos, umbrellas, everything. Each of these product will share a few common attributes, height, width, length, weight, etc but some products have special data. For example, pens have different ink colors and tips/lids and brochures can have different types of folds. So far I have thought up some 20+ extra attributes, but these attributes may only apply to 1% of products on the website.
So I am wondering if it is appropriate to implement a EAV model to handle the extra data. Keeping in mind that when customers are viewing the site in the frontend, there will be a filtering sidebar like on eBay and carsales.com.au. (So keeping in mind there will be a fair bit of querying)
I don't think it's practical to implement Class Table inheritance as the system needs to remain flexible. This is because, down the track we may have more attributes in the future with new types of products.
The other thing I have considered is using a NoSQL database (probably MongoDB) however I have little experience with these types of databases, will it even solve my problem?
Review of options:
Single products entity with lots of columns
Separate attributes entity (EAV)
Switch to schema-less persistence
I'm in the process of building a prototype with an attributes entity to see how flexible it is, and testing the performance and how out of control the querying gets.
EDIT: I am, of course, open to any other solutions.
Great question, but of course, there is no "one true way". As per #BenV, Magento does use the EAV model. My experience with it has been overwhelmingly positive, however it does trip up other users. Some considerations:
1. Performance.
EAV requires complex, multi-table joins to populate your object with the relevant attributes. That does incur a performance hit. However, that can be mitigated through careful caching (at all levels through the stack, including query caching) and the selective use of denormalization. Magento does allow administrators to select a denormalized model for categories and products where the number of SKUs warrants it (generally in the thousands). That in turn requires Observers that trigger re-indexing (always good!) and updates to the "flat" denormalized tables when product data changes. That can also be scheduled or manually triggered with a prompt to the administrator.
2. 3rd Party User Complexity
If you ever plan to make this application available to other users, many will find EAV too complex and you'll end up dealing with a lot of bleating and uninformed abuse on the user forums (ref Magento!!).
3. Future extensibility and plugin architecture.
There is no doubt that the EAV model really comes into it's own when extensibility is a factor. It is very simple to add new attributes into the model while minimizing the risk of breaking existing ORM and controller code.
4. Changes in datatype
EAV does make it a little harder to alter attribute datatypes. If your initial design calls for a particular attribute datatype that changes in future (say int to varchar), it means that you will have to migrate all the records for that attribute to the corresponding table that matches the new datatype. Of course, purists would suggest that you get the design right first time, but reality does intrude sometimes!
5. Manual product imports
One thing that EAV makes almost impossible is importing products (or other entities) into the database using SQL and/or phpMyAdmin-style CSV/XML. You'll need to write an Importer module that accepts the structured data and passes it through the application's Model layer to persist it to the database. That does add to your complexity.
The open source shopping cart Magento allows custom attributes for their products using an EAV design. You can check out their database schema here.
I would suggest you to look closer on Doctrine 2 ORM with OXM plugin for it (https://github.com/doctrine/oxm). It will solve your problem with different attributes. Of course you will be required to build indexes for searchable custom attributes, but I don't think it will be a problem :)
If you don't care about number of community members, then you can use MongoDB as well.