We have a php/mysql system with about 5 core entities. We now need to add the ability for customers to create custom fields for some of these entities on a per project basis.
They would contain a label, key, type, default value, and possible allowed values.
This is so they could add a custom date field, or a custom dropdown to the UI and save this value against the specific entity.
What is the best approach for storing this kind of data in a mySQL database? I need to store both the config for the field, and then the current value for a specific entity.
I've had a look at various options here.. https://ayende.com/blog/3498/multi-tenancy-extensible-data-model
But this is not really at a tenancy level, more a project level.
I was thinking...
A CustomFields table to hold the configuration of a field against an entity type and project id.
A CustomFieldValues table to hold the value saved against the field - a row per field ( entity_id | field_id | field_value)
Then we create relationships between the entities and these custom values when retrieving the entities.
The issue with this is that there will be as many rows in the Values table as there are custom fields - so saving a entity will result in X extra rows. On top of that, these are versioned, so once a new version is created, there will be another X rows created for that new version.
Also, you can't index the fields on name, joins would become pretty complex i think as you have to join to the configuration and the values to build the key value pair to return against the entity, and how would you select based on a custom field name, when the filed name was actually a value?
I don't want to add dynamic columns to the table, as this will affect ALL the entites in the whole system - not just the ones in the current client / project.
The other option is to store the values in a JSON column.
This could be on the entity row itself customFields or similar. This would prevent the extra rows per field, but also has issues with lack of indexing etc, and still need to join to the config table. However, you could perform queries by the property name if the key=value was stored in the JSON... WHERE entity.customFields->"$.myCustomFieldName" > 1.
Storing the filed name in the json does mean you cannot change it once created, without a lot of pain.
If anyone has any advice on approaches for this, or articles to point me at that would be much appreciated - Im sure this has been solved many times before....
JSON records: No! A thousand times no! If you do that, just wait until somebody actually uses your system for a few tens of millions of records, then asks you to search on one of your extra fields. Your support people will curse your name.
Key-value store. Probably yes. There's a very widely deployed existence proof of this design: WordPress. It has a table called wp_postmeta, containing metadata fields applying to wp_posts (blog pages and posts). It's proven successful.
You will need to do some multiple joining to use this stuff. For example, to search on height and eye-color, you'd need
SELECT p.person_id, p.first, p.last, h.value height, e.value eye_color
FROM person p
LEFT JOIN attrib h ON p.person_id = h.person_id AND h.key='eye_color'
LEFT JOIN attrib e ON p.person_id = e.person_id AND e.key='height'
WHERE e.value='green' and CAST(h.value AS INT) < 160
As the CAST in that WHERE clause shows, you'll have some struggles with data type as well.
You'll need LEFT JOIN operations in this sort of attribute lookup; ordinary inner JOIN operations will suppress rows with missing attributes, and that might not work for you.
But, if you do a good job with indexes, you'll be able to get decent performance from this approach.
The table structure envisioned in my example doesn't have your table describing each additional field, but you know how to add that. It also doesn't have explicit support for multi-project / multitenant data separation. But you can add that as well.
Related
I'm currently working on an app backend (business directory). Main "actor" is an "Entry", which will have:
- main category
- subcategory
- tags (instead of unlimited sub-levels of division)
I'm pretty new to OOP but I still want to use it here. The database is MySql and I'll be using PDO.
In an attempt to figure out what database table structure should I use in order to support the above classification of entries, I was thinking about a solution that Wordpress uses - establish relationship between an entry and cats/subcats/tags through several tables (terms, taxonomies, relationships). What keeps me from this solution at the moment is the fact that each relationship of any kind is represented by a row in the relationships table. Given 50,000 entries I would have, attaching to a particular entry: main cat, subcat and up to 15 tags might slow down the app (or I am wrong)?
I then learned a bit about Table Data Gateway which seemed an excellent solution because I liked the idea of having one table per a class but then I read there is virtually no way of successful combating the impedence missmatch between the OOP and relational-mapping.
Are there any other approaches that you may see fit for this situation? I think I will be going with:
tblentry
tblcategory
tblsubcategory
tbltag
structure. Relationships would be based on the parent IDs but I+'m wondering is that enough? Can I be using foreign key and cascade delete options here (that is something I am not too familiar with and it seems to me as a more intuitive way of having relationships between the elements in tables)?
having a table where you store the relationship between your table is a good idea, and through indexes and careful thinking you can achieve very fast results.
since each entry must represent a different kind of link between two entities (subcategory to main entry, tag to subcategory) you need at least (and at the very most) three fields:
id1 (or the unique id of the first entity)
linkid (linking to a fourth table where each link is described)
id2 (or the unique id of the second entity)
those three fields can and should be indexed.
now the fourth table to achieve this kind of many-to-many relationship will describe the nature of the link. since many different type of relationship will exist in the table, you can't keep what the type is (child of, tag of, parent of) in the same table.
that fourth table (reference) could look like this:
id nature table1 table2
1 parent of entry tags
2 tag of tags entry
the table 1 field tells you which table the first id refers to, likewise with table2
the id is the number between the two fields in your relationship table. only the id field should be indexed. the nature field is more for the human reader then for joining tables or organizing data
So I'm a visual designer type guy who has learned a respectable amount of PHP and a little SQL.
I am putting together a personal multimedia portfolio site. I'm using CI and loving it. The problem is I don't know squat about DB design and I keep rewriting (and breaking) my tables. Here is what I need.
I have a table to store the projects:
I want to do fulltext searcheson titles and descriptions so I think this needs to be MyISAM
PROJECTS
id
name (admin-only human readable)
title (headline for visitors to read)
description
date (the date the project was finished)
posted (timestamp when the project was posted)
Then I need tags:
I think I've figured this out. from researching.
TAGS
tag_id
tag_name
PROJECT_TAGS
project_id (foreign key PROJECTS TABLE)
tag_id (foreign key TAGS TABLE)
Here is the problem I have FOUR media types; Photo Albums, Flash Apps, Print Pieces, and Website Designs. no project can be of two types because (with one exception) they all require different logic to be displayed in the view. I am not sure whether to put the media type in the project table and join directly to the types table or use an intermediate table to define the relationships like the tags. I also thinking about parent-types/sub-types i.e.; Blogs, Projects - Flash, Projects - Web. I would really appreciate some direction.
Also maybe some help on how to efficiently query for the projects with the given solution.
The first think to address is your database engine, MyISAM. The database engine is how MySQL stores the data. For more information regarding MyISAM you can view: http://dev.mysql.com/doc/refman/5.0/en/myisam-storage-engine.html. If you want to have referential integrity (which is recommended), you want your database engine to be InnoDB (http://dev.mysql.com/doc/refman/5.0/en/innodb-storage-engine.html). InnoDB allows you to create foreign keys and enforce that foreign key relationship (I found out the hard way the MyISAM does not). MyISAM is the default engine for MySQL databases. If you are using phpMyAdmin (which is a highly recommended tool for MySQL and PHP development), you can easily change the engine type of the database (See: http://www.electrictoolbox.com/mysql-change-table-storage-engine/).
With that said, searches or queries can be done in both MyISAM and InnoDB database engines. You can also index the columns to make search queries (SELECT statements) faster, but the trade off will be that INSERT statements will take longer. If you database is not huge (i.e. millions of records), you shouldn't see a noticeable difference though.
In terms of your design, there are several things to address. The first thing to understand is an entity relationship diagram or an ERD. This is a diagram of your tables and their corresponding relationships.
There are several types of relationships that can exist: a one-to-one relationship, a one-to-many relationship, a many-to-many relationship, and a hierarchical or recursive relationship . A many-to-many relationship is the most complicated and cannot be produced directly within the database and must be resolved with an intermittent table (I will explain further with an example).
A one-to-one relationship is straightforward. An example of this is if you have an employee table with a list of all employees and a salary table with a list of all salaries. One employee can only have one salary and one salary can only belong to one employee.
With that being said, another element to add to the mix is cardinality. Cardinality refers to whether or not the relationship may exist or must exist. In the previous example of an employee, there has to be a relationship between the salary and the employee (or else the employee may not be paid). This the relationship is read as, an employee must have one and only one salary and a salary may or may not have one and only one employee (as a salary can exist without belonging to an employee).
The phrases "one and only one" refers to it being a one-to-one relationship. The phrases "must" and "may or may not" referring to a relationship requiring to exist or not being required. This translates into the design as my foreign key of salary id in the employee table cannot be null and in the salary table there is no foreign key referencing the employee.
EMPLOYEE
id PRIMARY KEY
name VARCHAR(100)
salary_id NOT NULL UNIQUE
SALARY
id PRIMARY KEY
amount INTEGER NOT NULL
The one-to-many relationship is defined as the potential of having more than one. For example, relating to your portfolio, a client may have one or more projects. Thus the foreign key field in the projects table client_id cannot be unique as it may be repeated.
The many-to-many relationship is defined where more than one can both ways. For example, as you have correctly shown, projects may have one or more tags and tags may assigned to one or more projects. Thus, you need the PROJECT_TAGS table to resolve that many-to-many.
In regards to addressing your question directly, you will want to create a separate media type table and if any potential exists whatsoever where a project is can be associated to multiple types, you would want to have an intermittent table and could add a field to the project_media_type table called primary_type which would allow you to distinguish the project type as primarily that media type although it could fall under other categories if you were to filter by category.
This brings me to recursive relationships. Because you have the potential to have a recursive relationship or media_types you will want to add a field called parent_id. You would add a foreign key index to parent_id referencing the id of the media_type table. It must allow nulls as all of your top level parent media_types will have a null value for parent_id. Thus to select all parent media_types you could use:
SELECT * FROM media_type WHERE parent_id IS NULL
Then, to get the children you loop through each of the parents and could use the following query:
SELECT * FROM media_type WHERE parent_id = {$media_type_row->id}
This would need to be in a recursive function so you loop until there are no more children. An example of this using PHP related to hierarchical categories can be viewed at recursive function category database.
I hope this helps and know it's a lot but essentially, I tried to highlight a whole semester of database design and modeling. If you need any more information, I can attach an example ERD as well.
Another posibble idea is to add columns to projects table that would satisfy all media types needs and then while editting data you will use only certain columns needed for given media type.
That would be more database efficient (less joins).
If your media types are not very different in columns you need I would choose that aproach.
If they differ a lot, I would choose #cosmicsafari recommendation.
Why don't you take whats common to all and put that in a table & have the specific stuff in tables themelves, that way you can search through all the titles & descriptions in one.
Basic Table
- ID int
- Name varchar()
- Title varchar()
etc
Blogs
-ID int (just an auto_increment key)
-basicID int (this matches the id of the item in the basic table)
etc
Have one for each media type. That way you can do a search on all the descriptions & titles at the one time and load the appropriate data when the person clicked through the link from a search page. (I assume thats the sort of functionality you mean when you say you want to be able to let people search.)
I'm building a private social network with Yii that will have "comments" all over the site - in Profiles, Events pages, Group Threads, etc. When a user makes a post, they will be able to select the visibility of that content as:
Anyone
Registered Users Only
Friends Only
Custom (specific list of friends)
I'm trying to figure out how to model this for speed. I've considered using MySQL for writing the setting into a binary "is_secure" field in the Comments table - if it is true, then go to a table with three columns: comment_id, user_id, and group_id. Groups (group_id) would be for groups of users - Registered Users, Friends. Custom would make one row for each user that is selected (user_id).
This table will get huge (perhaps several dozen rows for each comment), so I'm wondering if using NoSQL is worth considering here for retrieval only, or if there's a better way to model this.
Thanks so much!
Similar question to database "flags". Search for related SO questions.
Instead of an IF true/false with the is_secure field, just add 1-bit fields for read_all (anyone), registered, friends, custom. Add another table which holds the custom list would have comment_id (from the previous table) and friend_id (multiple rows). That way, in a single query with a LEFT JOIN on custom_friends_list_for_comments you can determine whether or not to show the page to a user. Optionally, custom could be a comma separated list (char field) but size limits might be an issue. Assuming 3-letter friend ids with a comma, each 255 char field can have 64 friends.
what do you think would be performance-wise the better way to get the category-names of a news-system:
add an extra field for the cat-names inside a table, which allreade contains a field for the cat-ids
no extra field for the cat-names, but cat-ids and read in the cat-names (comma-seperated string: "cat1,cat2,cat3,cat4") into the php-file by an existing config-file and then build the cat-names with the help of the db-field "cat-ids" an array and a for-loop?
Thanx in advance,
Jayden
edit: cant seem to add a "hi" or "hallo" on top of the post, the editor just deletes it...
If you are measuring milliseconds and the disk IO of your system is not extremely slow, then option 2 would yield better performance. But, we are talking a negligible gain in execution time. Since you already will be querying the DB to get the news item it would be highly optimized to just get the category name at the same time. I would add a mapping table of category-name-id to category-names. And the join on that when getting news items.
From a flexibility standpoint and the standpoint of eliminating as many possible sources of error I would also go with my above idea. Since it adds flexibility to your system and keeps all your data in one spot. Changing the name of a category would require editing one column i the database instead of editing a php config file or, if option 1 was used, updating each and every news record.
So my best advise, add a table with category-name-id to category-names mappings and then have the news-items contain the id of the category they belong to.
For performance you could then cache the data you retrieve about existing categories and other data so you don't have to poll the DB for that information all the time.
For instance. You could, instead of joining at all, get all the categories from the category table I described above. Cache it in the application and only get it once the cache is invalidated. i.e. a timeout occurs or the data in the db is manipulated.
I think of two possible ways.
Have a category table, a articles table and a relationship table, and have a many-to-many relationship between categories and articles (as described in the relationship table).
If you feel smart today, declare each category as a binary number (0, 1, 2, 4, 8, 16 etc), and add them in a field on the articles table. If an article has a category value of 11, it has categories 1+2+8.
I like the first solution better, quite frankly.
I would create a categories table like this:
Categories
-----------
category_id name
-------------------------
1 Weather
2 Local
3 Sports
Then create a junction table, so each article can have 0 or more categories:
Article_Categories
-------------------
article_id category_id
-----------------------------
1 2
1 3
2 1
To get the articles with their categories (comma delimited) from MySQL server, you can use GROUP_CONCACT():
SELECT a.*, GROUP_CONCAT(c.name) AS cats
FROM Articles a
LEFT JOIN Article_Categories ac
ON ac.article_id = a.article_id
LEFT JOIN Categories c
ON c.category_id = ac.category_id
GROUP BY a.article_id
Add an additional table, that will save lots of issues in future for you. It is just the recommended way.
By the way, that idea of multiple id's in one field, don't try that way. It will give lots of code and issues which are totally unnecessary. If you really find performance issues you can always decide to take a step further and de-normalize or cache some of the data. There are lots of caching options available.
I think your first option is the suitable one. Because it make sense with the relationship with your data. And in a situation you want to display the category name with your news you can simply get everything by single select query with join.
So I recommend Option 1 You have mentioned.
And performance also can measure in two ways. Execution performance and development performance I feel both performance are in good position with your option 1. You don't need to do much just a one query. If you go for the option 2, then you have to load from config file, explode it with comma, then search using array elements which is time consuming.
I may be wrong, but since you already query the database, it's probably faster if you add a name field there..
Please also take into account that having the name in the same table as the ID provides consistency - if you have a config file you'll have to add a new category there plus in the table.
Also think of possible errors that may put wrong data into your config file - if this'd be the case your category names might get messed up..
So, not having come from a database design background, I've been tasked with designing a web app where the end user will be entering products, and specs for their products. Normally I think I would just create rows for each of the types of spec that they would be entering. Instead, they have a variety of products that don't share the same spec types, so my question is, what's the most efficient and future-proof way to organize this data? I was leaning towards pushing a serialized object into a generic "data" row, but then are you able to do full-text searches on this data? Any other avenues to explore?
split products and specifications into two tables like this:
products
id name
specifications
id name value product_id
get all the specifations of a product when you know the product id:
SELECT name,
value
FROM specifications
WHERE product_id = ?;
add a specification to a product when you know the product id, the specification's name and the value of said specification:
INSERT INTO specifications(
name,
value,
product_id
) VALUES(
?,
?,
?
);
so before you can add specifications to a product, this product must exist. also, you can't reuse specifications for several products. that would require a somewhat more complex solution :) namely...
three tables this time:
products
id name
specifications
id name value
products_specifications
product_id specification_id
get all the specifations of a product when you know the product id:
SELECT specifications.name,
specifications.value
FROM specifications
JOIN products_specifications
ON products_specifications.specification_id = specifications.id
WHERE products_specifications.product_id = ?;
now, adding a specification becomes a little bit more tricky, cause you have to check if that specification already exists. so this will be a little heavier than the first way of doing this, since there are more queries on the db, and there's more logic in the application.
first, find the id of the specification:
SELECT id
FROM specifications
WHERE name = ?
AND value = ?;
if no id is returned, this means that said specification doesn't exist, so it must be created:
INSERT INTO specifications(
name,
value
) VALUES(
?,
?
);
next, either use the id from the select query, or get the last insert id to find the id of the newly created specification. use that id together with the id of the product that's getting the new specification, and link the two together:
INSERT INTO products_specifications(
product_id,
specification_id
) VALUES(
?,
?
);
however, this means that you have to create one row for every specific specification. e.g. if you have size for shoes, there would be one row for every known shoe size
specifications
id name value
1 size 7
2 size 7½
3 size 8
and so on. i think this should be enough though.
You could take a look at using an EAV model.
I've never built a products database, but I can point you to a data model for that. It's one of over 200 models available for the taking, at Database Answers. Here is the model
If you don't like this one, you can find 15 different data models for Product oriented databases. Click on "Data Models" to get a list and scroll down to "Products".
You should pick up some good design ideas there.
This is a pretty common problem - and there are different solutions for different scenarios.
If the different types of product and their attributes are fixed and known at development time, you could look at the description in Craig Larman's book (http://www.amazon.com/Applying-UML-Patterns-Introduction-Object-Oriented/dp/0131489062/ref=sr_1_1/002-2801511-2159202?ie=UTF8&s=books&qid=1194351090&sr=1-1) - there's a section on object-relational mapping and how to handle inheritance.
This boils down to "put all the possible columns into one table", "create one table for each sub class" or "put all base class items into a common table, and put sub class data into their own tables".
This is by far the most natural way of working with a relational database - it allows you to create reports, use off-the-shelf tools for object relational mapping if that takes your fancy, and you can use standard concepts such as "not null", indexing etc.
Of course, if you don't know the data attributes at development time, you have to create a flexible database schema.
I've seen 3 general approaches.
The first is the one described by davogotland. I built a solution on similar lines for an ecommerce store; it worked great, and allowed us to be very flexible about the product database. It performed very well, even with half a million products.
Major drawbacks were creating retrieval queries - e.g. "find all products with a price under x, in category y, whose manufacturer is z". It was also tricky bringing in new developers - they had a fairly steep learning curve.
It also forced us to push a lot of relational concepts into the application layer. For instance, it was hard to create foreign keys to other tables (e.g. "manufacturer") and enforce them using standard SQL functionality.
The second approach I've seen is the one you mention - storing the variable data in some kind of serialized format. This is a pain when querying, and suffers from the same drawbacks with the relational model. Overall, I'd only want to use serialization for data you don't have to be able to query or reason about.
The final solution I've seen is to accept that the addition of new product types will always require some level of development effort - you have to build the UI, if nothing else. I've seen applications which use a scaffolding style approach to automatically generate the underlying database structures when a new product type is created.
This is a fairly major undertaking - only really suitable for major projects, though the use of ORM tools often helps.