I currently have 10 almost identical tables, 1 for each section of content in my app. There are no queries in my app that access more than 1 of these tables at a time.
Option 1
Create a single articles table having all fields the multiple tables have in common. Add a new section field indicating the old table name.
Reduce each of the 10 original tables fields to only the 4 or 5 fields that were unique to that section.
Add a foreign key pointing to the original record in the articles table.
Rename the 10 tables to start with the prefix "extra_".
Create 10 views that inner join the articles table to each extra_ table.
These views would have the same name as the original 10 tables.
Thoughts:
Since the resulting views would contain all the fields from the original there would be very little existing PHP/SQL code that needs updating.
If I can't INSERT, UPDATE, or DELETE against the views I'd need to make many changes to existing PHP code.
Adding to the articles table a new field and making it available in the views requires recreating 10 views.
Removing from the articles table a field which was used in the views requires recreating 10 views.
... other benefits or drawbacks?
Option 2
Keep the current 10 tables, create a view that UNION ALLs all the tables together.
Thoughts:
Changes to any of the 10 tables requires recreating only 1 view. Zero changes needed to existing PHP code.
... other benefits or drawbacks?
Option 3
Do everything in PHP. Make an array listing all the tables and which fields are desired from them. Dynamically generate and search the big UNION SQL query with PHP.
Anyone have a better suggestion or advice on the options proposed?
Related
We have a php/mysql system with about 5 core entities. We now need to add the ability for customers to create custom fields for some of these entities on a per project basis.
They would contain a label, key, type, default value, and possible allowed values.
This is so they could add a custom date field, or a custom dropdown to the UI and save this value against the specific entity.
What is the best approach for storing this kind of data in a mySQL database? I need to store both the config for the field, and then the current value for a specific entity.
I've had a look at various options here.. https://ayende.com/blog/3498/multi-tenancy-extensible-data-model
But this is not really at a tenancy level, more a project level.
I was thinking...
A CustomFields table to hold the configuration of a field against an entity type and project id.
A CustomFieldValues table to hold the value saved against the field - a row per field ( entity_id | field_id | field_value)
Then we create relationships between the entities and these custom values when retrieving the entities.
The issue with this is that there will be as many rows in the Values table as there are custom fields - so saving a entity will result in X extra rows. On top of that, these are versioned, so once a new version is created, there will be another X rows created for that new version.
Also, you can't index the fields on name, joins would become pretty complex i think as you have to join to the configuration and the values to build the key value pair to return against the entity, and how would you select based on a custom field name, when the filed name was actually a value?
I don't want to add dynamic columns to the table, as this will affect ALL the entites in the whole system - not just the ones in the current client / project.
The other option is to store the values in a JSON column.
This could be on the entity row itself customFields or similar. This would prevent the extra rows per field, but also has issues with lack of indexing etc, and still need to join to the config table. However, you could perform queries by the property name if the key=value was stored in the JSON... WHERE entity.customFields->"$.myCustomFieldName" > 1.
Storing the filed name in the json does mean you cannot change it once created, without a lot of pain.
If anyone has any advice on approaches for this, or articles to point me at that would be much appreciated - Im sure this has been solved many times before....
JSON records: No! A thousand times no! If you do that, just wait until somebody actually uses your system for a few tens of millions of records, then asks you to search on one of your extra fields. Your support people will curse your name.
Key-value store. Probably yes. There's a very widely deployed existence proof of this design: WordPress. It has a table called wp_postmeta, containing metadata fields applying to wp_posts (blog pages and posts). It's proven successful.
You will need to do some multiple joining to use this stuff. For example, to search on height and eye-color, you'd need
SELECT p.person_id, p.first, p.last, h.value height, e.value eye_color
FROM person p
LEFT JOIN attrib h ON p.person_id = h.person_id AND h.key='eye_color'
LEFT JOIN attrib e ON p.person_id = e.person_id AND e.key='height'
WHERE e.value='green' and CAST(h.value AS INT) < 160
As the CAST in that WHERE clause shows, you'll have some struggles with data type as well.
You'll need LEFT JOIN operations in this sort of attribute lookup; ordinary inner JOIN operations will suppress rows with missing attributes, and that might not work for you.
But, if you do a good job with indexes, you'll be able to get decent performance from this approach.
The table structure envisioned in my example doesn't have your table describing each additional field, but you know how to add that. It also doesn't have explicit support for multi-project / multitenant data separation. But you can add that as well.
Given a string of text as search criteria, in every field in a (very large, 50+ columns) table.
there is one main table and many smaller ones that are connected to it, but have 'ids' in the main table, not actual searchable text values.
such as:
TABLE SALES LOCATIONS
id location_name customer_id ... other fields
2 normalville 4
where customer id is the primary key in another table:
TABLE CUSTOMER
id name industry
4 EXC Selling Things
is there any elegant way to accomplish this with SQL without using a whole slew of joins or subqueries specifically targeted at each of the 50+ fields?
there are probably 11 fields in the main table that are actually just ids pointing to other tables.
also of note, I am using the Yii framework.
Thanks!
I'm currently working on an app backend (business directory). Main "actor" is an "Entry", which will have:
- main category
- subcategory
- tags (instead of unlimited sub-levels of division)
I'm pretty new to OOP but I still want to use it here. The database is MySql and I'll be using PDO.
In an attempt to figure out what database table structure should I use in order to support the above classification of entries, I was thinking about a solution that Wordpress uses - establish relationship between an entry and cats/subcats/tags through several tables (terms, taxonomies, relationships). What keeps me from this solution at the moment is the fact that each relationship of any kind is represented by a row in the relationships table. Given 50,000 entries I would have, attaching to a particular entry: main cat, subcat and up to 15 tags might slow down the app (or I am wrong)?
I then learned a bit about Table Data Gateway which seemed an excellent solution because I liked the idea of having one table per a class but then I read there is virtually no way of successful combating the impedence missmatch between the OOP and relational-mapping.
Are there any other approaches that you may see fit for this situation? I think I will be going with:
tblentry
tblcategory
tblsubcategory
tbltag
structure. Relationships would be based on the parent IDs but I+'m wondering is that enough? Can I be using foreign key and cascade delete options here (that is something I am not too familiar with and it seems to me as a more intuitive way of having relationships between the elements in tables)?
having a table where you store the relationship between your table is a good idea, and through indexes and careful thinking you can achieve very fast results.
since each entry must represent a different kind of link between two entities (subcategory to main entry, tag to subcategory) you need at least (and at the very most) three fields:
id1 (or the unique id of the first entity)
linkid (linking to a fourth table where each link is described)
id2 (or the unique id of the second entity)
those three fields can and should be indexed.
now the fourth table to achieve this kind of many-to-many relationship will describe the nature of the link. since many different type of relationship will exist in the table, you can't keep what the type is (child of, tag of, parent of) in the same table.
that fourth table (reference) could look like this:
id nature table1 table2
1 parent of entry tags
2 tag of tags entry
the table 1 field tells you which table the first id refers to, likewise with table2
the id is the number between the two fields in your relationship table. only the id field should be indexed. the nature field is more for the human reader then for joining tables or organizing data
what do you think would be performance-wise the better way to get the category-names of a news-system:
add an extra field for the cat-names inside a table, which allreade contains a field for the cat-ids
no extra field for the cat-names, but cat-ids and read in the cat-names (comma-seperated string: "cat1,cat2,cat3,cat4") into the php-file by an existing config-file and then build the cat-names with the help of the db-field "cat-ids" an array and a for-loop?
Thanx in advance,
Jayden
edit: cant seem to add a "hi" or "hallo" on top of the post, the editor just deletes it...
If you are measuring milliseconds and the disk IO of your system is not extremely slow, then option 2 would yield better performance. But, we are talking a negligible gain in execution time. Since you already will be querying the DB to get the news item it would be highly optimized to just get the category name at the same time. I would add a mapping table of category-name-id to category-names. And the join on that when getting news items.
From a flexibility standpoint and the standpoint of eliminating as many possible sources of error I would also go with my above idea. Since it adds flexibility to your system and keeps all your data in one spot. Changing the name of a category would require editing one column i the database instead of editing a php config file or, if option 1 was used, updating each and every news record.
So my best advise, add a table with category-name-id to category-names mappings and then have the news-items contain the id of the category they belong to.
For performance you could then cache the data you retrieve about existing categories and other data so you don't have to poll the DB for that information all the time.
For instance. You could, instead of joining at all, get all the categories from the category table I described above. Cache it in the application and only get it once the cache is invalidated. i.e. a timeout occurs or the data in the db is manipulated.
I think of two possible ways.
Have a category table, a articles table and a relationship table, and have a many-to-many relationship between categories and articles (as described in the relationship table).
If you feel smart today, declare each category as a binary number (0, 1, 2, 4, 8, 16 etc), and add them in a field on the articles table. If an article has a category value of 11, it has categories 1+2+8.
I like the first solution better, quite frankly.
I would create a categories table like this:
Categories
-----------
category_id name
-------------------------
1 Weather
2 Local
3 Sports
Then create a junction table, so each article can have 0 or more categories:
Article_Categories
-------------------
article_id category_id
-----------------------------
1 2
1 3
2 1
To get the articles with their categories (comma delimited) from MySQL server, you can use GROUP_CONCACT():
SELECT a.*, GROUP_CONCAT(c.name) AS cats
FROM Articles a
LEFT JOIN Article_Categories ac
ON ac.article_id = a.article_id
LEFT JOIN Categories c
ON c.category_id = ac.category_id
GROUP BY a.article_id
Add an additional table, that will save lots of issues in future for you. It is just the recommended way.
By the way, that idea of multiple id's in one field, don't try that way. It will give lots of code and issues which are totally unnecessary. If you really find performance issues you can always decide to take a step further and de-normalize or cache some of the data. There are lots of caching options available.
I think your first option is the suitable one. Because it make sense with the relationship with your data. And in a situation you want to display the category name with your news you can simply get everything by single select query with join.
So I recommend Option 1 You have mentioned.
And performance also can measure in two ways. Execution performance and development performance I feel both performance are in good position with your option 1. You don't need to do much just a one query. If you go for the option 2, then you have to load from config file, explode it with comma, then search using array elements which is time consuming.
I may be wrong, but since you already query the database, it's probably faster if you add a name field there..
Please also take into account that having the name in the same table as the ID provides consistency - if you have a config file you'll have to add a new category there plus in the table.
Also think of possible errors that may put wrong data into your config file - if this'd be the case your category names might get messed up..
Lets take the example from Yelp: http://www.yelp.com/boston
You can see that it's a website with several different categories, each category containing a listing of places. Should I include all the different places/listing in a single table, or let each category have its own tables?
EDIT: this means having tables 'places_restaurants' and 'places_nightlife', instead of just having the single table 'places' and every entry of every different category will be stored in one huge table... Will this affect performance?
One table per category will require that you CREATE a table every time there's a new category. I'd prefer CATEGORY and PLACE tables, with a one-to-many or many-to-many relationship between them.
You should keep all of the categories in the same table and then have a CategoryID which actually maps each category to the specific / desired category. Your application should be built in a way that is inherently extensible which creating tables each time is definitely not.
It depends. You could normalize the database so that all categories are in their own table, and only referred to from other tables by a foreign key. But there are some arguments that performance outweighs normalization, and so it may be beneficial to keep category names both in their own table of record, and also to include a category name column in other, frequently-joined tables.
If you took the second approach, you would need to ensure data integrity by implementing UPDATE and DELETE triggers such that whenever a category changes in the table of record (presumably, not often), that other tables containing copies of category names also get updated.
It still depends on the application ,also, all the categories is a many to many fields with a main table and of course beliving u have some unique columns in each table