Thoughts/Input about Database Design for a CMS

Thoughts/Input about Database Design for a CMS - php

I'm just about to expand the functionality of our own CMS but was thinking of restructuring the database to make it simpler to add/edit data types and values.
Currently, the CMS is quite flat - the CMS requires a field in the database for every type of stored value (manually created).
The first option that comes to mind is simply a table which keeps the data types (ie: Address 1, Suburb, Email Address etc) and another table which holds values for each of these data types. Just like how Wordpress keeps values in the 'options' table, PHP serialize would be used to store an array of values.
The second option is how Drupal works, the CMS creates tables for every data type. Unlike Wordpress, this can be a bit of an overkill but really useful for SQL queries when ordering and grouping by a particular value.
What's everyone's thoughts?

In my opinion, you should avoid serialization where possible. Your relational database should be relational, and thus be structured as such. This would include the 'Drupal Method', e.g. one table per data type. This also keeps your database healthy in a sense that it can be indexed en easily queried upon.

Unless you plan to have lots of different data types that will be added in the future which are unknown now, this is not really going to help you and would be overkill. If you have very wide tables and lots of holes in your data (i.e. lots of columns that seem to be NULL at random) then that is a pattern that is screaming to maybe have a seperate table for data that may only belong to certain entries.
Keep it simple and logical. Don't abstract for the sake of abstraction. Indeed, storing integers is cheaper with regards to storage space but unless that is a problem then don't do it in this case.

Related

Should I store product lists to users with arrays in MySQL?

Well, I have a big dilemma in my life about how to store product lists when I need to link that lists to an ID or something like that.
For this first example I'll gonna use the user cart.
I've always used this, even never liking it:
ID | PRODUCT
12 | Ring
12 | Necklace
12 | Bracelet
But lately I've been thinking about store arrays in MySQL. While it sounds like a good idea (in first view), by using arrays I'd only be able to manage the carts through PHP, by creating some kind of control panel or something.
Unfortunately, there is a con in it. Although the whole system take less space than the other way, I wouldn't be able to handle things through MySQL itself. Like, if someone make an order, I wouldn't be able to SUM the prices*quantity to get the order value.
What is the best way? Is there another way?

As soon as you store unstructured information in MySQL you lose most of the benefits of using a relational database and you only have an over-engineered file system (with, that's true, excellent multi-user capabilities). If you aren't familiar with SQL (which I suspect is the case) you'll initially think you're speeding development. But one day, if the shop hopefully grows, you'll realise that retrieving everything for further PHP postprocessing doesn't scale well.
While it can certainly make sense to outsource certain features (not the complete app) to a nosql database if you have a high concurrence site, if you use a relational database you'd better use it as it's meant to be used: with properly structured and normalised information.

If you are going to need this operations over elements:
searching
sorting
filtering elements
aggregate functions
I would recomend you storing elements in separate rows. This way you are going to get full control over elements in "your arrays"

I wouldn't be able to handle things through MySQL itself. Like, if someone make an order, I wouldn't be able to SUM the prices*quantity to get the order value.
It seems you understand why your 'arrays' idea isn't a good one :)
Consider that XML (or JSON or similar) was invented because distributed systems get disconnected. It is OK to stored such 'complex' values in a database, even better if the DBMS has support for XML as a data type.
However, if you need to query the 'complex' value in the DBMS then you should be looking for first class support for such data types. For example, most SQL DBMSs have support for a timestamp/datetime temporal type: being able to declare a column of that type and compare two values of the type is one thing but what you really need are the temporal functions that will e.g. get time granules (years, days, etc), formats (weekday etc), support UTC and timezones, etc.
For your 'arrays' I'm guessing you will need to handle serialization and storage yourself, plus all that entails e.g. convert to a delimited string, escaping delimiting characters, and storing in a text column. Once you have done that work you will be left with no support on the DBMS side.

MySQL multiple tables or serialized data

I'm writing up a job recruitment site for ex-military personnel in PHP/MySQL and I'm at a stage where I'm not sure which direction to take
I have a "candidate" table with all the usual firstname, surname etc and a unique candidate_id, and an address table linked by the unique candidate_id
The client has asked for more data to be captured, as in driving licence type, religion, SIA Level (Security Industry Authority), languages spoken etc
My question is, with all this different data, is it worth setting up dedicated tables for each? So for example having a driving licence table, with all the different types of driving licence, each with a unique id, then linked to the candidate table with a driving_licence_id cell?
Or should I just serialize all the extra data as text and put it in one cell in the candidate table?

My question is, with all this different data, is it worth setting up dedicated tables for each?
Yes.
That is what databases are for.

Dedicated tables versus serialized data is called Database Normalization and Denormalization, respectively. In some cases both options are acceptable, but you should really make an educated choice, by reading up on the subject (for example here on about.com).
Personally I usually prefer working with normalized databases, as they are much easier to query for complex aggregated data. Besides, I feel they are easier to maintain as well, since you usually don't have to refactor when adding new fields and tables.
Finally, unless you have a lot of tables, you are unlikely to run into performance problems due to the number of one-to-one joins (the kind of data that's easy to denormalize).

It depends on whether you wish to query this data. If so keep the data normalised (eg in it's own logically separated table), otherwise, if it's just meta data to be pulled along for the ride, whatever is simplest seems reasonable.
Neither approach necessarily precludes the other in the future, simple migration scripts can be created to move the data from one format to the other. I would suggest doing what is simplest to enable you to work on other features of the site soonest.

You must Always go for normalization, believe me.
I made the mistake of going through the easy way and store data improperly (not only serialized, implode strings of multidimensional arrays ), then when the time came i had to re design the whole thing and it was a lot of time wasted.
I will never go by the wrong way again, clients can say "no" today, but "report (queries)" tomorrow.

Optimal data storage - triple store / relational db / other?

I'm building a web-app with PHP on an Apache server.
The app contains a lot of optional data about persons. Depending on the category of the person (one person can be in may categories), they can choose to specify data or not: home-address (== 5 fields for street, city, country, ...), work-address (again 5 fields), age, telephone number, .... The app stores some additional data too, of course (created, last updated, username, password, userlevel, ...).
The current/outdated version of the app has 86 fields in the "users" table, and is (depending on the category of the person), extended with an additonal table with another 23 fields (1-1 relationship).
All this is stored in a Postgresql database.
I'm wondering if this is the best way to handle this type of data. Most records have (a lot of) empty fields, making the db larger and the queries slower. Is it worth looking into an other solution like a Triple Store, or am I worrying too much about it and should I just keep the current setup? It seems odd and feels awkward to just add fields to a table for every new purpose of the site. On the other hand, I have the impression that triple stores are not that common yet. Any pointers, or suggestions how to approach this?
I've read "Programming the semantic web" by Toby Segaran and others, but from that book I get the impression that the main advantage of triple stores and RDF is the exchange of information over the web (which is not the goal of my app)

Most records have (a lot of) empty fields
This implies that your data is far from normalized.
The current/outdated version of the app has 86 fields in the "users" table, and is (depending on the category of the person), extended with an additonal table with another 23 fields (1-1 relationship).
Indeed, yes, it's a very long way from being normalized.
If you've got a good reason to move away from where you are just now, then the firs step would be to structure your data much better. Even if you choose to move to a different type of DBMS e.g. noSQL or object db.
This does not just save space in your DBMS, it makes retrieving the data faster and reduces the amount of code you need to write (e.g. you can re-use the same code for maintaining a home address as maintaining a work address if you have a single table for 'address' with a field flagging the type of address).
There are lots of resources on the web (in addition to the wikipedia link above) describing how to apply the rules of normalization (it starts getting a little involved after 1,2 and 3 - but if you can master these then you're well equipped to take on most tasks).

MySQL many tables or few tables

I'm building a very large website currently it uses around 13 tables and by the time it's done it should be about 20.
I came up with an idea to change the preferences table to use ID, Key, Value instead of many columns however I have recently thought I could also store other data inside the table.
Would it be efficient / smart to store almost everything in one table?
Edit: Here is some more information. I am building a social network that may end up with thousands of users. MySQL cluster will be used when the site is launched for now I am testing using a development VPS however everything will be moved to a dedicated server before launch. I know barely anything about NDB so this should be fun :)

This model is called EAV (entity-attribute-value)
It is usable for some scenarios, however, it's less efficient due to larger records, larger number or joins and impossibility to create composite indexes on multiple attributes.
Basically, it's used when entities have lots of attributes which are extremely sparse (rarely filled) and/or cannot be predicted at design time, like user tags, custom fields etc.

Granted I don't know too much about large database designs, but from what i've seen, even extremely large applications store their things is a very small amount of tables (20GB per table).
For me, i would rather have more info in 1 table as it means that data is not littered everywhere, and that I don't have to perform operations on multiple tables. Though 1 table also means messy (usually for me, each object would have it's on table, and an object is something you have in your application logic, like a User class, or a BlogPost class)
I guess what i'm trying to say is that do whatever makes sense. Don't put information on the same thing in 2 different table, and don't put information of 2 things in 1 table. Stick with 1 table only describes a certain object (this is very difficult to explain, but if you do object oriented, you should understand.)

nope. preferences should be stored as-they-are (in users table)
for example private messages can't be stored in users table ...
you don't have to think about joining different tables ...

I would first say that 20 tables is not a lot.
In general (it's hard to say from the limited info you give) the key-value model is not as efficient speed wise, though it can be more efficient space wise.

I would definitely not do this. Basically, the reason being if you have a large set of data stored in a single table you will see performance issues pretty fast when constantly querying the same table. Then think about the joins and complexity of queries you're going to need (depending on your site)... not a task I would personally like to undertake.
With using multiple tables it splits the data into smaller sets and the resources required for the query are lower and as an extra bonus it's easier to program!
There are some applications for doing this but they are rare, more or less if you have a large table with a ton of columns and most aren't going to have a value.
I hope this helps :-)

I think 20 tables in a project is not a lot. I do see your point and interest in using EAV but I don't think it's necessary. I would stick to tables in 3NF with proper FK relationships etc and you should be OK :)

the simple answer is that 20 tables won't make it a big DB and MySQL won't need any optimization for that. So focus on clean DB structures and normalization instead.

crunching serialized data vs adding more fields - php - mysql

okay, let's pretend i've got fifty pieces of information that i want to store in each record of a table. when i pull the data out, i'm going to be doing basic maths on some of them. on any given page request, i'm going to pull out a hundred records and do the calculations.
what are the performance impacts of:
A - storing the data as a serialized array in a single field and doing the crunching in php
vs
B - storing the data as fifty numeric fields and having mysql do some sums and avgs instead
please assume that normalization is not an issue in those fifty fields.
please also assume that i don't need to sort by any of these fields.
thanks in advance!

First, I would never store data serialized, it's just not portable enough. Perhaps in a JSON encoded field, but not serialized.
Second, if you're doing anything with the data (searching, aggregating, etc), make them columns in the table. And I do mean anything (sorting, etc).
The only time it's even acceptable to store formatted data (serialized, json, etc) in a column is if it's read only. Meaning that you're not sorting on it, you're not using it in a where clause, you're not aggregating the data, etc.
Database servers are very efficient at doing set-based operations. So if you're doing any kind of aggregation (summing, etc), do it in MySQL. It'll be significantly more efficient than you could make PHP be...

MySQL will almost certainly be doing these calcualtions faster than PHP.

While I would almost always recommend option B, I'm running into a unique situation myself where storing serialized into a text field might make more sense.
I have a client who has an application form on their website. There are around 50 fields on the form, and all the data will only ever be read only.
Moreover, this application may change over time. Fields may be added, fields may be removed. By using serialized data, I can save all the questions and answers in a serialized format. If the form changes, the old data stays in tact, along with the original questions.

I go with Jonathan! If you have a table where the number of fields would vary depending on the options or contents the user makes, and those fields are neither aggregated nor calculated, i would serialize(and base64_encode) or json_encode the values too.
Joomla and Wordpress do this too. Typo3 has some tables with lots and lots of columns, and that is kind of ugly :-)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Thoughts/Input about Database Design for a CMS - php

Related

Should I store product lists to users with arrays in MySQL?

MySQL multiple tables or serialized data

Optimal data storage - triple store / relational db / other?

MySQL many tables or few tables

crunching serialized data vs adding more fields - php - mysql

Categories

Resources