While I'm designing a MySQL database for a dating website, I have come with the doubt of how to store the referenced data. Currently the database has 33 tables and there are nearly 32 different fields who need to be referenced. We have to consider as well that many of these elements need to be translated.
After been reading several opinions, I have almost dismissed to use enum like:
CREATE TABLE profile (
'user_id' INT NOT NULL,
...
'relationship_status' ENUM('Single','Married') NOT NULL,
...
);
And normally I would be using a reference table like:
CREATE TABLE profile (
'user_id' INT NOT NULL,
...
'relationship_status_id' INT NOT NULL,
...
);
CREATE TABLE relationship_status (
'id' INT NOT NULL,
'name' VARCHAR(45) NOT NULL,
PRIMARY KEY ('id')
);
But it might be over-killed to create 32 tables so I'm considering to code it in PHP like this:
class RelationshipStatusLookUp{
const SINGLE = 1;
const MARRIED = 2;
public static function getLabel($status){
if($status == self::SINGLE)
return 'Single';
if($status == self::MARRIED)
return 'Married';
return false;
}
}
What do you think? Because I guess it could improve the performance of the queries and also make easier the development of the whole site.
Thanks.
Definitely a good idea to steer clear of ENUM IMHO: why ENUM is evil. Technically a lookup table would be the preferred solution although for simple values a PHP class would work. You do need to be careful of this for the same reasons as ENUM; if the values in your set grow it could become difficult to maintain. (What about "co-habiting", "divorced", "civil partnership", "widowed" etc). It also not trivial to query for lists of values using PHP classes; it's possible using reflection but not as easy as a simple MySQL SELECT. This is probably one of those cases where I wouldn't worry about performance until it becomes a problem. Use the best solution for your code/application first, then optimise if you need to.
enum fields present some issues:
Once they're set, they can't easily be changed
'relationship_status' ENUM('Single','Married') NOT NULL,
would need 'Civil Partnership' adding in this country nowadays
You can't easily create a dropdown list of options from the enum lists
However, data onthe database can be subjected to referential integrity constraints, so using a foreign key link against a reference table gives you that degree of validation without the constraints of an enum.
Maintaining the options in a class requires a code change for any new options that have to be added to the data, which may increase the work involved depending on your release procedures, and doesn't prevent bad data being inserted into the database.
Personally, I'd go for a reference table
First off, you wouldn't need id and relationship_status_id in the Relationship_status table.
Personally, I would use an enum unless you need to associate more data than just the name of the person's relationship status (or if you foresee needing to expand on this in the future). It will be much easier when you're looking at the database to see what's what if it is in an easily readable language versus having to query against a second table.
When you are considering performance, sure it's faster to query a table by a unique ID but you have to track that relationship and you will always be joining multiple tables to get the same data. If the enum solution ends up being slower, I don't think it will be enough that the human brain will be able to perceive the difference even with large data sets.
Related
I have a MySQL/PHP performance related question.
I need to store an index list associated with each record in a table. Each list contains 1000 indices. I need to be able to quickly access any index value in the list associated to a given record. I am not sure about the best way to go. I've thought of the following ways and would like your input on them:
Store the list in a string as a comma separated value list or using JSON. Probably terrible performance since I need to extract the whole list out of the DB to PHP only to retrieve a single value. Parsing the string won't exactly be fast either... I can store a number of expanded lists in a Least Rencently Used cache on the PHP side to reduce load.
Make a list table with 1001 columns that will store the list and its primary key. I'm not sure how costly this is regarding storage? This also feels like abusing the system. And then, what if I need to store 100000 indices?
Only store with SQL the name of the binary file containing my indices and perform a fopen(); fseek(); fread(); fclose() cycle for each access? Not sure how the system filesystem cache will react to that. If it goes badly then there are many solutions available to adress the issues... but that's sounds a bit overkill no?
What do you think of that?
What about a good old one-to-many relationship?
records
-------
id int
record ...
indices
-------
record_id int
index varchar
Then:
SELECT *
FROM records
LEFT JOIN indices
ON records.id = indices.record_id
WHERE indices.index = 'foo'
The standard solution is to create another table, with one row per (record, index), and add a MySQL Index to allow fast search
CREATE TABLE IF NOT EXISTS `table_list` (
`IDrecord` int(11) NOT NULL,
`item` int(11) NOT NULL,
KEY `IDrecord` (`IDrecord`)
)
Change the item's type according to your needs - I used int in my example.
The most logical solution would be to put each value in it's own tuple. Adding a MYSQL index to each tuple will enable the DBMS to quickly ascertain the value, and should improve performance.
The reasons we're not going with your other answers are as follows:
Option 1
Storing multiple values in one MYSQL cell is a violation of the first stage of database normalisation. You can read up on it here.
Option 3
This has heavy reliance on other files. You want to localize your data storage as much as possible, to make it easier to maintain in the future.
I just came across the idea of writing a special database which will fit for exactly one purpose. I have looked into several other database-systems and came to the conclusion that I need a custom type. However my question is not about if it is a good idea, but how to implement this best.
The application itself is written in php and needs to write to a custom database system.
Because there can be simultaneous read/write operations I can forget the idea of implementing the database directly into my application. (correct me please if I'm wrong).
That means I have to create 2 scripts:
The database-server-script
The application.
This means that the application has to communicate with the server. My idea was using php in cli mode for the database-server. The question is, if this is effective, or if I should look into a programming language like c++ to develop the server application? The second question is then the communication. When using php in cli mode I thought about giving a serialized-array-query as a param. When using c++ should I still do it serialized? or maybe in json, or whatever?
I have to note that a database to search through can consist of several thousands of entries. So i dont know exactly if php is realy the right choice.
Secondly i have to note that queries arent strings which have to be parsed, but an array giving a key,value filter or dataset. The only maybe complexer thing the database server has to be able to is to compare strings like the MySQL version of LIKE '%VALUE%', which could be slow at several thousand entries.
Thanks for the Help.
writing a special database which will fit for exactly one purpose
I presume you mean a custom database management system,
I'm having a lot of trouble undertanding why this would ever be necessary.
Datasbes and Tables like usual databases have. But i dont have columns. Each entry can have its own columns, except for the id
That's not a very good reason for putting yourself (and your users) through a great deal of pain and effort.
i could use mysql id | serialized data... but then much fun searching over a specific parameter in a entry
So what's wrong with a fully polymorphic model implemented on top of a relational database:
CREATE TABLE relation (
id INTEGER NOT NULL auto_increment,
....
PRIMARY KEY (id)
);
CREATE TABLE col_string (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
val_string VARCHAR(40),
PRIMARY KEY (relation_id, name)
);
CREATE TABLE col_integer (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
val_integer INTEGER,
PRIMARY KEY (relation_id, name)
);
CREATE TABLE col_float (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
val_float INTEGER,
PRIMARY KEY (relation_id, name)
);
... and tables for BLOBs, DATEs, etc
Or if scalability is not a big problem....
CREATE TABLE all_cols (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
ctype ENUM('string','integer','float',...),
val_string VARCHAR(40),
val_integer INTEGER,
val_float INTEGER,
...
PRIMARY KEY (relation_id, name)
);
Yes, inserts and selecting 'rows' is more complicated than for a normal relational table - but a lot simpler than writing your own DBMS from scratch. And you can wrap most of the functionality in stored procedures. The method described would also map easily to a NoSQL db.
I'm designing a website using PHP and MySQL currently and as the site proceeds I find myself adding more and more columns to the users table to store various variables.
Which got me thinking, is there a better way to store this information? Just to clarify, the information is global, can be affected by other users so cookies won't work, also I'd lose the information if they clear their cookies.
The second part of my question is, if it does turn out that storing it in a database is the best way, would it be less expensive to have a large number of columns or rather to combine related columns into delimited varchar columns and then explode them in PHP?
Thanks!
In my experience, I'd rather get the database right than start adding comma separated fields holding multiple items. Having to sift through multiple comma separated fields is only going to hurt your program's efficiency and the readability of your code.
Also, if your table is growing to much, then perhaps you need to look into splitting it into multiple tables joined by foreign dependencies?
I'd create a user_meta table, with three columns: user_id, key, value.
I wouldn't go for the option of grouping columns together and exploding them. It's untidy work and very unmanageable. Instead maybe try spreading those columns over a few tables and using InnoDb's transaction feature.
If you still dislike the idea of frequently updating the database, and if this method complies with what you're trying to achieve, you can use APC's caching function to store (cache) information "globally" on the server.
MongoDB (and its NoSQL cousins) are great for stuff like this.
The database a perfectly fine place to store such data, as long as they're variables and not, say, huge image files. The database has all the optimizations and specifications for storing and retrieving large amounts of data. Anything you set up on file system level will always be beaten by what the database already has in terms of speed and functionality.
would it be less expensive to have a large number of columns or rather to combine related columns into delimited varchar columns and then explode them in PHP?
It's not really that much of a performance than a maintenance question IMO - it's not fun to manage hundreds of columns. Storing such data - perhaps as serialized objects - in a TEXT field is a viable option - as long as it's 100% sure you will never have to make any queries on that data.
But why not use a normalized user_variables table like so:
id | user_id | variable_name | variable_value
?
It is a bit more complex to query, but provides for a very clean table structure all round. You can easily add arbitrary user variables that way.
If you are doing a lot of queries like SELECT FROM USERS WHERE variable257 = 'green' you may have to stick to have specific columns.
The database is definitely the best place to store the data. (I'm assuming you were thinking of storing it in flat files otherwise) You'd definitely get better performance and security from using a DB over storing in files.
With regards to the storing your data in multiple columns or delimiting them... It's a personal choice but you should consider a few things
If you're going to delimit the items, you need to think of what you're going to delimit them with (something that's not likely to crop up within the text your delimiting)
I often find that it helps to try and visualise whether another programmer of your level would be able to understand what you've done with little help.
Yes, as Pekka said, if you want to perform queries on the data stored you should stick with the seperate columns
You may also get a slight performance boost from not retrieving and parsing ALL your data every time if you just want a couple of fields of information
I'd suggest going with the seperate columns as it offers you the option of much greater flexibility in the future. And there's nothing worse than having to drastically change your data structure and migrate information down the track!
I would recommend setting up a memcached server (see http://memcached.org/). It has proven to be viable with lots of the big sites. PHP has two extensions that integrate a client into your runtime (see http://php.net/manual/en/book.memcached.php).
Give it a try, you won't regret it.
EDIT
Sure, this will only be an option for data that's frequently used and would otherwise have to be loaded from your database again and again. Keep in mind though that you will still have to save your data to some kind of persistent storage.
A document-oriented database might be what you need.
If you want to stick to a relational database, don't take the naïve approach of just creating a table with oh so many fields:
CREATE TABLE SomeEntity (
ENTITY_ID CHAR(10) NOT NULL,
PROPERTY_1 VARCHAR(50),
PROPERTY_2 VARCHAR(50),
PROPERTY_3 VARCHAR(50),
...
PROPERTY_915 VARCHAR(50),
PRIMARY KEY (ENTITY_ID)
);
Instead define a Attribute table:
CREATE TABLE Attribute (
ATTRIBUTE_ID CHAR(10) NOT NULL,
DESCRIPTION VARCHAR(30),
/* optionally */
DEFAULT_VALUE /* whatever type you want */,
/* end_optionally */
PRIMARY KEY (ATTRIBUTE_ID)
);
Then define your SomeEntity table, which only includes the essential attributes (for example, required fields in a registration form):
CREATE TABLE SomeEntity (
ENTITY_ID CHAR(10) NOT NULL
ESSENTIAL_1 VARCHAR(30),
ESSENTIAL_2 VARCHAR(30),
ESSENTIAL_3 VARCHAR(30),
PRIMARY KEY (ENTITY_ID)
);
And then define a table for those attributes that you might or might not want to store.
CREATE TABLE EntityAttribute (
ATTRIBUTE_ID CHAR(10) NOT NULL,
ENTITY_ID CHAR(10) NOT NULL,
ATTRIBUTE_VALUE /* the same type as SomeEntity.DEFAULT_VALUE;
if you didn't create that field, then any type */,
PRIMARY KEY (ATTRIBUTE_ID, ENTITY_ID)
);
Evidently, in your case, that SomeEntity is the user.
Instead of MySQL you might consider using a triplestore, or a key-value store
that way you get the benifits of having all the multithreading multiuser, performance and caching voodoo, figured out, without all the trouble of trying to figure out ahead of time what kind of values you really want to store.
Downsides: it's a bit more costly to figure out the average salary of all the people in idaho who also own hats.
depends on what kind of user info you are storing. if its session pertinent data, use php sessions in coordination with session event handlers to store your session data in a single data field in the db.
i have a similar issue as espoused in How to design a product table for many kinds of product where each product has many parameters
i am convinced to use RDF now. only because of one of the comments made by Bill Karwin in the answer to the above issue
but i already have a database in mysql and the code is in php.
1) So what RDF database should I use?
2) do i combine the approach? meaning i have a class table inheritance in the mysql database and just the weird product attributes in the RDF? I dont think i should move everything to a RDF database since it is only just products and the wide array of possible attributes and value that is giving me the problem.
3) what php resources, articles should i look at that will help me better in the creation of this?
4) more articles or resources that helps me to better understand RDF in the context of the above challenge of building something that will better hold all sorts of products' attributes and values will be greatly appreciated. i tend to work better when i have a conceptual understanding of what is going on.
Do bear in mind i am a complete novice to this and my knowledge of programming and database is average at best.
Ok, one of the main benefits of RDF is global identity, so if you use surrogate ids in your RDBMS schema, then you could assign the surrogate ids from a single sequence. This will make certain RDF extensions to your schema easier.
So in your case you would use a single sequence for the ids of products and other entities in your schema (maybe users etc.)
You should probably keep essential fields in normalized RDBMS tables, for example a products table with the fields which have a cardinality of 0-1.
The others you can keep in an additional table.
e.g. like this
create table product (
product_id int primary key,
// core data here
)
create table product_meta (
product_id int,
property_id int,
value varchar(255)
)
create table properties (
property_id int primary key,
namespace varchar(255),
local_name varchar(255)
)
If you want also reference data in dynamic form you can add the following meta table :
create table product_meta_reference (
product_id int,
property_id int,
reference int
)
Here reference refers via the surrogate id to another entity in your schema.
And if you want to provide metadata for another table, let's say user, you can add the following table :
create table user_meta (
user_id int,
property_id int,
value varchar(255)
)
I put this as a different answer, because it is a more specific suggestion then my first answer.
1 & 3) As you're using PHP and MySQL you're best bet would be either ARC 2 (although the website states this is a preview release this is the release you want) or RAP both of which allow for database based storage allowing you to store your data in MySql
ARC 2 seems to be more popular and well maintained in my experience
2) Depends how much of your existing code base would have to change if you move everything to RDF and what kinds of queries you do with the existing data in your database. Some things may be quicker using RDF and SPARQL while some may be slower.
I haven't used RDF with PHP, but in general if you use two persistence technologies in one project then the result is probably more complex and risky than using one alone.
If you stay with the RDBMS approach you can make it more RDF like by introducing the following attributes :
use a single sequence for all surrogate ids, this way you get unique identifiers, which is a requirement for RDF
use base tables for mandatory properties and extension tables with subject + property + values columns for additional data
You don't have to use an RDF engine to keep your data in RDF mappable form.
Compared to EAV RDF is a more expressive paradigm for dynamic data modeling.
I'm working on a PHP app which requires various settings to be stored in a database. The client often asks if certain things can be added or changed/removed, which has been causing problems with the table design. Basically, I had a lot of boolean fields which simply indicated if various settings were enabled for a particular record.
In order to avoid messing around with the table any more, I'm considering storing the data as a serialized array. I have read that this is considered bad practice, but I think this is a justified case for using such an approach.
Is there any real reason to avoid doing this?
Any advice appreciated.
Thanks.
The real reason is normalisation, and you will break the first normalform by doing it.
However, there are many cases in which a breach of the normal forms could be considered. How many fields are you dealing with and are they all booleans?
Storing an array serialized as a string in your database will have the following disadvantages (among others):
When you need to update your settings you must first extract the current settings from the database, unserialize the array, change the array, serialize the array and update the data in the table.
When searching, you will not be able to just ask the database whether a given user (or a set of users) has a given setting disabled or enabled, thus you won't have any chances of searching.
Instead, you should really consider the option of creating another table with the records you need as a one-to-many relation from your other table. Thus you won't have 30 empty fields, but instead you can just have a row for each option that deviates from the default (note that this option has some disadvantages aswell, for example if you change the default).
In sum: I think you should avoid serializing arrays and putting them into the databases, at least if you care just a tiny bit about the aforementioned disadvantages.
The proper way (which isn't always the best way)
CREATE TABLE mytable (
myid INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
mytitle VARCHAR(100) NOT NULL
);
CREATE TABLE myarrayelements (
myarrayid INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
myid INT UNSIGNED NOT NULL,
mykey VARCHAR(100) NOT NULL,
myval VARCHAR(100) NOT NULL,
INDEX(myid)
);
$myarray = array();
$res = mysql_query("SELECT mykey, myval FROM myarrayelements WHERE myid='$myid'");
while(list($k, $v) = mysql_fetch_array($res)) $myarray[$k] = $v;
Although sometimes it's more convenient to store a comma separated list.
One thing is that extensibility in limited. Database should not be mixed with programming environment. Also changing the values in database and debugging is much easier. The database and cgi can be interchanged to another database or cgi like perl.
One of the reasons to use a relational database is to help maintain data integrity. If you just have a serialized array dumped into a blob in a table there is no way for the database to do any checking that what you have in that blob makes any sense.
Any reason you can't store your settings in a configuration file on the server? For example, I save website or application settings in a config.php rather than a database.