I'm working on an old web application my company uses to create surveys. I looked at the database schema through the mysql command prompt and thought the tables looked pretty solid. Though I'm not a DB guru I'm well versed in the theory behind it (having taken a few database design courses in my software engineering program).
That being said, I dumped the create statements into an SQL file and imported them in MySQL Workbench and saw that they make no use of any "actual" foreign keys. They'll store another table's primary key like you would with a FK but they don't declare it as one.
So seeing how their DB is designed the way I would through what I know (minus the FK issue) I'm left wondering that maybe there's a reason behind it. Is this a case of lazy programming or could you get some performance gains by doing all the error check programmatically?
In case you'd like an example they basically have Surveys and a survey has a series of Questions. A question is part of a survey so it holds it's PK in a column. That's pretty much it but they use it everywhere.
I'd appreciate any insight :) (I understand that this question might not have a right/wrong answer but I'm looking more for some information on why they would do this as this system has been pretty solid ever since we started using it so I'm led to believe that these guys knew what they were doing)
The original developers might have opted to use MyISAM or any other storage engine that does not support foreign key constraints.
MySQL only supports the defining of actual foreign key relationships on InnoDB tables, maybe yours are MyISAM, or something else?
More important is that the proper columns have indices defined on them (so the ones holding the PK of another table should be indexed). This is also possible in MyISAM.
As general points; keys speed up reads (if they are applicable to the read taking place they help the optimizer) and slow down writes (because they add overhead to the tables).
In the vast majority of cases the improvement of speed for reading and maintenance of referential integrity outweighs the minor overhead they add to writes.
This distinction has been blurred by cacheing, mirroring etc as so many reads on the very big sites don't actually hit the 'live' database - but this is not very relevant unless you are working for Amazon, Twitter or the like.
On uber large databases (the type that Teradata support) you find that they don't use Foreign keys. The reason is performance. Every time you write out to the database, which is often enough in a data warehouse you have the added overhead of having to check all the fk's on a table. If you already know it to be true, what's the point.
Good design on a small db would just mean you put them in, but there are performance gains to be had by leaving them out.
You don't really have to use foreign keys.
If you don't have them, data might became inconsistent and you won't be able to use cascade deletes and updates.
If you have them you might loose some of the users data due to the bug in your SQL statements that happens because of schema changes.
Some prefer to have them, some prefer life without them. There's no real advantages in either case.
Here is a real life instance where I'm not using a foreign key.
I needed a way to store a parent child relationship where the child may not exist, and the child is an abstract class. Since the child could be of a few types, I use one field to name the type of the child and one field to list the id of the child. The application handles most of the logic.
I'm not sure if this was the best design decision, but it was the best I could come up with under the deadline. It's been working well so far!
Related
lets say i have this mysql db, and all the tables in the db are related to one another, primary keys, foreign keys, etc all are set. Now is it possible to predict, just from the database design, what the queries will be used for the application? Since the database does dictate the application capabilities, then therefore from the design, we can predict what queries that will be used in the application, right?
If it is possible, is there a strategy or automated way to generate the possible queries?
I have written a book on the subject of analyzing data using SQL and Excel, and have spent many years working with databases.
Yes, from a database structure, you can figure out how tables are going to be joined together. You are not going to figure out the harder -- and generally more business relevant -- things that users need. Here are some examples:
You can have a database where the primary table is telephone calls, with the associated information. From this database, you may need to know the maximum number of active calls at one time. Or you may need to know how many different people someone calls in a month.
You can have a database of subscriber records. You may need to figure out the probability that someone will stop after a given amount of time.
You can have a database of products and purchases. You may need to figure out the most common combinations of three products that occur together.
You can have a database of credit card purchases. You may need to figure out who spends more than $200 in a restaurant more than 50 miles from their billing address.
The point is. A database does not represent "application capabilities". A database represents entities and relationships between them, presumably in the real world. There is hubris to think that you can look at a database and know what the business questions are.
Instead, the purpose of a database is to support data, which in turn, supports applications. The needs of applications will change over time. The beauty of databases, as opposed to many other data storage technologies, is that the technology scales as the data increases, supports changes to the structure, and allows new entities and relationships to be added into the system, without completely rewriting it.
Over time, and with experience, you might develop intuition on what's important. Even if you do, you will be constantly surprised at the varied needs of your users.
I am sincerely not trying to be smart here but answer is - yes and no.
Yes, because 3NF design usually outlines business rules behind it pretty well, so you can to a degree tell what is the business logic behind it, you can create an object or graph model from it and get a good idea
what kinds of questions can be asked from based on connections/relations and accessible properties.
No, because combinatorially you might have a untractable number of combinations of questions from a graph. Hence, you can't really tell what question one might ask in reasonable, non-exponential amount of time.
In general, if design is good and tables are meaningfully named you can get a pretty good idea what is going on.
Theoretically it's possible but due to the combinatorial explosion of N rows by X columns by Z tables by W possible functions by Q possible values on each column/row this is an amazingly large number.
The issue here is that you need to take into account the data too. Some queries only make sense when there is particular data and other don't. So you are essentially considering massively large hypercube.
I work with Multidimensional databases (denormalised cubes) and this is essentially denormalised databases. Have a read aup on OLAP theory and you'll see why.
So in short no as it's practically impossible.
Now is it possible to predict, just from the database design, what the queries will be used for the application?
You can, at least in principle, predict which queries can be answered efficiently. Which queries will the applications actually try to execute is another matter.
In an ideal world, database model would take into account all the querying needs of all the applications, now and in the future. We don't live in that world yet ;)
If it is possible, is there a strategy or automated way to generate the possible queries?
No, that requires human understanding of what the model actually means. Unfortunately, there is no good way to teach a tool to have that level of understanding.
A good model will immediately make sense to a person experienced in database modeling and the domain being modeled. Such person will typically be able to predict a fair portion of queries actually being used, but rarely all of them, so the documentation beside the database model itself is desirable. And of course, not all models are good...
I am trying to make a database in NoSQL for learning purpose
Its a simple Notice management (Add/ edit/ Delete notice from notice borad) application in PHP.
I have Memcached (Membase actually) where I can store data as key value pair.
For adding a notice, I am generating a unique id {using uniqueid()function} and storing notice detail in it. But the problem is,
1. How to list all the notices?
I also want to add serial key to Notices. To do that, I need to know the serial key of last inserted data. 2. How do I find out the last inserted Notice?
If find this question inappropriate, cuz this is somewhat relational datamodel (or you may say, it should be implemented in relational database), please let me know any use case scenario where I can use NoSQL to learn more about it.
Natural connections between entities are relational - so every data model is well designed using relational schema. Almost every nosql schema could be represented in relational data model.
You use NoSQL where using standard relational model is not comfortable (for example when foreign components need to add their own data and you don't know it in advance) or you need better performance and scaling - then you denormalize your data in NoSQL schema.
MongoDB (http://www.mongodb.org/) is a good start point in NoSQL data because it allows you to mix denormalized schema with (almost) relational design.
Nice use case is to implement data model for custom form data storing - where number of fields and type of fields isn't known upfront
And about your questions:
I don't know membase well but if it's a simple key-value store the only solution is to create another key at which you store list of all id's - but concurrent updates are a big concern here
You can also store last insert id somewhere else (at other key) - here concurrent updates are easier to master
The first thing to learn about NoSQL is that there are a lot of NoSQL solutions out there, with different capabilities. You need to pick the one that is most appropriate. In this case Redis will make your life a lot easier with the design that you've chosen.
The heart of the issue is the CAP theorem. Many NoSQL solutions deliberately choose to not guarantee consistency. Once you have thrown that away, you can't guarantee that the same ID is not handed out twice. Therefore it makes sense to either use timestamps, or use something else (like Redis) to generate the unique ids which you can store wherever you want.
My question is very similar to this question but a bit more specific.
My application has multiple companies and multiple users per company. It makes the most sense to me (at this point) for each company to have a "private" set of tables. This makes security extremely simple as I don't have to worry about JOIN-ing up my structure tree to be sure I only get data for the specific company. I can also extend the mysqli database extension and have it put a prefix on the table names in the query so that I never have to worry about security while writing my queries.
One other major advantage that I can see is that if one of the companies needs a customization, I can modify their specific tables and not have to take into account everyone else. The way that my app is designed it is extremely modular and implementing custom code is very simple.
There are some disadvantages that I can see but so far it seems that the above advantages would out-weigh them. The above proposed system does sort of grate on my (possibly) hyper-normalized database schema preferences up to this point. Another obvious disadvantage is implementing schema alterations but I can script them and be safe enough. One point that I'm not sure about is performance. If I have MySQL working with so many tables, will I make bottlenecks for myself?
I look forward to your thoughts!
Your proposal sounds reasonable to me. I would suggest that instead of prefixing your tables with the company name, you store the tables for each company in a separate schema. That way you can have tables with the same name, reducing your problems in the code, and have each set of tables protected by a different username and password in a convenient manner. Backups and replication would then all be distinguishable at need.
Lookup tables could be stored in yet another schema to which all users have access.
I searched in google for this without a good result. The only topic I found in the CakePHP trac, was closed without a "real" explanation. Since CakePHP is like one of the rails ports for php and rails does support this, I would like to know why it doesn't support this feature.
ok. but I would like to decide how my db schema will be, in RoR you have the tool, if you wanna use it, you do it under your risk.
btw: I don't know if symphony allow to do it also.
Only the CakePHP team would know for sure. One of the team, Nate Abdele, said this about multi-column primary keys back in February 2007:
I could come up with a million other
reasons why multi-column primary keys
are a dumb idea, but I think the most
important one for 2007 is that it
breaks REST architecture on the web,
as there is no single point of
reference to a piece of data, and that
data may now change up on you without
you knowing it, so objects can no
longer be consistently referenced from
a single URI.
I assume this would be his argument against multi-column foreign keys too.
Someone learning cake said it best:
I'm learning that, if something is
ridiculously difficult in cakephp,
you've probably got design problems.
-- asciimo
Can you achieve the same result by adding a condition with the 2nd column to the association?
No, the real reason to support multi-column primary keys is for retro-fitting CakePHP into an existing application. Don't promote this type of practice because it is poor design, of course, but if you had the choice of using multiple primary keys versus redesigning a large chunk of an existing administration system, the simple choice would be a very nice feature.
Here's my scenario:
I've got a table of (let's call them) nodes. Primary key on each one is simply "node_id".
I've got a table maintaining a hierarchy of nodes, with only two columns: parent_node_id and child_node_id.
The hierarchy is maintained in a separate table because nodes can have an N:N relationship. That is to say, one node can have multiple children, and multiple parents.
If I start with a node and want to get all of its ancestors (i.e. everything higher up the hierarchy), I could either do several selects, or do it all in one stored procedure.
Anyone with any practical experience with this question know which one is likely to have the best performance? I've read things online that recommend both ways.
"which one is likely to have the best performance? " : No one can know ! The only thing you can do is try both and MEASURE. That's sadly enough the main answer to all performance related questions... except in cases where you clearly have a O(n) difference between algorithms.
And, by the way, "multiple parents" does not make a hierarchy (otherwise I would recommend to read some books by Joe Celko) but a DAG (Direct Acyclic Graph) a much harder beast to tame...
If performance is your concern, then that schema design is not going to work as well for you as others could.
See More Trees & Hierarchies in SQL for more info.
I think a general statements could lead into problem, because it depends on how you your queries respectively the stored procedure make of usage of the indices.
To make a helpful declaration it would be necessary to compare the SQL of your selects and the stored procedure.