I have a Magento shop (using MySql db) and just noticed that some developer introduced a custom db for capturing some structured data.
Now I noticed that the tables are not linked via foreign keys with each other, but just added a column e.g. priceListID = 01124 which is the same Id as on price list table. So linking the data together must happen within the code by firing different select statements I assume.
Now I am wondering if this needs to be fixed soon or if it actually is ok not to use foreign keys on db level to link data together?
What are the down sides of doing this and are there maybe some benefits (like flexibility?)
Hope you can help me with this! Thanks a lot!
There're few advantages of keeping such constraints inside a database:
Performance. Most of constraints, such as foreign keys, are better implemented if stored inside a database, close to the data. You want to check data integrity with additional select? You have to make an extra request to the database. It requires some time.
What if you have several applications that work with your database? You have to write code for checking data integrity in all of them, which implies additional expenses.
Synchronization. While you're checking data integrity with additional select, some other user may delete this data at the same time. And you will not know about it. Of course, these checks can be properly implemented, but this is still an extra work you have to do.
To me, this is all a smell of bad, not scalable design which can bring many problems. Data integrity is what databases are built for. And these types of verifications should stay inside a database.
From your description, I understand that tables are indeed functionaly related, as they share a common piece of information (priceListID in the new table relates to id in the original table). On the one hand, this set-up would still allow writing queries that join the tables together.
The downside of not creating a foreign key to represent that relationship, however, is that, from database perspective, the consistency of the relationship cannot be guaranteed. It is, for example, possible that records are created in the new table where priceListID do not exist in the original table. It would also be possible to delete records in the old table while related records exists in the new one, hence turning the children to orphans.
As a conclusion: by not using foreign keys, the developers rely solely on the application to maintain data integrity. There is no obvious benefit not using the built-in features that the RDBMS offers to protect data consistency, and chances are developers just forgot that critical part of the table definition. I would suggest having a talk with them and intimate them to create the missing foreign key (unless they can give a clear explanation why they did not).
This should be as simple as:
ALTER TABLE newtable
ADD CONSTRAINT fk_new_to_original_fk
FOREIGN KEY (priceListID )
REFERENCES originaltable(id);
Please note that this requires all values in the referrencing column to be available in the parent table.
Related
I have this code $table->integer('card_id')->unsigned()->index(); in a table that I created using Laravel framework. Just to make sure what does the index()?
It's the way to say to the Laravel Migration to add indices to that column, in order to get faster results when searching through that particular column.
It's a common procedure in DB design when building tables. Just "index" some particular columns if you plan to make searchs in the table using those columns.
I just realized you added that "indexing" tag to your question and that the description of that tag answers your question.
A little bit more explanation to the answer:
It means the database server will create, well, an 'index' on that column. It makes queries faster for that column - so usually you'd use it on your primary key for instance. But maybe you find out you're looking up users by their email address a lot so you might add an index to that too.
There is a small performance hit for the database server maintaining the index (it has to update the index when you write a record to the db) - so you usually use them only where needed.
I am currently maintaining a rather large office web-application. I recently became aware that via various developer-tools within web-browsers that values of select-boxes can easily be modified by a user (among other things). On the server side I do validation if the the posted data is numerical or not (for drop-downs), but don't actually check if the value exists in a database table, for example I have a dropdown box for salutation ('mr','ms','mrs','Mr/ms') etc. which correspond with a numerical values.
Currently I use Mysql's Myisam tables which don't offer foreign keys referential integrity, so I am thinking about moving to Innodb, yet this posses the following issue:
If I want to apply referential integrity (to insure valid ID's are inserted), it would mean I'd have to index all columns (if using integrity checks) that do not necessarily need to be indexed for performance issues at all (e.g. a salutation dropdown). If a very large database client-table has say 10 simular dropdowns (e.g. clientgroup, no. of employees, country-region etc) it would seem an overkill to index every linked table.
My questions:
1) when using referential integrity, do columns really need to be indexed also?
2) are there other practical solutions I may be overlooking? (e.g. use a separate query for every dropdown-list to see if the value exists in a table?)
3) How do other web-applications deal with such issues?
Help Appreciated!
thanks
Patrick
You only have to index the fields used in the foreign key relationships, and recent version of mysql do this automatically for you anyways. It's not "overkill". it's actually an optimization.
Consider that anytime you update/delete/insert a record, the foreign tables have to be checked for matching records - without the indexes, those checks could be glacially slow.
InnoDB automatically creates an index when you define a foreign key. If an index on that column already exists, InnoDB uses it instead of creating a new index.
As #MarcB mentioned in his answer, InnoDB uses these indexes to make referential integrity checks more efficient during some types of data changes. These changes include updating or deleting values in the parent table, and cascading operations.
You could use the ENUM data type to restrict a column to a fixed set of values. But ENUM has some disadvantages too.
Some web developers eschew foreign keys. To provide the same data integrity assurances, they have to write application code for every such case. So if you like to write and test lots of repetitive code, unnecessarily duplicating features the RDBMS already provides more efficiently, then go ahead! :-)
Most developers who don't use foreign keys don't write those extra checks either. They just don't have data integrity enforcement. I.e. they have sacrificed quality.
PS: I do recommend switching to InnoDB, and referential integrity is just one of the reasons to do so. Basically, if you want a database that supports ACID, InnoDB supports all aspects of that and MyISAM supports none.
I want to implement a filter-function in my PHP project.
To implement a filter, I usually just add a WHERE clause in my query to show filtered results.
My Problem is:
These filters require not only a smple added WHERE clause, but a huge Query including multiple JOINs. The resulting Query has > 30 lines.
Later, there should also be a search function which would then also require this huge query.
I wonder if this is a good practice or if I should add a "redundant" Database column to my database table where I compute the attribute I need for filtering on every update.
With this column, I wouldnt have my huge query on different places over my project, but have a redundant column.
What do you think?
Greetings
As questioned, here the table structure/code. This is not the exact code, because there is also a revision system which makes it even more complex, but for understanding this is enough:
table submissions:
ID (primary)
(additionalColumns)
table reports:
ID (primary)
submissionID (reference to submission table)
(additionalColumns)
table report_objects:
reportID (reference to reports table, multiple report_object for one report)
table accounting:
ID (primary)
reportID (reference to reports table, multiple accountings for one report)
(additionalColumns)
table accounting_objects:
ID
accountingID (reference to accounting table, multiple accounting_object for one accounting)
(additionalColumns)
For a submission, one or multiple reports are being create with multiple objects to account (report_objects).
For each report, I can create multiple accountings, where each accounting is for a few objects of the report. The accounted report_objects are stored in accounting_object
My query/filter checks, if each report_object of a submissionID is accounted (accounting_object exists) for one submissionID.
There isn't one definitive answer and, in practice, if it works and runs quickly enough for your needs then you can leave it as is. Optimization is always something you can come back to.
Joining correctly
If you are simply checking for the existence of a join table and only including the results with that join you can do this through the correct LEFT / RIGHT JOIN expressions. This is always the first call.
Expressiveness
Also be as expressive as you can with SQL, you want to give it the best chance to optimize your query, there are keywords such as EXISTS, for example, make sure to use them.
Denormalization
You can add in a column that stores the computed value, the complexity that arises out of this is ensuring that the value is always up to date. This can be done by triggers or manually. The pros:
It is the easiest method of getting around slowness introduced by computed columns.
The cons:
Ruins your nice normalized schema
If you do it manually in code, you will forget to do it somewhere, causing headaches.
Triggers can be a bit of a pain.
Materialized view
This is like denormalization but prevents polluting your normalized tables by created a stored view. This is achieved in MySQL by storing the result of your complex select into a results table when the values change. Again, the same as denormalization, the complexity is keeping this up to date. It is typically done with triggers. This can be a pain but keeps the complexity out of your schema. As mentioned by#eggyal it isn't a supported feature of MySQL yet so you will have to DIY...
Materialized views with MySQL
Pros:
Keeps dirty denormalized stuff away from your nice normalized schema.
Cons:
Materialized views aren't supported so setting them up requires work.
If you trigger the refresh of your views in code you get stale data, but isn't quite as painful as the single column staleness of denormalization.
Triggers can be a bit of a pain.
If you aren't sure, and it really matters, do some benchmarking.
EDIT If you code has this query in one form or another across your code base then that has the possibility of cause headaches in future as you will have to remember to change the statements in all of the places if or when they change.
If by doing the above you have made your statements really simple and concise then they may differ enough from each other for it to not be a problem.
You can do some things to help you out:
Put all of the related queries in a single place, i.e. a single class or script that handles this query in its various forms. This way at least all of the changes are limited to the one file.
You can, to help yourself out a bit more, do a bit of refactoring it to remove duplication between the queries.
Also, If you feel the database information is too exposed to the code you may want to abstract it behind a view.
I am in the process of creating a website where I need to have the activity for a user (similar to your inbox in stackoverflow) stored in sql. Currently, my teammates and I are arguing over the most effective way to do this; so far, we have come up with two alternate ways to do this:
Create a new table for each user and have the table name be theirusername_activity. Then when I need to get their activity (posting, being commented on, etc.) I simply get that table and see the rows in it...
In the end I will have a TON of tables
Possibly Faster
Have one huge table called activity, with an extra field for their username; when I want to get their activity I simply get the rows from that table "...WHERE username=".$loggedInUser
Less tables, cleaner
(assuming I index the tables correctly, will this still be slower?)
Any alternate methods would also be appreciated
"Create a new table for each user ... In the end I will have a TON of tables"
That is never a good way to use relational databases.
SQL databases can cope perfectly well with millions of rows (and more), even on commodity hardware. As you have already mentioned, you will obviously need usable indexes to cover all the possible queries that will be performed on this table.
Number 1 is just plain crazy. Can you imagine going to manage it, and seeing all those tables.
Can you imagine the backup! Or the dump! That many create tables... that would be crazy.
Get you a good index, and you will have no problem sorting through records.
here we talk about MySQL. So why would it be faster to make separate tables?
query cache efficiency, each insert from one user would'nt empty the query cache for others
Memory & pagination, used tables would fit in buffers, unsued data would easily not be loaded there
But as everybody here said is semms quite crazy, in term of management. But in term of performances having a lot of tables will add another problem in mySQL, you'll maybe run our of file descriptors or simply wipe out your table cache.
It may be more important here to choose the right engine, like MyIsam instead of Innodb as this is an insert-only table. And as #RC said a good partitionning policy would fix the memory & pagination problem by avoiding the load of rarely used data in active memory buffers. This should be done with an intelligent application design as well, where you avoid the load of all the activity history by default, if you reduce it to recent activity and restrict the complete history table parsing to batch processes and advanced screens you'll get a nice effect with the partitionning. You can even try a user-based partitioning policy.
For the query cache efficiency, you'll have a bigger gain by using an application level cache (like memcache) with history-per-user elements saved there and by emptying it at each new insert .
You want the second option, and you add the userId (and possibly a seperate table for userid, username etc etc).
If you do a lookup on that id on an properly indexed field you'd only need something like log(n) steps to find your rows. This is hardly anything at all. It will be way faster, way clearer and way better then option 1. option 1 is just silly.
In some cases, the first option is, in spite of not being strictly "the relational way", slightly better, because it makes it simpler to shard your database across multiple servers as you grow. (Doing this is precisely what allows wordpress.com to scale to millions of blogs.)
The key is to only do this with tables that are entirely independent from a user to the next -- i.e. never queried together.
In your case, option 2 makes the most case: you'll almost certainly want to query the activity across all or some users at some point.
Use option 2, and not only index the username column, but partition (consider a hash partition) on that column as well. Partitioning on username will provide you some of the same benefits as the first option and allow you to keep your sanity. Partitioning and indexing the column this way will provide a very fast and efficient means of accessing data based on the username/user_key. When querying a partitioned table, the SQL Engine can immediately lop off partitions it doesn't need to scan as it can tell based off of the username value queried vs. the ability of that username to reside within a partition. (in this case only one partition could contain records tied to that user) If you have a need to shard the table across multiple servers in the future, partitioning doesn't hinder that ability.
You will also want to normalize the table by separating the username field (and any other elements in the table related to username) into its own table with a user_key. Ensure a primary key on the user_key field in the username table.
This majorly depends now on where you need to retrieve the values. If its a page for single user, then use first approach. If you are showing data of all users, you should use single table. Using multiple table approach is also clean but in sql if the number of records in a single table are very high, the data retrieval is very slow
Right now in a database I have a Members table and a Products table with a joining Favorites table that consists of primary foreign keys from both the Members and Products tables. I have a requirement to place a restriction on amount of products that a member can place in their favorites at 5.
Where can this restriction come from? Is it something done within the database (MySQL) and hence would be part of my existing schema? Or is this a programming function that could be accomplished with something like PHP?
The question has been answered, however, since you are seeking understanding ...
The idea with Databases is that all such such limits and Constraints on data are placed in the Database itself (as a self-contained unit). Data Constraints should be in the Database, not only in the app. ISO/IEC/ANSI SQL provide several types of Constraints, for different purposes:
FOREIGN KEY Constraints, for Referential Integrity (as well as performance; Open Architecture compliance, etc)
CHECK Constraints, to check against data values of other columns, and disallow violations
RULE Constraints, to disallow data that is out-of-range or specify exact data value formats
Yours is a classic simple RULE or CHECK. And the correct answer for Database and Database Design is a RULE or CHECK, not code.
That is not to say that the app should not check the count, and avoid attempting an invalid action. That is just good sense. And it is not a repetition, it is stopping invalid actions at a higher level, which saves resource use. And data in the Db cannot be relied upon, if the integrity is managed outside, in app code, written by developers. The rules implemented inside the server can be relied upon, they are enforced for all apps or app components.
But the freeware Non-SQLs do not have the basics of Standard-SQL. No Checks or Rules. Therefore the integrity of data in the database relies solely on the developer: their quality, knowledge, consistency, etc.
And the correct answer for MySQL/PHP is code. In every location that attempts that insert.
You would do this in PHP.
Just do a SELECT COUNT(*) FROM members_products WHERE member_id = 3 before inserting.