mysql - Many tables to one table - multiple entries - php

I have a system which has (for the sake of a simple example) tables consisting of Customers, Vendors, Products, Sales, etc. Total 24 tables at present.
I want to add the ability to have multiple notes for each record in these tables. E.g., Each Customers record could have 0 to many notes, each Vendor record could have 0 to many notes, etc.
My thinking is that I would like to have one "Notes" table, indexed by a Noteid and Date/time stamp. The actual note data would be a varchar(255).
I am looking for a creative way to bi-directionally tie the source tables to the Notes table. The idea of having 24 foreign key type cross reference tables or 24 Notes tables doesn't really grab me.
Programming is being done in PHP with Apache server. Database is mysql/InnoDB.
Open to creative ideas.
Thanks
Ralph

I would sugges a table like this
note_id : int autoincrement primary
type_id : int, foreign key from f Customers, Vendors, Products etc
type : varchar, code indicating the type, like Vendors, VENDORS or just V
note : varchar, the actual node
CREATE TABLE IF NOT EXISTS `notes` (
`note_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`type` varchar(20) CHARACTER SET utf8 NOT NULL,
`note` varchar(255) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`note_id`)
)
With a setup like that you can have multiple notes for each type, like Vendors, and also hold notes for multiple types.
data sample
note_id type_id type note
--------------------------------------------------------------------
1 45 Vendors a note
2 45 Vendors another note
3 3 Customers a note for customer #3
4 67 Products a note for product #67
SQL sample
select note from notes where type="Vendors" and type_id=45
To reduce table size, I would prefer aliases for the types, like V, P, C and so on.

Don't do a "universal" table, e.g.
id, source_table, source_record_id, note_text
might sound good in practice, but you can NOT join this table against your others without writing dynamic SQL.
It's far better to simply add a dedicated notes field to every table. This eliminates any need for dynamic sql, and the extra space usage will be minimal if you use varchar/text fields, since those aren't stored in-table anyways.

I've done a structure like this before where I used a format like this:
id (int)
target_type (enum/varchar)
target_id (int)
note (text)
Each data element just has to query for it's own type then, so for your customer object you would query for notes attached to it like this
SELECT * FROM notes where target_type='customer' AND target_id=$this->id
You can also link target_type to the actual class, so that you write to the database using get_class($this) to fill out target type, in which case a single function inside of the Note class could take in any other object type you have.

In my opinion, there isn't a clean solution for this.
option 1: Master entity table
Every (relevant) row of every (relevant) table has a master entry inside a table (let's call it entities_tbl. The ids of each derived table isn't an autoincrement but it's a foreign key referencing the master table.
Now you can easily link the notes table with the master entity id.
PRO: It's an object oriented idea. Like a base "Object" class which is the father of every other class. Also, each entity has an unique id across the database.
CON: It's a mess. Every entity ID is scattered among (at least) two tables. You'd need JOINs every single time, and the master entity table will be HUGE (it will contain the same number of rows as the sum of every other child table, combined)
option 2: meta-attributes
inside the notes table, the primary key would contain an autoincrement, the entity_id and item_table_name. This way you can easily extract the notes of any entity from any table.
PRO: Easy to write, easy to populate
CON: It needs meta-values to extract real values. No foreign keys to grant referential integrity, messy and sloppy joins, table names as where conditions.
option 3: database denormalization
(sigh, I've never considered to ever give this suggestion)
Add a column inside each table where you need notes. Store the notes as json encoded strings. (this means to denormalize a database because you will introduce non-atomic values)
PRO: easy and fast to write, uses some form of standard even for future database users, the notes are centralized and easily accessible from each entity
CON: the database isn't normalized, poor search and comparison between notes

Related

Common child table to many master tables in database

I am using mysql database with php to build a web application .
I have a child table attachment, which is a common table for many master tables: teacher, student, classRoom (and others).
The master tables number exceeds 10 lets say n tables.
My question's, is it a good practice to:
Create just one table in database called 'attachment' and relate it with its masters .
This will cause to have n foreign keys in the attachment table (ie: n-1 unused columns ) which will leads too, to n-1 attributes in the model without being initialized or used each time I create a model .
Create a table for each master table (master_i) called (master_i_Attachment) and relate it just to its master. But this will lead to n attachments tables and n models for attachment in my code.
Any advice ?
What you can do is to just have a table with the following fields: id, reference_id (one of your parent tables), reference_type (ie to which table the reference_id belongs), (all the other fields in your attachment table).
Then, for example, if you want to get the attachments for the particular parent type, you can run SELECT query filtering on that type, e.g. WHERE reference_type='classroom'.
Or if you want to get the attachment for the classroom with a specific ID:
SELECT * FROM attachment WHERE reference_id=<ID> AND reference_type = 'classroom';
You will probably want to have a composite unique key on (reference_id, reference_type) which will ensure that you won't get duplicated attachments (unless you want the possibility for the given ID of the given type to have more than one attachment, in which case the key should not be unique).
Whether this solution suits your needs depends on how you are going to use the data, i.e. what kind of queries you are going to run most often.
Based on database normalization concept, using redundant and uninitialized (or null) values in database is discouraged. Actually normalization tries to isolate data more and more (it means more table for any anomaly). BUT you can simply ignore rules or denormalize your database for performance issues.
In your case, I think the simplest (and normalized) way would be choice number #2 (a separate table for each attachment type). But you can tweak your design as Ashalynd says. Put a type column in your table to specify the parent table. BTW using this method will add complexity for cascading changes in database.

MySQL Database I18N, a JSON approach?

UPDATE: I've come across this question I did after some years: now I know this is a very bad approach. Please don't use this. You can always use additional tables for i18n (for example products and products_lang), with separate entries for every locale: better for indexes, better for search, etc.
I'm trying to implement i18n in a MySQL/PHP site.
I've read answers stating that "i18n is not part of database normally", which I think is a somewhat narrow-minded approach.
What about product namesd, or, like in my instance, a menu structure and contents stored in the db?
I would like to know what do you think of my approach, taking into account that the languages should be extensible, so I'm trying to avoid the "one column for each language solution".
One solution would be to use a reference (id) for the string to translate and for every translatable column have a table with primary key, string id, language id and translation.
Another solution I thought was to use JSON. So a menu entry in my db would look like:
idmenu label
------ -------------------------------------------
5 {"en":"Homepage", "it":"pagina principale"}
What do you think of this approach?
"One solution would be to use a reference (id) for the string to translate and for every translatable column have a table with primary key, string id, language id and translation."
I implemented it once, what i did was I took the existing database schema, looked for all tables with translatable text columns, and for each such table I created a separate table containing only those text columns, and an additional language id and id to tie it to the "data" row in the original table. So if I had:
create table product (
id int not null primary key
, sku varchar(12) not null
, price decimal(8,2) not null
, name varchar(64) not null
, description text
)
I would create:
create table product_text (
product_id int not null
, language_id int not null
, name varchar(64) not null
, description text
, primary key (product_id, language_id)
, foreign key (product_id) references product(id)
, foreign key (language_id) references language(id)
)
And I would query like so:
SELECT product.id
, COALESCE(product_text.name, product.name) name
, COALESCE(product_text.description, product.description) description
FROM product
LEFT JOIN product_text
ON product.id = product_text.product_id
AND 10 = product_text.language_id
(10 would happen to be the language id which you're interested in right now.)
As you can see the original table retains the text columns - these serve as default in case no translation is available for the current language.
So no need to create a separate table for each text column, just one table for all text columns (per original table)
Like others pointed out, the JSON idea has the problem that it will be pretty impossible to query it, which in turn means being unable to extract only the translation you need at a particular time.
This is not an extension. You loose all advantages of using a relational database. By way like yours you may use serialize() for much better performance of decoding and store data even in files. There is no especial meen to use SQL with such structures.
I think no problem to use columns for all languages. That's even easier in programming of CMS. A relational database is not only for storing data. It is for rational working with data (e.g. using powerful built-in mechanisms) and controlling the structure and integrity of data.
first thought: this would obviously brake exact searching in sql WHERE label='Homepage'
second: user while search would be able to see not needed results (when e.g. his query was find in other languge string)
I would recommend keeping a single primary language in the database and using an extra sub-system to maintain the translations. This is the standard approach for web applications like Drupal. Most likely in the domain of your software/application there will be a single translation for each primary language string, so you don't hav to worry about context or ambiguity. (In fact for best user experience you should strive to have unique labels for unique functionality anyway).
If you want to roll your own table, you could have something like:
create table translations (
id int not null primary key
, source varchar(255) not null // the text in the primary language
, lang varchar(5) not null // the language of the translation
, translation varchar(255) not null // the text of the translation
)
You probably want more than 2 characters for language since you'll likely want en_US, en_UK etc.

database table design dilemma, a lot of check boxes?

I want to begin with Thank you, you guys have been good to me.
I will go straight to the question.
Having a table with over 400 columns, is that bad?
I have web forms that consists mainly of questions that require check box answers.
The total number of check boxes can run up to 400 if not more.
I actually modeled one of the forms, and put each check box in a column (took me hours to do).
Because of my unfamiliarity with database design, I did not feel like that was the right way to go.
So I read somewhere that some people use the serialize function, to store a group of check boxes as text in a column.
I just want to know it that would be the best way to store these check boxes.
Oh and some more info I will be using cakephp orm with these tables.
Thanks again in advance.
My database looks something like this
Table : Patients, Table : admitForm, Table : SomeOtherFOrm
each form table will have a PatientId
As i stated above i first attempted creating a table for each form, and then putting each check box in a column. That took me forever to do.
so i read some where serializing check boxes per question would be a good idea
So im asking would would be a good approach.
For questions with multiple options, just add another table.
The question that nobody has asked you yet is do you need to do data mining or put the answers to these checkbox questions into a where clause in a query. If you don't need to do any queries on the data that look at the data contained in these answers then you can simply serialize them up into a few fields. You could even pack them into numbers. (all who come after you will hate you if you pack the data though)
Here's my idea of a schema.
== Edit #3 ==
Updated ERD with ability to store free form answers, also linked patient_reponse_option to question_option_link table so a patients response will be saved with correct option context (we know which question the response is too). I will post a few queries soon.
== Edit #2 ==
Updated ERD with form data
== Edit #1 ==
The short answer to your question is no, 400 columns is not the right approach. As an alternative, check out the following schema:
== Original ==
According to your recent edit, you will want to incorporate a pivot table. A pivot table breaks up a M:M relationship between 'patients' and 'options', for example, many patients can have many options. For this to work, you don't need a table with 400 columns, you just need to incorporate the aforementioned pivot table.
Example schema:
// patient table
tableName: patient
id: int(11), autoincrement, unsigned, not null, primary key
name_first: varchar(100), not null
name_last: varshar(100), not null
// Options table
tableName: option
id: int(11), autoincrement, unsigned, not null, primary key
name: varchar(100), not null, unique key
// pivot table
tableName: patient_option_link
id: int(11), autoincrement, unsigned, not null, primary key
patient_id: Foreign key to patient (`id`) table
option_id: Foreign key to option (`id`) table
With this schema you can have any number of 'options' without having to add a new column to the patients table. Which, if you have a large number of rows, will crush your database if you ever have to run an alter table add column command.
I added an id to the pivot table, so if you ever need to handle individual rows, they will be easier to work with, vs having to know the patient_id and option_id.
I think I would split this out into 3 tables. One table representing whatever entity is answering the questions. A second table containing the questions themselves. Finally, a third junction table that will be populated with the primary key of the first table and the id of the question from the second table whenever the entity from the first table selects the check box for that question.
Usually 400 columns means your data could be normalized better and broken into multiple tables. 400 columns might actually be appropriate, though, depending on the use case. An example where it might be appropriate is if you need these fields on every single query AND you need to filter records using these columns (ie: use them in your WHERE clause)... in that case the SQL JOINs will likely be more expensive than having a sparsely populated "wide" table.
If you never need to use SQL to filter out records based on these "checkboxes" (I'm guessing they are yes/no boolean/tinyint type values) then serializing is a valid approach. I would go this route if I needed to use the checkbox values most of time I query the table, but don't need to use them in a WHERE clause.
If you don't need these checkbox values, or only need a small subset of them, on a majority of requests to your table then its likely you should work on breaking your table into multiple tables. One approach is to have a table with the checkbox values (id, record_id, checkbox_name, checkbox_value) where record_id is the id of your primary table record. This implies a one-to-many relationship between your primary records and your checkbox values.

Merge several mySQL databases with equivalent structure

I would like write a php script that merges several databases, and I would like to be sure of how to go around it before I start anything.
I have 4 databases which have the same structure and almost same data. I want to merge them without any duplicate entry while preserving (or re-linking) the foreign keys.
For example there is a db1.product table which is almost the same as db2.products so I think I would have to use LIKE comparison on name and description columns to be sure that I only insert new rows. But then, when merging the orders table I have to make sure that the productID still indicates the right product.
So I thought of 2 solutions :
Either I use for each table insert into db1.x as select * from db2.x and then make new links and check for duplicate using triggers.
Either I delete duplicate entries and update new foreign keys (after having dropped constraints) and then insert row into the main database.
Just heard of MySQL Data Compare and Toad for mySQL, could they help me to merge tables ?
Could someone indicate to me what should be the right solution ?
sorry for my english and thank you !
First thing is how are you determining whether products are the same? You mentioned LIKE comparison on name and description. You need to establish a rule what says that product is one and the same in your db1, db2 and so on.
However, let's assume that product's name and description are the attributes that define it.
ALTER TABLE products ADD UNIQUE('name', 'description');
Run this on all of your databases.
After you've done that, select one of the databases you wish to import into and run the following query:
INSERT IGNORE INTO db1.products SELECT * FROM db2.products;
Repeat for the remaining databases.
Naturally, this all fails if you can't determine how you're going to compare the products.
Note: never use reserved words for your column names such as word "name".
Firstly, good luck with this - sounds like a tricky job.
Secondly, I wouldn't do this with PHP - I'd write SQL to do the work, assuming this is a one-off migration task and not a recurring task.
As an approach, I would do the following.
Create a database with the schema you want - it sounds like each of your 4 databases have small variations in the schema. Just create the schema for now, don't worry about the data.
Create a "working" database, with the same schema, but with columns for "old" primary keys. For instance:
table ORDER
order_id int primary key auto increment
old_order_id int not null
...other columns...
table ORDER_LINE
order_line_id int primary key auto increment
old_order_line_id int not null
order_id int foreign key
...other columns...
Table by table, Insert into your working database from your first source database. Let the primary keys auto_increment, but put the original primary key into the "old_" column.
For instance:
insert into workingdb.orders
select null, order_id, ....other columns...
from db1.orders
Where you have a foreign key, populate it by finding the record in the old_ column.
For instance:
insert into workingdb.order_line
select null, ol.order_line_id, o.order_id
from db1.order_line ol,
workingdb.order
where ol.order_id = o.old_order_id
Rinse and repeat for the other databases.
Finally, copy the data from your working database into the "proper" database. This is optional - it may help to retain the old IDs for lookups etc.

Bulletin board - Database optimisation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
This question is a follow on from this
Question
The project and problem
The project I am currently working on is a bulletin board for a large non-profit organisation. The bulletin board will be used to allow inter-office communication within the organisation.
I am building the application and have been having trouble extracting the results that I need from my database because I don't think it is properly normalized and because of limitations in my knowledge of relational database theory and mysql. I would appreciate input into the design of the board in general and in particular, ways that the database structure can be improved to facilitate efficient queries and help me develop this application and future application faster
Business Logic
The bulletin board will be used in the following way
Posting bulletins and responses to bulletins
Employees or 'users' in offices around the country will be able to post messages to the bulletin board.Bulletins must be posted to a location and categorised- i'll call these "bulletins".
Users will be able to post any number of replies to any one bulletin and users will be able to reply to their own bulletin - i'll call these 'replies'.
Rating bulletins and replies
Users will be able to either 'like' or 'dislike' a bulletin or a reply and the total number of likes or dislikes will be shown for each bulletin or reply.
Viewing the bulletin board and responses
Bulletins can be displayed chronologically.
Users can sort bulletins chronologically or chronologically by the latest reply to that bulletin(let me know if you need more explanation)
When a particular bulletin is selected, replies to that bulletin will be displayed chronologically
-- phpMyAdmin SQL Dump
-- version 3.2.4
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Jan 16, 2011 at 06:44 PM
-- Server version: 5.1.41
-- PHP Version: 5.3.1
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
--
-- Database: `bulletinboard`
--
-- --------------------------------------------------------
--
-- Table structure for table `bbs`
--
CREATE TABLE IF NOT EXISTS `bbs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bb_locations_id` int(11) NOT NULL,
`bb_categories_id` int(11) NOT NULL,
`users_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`content` text NOT NULL,
`created_date` int(11) NOT NULL,
`rank` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=87 ;
--
-- Dumping data for table `bbs`
--
INSERT INTO `bbs` (`id`, `bb_locations_id`, `bb_categories_id`, `users_id`, `title`, `content`, `created_date`, `rank`) VALUES
(83, 8, 28, 44, 'sdaf', 'asdfasdf', 1292712797, 0),
(84, 8, 28, 44, 'asdf', 'asdfasd', 1292875089, 0),
(86, 8, 28, 44, 'Robert is leaving', 'Robert is leaving and going back to the states ', 1294344916, 0);
-- --------------------------------------------------------
--
-- Table structure for table `bb_categories`
--
CREATE TABLE IF NOT EXISTS `bb_categories` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` varchar(255) NOT NULL,
`list_order` varchar(255) NOT NULL,
`admin` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=30 ;
--
-- Dumping data for table `bb_categories`
--
INSERT INTO `bb_categories` (`id`, `title`, `description`, `list_order`, `admin`) VALUES
(28, 'Travel', 'Rideshares, proposed trips etc', '1', 1);
-- --------------------------------------------------------
--
-- Table structure for table `bb_locations`
--
CREATE TABLE IF NOT EXISTS `bb_locations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` varchar(255) NOT NULL,
`address` varchar(255) NOT NULL,
`post_code` int(11) NOT NULL,
`list_order` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=15 ;
--
-- Dumping data for table `bb_locations`
--
INSERT INTO `bb_locations` (`id`, `title`, `description`, `address`, `post_code`, `list_order`) VALUES
(8, 'Washington DC', 'asdkf', 'dsf', 0, 1);
-- --------------------------------------------------------
--
-- Table structure for table `bb_ratings`
--
CREATE TABLE IF NOT EXISTS `bb_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bbs_id` int(11) NOT NULL,
`users_id` int(11) NOT NULL,
`like_id` int(2) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=68 ;
--
-- Dumping data for table `bb_ratings`
--
-- --------------------------------------------------------
--
-- Table structure for table `bb_replies`
--
CREATE TABLE IF NOT EXISTS `bb_replies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`users_id` int(11) NOT NULL,
`bbs_id` int(11) NOT NULL,
`content` text NOT NULL,
`created_date` int(11) NOT NULL,
`rank` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=158 ;
--
-- Dumping data for table `bb_replies`
--
INSERT INTO `bb_replies` (`id`, `users_id`, `bbs_id`, `content`, `created_date`, `rank`) VALUES
(156, 44, 86, 'good ridance i say\r\n', 1294788444, 0),
(157, 44, 86, 'And stay away\r\n', 1294892751, 0);
-- --------------------------------------------------------
--
-- Table structure for table `bb_reply_ratings`
--
CREATE TABLE IF NOT EXISTS `bb_reply_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bb_replies_id` int(11) NOT NULL,
`users_id` int(11) NOT NULL,
`like_id` tinyint(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=115 ;
--
-- Dumping data for table `bb_reply_ratings`
--
-- --------------------------------------------------------
--
-- Table structure for table `bb_sort_bys`
--
CREATE TABLE IF NOT EXISTS `bb_sort_bys` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(20) NOT NULL,
`description` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
--
-- Dumping data for table `bb_sort_bys`
--
INSERT INTO `bb_sort_bys` (`id`, `title`, `description`) VALUES
(1, 'Newest', 'Posts are sorted by their creation date'),
(2, 'Popular', 'Posts are sorted by the date of their lates reply, or by post date if they have now replies');
-- --------------------------------------------------------
--
-- Table structure for table `users`
--
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_name` varchar(10) NOT NULL,
`first_name` varchar(100) NOT NULL,
`last_name` varchar(100) NOT NULL,
`permission` int(1) NOT NULL,
`bb_sort_bys_id` varchar(10) NOT NULL,
`bb_locations_csv` varchar(255) NOT NULL,
`defaultLocation` int(11) NOT NULL,
`bb_categories_csv` varchar(255) NOT NULL,
`total_bulletins` int(5) NOT NULL,
`bulletins_per_page` int(5) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=45 ;
Part I
Revised 09 Dec 10 01:00 EST
Looked at your DDL. Ok. We need to take a step back and organise your database first. That will solve half your problems (your SQL will be straight-forward; and fast; less indices; no temp tables required). For a while I thought, aha, you have your columns, it must be stable, but there is no chance. Top down from scratch, ok. Have a look at this Entity Relation Diagram (no use working on the Data Model, which is Entities, Relations and Attributes, until we get the ERs right), and check that it is correct.
The way to do that is, answer the following questions (short answers are fine). These questions are clarifying the Entities and Business Rules. How you understand databases in general, and your data in particular is crucial. You have come a long way, on your own, so we can take it from there.
I think ▶this post◀ might be helpful to you, in order to understand the formal stages that should be followed; which we are short-circuiting here.
Most important, totally, and completely, forget about the function and any coding requirements. Data has to be modelled independent of the application, simply as Data. Function Modelling is a different science. First get one right; then get the other right; and the two together play beautiful tunes. Try jamming them together; doing both tasks at the same time, and they won't even make a suburban garage band.
For brevity, and the sake of anyone reading this, I with use a Closed and Open Section; when an Open item (discussion) is closed, I will make it concise, and move it to the Closed section. Maintain the numbering, because things sometimes come back to haunt us. You may wish to do the same, or even delete the discussion on your side.
The links for the pretty pictures are at the end.
Apologies: the editing does not work; sub-numbering is inconsistent
Closed Issues
users.bb_locations_csv is a many-to-many relation between users and locations:
Each of those elements should be an entry in a discrete column, in a discrete row
One users can have many locations and 1 location can have many users is many-to-many
Read ▶this post◀ for a discussion of how that is treated and what stage it is dealt with
At this Logical Stage, that is just a n::n relation, as I have drawn, you can forget about it for now, it will be supplied, simply, when we get to the physical Stage.
Trust me, I will provide code that in no more complex than ...WHERE IN () for your declared purpose.
On second thought, if I break your fingers, you will type even slower, so I better not
Ok, your app is browser based, and the page is dynamic (my advice was for static pages that need to be touched up); go ahead with check boxes.
.
users.bb_categories_csv is many-to-many relation between users and categories
Ditto.
.
Confirmed: a bulletin (bbs) does not exist without an user; an user issues a bulletin, and that starts the whole cycle; then invites replies and ratings.
3.1 Confirmed: There is really only one bulletin board and it does not exist as a Thing in the database.
3.2 Confirmed: that the org will never have more than one bulletin board, and the classifications and categorisations are all adequately handled by the Category table/function
Deleted.
Confirmed: The difference between bulletins and replies is that replies are dependent on a bulletin to exist, they do not have a title and they are not categorised by location or category because they are dependent on the bulletin itself to exist.
Deleted.
Comments noted. Resolved.
7.1. For each single bulletin submitted by another user, each user can post more than one reply.
7.2. For each single bulletin submitted by an user, that user can post one, or more than one reply.
7.3. Deleted.
7.4. Deleted.
The Data Model now allows more than one reply per user per bulletin; including the User who submitted the bulletin.
.
8. Confirmed: each user can post at most one rating to a bulletin (which can be revoked/changed)
.
9. Confirmed: each user can post at most one rating to a reply (ditto)
10.1. Given: username comes from the organisation and is the unique name that identifies employees. For example emails are username#organisation.org - authentication is done with ldap and this is required in order to connect an retrieve other information about the employees
Confirmed: UserName is an excellent Identifier
10.2. Confirmed: FirstName, LastName ... BirthPlace, etc remain as (the traditional) columns for ensuring People are not duplicated.
.
11. Given: At the moment we can Identify our offices by casual names which are generally know within the organisation, since we only have about 3 main offices and many field offices. So examples would be Washington DC or virginia field office. In total I think we will try and keep the total below 20. I want to record the exact address of each location as well because that could be used to uniquely identify offices to users.
Provided: StateCode+Town as PK; IsMainOffice as boolean.
.
12. Confirmed: Description and Name for Category are required.
.
13. Given:Users will not be able to post to some categories. Only users with sufficiently high rights will have the right to post to certain categories.
Provided: Permission in User, Location, Category is a method of evaluating such rights.
.
14. Confirmed: Location.Administrator is UserId of admin for the Location.
.
15. Given: There will only ever be a need for a like or a dislike. I don't think there needs to be a neutral position because this is the same as just not voting? Liking seems more relevant to bulletin replies that posts to be honest. Ie 'i see your response and instead of writing my own I will just agree with you - the existing bulletin board is somewhat of a social aspect in the orgainsation and I think liking and disliking/agreeing and dissagreeing creates a level of controversy that encourages participation. However liking or disliking a bulletin may not always be entirely appropriate.
15.1 Provided: Like as boolean in BulletinRating and ResponseRating. This will require interpretation on every access.
15.2. When it is no longer a boolean, it can be changed to a RatingCode, and implemented as a Lookup table. The names are then determined by Joins, and interpretation is eliminated. I drew this in the First Data Model, so that you could see what I meant
15.3. Removed in the Second Data Model.
.
16. Confirmed: each user has a home Location (other than the list of Locations that they are interested in).
.
17. Confirmed: Permission as per (13).
.
18. Confirmed: Further Permissions may be be required, as per Data Model.
18.1. If you do this now, you won't have to worry about when organisation decides to prevent a certain Person from posting Responses or Bulletins, or Rating them; and wants that feature implemented yesterday.
18.2. Even if you do not implement it, leave gaps between the values you do implement.
.
19 Confirmed: a Bulletin is about a Location.
19.1. Confirmed: There are no Bulletins without a Location
19.2. Confirmed: There are no Bulletins without a Location.
19.3 Confirmed: There are no Bulletins without a User (declarative). But so far we have no way of constraining that User; therefore any User can inset a Bulletin for any Location ( you could constrain it in code, eg. to Locations each User Is Interested In.
19.4 Confirmed: There are no BulletinRatings without a Bulletin and a rating User.
19.5 Confirmed: There are no Responses without a Bulletin.
19.4 Confirmed: There are no ResponseRatings without a Response and a rating User.
19.7. But, there can be Users, Locations, andCategories`, independently.
.
20. If you do not mind, I will provide naming conventions, etc. They should be self explanatory, and the value will show up only when you start coding SQL. Please ask, if anything isn't. For starters, all names are singular. Mixed Case is easier to read (you are supposed to use capitals for SQL language).
20.1. My experience is table_name as opposed to tableName are really technie forms, and users do not like them; Consistent mixed case is liked by everyone. It is one of those things that is impossible to change, so choose carefully.
.
21. For your need to group tables together, which is good, keep in mind that that is a Physical issue. At the Logical Data Model level, the tables have normal names, uncluttereded by physical issues. Imagine that the physical tables are prefixed with something like (and please use capitals for this):
- REF_ for reference (such as User) and lookup tables
- BUL_ for Bulletin system
.
I am not able to name tables with uppercase letters? Im not sure why. I don't know why I can't have uppercase table names. Is it to do with using MyIsam database tables?
The universal convention is that SQL Language is expressed in upper case; every report and admin tool I have ever used generates such SQL code. So we can't use upper case. Lower case or mixed case only. So the choices boil down to table_name or TableName; we need a separator of some kind. For reasons already provided, I strongly recommend mixed case, capiatlised, and not the OO style with the leading letter uncapitalised.
.
22. rank (all) can be derived directly from the database (remember, do not worry about the code during Data Modelling). If you store it, it is a Normalisation error; a duplicated column; which has to be kept up-to-date; which can get out of synch with the derived value; which is called an Update Anomaly. Fifth Normal Form eliminates Update Anomalies. That is my minimum level of Normalisation, so that is what you will get from me.
22.1. I am not interfering with the sort order or popularity issue at all; in fact, by the sounds of it, you haven't closed that functionality. I am only taking redundant data, the rank column, out, as part of the Normalisation process.
22.2. Here's a ▶Quick Tutorial◀ on the RANK() operator (as it is commonly known). It is not ANSI SQL; it is an Oracle and MS extension. However it is not required if you understand Subqueries, which is why Sybase does not have it. I doubt MySQL has it, so you need to get your head around it. Understanding Scalar Subqueries is a pre-requisite. Sybase syntax, so whack your semi-colons in, etc. Feel free to ask specific questions.
.
I have never seen that approach of writing Rank = (SELECT.... Is that the same as (SELECT ...) as Rank?
I have posted a separate Answer for that.
.
22.3. Needing to understand why, is no problem at all. Only children blindly follow simple rules, and you are certainly not one of them.
.
23. Confirmed: users.total_bulletins is redundant; it can be derived. Removed.
.
24. All your PKs are Ids. Haven't you gotten tired of getting lost in the code yet ? Forget about sticking Idiot PKs on everything that moves, let's find out How your users Identify their Entities; what Entities are truly Independent, and the other which depend on Independent Entities.
24.1. Never use Id or any such form. Where it is a PK, use the full form.
24.2. Call location_id, location_id, wherever it is, including the PK table. The exception is when you need to show the role. This will become clear in the Data Model.
.
25. You have no Declarative Referential Integrity, no Defined Foreign keys. That is bad news for many different reasons. Once these questions are clairified, please add them in. DRI means that as much as possible, if not all, Integrity is Declared in SQL. ISO/IEC/ANSI SQL standard allows for this, but the freeware end of the market does not provide the standard, and is slowly catching up. It means the server will not allow a row in the FK table to be added unless the PK exists in the parent table. MySQL recently provided DRI for Foreign Keys. For FKs, refer to ▶this article◀.
25.1. For CHECK constraints and RULES, you will have to implement those in code.
my foreign keys are like, users-id(fk) = users.id(pk) Im not sure how to add them other that what I have done but will certainly do so once I know how to.
That's not adding them into your db; that's merely referencing columns in a WHERE clause in Data Manipulation Language, not Data Definition Languge. Adding them, so that they function at the db/server level, means declaring them in DDL, as per the linked article. Then MySQL will stop a row from being inserted to a child table (FK) where the parent PK does not exist. That is Referential Integrity. If it is declared in DDL, it is Declarative Referential Integrity.
In addition to enforcement of RI, everyone can see the definition: report tools can be used by the users to access and report from the db, without having to get someone to code a report.
Yes, as far as I know. Confirmed at ▶this site◀. The code I have provided for the subquery uses DRI, so we can test that and get it out of the road early. You have to check for your specific version of MySQL.
Twenty-Five. Comments Noted. I ama not a MySQL specialist. Yes, those are the issues you have to figure out for yourself. In general, from my perusing, MySQL is legless; for anything SQL-ish, you need InnoDB.
But do not let that hold you back. Use Engine=MySQL for now, without the Declarative SQL, and keep going with both the Data Model and the Subquery. Work on InnoDB in the background.
To be clear, the DDL I have provided should work for MyISAM (and "do nothing" in the DRI department, until you get InnoDB).
.
27. Given: I have rethought the sorting requirements for bulletin. Users could sort chronologically- easy,makes sense. Users could sort bulletins by the date of the latest reply to the bulletin. Then we can forget about rank and it should be really easy to sort bulletins chronologially by the time of their last response? What are your thoughts.
Yes. that is sensible and quite common, most people understand chronological order. You will have to mess with the filters they choose in the search window (choose: Location or list; choose: Category or list; choose: My Bulletins or all).
Open Issues
(Nil)
Data Model
Ok, assuming you do not have issues with the ERD, and implementing all Closed Issues, I have modelled the data, and prepared a Fifth Data Model 09 Dec 10 for your review. I definitely need much more feedback, questions, etc, on this. I am experiencing difficulty accepting that it is done. Probably best to start writing real code for your problem areas.
Links
▶Link to IDEF1X Notation◀ You really need to read and understand this, before you read the Data Model.
▶Link to Fifth Bulletin Data Model◀ The Entity Relation Diagram is on the first page, followed by the Data Model.
The Keys are pretty much straight IDEF1X (except for UserId which I provided as a counterpoint); which means purse Relational Keys. Un-enhanced and not optimised for Physical considerations. Before you baulk at them, first notice them, register them, and evaluate them. Of course we can add Idiot keys, but before we do that, let's make sure we understand what we are going to lose.
Notice the Identifiers (solid lines) as per the Notation document. The spine, the vertebrae of the system is Location ... Bulletin ... Response.
Notice that Keys actually implement many Business Rules.
Notice the Natural Hierarchy that I have rendered. See if there is any meaning in it for you.
The VerbPhrases are really important; see if they mean anything.
Comments re First Data Model and Responses
One question I have is that the primary key of the location will be used to form the child primary key?(they are joined by a solid line) I don't really understand that concept
Yes. the PK for Location (above the line) is (StateCode, Town). That PK the two columns together, a compound key, is migrated from Location to Bulletin anyway, as an FK (bold). We are additionally using it to form the Bulletin PK (above the line).
If and when we need a Surrogate key, we will add it. For now, we are working out the Identifiers. So the question to contemplate is:
What is a good Identifier for Bulletin ?, what do your users naturally use to Identify a Bulletin ...
"have you seen the bulletin from Virginia FO yesterday ?",
"Sally from Washington sure writes good bulletins", etc.
or why that relationship does not exist between the user and the bulletin?
Well, that relation cannot exist between User and Bulletin, but a relation exists, the dotted line, meaning UserId is an FK in Bulletin(bold), but not used it to form its PK (below the line).
Or do you mean: the User is a strong Identifier for Bulletin (and therefore should be used to form the BulletinPK, therefore the line should be solid) ?
Fine. Excellent. That is what modelling re Identifiers is all about. That clears up an area that I did not like, in that we had non-unique indices. That resolves my issue as well.
As per intention stated further above, since I have now shown Rating as a table and what the rendering would be, once, I shall remove it
I think Permission should be an Entity.
Bulletin PK is now (StateCode, Town, UserId, SequenceNo). To be clear, SequenceNo is within StateCode, Town, UserId: it will be 5 for Sally's 5th bulletin re MO/Billngs FO.
Note that user Settings BulletinsPerPage,etc, are 1::1 with User, so they are in User; child table would be incorrect.
Typographical errors corrected.
Comments re Second Data Model and Responses
The PKs for both Bulletin and Response have been changed to reflect (7). BulletinNo and ResponseNo have been replaced with BulletinDate and ResponseDate (which used to be CreatedDate), in order to allow multiple replies per User per Bulletin.
Comments re Third Data Model and Responses
Trust you had a good break.
At least 30 years ago (that I am aware of), the giants in the industry had this debate. Names are always singular. Tables are nouns. VerbPhrases are verbs. This is not limited to db naming conventions, it applies to documents, theses, dissertations, etc. You may have 5 conclusions at the end of the the doc, but the section or chapter title, in both the ToC and the top of the page is "Conclusion".
After fighting them all the way through Uni, as soon as I started my first paid programming job, and saw the importance of the rules in the real world, as opposed to the theoretical arguments we had in college, I gave it up as a waste of time. All that time and energy I wasted was released to do productive work. Since then, I don't question the giants; I just accept. That their minds are greater than mine. It is like accepting Standards, or behaving within the law, or God. I have no really, really good reasons for doing anything illegal.
Anyway, the ease of languaging (discussion, SQL, documentation) that is supported by such rules cannot be adequately explained; as you write more and more SQL code, it will become clear.
You are always free to use whatever you want. I deliver singular only.
Fine with me.
But you need to keep in mind, those two elements, in the identified sequence (ala non-PK Unique Index, or Alternate Key) are universally required to establish Uniqueness for a Person. Removing them will result in two things. First, you will no longer be able to identify uniqueness across Users (and thus you may have duplicate rows). Second, the AK becomes non-unique, an Inversion Entry.
The point is (contrary to one of the posts), any column that is 1::1 with theUserPK, should reside inUser. All preference settings. Since we cleaned up theInterestedLocationsandInterestedCategories, I know only of onlyBulletinsPerPageremaining; but I am sure there are others. IsPreference2is an eg. of a boolean;NumPreference3is an eg. of an Integer. Etc. You can tell me what the real Preferences are.
(Let's try that in plural: ... any column that is 1::1 with theUsersPK, should reside inUsers. Just doesn't do it for me, I get hung up on the broken English, and I am a bit precious about my mother tongue.)
Data Model Updated.
Excellent. Let me know when you are comfortable with that, and I will give you the Physical Model.
How about the VerbPhrases ?
Comments re 06 Dec 10 20:38 EST (Small Updates)
.
28. Where there is only one occurrence of PK as an FK, of course, the FK column name is the same as the PK column name. However, when there is more than one occ of the FK (take a look at ResponseRating), there are three UserIds), we need to differentiate them. In IDEF1X terminology this is called Roles. The Role of the User who issued the Bulletin is Issuer, and so on. Obviously it is better to use that name, and keep it consistent throughout the hierarchy (not UserId in Bulletin and then when we get to Response, where there are two, and a differentiation is demanded, change it to IssuerId. I thought you might have a problem with that; in the early stages, the usage is Issuer.UserId so that it is absolutely clear the it is UserId as an FK, and the Role is Issuer; when we get to the physical model, it gets simplified to IssuerId.
Likewise, we have many DateTime columns (Date for short if you like; otherwise Dtm), that need to be differentiated.
.
29. Did the IDEF1X Notation doc not make sense ?
The PK for each table is above the line, in the specified order.
Remember we are carrying the PKs of the parent tables anyway, and if there is meaning, using those FKs to form the child PK.
For Bulletin:
The Location FK (StateCode, Town) for which it is Issued
The UserId of the Issuer
and DateTime it was Issued, to make it unique.
therefore (StateCode, Town, IssuerId, BulletinDate)`
To delete all ResponseRatings for this Bulletin, use WHERE = on those four Bulletin columns.
.
30. Because (State, Town) is the PK of Location, carrying wherever. And it forms part of the Bulletin PK, so any dependent tables carry those columns because they are carrying the Bulletin PK.
Look for the coloured Tabs (This version only)
.
32. Those are Verb Phrases. The way to read them is detailed in the Notation doc. It appears you have a good handle on it. It is really important to get the table names (and the Verb Phrases) right, because change is difficult after implementation. If you tell me Office is better than Location, that's fine with me.
Read: Office Is Activated By Bulletin
Feel free to supply another Verb Phrase.
AFAIC, the Office is dead to the rest of the org, and only comes alive on their radar (is activated by) the issue of a Bulletin.
I realise it sounds silly here, but ignore that for a moment, something along the lines of "Office expresses its aliveness; advertises its activity, by issuing a Bulletin".
Have a quiz at Mark's Sensor Data Model, for some nice Verb Phrases.
We had previously identified that (State, Town) is the PK, I will leave that as is Refer to (38) for change.
.
33. Worth discussion. Yes, if you are going to display it when (eg) displaying Responses, and the users understand UserName. No, if it is 30 bytes, and there is also an unique 4 byte UserId. The idea is to make these choices consciously, aware of what you are giving up, when you eventually decide that some 6 column 30-byte key is too cumbersome to migrate to the children.
I did state at the outset, I would use UserId as a typical Id Pk, because it is carried/migrated to several child tables.
We can leave how that is created for later. But it is a pure Surrogate PK.
.
34. No problem. Category already has it. I'll change Order to ListOrder.
.
35. Sure. Based on what I have read and heard, I am quite happy with it. But I would like more back-and-forth to achieve some confidence, before you write code. Alternately, view it as a learning experience, and accept that the model and code may change later. Would you like me to produce the Physical now ? If you give me any and all corrections, I will publish the next version. I am expecting preferences in User. Also, quickly run through the functions and check that you have all the columns you need.
Do look at some of the other answers, for the purpose of learning, and interest.
.
36. Joins. You just join on four three columns as opposed to one. SQL is cumbersome with joins, and the new syntax which was supposed to make it easier, is actually more cumbersome. My coders never write joins: we save time and typos. I have a proc that given two or more tables, will generate the code with all the columns and joins. I don't know enough of MySQL to convert that for you.
Data Model Updated.
.
Comments re 08 Dec 10 20:49, Fourth Data Model and Responses
.
Check the previous section immediately above, there are small updates.
IDEF1X: Your speed is fine.
Note the child always "inherits" the Parent PK, as an FK (either solid or broken line), otherwise there is no Relation between them. By using these columns that exist in the child anyway, to form the child PK, we carry the meaning (and that is the difference between solid and broken). And thus we do not need to look for an independent Identifier for the child. The Relational power in this method will become clear later, when you are coding.
The section we are dealing with is about Identifiers: natural vs unatural; meaningful vs meaningless. Later you will see how we can use the Relational capability of the engine, when the child PK is formed from the parent PK. (Isn't your surname the same as your father's ?)
It is also important to understand Relational databases and their capability. That is lost when we approach the database (eg) from an OO perspective, and treat it as a location to make our classes "persistent". Therefore, we will try to learn and use Relational terms. It gets difficult when you go to France and expect that they speak American, and use the same currency; learn to speak 10 words of French, and they welcome you with open arms, and you'll have quite a different experience with the locals.
Anyway, go ahead with implementing the model. Just realise we will probably make a change at some point. Save all your DDL. Save all your test data as insert statements or as a table backup or character format export (no idea what MySQL can/cannot do in this area).
.
37.1. Handled, the n::n Relation with Office & Category. You will only "see" that when we get to the Physical Model.
37.2. Done.
37.3 Done.
.
38. Excellent. Shorter as well. Note they will never be able to have two Offices in the same Zip Code. NUMERIC(5,0) is good, but I thought the US was moving towards 7 digits. Doesn't matter, you can figure it out; it is an excellent PK for Office. Now this column, which was part of Address, probably ZipCode, has been elevated to a higher purpose, without duplication; since we are carrying it in 5 child tables, and we want the PK name to be clear, as per previously explained conventions, we will call it OfficeCode; OfficeZipCode might be silly.
We need an Unique Index on Name to ensure they do not add two Offices with the same name. Note, for explanation purposes, this is is actually the logical key of Office, replacing (StateCode, Town), and it remains so.
I still think you may need StateCode and Town as a quick reference (other than sitting somewhere in Address)
Data Model updated, Fifth now available for review. You did not state your preference, for ...Date vs ...Dtm. I am going with the latter, as it is more spceific, identifying the time component as well. Easy to change.
This Answer has reached maximum length. Continued in "Part II"
They key to having an efficient database is to simplify. The main goal of a relational database is not to repeat any information. I took your SQL dump and quickly drafted a simpler version that is normalized, to the best of my knowledge. I did leave some of the fields you had in for cvs's ect. I have removed fields that it would be simpler to just recalculate by querying the db when the information is needed, such as a users total posts and a ranking of a given post. I also removed your bb_replies as you can accomplish the same result with referencing to a parent post. I have renamed the tables slightly to what made sense to me, you can use what ever naming scheme you feel comfortable with. I find that using terms that are simple makes it easier to understand how the data relates to each other.
I must admit that I do agree with some of the comments above, there are plenty of BBs out there that work just fine and would have all the functionality you are looking for. And you are lucky I am in the reading mood tonight lol that was one long question. Simplification is key in everything :)
SET #OLD_UNIQUE_CHECKS=##UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET #OLD_FOREIGN_KEY_CHECKS=##FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET #OLD_SQL_MODE=##SQL_MODE, SQL_MODE='TRADITIONAL';
-- -----------------------------------------------------
-- Table `users`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `users` (
`id` INT NOT NULL AUTO_INCREMENT ,
`username` VARCHAR(45) NULL ,
`password` VARCHAR(100) NULL ,
`email` VARCHAR(255) NULL ,
`first_name` VARCHAR(100) NULL ,
`last_name` VARCHAR(100) NULL ,
`permission` INT NULL ,
`created` DATETIME NULL ,
`modified` DATETIME NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `categories`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `categories` (
`id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(45) NULL ,
`description` TEXT NULL ,
`order` INT NULL ,
`admin` INT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `locations`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `locations` (
`id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(45) NULL ,
`description` TEXT NULL ,
`address` TEXT NULL ,
`order` INT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `posts`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `posts` (
`id` INT NOT NULL AUTO_INCREMENT ,
`post_id` INT NOT NULL ,
`user_id` INT NOT NULL ,
`category_id` INT NOT NULL ,
`location_id` INT NOT NULL ,
`title` VARCHAR(45) NULL ,
`content` TEXT NULL ,
`created` DATETIME NULL ,
`modified` DATETIME NULL ,
PRIMARY KEY (`id`, `post_id`, `user_id`, `category_id`, `location_id`) ,
INDEX `fk_posts_users` (`user_id` ASC) ,
INDEX `fk_posts_posts1` (`post_id` ASC) ,
INDEX `fk_posts_categories1` (`category_id` ASC) ,
INDEX `fk_posts_locations1` (`location_id` ASC) ,
CONSTRAINT `fk_posts_users`
FOREIGN KEY (`user_id` )
REFERENCES `users` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_posts_posts1`
FOREIGN KEY (`post_id` )
REFERENCES `posts` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_posts_categories1`
FOREIGN KEY (`category_id` )
REFERENCES `categories` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_posts_locations1`
FOREIGN KEY (`location_id` )
REFERENCES `locations` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `likes`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `likes` (
`id` INT NOT NULL AUTO_INCREMENT ,
`user_id` INT NOT NULL ,
`post_id` INT NOT NULL ,
`like` TINYINT(1) NULL ,
PRIMARY KEY (`id`, `user_id`, `post_id`) ,
INDEX `fk_posts_users_users1` (`user_id` ASC) ,
INDEX `fk_posts_users_posts1` (`post_id` ASC) ,
CONSTRAINT `fk_posts_users_users1`
FOREIGN KEY (`user_id` )
REFERENCES `users` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_posts_users_posts1`
FOREIGN KEY (`post_id` )
REFERENCES `posts` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `sort_options`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `sort_options` (
`id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(45) NULL ,
`description` TEXT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `preferences`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `preferences` (
`id` INT NOT NULL AUTO_INCREMENT ,
`user_id` INT NOT NULL ,
`pagination` INT NULL ,
`sort_option_id` INT NOT NULL ,
`categories_csv` VARCHAR(45) NULL ,
`locations_csv` VARCHAR(45) NULL ,
PRIMARY KEY (`id`, `user_id`, `sort_option_id`) ,
INDEX `fk_preferences_users1` (`user_id` ASC) ,
INDEX `fk_preferences_sort_options1` (`sort_option_id` ASC) ,
CONSTRAINT `fk_preferences_users1`
FOREIGN KEY (`user_id` )
REFERENCES `users` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_preferences_sort_options1`
FOREIGN KEY (`sort_option_id` )
REFERENCES `sort_options` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
SET SQL_MODE=#OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=#OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=#OLD_UNIQUE_CHECKS;
Subquery First, then the RANK() Function
Relax, son, we'll get there! Your speed is fine.
Preparation
The first thing, you really need to get access to a decent set of manuals, for your specific flavour of MySQL. I found ▶this one◀. As before, you have to do your own debugging, but I am now providing SQL that is as close to generic MySQL as possible. I've confirmed that everything we are going to be doing is entirely possible in that flavour of MySQL (I don't know what flavour/version yours is, except ENGINE=MyISAM).
Subquery
Ok, let's start again. I have written a ▶series of SELECTS◀, to lead your through the process. Please complete each one, and understand it completely before progressing to the next. If you have any questions, stop, and post the question.
The code is written and tested in Sybase; then downgraded for MySQL (from perusing the web, eg. the above site), and tested as much as possible in that state.
The first bit creates and loads three tables for use.
The first SELECT is a straight join of the three tables, no subquery. You need to get that to work; that is, understand what is does, fix any syntax problems; figure out the differences between the SQL I provide and the SQL runs on your server. And get used to making those changes. We can't keep stopping for that.
The second SELECT produces exactly the same result set. It introduces the concept of a Subquery, which is used to populate a single column.
Drive that bus. Respond when you're done or if your have problems.
Responses to Your Comments of 03 Dec 10 17:51
Straight Join
I have never seen that way of doing joins before, I have always used left join, right join or inner join. Ok so for this first query we are just joining the two tables student and course with the studentcourse table sitting in the middle as the associative table. Results are repeated as expected because one student might be on more that one course and they will have a result for that course.
Yes.
That ( x=y in the WHERE clause ) is the traditional way of identifying joins, it is much more clear; the LEFT/RIGHT/INNER/OUTER JOIN syntax is the "new" way. Much more cumbersome AFAIC, but the learning is relevant because it is fundamental to what comes later. Feel free to convert to the latter syntax, and back again, for purposes of understanding.
Repeats ? That is not what repeats or duplicates mean. All the rows are discrete, true rows in CS. You should get the same 15 rows in every report (as we progress).
(ps when i direclty create the tables using queries you provided, the names are converted to all lowercase while the column names can still be camel case.)
MySQL is very strange. (It appears to be doing the naming conventions for us!)
.
2. Simple scalar query
A few issues with query. You use the alias(in the scalar subquery) before you have defined what it is?(StudentCourse sc) I guess I always incorrectly assumed that you have to say define an alias before you use it.
You are thinking procedurally. SQL is a set-oriented language, for manipulating Relational sets of data.
The whole query gets evaluated and optimised in one pass. There is no "before" or "after". I am defining it in the same batch of SQL that I am using it.
I don't entirely understand the use of the alias 'in-ner' in the scalar subquery, is this to say that you want it to check each row individually(not sure how to explain this) instead of on a table wide check?Ie when you are doing this check make it local to the particular row you are on?(terrible explanation sorry).
For purposes of understanding/debugging, evaluate the subquery first (the contents of the brackets), alone. Understand it fully. Note the use of "sc" and keep it in your hat.
in_ner and sc are ALIASES, that is, handles for the table name that it sits next to in the FROM clause; that we use elsewhere in the code for convenience
in_ner is a descriptive name for the table referenced in the Inner Query, the Subquery
sc is a descriptive name for the table referenced in the Outer Query, which is only Outer because it has an Inner query, otherwise it would be a flat query
we could just as easy use fred and sally
Aliases such as in_ner and out_er are meaningful when the same table is referenced in both the Inner and Outer queries.
notice the join between the Inner query and the Outer query WHERE in_ner.CourseId = sc.CourseId
I have related the table referenced in the in_nerquery to the table sc referenced in Out_er query
Such a subquery is called a Correlated Subquery
See if you can visualise the Outer query (result set) as a grid, a spreadsheet, 15 rows by 4 columns.
Make sure you understand that Outer query, "easy" as it is. Notice that it is the same as (1. Straight Join), with a different method of populating one column.
As i understand it the scalar subquery asks for Name where the courseId's in Course and studentcourse are the same.(pretty straight forward) and is an alternative to saying that in the where,
Yes, exactly.
And notice that we are after only the Course.Name which is a 1::1 join from StudentCourse to Course, on CourseId. Notice exactly the WHERE clause in (1) that we are replacing in (2); in (1) it applies to all rows.
But because we are grabbing one datum; one cell; one item for a specific row/column; not all rows; not all columns, it is called a Scalar.
We are obtaining it using a subquery, which has to be constrained to the specific row. Therefore we need to relate the row from the outer query to the row in the Inner query.
so the Correlation between the Inner Subquery and Outer (specific row) is required.
And if we did not have that identification of the specific row, we would be loading rubbish into the Scalar, or it would return a Table (not a Scalar value) and the query would fail.
Try that, take the WHERE CourseId = sc.CourseId out
So that you know what the error message is, so that when it happens in future, you will know "Aha, I am returning a table, not a scalar; I am missing something in my Inner WHERE clause; I am not identifying a specific Correlated row".
.
it is not quite "asks for Name where the courseId's in Course and studentcourse are the same"; it is getting the Course.Name for a specific StudentCourse.CourseId, which is identified from the outside, whatever sc row it is.
with the differnece that you can make this check row by row before the where.
you are thinking procedurally; there is no "row-by-row"; the dbms is set-oriented; the result set you are building is a set. Re-state the question is set terminology.
I used Course instead or in-ner, what is the point of using an alias in this case, is it just to show that aliases can be used?
Yes. And to highlight issues. And to differentiate the Inner Query from the Outer query. In the Inner query, the "inner" Alias, or any alias is not demanded. Only the Alias relating to the outer query is demanded.
Something I don't understand here is that when I try to do this, 'course.Name' it says unknown Course.Name in field list. this is the way that I have always defined that i mean Name in the Course table and not some other table. What would happen if I had two tables with a name column?
Exactly. If it were ambiguous, then you would have to supply the table name or alias; where it is not ambiguous, it is not demanded, but nice to have for documentary, clarity, purposes. You have to figure out why MySQL is not accepting it. Mixed case/lower case madness ?
I have also never seen that order by syntax, I can see that 1 and 4 mean the column numbers but why bother passing it two columns?
Huh ? Because I want the result set ordered by Course.Name in ascending order, and within that, by StudentCourse.Mark in Descending order.
If I did not state the order, MySql would produce the result set in whatever order it gets it from StudentCourse (chronological ?; by index ?). Whatever that default order is, find that out, you need to know it, and thus avoid an ORDER BY, when it is unnecessary.
Take the ORDER BY out and play with it.
Try ORDER BY 4 DESC, 1
It is not "passing", I am telling it what to do with my result set, in the one SQL command. The only passing you are doing is between your app (PHP ?) and MySQL.
2.1. Ok, when you finished with (2), and completely happy that you understand it, do this exercise.SELECT (SELECT Name
FROM Course
WHERE CourseId = sc.CourseId
) AS CourseName,
() AS FirstName,
() AS LastName,
Mark
FROM StudentCourse sc
ORDER BY 1, 4 DESC
Produce the same grid format, we want the exact same result set as (1) and (2).
Fill in the two pairs of empty brackets with the appropriate subquery; ie. write a subquery to populate the FirstName column, and another to populate the LastName column
Responses to your Comments re Third Data Model
2.1. Perfect, yes, we move on.
.
You are cooking with gas, so if you don't mind, I will take your text, and annotate it a bit; notice the differences, they may or may not be subtle.
The correlated scalar subquery says that for each course id we need the highst mark, as opposed to the highest mark for all the courses. This is where the correlated aspect of this subquery comes into play because we are relating the outer query to the inner query for this particular row. [Yes!] The way that I am currently visualizing [That's it, use the visual part of your mind, not the serial part] it is that the outer query runs through the tables putting together the result table set, and each time it creates a row it runs the scalar subquery and picks out [a single value to fill the cell; here it is] the highest mark where the courseId's match, so when it is on a row where the course id is 66 then the scalar subquery is only looking for the max mark where the courseId is 66.
I could hardly have said it better myself.
There is no such thing as "result table".
Add one more definitive item.
The outer query defines the result set.
The subquery is independent of that; it is merely Correlated or Indexed.
Ok, so you have that SQL working, right ?
Now that you understand that, the next step is to visualise the result set, and to visulaise the subquery (3, unchanged) filling the entire column. if the above text was a balloon filling one cell at a time, then visualise hundreds of ballons, filling consecutive cells. Then visualise a bucket poured into the column.
Now leave that two dimensional result set alone for a minute, and visualise another layer on top of it. This is the parallel layer, where you write your subquery code.
If ever you have difficulty getting a subquery to work, go back to this, your way of visualising, one result set, and another layer for the subquery, which pours a bucket of scalars in, to fill the column. It eliminates all the well-known subquery coding bugs; removes the use of GROUP BY, DISTINCT, and all those ham-fisted methods of getting a long angry snake to fit into a jam jar.
.
Three more small steps before you proceed to (4).
2.2 Re-read my response (2) above, all the way down to this point. No skimming. This is because when you teach your mind something new and different, you need to re-inforce it. It is an officially recognised and labelled technique.
Responses to Comments of 08 Dec 10 20:49
2.3. Write that query (3) without using subqueries, and ensure you check the results. If you catch yourself laughing when you are writing the code, it is a good sign. As long as you produce the correct result set, you pass, but try to write the most efficient code (fewest COUNTS and GROUP BYs, etc). Do this only if you want to run circles around your peers, to be able to answer any "how do I code ..." question on your database.
I'm not sure what you mean by write that query without using sub-queries? I thought we wanted to avoid the use of group by's etc
Yes. Absolutely. You've walked forward. Now walk backward without tripping. This will really help your understanding of walking forwards, when it is better to use a subquery vs a join. Code the query with GROUP BYs and COUNTs. The fewest. Don't laugh.
2.4. Write the subquery (3) on your database, to produce a list of Bulletins, the outer query has to be FROM bbs only; with a count of likes, and a count of dislikes. So trunacte the tables and do 10 or 12 meaningful INSERTS, fibe minutes, big deal.
I used the method of using sub-queries on my database to put together a list of bulletins replies, count the number of reply likes and dislikes and get a particular users rating. it was great because I didn't have to use any group by's or counts and I didn't have to create temporary tables like I did for the bulletins.
Well, that's perfect. Now we are getting a bit of Relational Power in your spinach.
Now, go and look at this question and answer; ensure you compare the code. You've come a long way in just a few days.
When you finish (2.3), read your (2.4) query again, to refresh yourself, and move onto (4).
If you get stuck, replace the word "Rank" with "CountOfStudsWithHigherMark", and give it another go.
Responses to Comments of 11 Dec 10 13:14
2.3 I am having trouble writing that query without a scalar subquery. Scalar subqueries always made more logical sense to me even before I knew how to do them. That is why I said "I guess the problem I am running into here is, how do you refer to user-id = x in this particular row, not in all the row" in that previous question. Correlating the scalar subquery to the main query with and alias was the answer.
The (2.3) exercise is intended for you to:
really understand the incorrectness of the fat query with the GROUP BY (in a relational database using a set-processing relational engine) vs the correctness, elegance, and speed of the Correlated Subquery. You have achieved that. That will place you above your peers, in terms of SQL coding ability.
be able to identify when a fat WHERE clause and when a Correlated Subquery is appropriate. I am not sure, but it looks like you have achieved that.
be able to correct and debug this kind of issue when maintaining code written by others, and to be able to teach them the distinction. It sounds like you have a good visual, relational ability; which has been re-inforced by the exercise; and now you cannot go back to inferior methods. That is, you can understand and fix incorrect SQL code, but you cannot communicate that to others.
As long as you understand those distinctions and accept that, I am happy to drop (2.3) and move on.
Read your (2.4) query again, to refresh yourself, and move onto (4).
If you get stuck, replace the word "Rank" with "NumStudentsWithHigherMark", and give it another go.
Don't read further. The following is "old code"
Here's a ▶Quick Tutorial◀ on the RANK() operator (as it is commonly known). It is not ANSI SQL; it is an Oracle and MS extension. However it is not required if you understand Subqueries, which is why Sybase does not have it. I doubt MySQL has it, so you need to get your head around it. Understanding Scalar Subqueries is a pre-requisite. Sybase syntax, so whack your semi-colons in, etc. Feel free to ask specific questions.
I have never seen that approach of writing Rank = (SELECT.... Is that the same as (SELECT ...) as Rank?
Yes, () AS Rank instead of Rank = () are both legal SQL; MySQL may not like the latter form. The brackets containing the Subquery, of course. Note that Rank is the name of the derived column.
I have already stated that understanding subqueries is prerequisite. That means that millions before you have had this problem, and the lecturers figured out that you would suffer less frustration if you followed the lessons in the prescribed order. So forget RANK for now, and learn subqueries.
Try this (I supply ANSI Standard SQL; I do not have MySQL; you will have to syntax-fix it for MySQL; I don't fix syntax problems; that's your job):SELECT COUNT(*)+1 AS Id_iot -- not you, everyone who uses them blindly
(SELECT title in_ner FROM bb_locations WHERE out_er.bb_locations_id = in_ner.id) AS Location,
title AS Bulletin,
created_date AS Date
FROM bbs out_er
in_ner and out_er are ALIASES, that is, handles for the table name that it sits next to in the FROM clause; that we use elsewhere in the code for convenience
in_ner is a descriptive name for the table referenced in the Inner Query, the Subquery
out_er is a descriptive name for the table referenced in the Outer Query, which is only Outer because it has an Inner query, otherwise it would be a flat query
we could just as easy use fred and sally
notice the join
I have related the table referenced in the in_nerquery to the table referenced in out_erquery
Such a subquery is called a Correlated Subquery
This is just an example, simple, so that you can learn Subqueries; purposely chosen to provide the same result set as one you are familiar with producing, using straight joins (bbs and bb_locations in the FROM clause, joining via the WHERE clause or JOIN syntax).
Because it produces a single value, it is called a Scalar Subquery (those that produce rows are Table Subqueries; and cannot be used like this, to load a single value into each row)
There is no suggestion that anyone should "use Subqueries instead of Joins". Absurd. Subqueries have their place, and Joins have theirs. Misue is a different thing.
Now, drive that bus. And don't talk to me about RANK until you can drive that bus around every corner in your database neighbourhood without killing any children.
I don't understand inner and outer, when I google them I get INNER JOIN what are they called so I can research further
Aliases. Refer above.
When I run that select statement I get this error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE inner.Mark >= outer.Mark ) FROM studentmark outer ORDER B' at line 5
first, as per reasons detailed above, I can't write MySQL syntax, and debugging is your job
second, I realiise that you can't debug what you can't understand, so drop it for now (it has to do with RANK) and as you learn the MySQL flavour of SQL, all these things will be resolved
third, let me assure you that it runs on any Standard SQL server. It gets used in about 10 courses a year, so hundreds of participants per year. I just ran it again on Sybase, just to check.
first thing I would suggest is, since the MySQL optimiser sucks dead bears; it does not understand context, inner and outer are probably being treated as reserved words. So change that as per the above code.
Part II
Continuation of Part I, due to that Answer reaching maximum length.
Revised 14 Jan 11 - 05:40 PST
Comments re 11 Dec 10 13:14, Fifth Data Model and Responses
a. IDEF1X Design/Diagramming tool.
I do not know of any freeware options. The MySQL design tool reportedly crashes often. If you are happy with my diagrams, I am happy to work with you for the duration, until the final model is resolved; ie. I provide the Data Model, and you can skip that task. For ongoing work, yes, you need a diagramming tool, perhaps not a database design tool. Refer my comments at the end of p2 in the Notation doc.
.
29. Are you clear about the PKs and FKs in each table, as per the coloured tabs in the Fifth Data Model; can I remove the tabs now ?
.
38. Closed.
.
39. All the Dtm columns will be MySQL DATETIMEdatatypes. The variables you use for those columns should be the same. TimeStamp has a different meaning. Using the correct Datatypes is the first (big) step towards ensuring that the data is coorect and no illegal values are allowed to enter the db. Ie, only valid dates and times will be allowed. Further, you can interrogate any date or time component (eg. month or day name) from it. Check this document.
.
40. No Problem. Instead of having just the one category hardcoded, how about (like the handling your of Permission), we implement an Category.IsRestricted and then Permission 5 becomes Post Restricted Bulletins.
.
41. Done.
You should think about doing the same for Category and User. You want to be able to delete them by setting the Indicator, without removing the entry (and all the Bulletins, Responses, Replies, etc) fro the database. It has to be retained for historical purposes, but you need to disallow the User from logging in and doing anything. I have included this in the DM.
For such column names and Booleans in general, personally I prefer to identify the minority or exception case, as in IsObsolete.
20.2. Done. Table and column naming now progressed to InnoDB format.
.
Subquery responses in Subquery Answer.
Comments 13 Dec 10 13:14 EST and Responses
.
41. See (41) and next para, above.
.
42. I meant: either Title or Description is enough; we do not need both.
.
43.1. Implementation of Data Model. Go ahead. That's why I gave you the Physical yesterday.
.
43.2. Design/Drawing. Go ahead. I have already commented in (a) above.
Data Model
Sixth Data Model supplied, containing all changes as per above.
The Physical means a lot more detail required for implementation/coding: Datatypes; n::n Relations implemented as Associative tables; etc. You are pretty much ready to implement the Data Model, which means you need the Physical. And you already have the Associative tables figured out. Therefore I have taken the liberty of providing you with the Physical DM, even though you said you were in no hurry.
Note that Domains (User Defined Datatyptes) should always be used in a database, both for the DDL; the $variables you use. And a private Domain for each Primary Key. But this is not possible in MySQL, therefore the Datatypes are raw, regrettably.
Fixed length columns are much faster than variable length; I do not provide (advise) Var length. You are free to implement what you like.
Are you sure you need both Category.Title and Description ? I think not, but I have left it in until you confirm.
Enjoy the little blue glass buttons, and the navigation from the Collapsed Entities.
Please read the IDEF1X Notation document again, I expanded it last week.
Depending on how the Open Issues close, and any issues you may have, we can progress another edition, in the next day/night.
Comments 28 Dec 10 10:34 and Response
I have begun implementing the data model. I assume that the 6th data model is the physical model because it contains the associative tables.
Yes, I supplied that, and the Datatypes, because you said you were ready to implement.
There are still a few minor outstanding items. May be a good time to go through your question; all three of my answers, and check. Category.Title and Description, for instance.
I will put up a database dump once I am done.
That is not necessary, given that the model has the Dataypes defined; but if you do post it, sure, I will check it for you. Email may be better.
I will then put up a list of all the queries that I need to run on the database and begin writing them.
Very good idea, to take a structured and planned approach to the job.
Implementation of Physical model
(39) With mysql I am not able to assign more than one primary key so I am just going to make them unique and not null as you suggest in the documentation. Do you think it would be a good idea to index them as well?
Not sure what you mean, what is "them" ?:
you can never have more than one Primary Key on a table; the Alternate Keys are Unique, one of them is "primary"; that is carried as FK in the child tables.
with InnoDB (what you said you will get), you can define PRIMARY KEY constraints (which is equivalent to UNIQUE, NOT NULL)
with MyISAM (what you have now), you need an Index, UNIQUE, NOT NULL for the Primary Key (above the line in the model)
for either InnoDB or MyISAM, each Alternate Key (AKx[.y] in the model) must be defined as an additional index, UNIQUE, NOT NULL.
Comments 07 Jan 11 14:08 and Response
(40) Could you explain why the category.CategoryCode is a char of 4 characters. Why not just use an number like we do for user?
40.1. The idea is to use good natural Identifiers. Numbers are meaningless to users. If we didn't have a large no of Users, and User churn, I would not have used a number there either. A CHAR(2) or (3) or (4) allows them to pick meaningful short code for the long Category.Description, and it is small enough to be carried as a Foreign Key in user_category and bulletin.
For the developer, when testing and debugging, that short code in a list of say bulletins, will be very handy.
(40) I don't quite understand permission for category and location. Lets say that I want all users to be able to post to the Travel category. Would I set the permission of this category at 4? Why do we need to bool IsRestricted when we are giving a permission to the category and location?
40.2. I have not changed the concept or essence of Permission; it was your idea, and it remains exactly as you explained it to me.
(All I did was implement permission as a table.)
40.3. Refer (13) and (40) original exchange. category.IsRestricted defines restricted categories; there are two categories of categories, Restricted and Unrestricted. The users need a permission of 5 to post bulletins re Restricted categories, and 4 to post bulletins re Unrestricted categories.
40.4. But whoa, son, are you introducing a change or extension; eg. match the permission of the user to the permission of the category, thus allowing far more than two categories of categories ? Please don't. That would mean permission means one thing re category and a different thing re the rest of the system. Or if you do, then we have to resolve the exact need first, then implement it as a change.
40.5. Location (now office) is exactly the same for that bit (40.2) and (40.3). If you are referring to the text at the bottom, it is my small mistake, will correct it.
40.6. As per (14), office additionally has a single AdministratorId (UserId). Shown on the model as (permitted) user.
40.7. But that brings up an issue: who can administer categories ? Right now anyone with permission 5 or greater, which is a different thing. I think we need something explicit, a permission 6 = Administer Category.
Other
Processed your DDL and returned.
Data Model Updated. Number of small clarifications and two minor errors corrected.
Comments 08 Jan 11 14:08 and Response
(I think that was 09 Jan not 08 Jan ... I did check for updates.)
(40) I don't quite understand permission for category and location. I haven't changed anything. Disregard the content of the previous question as explain what permission would be set to allow users to post to the 'Travel' Category. In my implementation I simply had a permission column. If a given user had the required permission or greater then they could post to a category, is that how the new system works?
Yes. Unchanged. category.permission has nothing to do with it. They need user.permission 4 for unrestricted categories.
If the category.IsRestricted, they need user.permission 5.
Quite separately, an user needs user.Permission of whatever category.Permission is, in order to administer category. Do not use values less than 4.
(41) Query Delete a bulletin and all its associated replies and ratings.
I did not expect that, are you sure they have no need to keep all past bulletins for historical or audit purposes ?
Anyway, let's deal with that on the basis that deleting bulletins is allowed ...
I don't even know where to get started on this. In the past I have joined would have joined up the bulletin table with the response table and the response table with the response rating table and the bulletin-rating table where the bulletin id = x and deleted them. But now any one particular bulletin is identified by three columns: the OfficeCode,IssuerId and BulletinDtm. Which are carried to the child tables as foreign keys. For a start, how do I store indicate which bulletin is to be deleted in my php? Usually I would have a link like this index.php?action=delete&bulletin-id=5. Now will I have to have a link that is index.php?action=delete&OfficeCode=20001&IssuerId=34&BulletinDtm=14:02... I really have no clue how to do this?
a. I can't help you there, I am a database and SQL expert, not a php or MyISAM expert. You will need to post that as a new question on SO or the MySQL boards.
b. As far as my reading of that subject has taken me, I don't know the syntax, but yes, it can be done, it is normal. I checked before recommending composite keys to you. The corrected DDL succeeded, and the indices are confirmed, correct ?
c. The issue is simply the syntax required for composite or compound keys; and working with index.php. Something like:
index.php? action=delete & OfficeCode=x and IssuerId=y and BulletinDtm=z
d. Why can't you use mysql_query instead of index.php and thus use full SQL ? As I understand it, it works with MyISAM. Then you can use:
`$sql = "DELETE $table WHERE OfficeCode=$OfficeCode AND IssuerId=$IssuerId AND BulletinDtm=$BulletinDtm";`
e. Delete response_rating first; then response; then bulletin_rating; then bulletin. When they switch to InnoDB they will have less changes to make.
f. Most important, you will have to get them to identify the basis on which a bulletin can be deleted. ANy and all bulletins should not be deleted. Something like "no activity for one year" or "closed" (which means an added column), etc.
Comments 10 Jan 11 14:08 EST and Response
(41.10-Jan-11) No problem, the method is fine, and I have detailed related issues which need address under (41.*) above. (41.f) still needs an answer ... other than permissions, is there any basis for deleting bulletins ?
Comments 10 Jan 11 13:48 pst and Response
SO Editing. Don't worry, it is not you. The site is of poor technical quality. The editing is hopeless (and believe me, I have tried to work with it and around it, to make my Answers appear even somewhat like I want them to appear). It cannot handle indents or more than one level of numbering correctly.
Delete Basis. Ok, you have a valid basis. And the users who wrote responses would not mind if they were deleted without being asked ?
(41) What you are looking for is a "cascading delete" in Standard SQL, which is defined in the Foreign Key clause (which you do not have in MyISAM). Each INSERT/UPDATE/DELETE verb applies to one table only, and may affect other tables by REFERENCE.
For non-standard SQLs, you have the DELETE multiple_table method (non-standard syntax).
First, it is very important to understand this, before anything else. The FROM and WHERE (or JOIN) clauses in a DELETE command are separate to the DELETE itself; they are in fact a SELECT. The idea is: DELETE table_one (SELECT FROM table_one, table_two WHERE join_conditions).
Therefore:
name the four tables in the DELETE (target)
name the four tables in the FROM (how you find them, via SELECT)
ensure you have the correct (complete) JOIN clauses for the four tables; which you can test via a SELECT
which means, JOIN ON OfficeCode, IssuerId, BulletinDtm (the bulletin PK affecting the thre child tables)
use NATURAL or INNER joins, not left joins (be explicit, do not mix them up, as a general rule)
ensure the WHERE identifies the specific bulletin composite Primary Key to the deleted.
Here's a link to the DELETE syntax and the JOIN syntax.
Comments 12 Jan 11 21:48 and Response
a. Don't be lazy. Write four delete statements, bottom up. That's what we have to do in the big end of town, where we do not have "cascading deletes". Write the delete for rating_response; then copy-and-paste, and delete one line of code each time. I do not understand the angst or avoidance.
b. I repeat, do not use left, right, or any kind of outer join (which is only required for the single all-encompassing delete). Use straight inner joins only (which is not a problem with 4 delete statements). Any and all upset you are experiencing is due to your need to use one delete. Give that up, and the upset and complication disappear.
I just wrote this code for another question. That is a single SELECT command. Three-column PK times four Subqueries. I do not understand the need to avoid long (demanded, again, due to SQL being cumbersome) commands. And I didn't even use the JOIN syntax. Took me all of ten mins to write, plus five mins to test. What exactly, is the big deal ?
c. You have not forgotten the power of Relational keys, that you recognised some weeks ago, have you. Eg. ability to grab bulletin from rating_response, without having to join with rating. If you succumb to your single-column-key desires, you will lose all that. SQL is cumbersome. But that is all we have. Deal with it. The non-SQLs try to "make life easy" but in fact, introduce all sorts of unnecessary and avoidable complications. Case in point.
Comments of 13 Jan 2011 21:18 PST and Response
Deletes. Three flavours. Great. Hopefully you will have the data values in $variables, so there will not be that form of repetition. For testing, that is fine.
Delete x Four Tables. (not "Indiviudally deleting records", which is a different thing altogether; each delete except the last could net hundreds of rows). I trust the cut-and-paste took seconds. You need to be careful about forgetting to change the table names.
Single Delete Command. $variables for the first triplet. You could use column names in all but the first triplet.
Ok, so you will convert the SELECT to a DELETE, after testing. Left Joins. Required for the Single Delete but not otherwise. That's identical to (2) with the WHERE replaced with JOIN>
I've already recommended (1), but you are more likely to go with (3).
Dont be afraid of joins. If i were you i would cut down on all the DB logic you need to write and use an ORM like Doctrine or Propel, it will make things infinitely easier to design and maintain - including all those joins youre trying to avoid.

Categories