I'm building an api, but I'm too afraid to it wrong with db design. I'm trying to practice an address book, where employee can have their addresses (home, work, other). So is this many to many relationship?
Is my db design correct? a compound table is created to for flexibility
Is ON DELETE and ON UPDATE important here? How to set it so that an employee is removed, we don't want to keep other records in other 2 tables?
First off I feel compelled to add that SO is not really the place for this, I am not sure but it wouldn't surprise me if there is a site/board just for Databases. A lot of this stuff is personal preference and Opinion.
Probably something like this would be a more appropriate place:
https://dba.stackexchange.com/
That said:
I would change the PK to just id in the tables, so address_type.id it would just be id and the same for person. It just becomes to redundant to do person.person_id
the id's should be INT(10) unsigned AUTO INCREMENT 10 is ten places, or about 99,999,999,999. You can't have negative ID's so the DB should enforce this. I do 10 because, it's INT(11) and that keeps the sign place reserved. It's not really necessary, but i do it out of habit for any unsigned int.
I would plural the bridge table persons_addresses. Because, records in person or address are for one entity. Records in the bridge table are for multiple entities. For me it makes it easier to tell that it's a bridge table. All others are singular these are plural, for example.
The main thing for "naming convention" is to be consistent. if you do {table}_id for your IDs then do them all that way. If you do person don't do something like zipcodes for a table. And even the column names if you do person_id then dont do any columns like FullName, fullName or Full_name etc. I would say pick a way and stick to it, it makes it much easier when you write code if you know ahead of time the table name will be singular. As I said I like the plural use for the bridge table as you would seldom use them by themselves.
For the relationship. You would still have to delete person and address separately. But the record in persons_addresses would be updated or deleted if you changed them to cascade. I think of it this way: the table that defines the relationship is the one that receives the changes.
This is the way it should be though. Imagine you have 2 person records with the same address. If you delete one person, you don't want the address deleted from both of them. Also, you would probably not want a person deleted if their address was deleted. So at most it should be:
person > persons_addresses > address
I am not sure if there is an automatic way to delete the address when there are no records in the bridge table. I've always just manually done it, but you could use a trigger to do it if there is not a better way.
For reference:
A trigger is a named database object that is associated with a table, and that activates when a particular event occurs for the table.
https://dev.mysql.com/doc/refman/5.7/en/triggers.html
To be honest I've never done it for that and I think triggers may not fire on cascade actions, I remember something about only being fired on SQL statements. In which case it may be better to do the delete from person solely with the trigger. So you would delete a person, the trigger would fire and you would check if anyone else uses the address, If false you delete both the persons_addresses record and the address record. If true you would only delete the persons_addresses record.
One other thing I would do, is break address down to have a separate zipcode. At my work we purchased a DB table with all the US zipcodes, which contain all the city, state, county, zip ( of course ) and the latitude and longitude.
By using that our address table contains a Many to One relationship to zipcodes. One zipcode can have many addresses associated with it. And we also break that down by state using a state table. So it becomes
address
id | street | street2 | zipcode_id
zipcode
id | city | state_id | county | zip | latitude | longitude
state
id | name | abbreviation
Then when users enter a zipcode it shows an auto-complete with all that information in it.
Then the final thing we do is normalize all the ST, N, NW etc. We chose to change them to the full name so ST becomes STREET when saved. We went that way because you could have street addresses like 187 NORTH PARK which would look like 187 N PARK which is way worse then 187 PARK NE becoming 187 PARK NORTH EAST. You would be amazed the variation on addresses, what I call the "dirt" or "dirtyness".
All of this, combined, removes a lot of errors. But as I said in the comments we deal with lawsuit data, so we have to have more accuracy and thus more complexity then just an address book.
Related
I am not sure how to formulate the question, so I didn't find anything useful enough regarding my problem.
1. In PRO(projects) I have 3 columns: "organizer", "partners", "other". I want to use information from table ORG(organisations) in all of these 3. Also, I need to show more than one partner. Is it possible? For example, I have
org:
name |country|city |
apple |shop |fruits part |
cherry|plate |big |
orange|plate |little |
banana|shop |frozen fruits|
I want to show in view.php:
All projects:
name |organizer|partner |other |place |
salad |banana |apple,cherry |orange |plate,little|
salad2|banana |apple |orange |plate |
Info for place in PRO is taken from two tables, country and city. But country and city are also used by organisation. organisation's country doesn't equal project's country(for example, project takes place in London, but none of the participants'organisations are based in London).
Are all of these things doable with what I already have?
I get "circled" relationship thanks to country/city double usage, is it allowed?(my teacher said no or should be avoided -? I don't remember- and I got different opinions from web).
If you want to Add multiple partners, there are really two ways to do it.
You could add a table to store partner information you could then have a column that contains a foreign key to the projects primary key that they are assigned to.
(this would be best if a partner is never assigned to multiple projects)
this is really an extension of #1. except in this case if you need to have projects assigned to multiple partners and partners assigned to multiple projects you could add a another table with only two columns. The first would be a foreign key to a project primary key, and the second would be a foreign key to partner's primary key. This way if you want to query for all of the partners assigned to project you could do:
Select * from Partners_Table where id=(Select partner_id from CrossReference_Table);
you can do the same the other way too.
Your Country Fields should be foreign keys to your country table, Same with city because your projects are not necessarily in the same city as there respective organization, therefor they should not be linked to the organization's city and country field. Everything else looks good from what I can tell.
I am developing a web portal which will store the job requirement like, experience, salary etc in a database and whenever any user (new/old) matches that criteria the job should display him in his dashboard after he logins.
My Columns in employees are
Age, City, Industry, Marital status.
So, when admin post the jobs, he will define the criteria which user can see this. For ex. Age between 20-30, City only Mumbai like that.
How do I store these information in database efficiently.
I am using PHP/MySQL.
You would ideally create a table with the user's:
Unique ID
Name
Marital Status
City
Age
Create a second table to pair industry and UUID, like so:
Unique ID
Industry
This is so that a given user can belong to more than a single industry.
Third, create a table to pair user IDs and experience:
Unique ID
Position
Industry
Start date
End date
Since industry and experience are data which a given user can possess an arbitrary quantity of, you need to abstract the data into its own tables. Don't try representing all of this information in a single table - it's a solution that scales poorly past a single employer.
I'd also like to note that if your application is going to be deployed in the United States and several other nations, it's actually illegal for employers to discriminate based on age and marital status. I'm assuming this doesn't apply to you, but there it is.
in terms of speeding up your look ups, the most important thing you'll want to do is make sure you index the columns that you will be searching against.
So for instance, if you want to do a search that is based on someonen's start date:
like:
select * from tablename where start_date > 'some date';
then it's very important that you index the start_date column on the 'tablename' table.
Apart from making sure that your tables are orthongal deciding what the best way of setting up your database you'll want to ask your self what kind of questions will you be asking your database and design your tables around those questions.
Say I have a table customers with the following fields and records:
id first_name last_name email phone
------------------------------------------------------------------------
1 Michael Turley mturley#whatever.com 555-123-4567
2 John Dohe jdoe#whatever.com
3 Jack Smith jsmith#whatever.com 555-555-5555
4 Johnathan Doe 123-456-7890
There are several other tables, such as orders, rewards, receipts which have foreign keys customer_id relating to this table's customers.id.
As you can see, in their infinite wisdom, my users have created duplicate records for John Doe, complete with inconsistent spelling and missing data. An administrator notices this, selects customers 2 and 4, and clicks "Merge". They are then prompted to select which value is correct for each field, etc etc and my PHP determines that the merged record should look like this:
id first_name last_name email phone
------------------------------------------------------------------------
? John Doe jdoe#whatever.com 123-456-7890
Let's assume Mr. Doe has placed several orders, earned rewards, generated receipts.. but some of these have been associated with id 2, and some have been associated with id 4. The merged row needs to match all of the foreign keys in other tables that matched the original rows.
Here's where I'm not sure what to do. My instinct is to do this:
DELETE FROM customers WHERE id = 4;
UPDATE customers
SET first_name = 'John',
last_name = 'Doe',
email = 'jdoe#whatever.com',
phone = '123-456-7890'
WHERE id = 2;
UPDATE orders, rewards, receipts
SET customer_id = 2
WHERE customer_id = 4;
I think that would work, but if later on I add another table that has a customer_id foreign key, I have to remember to go back and add that table to the second UPDATE query in my merge function, or risk loss of integrity.
There has to be a better way to do this.
I got here form google this is my 2 cents:
SELECT `TABLE_NAME`
FROM `information_schema`.`KEY_COLUMN_USAGE`
WHERE REFERENCED_TABLE_SCHEMA='DATABASE'
AND REFERENCED_TABLE_NAME='customers'
AND REFERENCED_COLUMN_NAME='customer_id'
add the db for insurance (you'll never know when somebody copies the db).
Instead of looking for a column name, here we look at the foreign keys themselves
If you change the on delete restrictions to restrict nothing can be deleted before the children are deleted/migrated
The short answer is, no there isn't a better way (that I can think of).
It's a trade off. If you find there are a lot of these instances, it might be worthwhile to invest some time writing a more robust algorithm for checking existing customers prior to adding a new one (i.e. checking variations on first / last names, presenting them to whoever is adding the customer, asking them 2 or 3 times if they are REALLY sure they want to add this new customer, etc.). If there are not a lot of these instances, it might not be worth investing that time.
Short of that, your approach is the only way I can think of. I would actually delete both records, and create a new one with the merged data, resulting in a new customer id rather than re-using an old one, but that's just personal preference - functionally it's the same as your approach. You still have to remember to go back and modify your merge function to reflect new relationships on the customer.id field.
At a minimum, to prevent any triggers on deletions causing some cascading effect, I would FIRST do
update SomeTable set CustomerID = CorrectValue where CustomerID = WrongValue
(do that across all tables)...
THEN
Delete from Customers where CustomerID = WrongValue
As for duplicate data... Try to figure out which "Will Smith, Bill Smith, William Smith" if you are lacking certain information... Some could be completely legitimate different people.
As an update to my comment:
use information_schema;
select table_name from columns where column_name = 'customer_id';
Then loop through the resulting tables and update accordingly.
Personally, I would use your instinctive solution, as this one may be dangerous if there are tables containing customer_id columns that need to be exempt.
I am trying to figure out the best method to relate country, region and town tables.
On my website I want the user to be able to just enter a town. Then optionally country and region, both of which will be required to be entered or not at all.
Currently my tables are as such
tbl>User has townID (FK)
tbl>town has id(PK) townName regionID(FK DEFAULT NULL)
tbl>region has id(PK) regionName countryID(FK NOT NULL)
tbl>country has id(PK) countryName
I thought to possibly further spit the user to town relation to:
tbl>User has locationID (FK)
tbl>location has id (PK) townID(FK) regionID(FK) countryID(FK)
But I think that is unnecessary and just further complicates the issue?
The country database is already populated. I intend to build up my own references of town > region > country relations as entered by users. So if a user enters a town with no region and country then it is entered into tbl>town without a regionID if there isn't already a town with the same name without a region ID. This is same for a town where a region and country ID has been entered by the user. Only I check that there isn't already a town > region > country relation that already exists before entering. Later on in the development of the site I will be providing Ajax suggestions for country/region based upon the town entered by a user.
So to the questions:
I can envisage pitfalls with this such as duplicate data or data possibly being overwritten. Is there a better way to construct the tables to fit in with my desired methods?
This might get answered by the prior question: but is there anything I can do to reduce the PHP processing of the tables. Obviously I'd prefer to just insert with one PHP statement but I think there are too many caveats to do it at once.
Also as the users town entry may be null and may or may not contain a foreign key reference to a region how is it best to create a View that takes that into consideration?
As it will be hosted I would rather not be using MySQL functions.
Please let me know if you need any clarification. I really want to get this right the first time before continuing, so your help will be invaluable.
I don't think you reduce the code because it's much too explicit. You can change it, but it won't be better.
Accepting a town name without a region and country is like letting someone enter their first name without their middle or last. It's data, but it's not an identifier.
Fullerton's full name is "Fullerton, California, USA". By not requiring Fullerton's full name, you abandon foreign keys for data integrity. ("Fullerton, California, USA" is a city; "Fullerton, Alabama, USA" is not.) Good luck with that.
If you're going down this path, the best advice I can offer you is get rid of the id numbers. ISO publishes standard codes for countries and subdivisions of countries. You can look them up in Wikipedia. Storing natural keys will reduce the number of joins from 3 to zero. Zero joins is almost always going to out perform 3 joins.
You'll probably need to use outer joins to create your views.
I may not be asking this in the best way possible but i will try my hardest. Thank you ahead of time for your help:
I am creating an enrollment website which allows an individual OR manager to enroll for medical testing services for professional athletes. I will NOT be using the site as a query DB which anybody can view information stored within the database. The information is instead simply stored, and passed along in a CSV format to our network provider so they can use as needed after the fact. There are two possible scenarios:
Scenario 1 - Individual Enrollment
If an individual athlete chooses to enroll him/herself, they enter their personal information, submit their payment information (credit/bank account) for processing, and their information is stored in an online database as Athlete1.
Scenario 2 - Manager Enrollment
If a manager chooses to enroll several athletes he manages/ promotes for, he enters his personal information, then enters the personal information for each athlete he wishes to pay for (name, address, ssn, dob, etc), then submits payment information for ALL athletes he is enrolling. This number can range from 1 single athlete, up to 20 athletes per single enrollment (he can return and complete a follow up enrollment for additional athletes).
Initially, I was building the database to house ALL information regardless of enrollment type in a single table which housed over 400 columns (think 20 athletes with over 10 fields per athlete such as name, dob, ssn, etc).
Now that I think about it more, I believe create multiple tables (manager(s), athlete(s)) may be a better idea here but still not quite sure how to go about it for the following very important reasons:
Issue 1
If I list the manager as the parent table, I am afraid the individual enrolling athlete will not show up in the primary table and will not be included in the overall registration file which needs to be sent on to the network providers.
Issue 2
All athletes being enrolled by a manager are being stored in SESSION as F1FirstName, F2FirstName where F1 and F2 relate to the id of the fighter. I am not sure technically speaking how to store multiple pieces of information within the same table under separate rows using PHP. For example, all athleteswill have a first name. The very basic theory of what i am trying to do is:
If number_of_athletes >1,
store F1FirstName in row 1, column 1 of Table "Athletes";
store F1LastName in row 1, column 2 of Table "Athletes";
store F2FirstName in row 2, column 1 of Table "Athletes";
store F2LastName in row 2, column 2 of table "Athletes";
Does this make sense? I know this question is very long and probably difficult so i appreciate the guidance.
You should create two tables: managers and athletes
The athletes table would contain a column named manager_id which would contain the id of the manager who signed the athlete up or NULL if the athlete signed himself up.
During output, create two CSV files (one for each table).
Further reading:
Defining Relationships
If you will retain the names for a future submission, then you should use a different design. You should also consider if a manager can also be an athlete. With those points in mind, consider having three tables: PEOPLE, REGISTRATION and REGISTRATION_ATHLETE. PEOPLE contains all athletes and manager. REGISTRATION is the Master table that has all the information for a submission of one or more individuals for testing. REGISTRATION_ATHLETE has one row for every Athlete to be tested.
People table:
---------------
People_ID
Type (A for Athlete, M for Manager B for Both)
First Name
Last Name
Birthdate
other columns of value
Registration table:
-------------------
Registration_ID
Registration_Date
People_ID (person requesting registration - Foreign Key to PEOPLE)
Payment columns....
Registration_Athlete table:
---------------------------
Registration_ID (Foreign Key to REGISTRATION)
People_ID (Foreign Key to PEOPLE)
I am not a mysql person, but I would think this simple type of structure would work.
Finally, storing credit card information is problematic as it runs into PCI (Payment Card Institute) rules, which you will want to avoid (think complicated and expensive). Consider processing payments through a third party, such as Google Checkout, etc. and not capturing the credit card.
Well based on your comment reply and what you are looking for. You could do this.
Create one database for Registration.
Create the columns ID, name, regDate, isManager, ManagerID (Whatever Else you need).
When a Manager enrolls set isManager to 1 and form a hash based on name and regdate, that would be the Managers Unique ID that would be added to all of the Athletes entries that the manager registers.
When a lone athlete registers don't worry about the ID and just set isManager to 0.
I think I may be oversimplifying it though. Wouldn't be the greatest for forming different types of queries but it should be alright if you are trying to minimize your db footprint