I am developing a (potentially) large-scale tracking software that tracks customer data, along with tickets that are created for tasks associated with said customers. This system is written entirely in PHP, and the database is MySQL.
The system currently supports multiple "locations" (stores for example), and each has its own table for customer data (in the same database, each database can be host to a whole different business' installation). For example:
store1_customers
customer_id | customer_firstname | customer_lastname
----------------------------------------------------
1 | John | Doe
2 | Bill | Bob
store2_customers
customer_id | customer_firstname | customer_lastname
----------------------------------------------------
1 | Jill | Smith
2 | Jimmy | Person
This works great for keeping locations separate for different business needs. However, we are running into the need to have "global" customers for other instances that can be accessed from any location, while keeping other customers separate.
The two options I can think of are to either make a new "global_customers" table that can then be pulled from separately, or to merge all of the data into one large table.
I have concerns with both methods. The first would require a new column in every table that references the customer to determine which customer table to pull from. For example, store1_tickets would have to know whether to pull the customer ID of 1 from store1_customers or from global_customers. This seems to be a bit dirty, and I think would present problems with trying to do my multiple JOIN queries.
The second method of making one giant table concerns me in two ways: the first being the size of the table (each table so far can have potentially 20k+ records, and there are 7 locations for just one particular installation of the "software"). I know this point may be moot due to how MySQL works and can handle it. The second concern is merging the existing data. I see it being a nightmare since each table has a 1-20k customer ID, and I would have to have some way of changing thousands upon thousands of existing records in other tables to match the new numbering of this table.
Is there a better way, or more proper way of accomplishing this? I'm sorry if this question does seem subjective, but it does come down to a database problem and how to handle the data in a reasonable way.
Merge all the data into one large table. That is how databases are designed to be used.
For data migration, you will end up with new Keys, there is no way around that. You could, however, add a new column to store the 'legacy' ID. This is just some of the pain assoicatied with normalizing a database. Take the pain now rahter than presisting with a sub-optimal database design.
Customer type would be another column within the cusotmer table, probably (but depending on your requirements) this would be a FK to a CustomerType table.
Related
Im building a yellow pages site. I tried multiple database structures. Im not sure which one is best. Here are few I considered,
Saving all business data - name, phone, email etc in one table, list of tags in another, and mapping data id and tag id for tag-data relationship in a third table. I found this cumbersome since I'll be doing most things directly in the database (at least initially, before launch) and hence distributing everything can be problematic in my case. This one is a clean solution I must admit though.
Saving biz entries in one table with a separate column for tags (that'll contain comma separated(or JSON) tags for every entry). Then retrieving results using like query or full-text search for a tag. This one will be slower and will get more slow as db size increases. Also its not easy to maintain - suppose if I have to rename a tag.
(My Preferred Choice) Distributing biz data in different tables based on type - all banks in one, hotels, restaurants etc in separate tables. A separate table for all tags containing a rule for searching data from the table. Here is a detailed explanation.
Biz Tables:
college_tbl, bank_tbl, hotel_tbl, restaurant_tbl...so on
Tags Table
ID | Biz Table | Tag Name | Tag Key | Match Rule (col:like_query_part)
1 | bank_tbl | Citi Bank Branches | ['citi','bank'] | 'name:%$1%$2%'
2 | restaurant_tbl | Pizza Hut Restaurants | ['pizza','hut'] | 'name:%$1%$2%'
3 | hotel_tbl | The Leela Hotels | ['the leela'] | 'name:%$1%'
I'll then use 'Match rule' in like query to fetch results from 'Biz Table' for 'Tag Name'.
Im going forward with the third approach. I feel its simple, reduces the need of third data-tag relationship table, renaming is easy and performance won't get down if table has limited entries - say 1 million max per table.
Im scratching my head for the last 15 days to find the best structure and feel this one is pretty good in my case.
Please suggest a better approach or if this approach could have some issues later on.
Use Number 1. Period, full stop.
The mistake is "doing things directly in the database" rather than developing the API first.
Number 2 has one advantage -- FULLTEXT search. That can be tacked onto #1 after you have have a working API and some data to play with.
Number 3 (multiple similar tables) is a fisaco. Numerous Q&A ask about such; the reply is always "NO".
What would be an efficient way to store "Quests" in an SQL database? Let's say the context is RPG. (Here was a previous question: How to store Goals (think RPG Quest) in SQL)
To summarize a Quest may be a combination of the following:
Discover [Location]
Kill n [MOB Type]
Acquire n of [Object]
Achieve a [Skill] in [Skillset]
All the other things you get in RPGs
The answer listed out in the link was:
For the Quest table:
| ID | Title | FirstStep (Foreign key to GuestStep table) | etc.
The QuestStep table
| ID | Title | Goal (Foreign key to Goal table) | NextStep (ID of next QuestStep)
I actually think it's pretty neat, but I have two things I would like to add:
Let's say I want to create it so that a quest can only be active only on certain days (e.g. M W F only) and/or active only at a certain time span (e.g. Halloween). What would be the ideal way of doing this?
Another thing: Let say I want to have a quest with two steps and a quest with 8 steps. We can create a table that is 8 columns wide but we would have lots of empty space. And what if the stars align and I needed an 9 step-wide quest?
The QuestStep table actually has a NextStep, sort of like a linked list, but what about Quests that you can do out of order?
P.S: As you can see it is potentially read-heavy, and the schema is potentially... non-schematic. Is NosSQL a vying option? (Redis seems memory only, so I'll more likely go with MongoDB)
I have a online shopping cart, at checkout user enters his zipcode.
There are 2 payment methods, cash-on-delivery and net-banking. The courier service ships to only certain areas(identified by zipcode). And the allowed list of zipcodes for COD and Net-Banking differ. (length of list = about 2500 for COD, and about 10,000 for latter)
Should I store these lists in database or a flat file?
For database, I will be querying using SELECT, and for file, I can read the entire(or partial) list in array, and then do Binary search on it.
Which one would be faster, considering following points -
There is only one courier service now, but in future there will be more, and with different lists of there own. So I need to search in multiple lists.
There is mostly read, write would be much less. Also the list should be customisable at later point.
I would have selected Database, but I don't know if it would make things slower, and I don't want to spend time designing database, when a file might be better.
EDIT:
Say there are 2 courier companies ABC and DEF.
For file I will have 4 files (say) ABC_COD.txt, ABC_net.txt, DEF_COD.txt, DEF_net.txt. So if a customer goes for COD, I search ABC_COD, if not in there, I search DEF_COD and so on. So ok this seems to be costly, but it is also easily extensible.
Now consider database, I will have a table Allowed_zipcodes, with five columns : zipcode(int/varchar(6)), ABC_COD(boolean), ABC_net(boolean), DEF_COD(boolean), DEF_net(boolean). If the x company offers cod for y code, the corresponding column has true, otherwise false.
While this seems good for lookup, adding a company involves a change in schema.
Please consider future changes and design as well.
Database, without any hint of a doubt. More logical, and more scalable.
For some reason I think you should look at the magenta framework, isn't it already in some of the packages?
But if you want to do it yourself: Just to give you a starting point on the database model:
carrier
id(int) | name (varchar)
zipcodes
start(int) | end(int) | carrier(fk::carrier.id)
For instance:
carrier
1 | UPS
2 | fedex
zipcodes
1000 | 1199 | 2
1000 | 1099 | 1
Querying your zipcode and available carriers:
SELECT carrier.name
FROM zipcodes
LEFT JOIN carrier ON zip codes.carrier = carrier.id
WHERE
zipcodes.end >= :code
AND
zipcodes.start <= :code
So, I'm dealing with an enormous online form right now. It's separated into different sections visually, but those tend to change. With around 300 fields, it almost seems ridiculous to put them into a single table, though if I separate them and someone decides to move a field to a different section on the front end on several different occasions, it will become a mess in the database and fields won't match their front end sections.
I'm essentially asking: What is the best way to organize something like this in a normalized fashion?
You could move the field names to another table and reference them in the value table.
Example
field_id | field_name
------------------------
1 | first_name
2 | last_name
Then reference from the values:
value_id | field_id | value
--------------------------------
1 | 1 | John
2 | 2 | Doe
3 | 1 | Max
4 | 2 | Jefferson
If you're going to use a SQL database, then the Entity-Attribute-Value model (EAV) described above is probably a good answer. You might also want to mix in a couple of denormalized tables with common or specialized data.
Another option might be a document store though; this sounds like just the kind of problem that inspired data stores like MongoDB. In MongoDB you just store everything as a giant json document. If some data isn't needed for some records and is left out, it isn't considered "bad" in the way sparsely populated wide SQL database tables are.
You can group your fields. Separate them as components and you will probably notice, that you can make multiple tables out off that one. Also by separating table you cant make form with, for example:
fieldset tags
separate it in multiple steps (it think the best solution)
multiple ajax requests for each form after previous is filled
form separated by open/close javascript windows
Database design, object design, and form design are three very different elements. If there are relationships between the data in a one to may fashion, you should have different tables to normalize the data. If however, everything is a One-to-one relationship then having all 300 in the same table is perfectly acceptable. I find it difficult to believe that there is a logical or even physical construct that has 300 elements unto itself; but it's possible. If you start getting into attribute data of something lets say we're talking about a vehicle. We could be talking about a car, a truck, a semi, a motorcycle, a bicycle, etc... each of those types of vehicles have different properties which would be managed in separate tables to normalize the data. moving elements of them to different pages wouldn't make a whole lot of sense; but moving common attributes might. For example I wouldn't ask about color on section 1 and again in section 4. But I might section things out to describe make, model, and then custom attributes.
I have been browsing this site for the answer but I'm still a little unsure how to plan a similar system in its database structure and implementation.
In PHP and MySQL it would be clear that some achievements are earned immediately (when a specialized action is taken, in SO case: Filled out all profile fields), although I know SO updates and assigns badges after a certain amount of time. With so many users & badges wouldn't this create performance problems (in terms of scale: high number of both users & badges).
So the database structure I assume would something as simple as:
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned(bool) |
| assigned_at |
But as some people have said it would be better to have an incremental style approach so a user who has 1,000,000 forum posts wont slow any function down.
Would it then be another table for badges that could be incremental or just a 'progress' field in the badges_user table above?
Thanks for reading and please focus on the scalability of the desired system (like SO thousands of users and 20 to 40 badges).
EDIT: to some iron out some confusion I had assigned_at as a Date/Time, the criteria for awarding the badge would be best placed inside prepared queries/functions for each badge wouldn't it? (better flexibility)
I think the structure you've suggested (without the "assigned" field as per the comments) would work, with the addition of an additional table, say "Submissions_User", containing a reference to user_id & an incrementing field for counting submissions. Then all you'd need is an "event listener" as per this post and methinks you'd be set.
EDIT: For the achievement badges, run the event listener upon each submission (only for the user making the submission of course), and award any relevant badge on the spot. For the time-based badges, I would run a CRON job each night. Loop through the complete user list once and award badges as applicable.
regarding the sketch you included: get rid of the boolean column on badges_user. it makes no sense there: that relation is defined in terms of the predicate "user user_id earned the badge bd_id at assigned_at".
as for your overall question: define the schema to be relational without regard for speed first (that'll get you rid of half of potential perf. problems, possibly in exchange for different perf. problems), index it properly (what's proper depends on the query patterns), then if it's slow, derive a (still relational) design from that that's faster. like you may need to have some aggregates precomputed, etc.
I would keep a similar type structure to what you have
Badges(badge_id, badge_name, badge_desc)
Users(user_id, etc)
UserBadges(badge_id, user_id, date_awarded)
And then add tracking table(s) depending on what you want to track and # what detail level... then you can update the table accordingly and set triggers on it to "award" the badges
User_Activity(user_id, posts, upvotes, downvotes, etc...)
You can also track stats from the other direction too and trigger badge awards
Posts(post_id, user_id, upvotes, downvotes, etc...)
Some other good points are made here
I think this is one of those cases where your many-to-many table (Badges_User) is appropriate.
But with a small alteration so that unassigned badges isn't stored.
I assume assigned_at is a date and/or time.
Default is that the user does not have the badges.
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned_at |
| |
This way only badges actually awarded is stored.
A Badges_User row is only created when a user gets a badge.
Regards
Sigersted