need efficient database model [closed] - php

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have a project where clients can submit changes, administrators can see the proposals and either accept, edit and approve, edit and send back to the client for approval, or deny them.
I am going to need 2 sets of data which is going to be a nightmare because I want to preserve the original proposal when edits have been made, in case someone doesn't like the edit and wants to revert back to the original.
Here is my current system which I think needs to be more efficient, I need a better idea.
I need a status for every field, because I want to see exactly what field has been edited.
mySQL
Table proposal_deal
+----------+------------+--------------+--------------+--------------+
| deal_id | name |name_status | price | price_status |
+----------+------------+--------------+--------------+--------------+
| 1 | deal 1 | 1 |12.00 | 1 |
+----------+------------+------------- +--------------+--------------+
Table deal
+------+---------+-----------+--------------+
| id |deal_id | name |price |
+------+--------+------------+--------------+
| 1 | 1 | deal 1 |12.00 |
+------+--------+------------+------------- +
These tables have a lot of fields, so there are a lot of status columns as well. I am wondering if a better approach would be to add a third table called status that would house all the field status info like this.
mySQL
Table proposal_deal
+----------+------------+-------------+
| deal_id | name |price |
+----------+------------+-------------+
| 1 | deal 1 |12.00 |
+----------+------------+-------------+
Table deal
+------+---------+-----------+--------------+
| id |deal_id | name |price |
+------+--------+------------+--------------+
| 1 | 1 | deal 1 |12.00 |
+------+--------+------------+------------- +
Table status
+------+--------+-------------+--------------+
| id |deal_id | column_name |status |
+------+--------+-------------+--------------+
| 1 | 1 | name | 1 |
+------+--------+-------------+------------- +
What is going to be easier for design purposes as well as efficiency when making a lot of call to a DB?
I've already started with the first approach, buts its giving me a headache, but I don't want to change if the other approach is going to be the same..
Anyone have an opinion(I'm sure you do) or a alternate approach?
thanks

So, summary:
deals are a one-off between two clients
clients are people
proposals form the guts of the deals, and there can only be one active per deal
proposals have a status: approved, declined, pending. A client needs to approve them
All history needs to be preserved on the following: proposals, deals
Following these guidelines, I would set it up as follows:
deals
id
name
proposal_id (FK proposals.id, UPDATE:CASCADE, DELETE:SETNULL)
client_id (FK clients.id, UPDATE:CASCADE, DELETE:SETNULL)
company_id (FK clients.id UPDATE:CASCADE, DELETE:SETNULL)
proposals
id
name
price
(more data fields)
modified_by (FK clients.id UPDATE:CASCADE, DELETE:SETNULL)
status
id
proposal_id (FK proposals.id UPDATE:CASCADE, DELETE:RESTRICT)
status
modified_by (FK clients.id UPDATE:CASCADE, DELETE:RESTRICT)
Add timestamps as you see fit. I would personally do the proposal edition using an UPDATE trigger that would duplicate the row before update, and then update the row accordingly. This guarantees that you have the set of foreign keys necessary to track all your status changes.
Oh, and the latest proposal in a deal should also be changed in the deals table. You can also do this with a trigger to make your life easier.

I think that a lot of reads/writes will be better with first approach and at the same time harder to maintain while 2nd approach is easier to maintain and might prove itself more useful at later times (should you need some stats from the tables)\
I personally would go with second or pick mongodb or similar if you like the first approach better.

Related

Which is faster and more preferred when keeping track of a row's status? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Let's say for instance you have a table which contains orders and this table has a column which keeps track if the order is either pending/shipped/denied/approved. Which is the better way to keep the status for the record?
Option one. Keeping the status as a string in an indexed column.
-------------------------------------------------------
| id | customer_id | status | created_at | shipped_at |
-------------------------------------------------------
| 1 | 1| pending| . . . | . . . |
-------------------------------------------------------
or
Option two. Have a separate table which contains the possible statuses and have the status column be a foreign key which points to this table which contains the statuses.
Table: Statuses
----------------
| id | name |
----------------
| 1 | pending |
----------------
| 2 | approved|
----------------
| 3 | denied |
----------------
| 4 | shipped |
----------------
Table: Orders
-------------------------------------------------------
| id | customer_id | status | created_at | shipped_at |
-------------------------------------------------------
| 1 | 1| 1| . . . | . . . |
-------------------------------------------------------
In my opinion, the first is simpler but would become slow if the table becomes massive while the second would be faster in that case.
Option two is better, because:
it will consume less space
Search also will work quicker with numbers than with a strings
In case you will later want to change "approved" to "temporary approve" you will need to change it in one place, no across your whole data
You also can do something like WHERE status > 2 which is impossible with strings
Probably many more reasons exist, it's just first came to my mind. And I see no reason to use first option.
For me, the answer is: it depends.
As already said, option 2 looks better because of space consumption, integrity and evolutivity.
But I'm a strong believer of the YAGNI principle: if it's not needed yet (let's say if you don't plan to add more statuses, to make them configurable or have a lot of data for the next years), it's probably not needed and it's perfectly ok to build option 1.
And it will be ok to change data structure in a few years if needed. Maybe in a different way, to fulfill needs you can't foresee right now.
"Better" is highly dependent on opinion. You could define what you mean by "better" - which attributes do you want to optimize for?
In terms of "faster" - relational databases (including MySQL) are really, really good at joins. So good that in most cases, when you are joining on foreign keys, there is no measurable performance impact, even with tens or hundreds of millions of records. So, I don't think option 1 is "faster" unless you reach Amazon scale.
Other attributes you may consider are "easy to maintain". Option 1 is open to bugs, because you have to be sure that every bit of code that wants to know whether an order is "pending" includes the right bit of text in the query. A simple typo could mean you stop shipping orders to your customers. It's likely you'll want to create some moderately complex business rules about the transitions between states - "do not allow an order to transition from denied to shipped", and you'll have lots of opportunity for typos. If you are worried about "easy to maintain", either option 2 or using an enum is probably better.
Another attribute may be "easy to extend". Right now, your statuses have no additional attributes, but that may not remain the case. For instance, you may decide to store the amount of time an order may stay in a given status, or the roles which are allowed to override a status change. Again, in that case, option 2 is probably easier to work with.

Database for Poll + Explanation which can be updated

I've got the following poll form for my users but I am not sure how I should structure the DB around it.
The user will get 20-30 poll questions like the following:
What is your favorite color?
Blue
Green
Yellow
Red
Other
Will be able to choose one of the above answers and must also provide around 100 words explaining why he chose that answer.
I've currently got two tables. One that holds the poll questions and one that holds the poll options. What I am not sure about is how should I hold the user answers.
The thing is because the poll is so big, the user can do it partially, come back at a later time, alter his answers and keep going until he is 100% done which is then that I'll be able to view the whole result in my panel. So he can basically save his progress and alter it at anytime. On top of that I would like to "remove" the whole poll for a specific user and be able to redo it all over again but at the same time keep a history of his previous answers.
So I am not sure if a table like this would be the best option for my needs:
id
user_id
poll_questiond
poll_answer
poll_text
last_update
status
Seems like something like this will create a huge mess. Is there a better way to do this?
I would create 3 tables:
table_polls
+----+---------------------+--------+
| id | description | status |
+----+---------------------+--------+
| 1 | Example description | 1 |
+----+---------------------+--------+
table_poll_options
+----+---------+-------------+
| id | poll_id | description |
+----+---------+-------------+
| 1 | 1 | Question 1 |
| 2 | 1 | Question 2 |
+----+---------+-------------+
table_poll_answers
+----+-----------+---------+----------------+---------------------+
| id | option_id | user_id | description | created_at |
+----+-----------+---------+----------------+---------------------+
| 1 | 1 | 1 | A valid reason | 1970-01-01 00:00:00 |
| 1 | 2 | 2 | Another reason | 1970-01-01 00:00:00 |
+----+-----------+---------+----------------+---------------------+
To recapitulate the above:
A poll has many questions.
A poll question has many answers
A poll answer has one user.
This way you have everything split up with pivot tables and you no longer need to create messy rows in your poll table.
You can expand on the tables of course, if you need extra information for dates etc.
I think you will find it easier if you define your problem domain in semi-structured language. You may decide to delegate some of the logic to the application layer, rather than embedding in the database schema - for instance, saving partially-completed polls might be easier within the application layer.
You might start with something like this, capturing the major entities in your domain, and their relationships, but not their attributes.
The system consists of many polls.
One poll has many questions. A question belongs to 1
poll.
A question can be of type multiple choice (select one) or
multiple choice (select n) or free text
A question may be mandatory or optional
A multiple choice question has 1..n answer options
A question has a sequential relationship to other questions
(e.g. the free text question must follow the favourite colour
question)
The system consists of many users
A user answers many polls
When the user answers a poll, they answer 0..n questions
When the user answers a poll, the answer is only valid if the
user completes all mandatory questions
This suggests that you have two polymorphic entities - question (which can be free text or multiple choice), and answer (as the answer is related to the question, it is also free text or multiple choice).
You have to decide how to model this in your schema - Stack Overflow has many questions on this topic - but I'll pick the simplest.
Poll
-----
Poll_id
Question
------
Question_id
Poll_id
Sequence
Is_mandatory
Description
Question_option
-------------
Question_option_id
Question_id
Option_id
Sequence
Description
User
-----
User_id
Poll_session
------
Poll_session_id
User_id
Poll_id
Date
Status
Poll_session_answer
-----
Poll_session_id
Quesion_id
Free_text_answer
Question_option_id_answer

Cheapest way of managing relational SQL data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to find the best method for managing the huge-relational game data.
Let me explain my data structure.
There are three main data field. User, Bets and Coupons.
+----------------------------------------------------+
| bets |
+----------------------------------------------------+
| id | status | yes | no |
+----+-----------+-----------------+-----------------+
| 1 | 0 | 1.45 | 2.52 |
+----+-----------+-----------------+-----------------+
| 2 | 1 | 3.00 | 1.08 |
+----+-----------+-----------------+-----------------+
| 3 | 2 | 2.43 | 1.42 |
+----+-----------+-----------------+-----------------+
+----------------------------------------------------+
| coupons |
+----------------------------------------------------+
| id | played_by | bets | status |
+----+-----------+-----------------+-----------------+
| 1 | 1 |1,yes;2,no;3,yes;| 0 |
+----+-----------+-----------------+-----------------+
| 2 | 2 |2,yes;3,no;1,no; | 0 |
+----+-----------+-----------------+-----------------+
| 3 | 3 |1,yes;2,no; | 0 |
+----+-----------+-----------------+-----------------+
Information: Every bet has yes/no choice. Users play bets. We register them inside of coupons. If all bets inside a coupon WIN, coupon wins and user get extra balance. Classic. Please note that there will be so many bets (avg. 5 per coupon), so many coupons played by users (thousands), and thousands of users.
So I'm trying to find best method for finalizing bets and checking coupons for win or lose process.
Method 1 I tried;
We finalized Bet ID: 2 as yes;
Check 2,yes; with "LIKE" operator in coupon, if there is, concat(append) 1 to progress field.
Check how many bets are there inside the coupon.
If count of 1s equals to numbers of bet inside this coupon, set coupon status to WON.
Method 2 I tried;
Finalize bets; YES or NO
Check related coupons with a cron task.
I liked both methods, but I want users see their progress immediately, so I am not sure about the cron method. Both methods work fine, but I have doubts what will happen when there are thousands of users.
I hope I described my issue understandable. I'm looking for comments and suggestions.
Thanks.
Instead of appending a user's bet to a value in a coupon (which is highly inefficient since you're having to use the LIKE operator), it makes more sense to just create a table of coupons that store the ID of the bet its associated with it, the ID of the user it belongs to, and the value of the coupon (YES or NO). So your Coupon table would look like the following:
Coupons
ID BetID UserID Value
1 1 10 YES
2 1 11 NO
Now if you want to acquire all of the coupons associated with Bet #1, you would just do a SELECT * FROM coupons WHERE BetID=1.
If Bet #1 wins, all you would need to do is acquire the value of the bet for the winning choice, and update all of the users who fall under the choice. For example:
# Select the winning value:
SELECT <winning value>
FROM bets
WHERE id = <id of completed bet>;
# Update the users:
UPDATE users
SET balance = balance + <winning value>
WHERE id EXISTS (SELECT userID from coupons where betID = <id of completed bet> AND value='<winning value>');

"horizontal" vs. "vertical" table design, SQL

Apologies if this has been covered thoroughly in the past - I've seen some related posts but haven't found anything that satisfies me with regards to this specific scenario.
I've been recently looking over a relatively simple game with around 10k players. In the game you can catch and breed pets that have certain attributes (i.e. wings, horns, manes). There's currently a table in the database that looks something like this:
-------------------------------------------------------------------------------
| pet_id | wings1 | wings1_hex | wings2 | wings2_hex | horns1 | horns1_hex | ...
-------------------------------------------------------------------------------
| 1 | 1 | ffffff | NULL | NULL | 2 | 000000 | ...
| 2 | NULL | NULL | NULL | NULL | NULL | NULL | ...
| 3 | 2 | ff0000 | 1 | ffffff | 3 | 00ff00 | ...
| 4 | NULL | NULL | NULL | NULL | 1 | 0000ff | ...
etc...
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes. A new attribute is added every 1-2 months which requires table columns to be added. The table is rarely updated and read frequently.
I've been proposing that we move to a more vertical design scheme for better flexibility as we want to start adding larger volumes of attributes in the future, i.e.:
----------------------------------------------------------------
| pet_id | attribute_id | attribute_color | attribute_position |
----------------------------------------------------------------
| 1 | 1 | ffffff | 1 |
| 1 | 3 | 000000 | 2 |
| 3 | 2 | ffffff | 1 |
| 3 | 1 | ff0000 | 2 |
| 3 | 3 | 00ff00 | 3 |
| 4 | 3 | 0000ff | 1 |
etc...
The old developer has raised concerns that this will create performance issues as users very frequently search for pets with specific attributes (i.e. must have these attributes, must have at least one in this colour or position, must have > 30 attributes). Currently the search is quite fast as there are no JOINS required, but introducing a vertical table would presumably mean an additional join for every attribute searched and would also triple the number of rows or so.
The first part of my question is if anyone has any recommendations with regards to this? I'm not particularly experienced with database design or optimisation.
I've run tests for a variety of cases but they've been largely inconclusive - the times vary quite significantly for all of the queries that I ran (i.e. between half a second and 20+ seconds), so I suppose the second part of my question is whether there's a more reliable way of profiling query times than using microtime(true) in PHP.
Thanks.
This is called the Entity-Attribute-Value-Model, and relational database systems are really not suited for it at all.
To quote someone who deems it one of the five errors not to make:
So what are the benefits that are touted for EAV? Well, there are none. Since EAV tables will contain any kind of data, we have to PIVOT the data to a tabular representation, with appropriate columns, in order to make it useful. In many cases, there is middleware or client-side software that does this behind the scenes, thereby providing the illusion to the user that they are dealing with well-designed data.
EAV models have a host of problems.
Firstly, the massive amount of data is, in itself, essentially unmanageable.
Secondly, there is no possible way to define the necessary constraints -- any potential check constraints will have to include extensive hard-coding for appropriate attribute names. Since a single column holds all possible values, the datatype is usually VARCHAR(n).
Thirdly, don't even think about having any useful foreign keys.
Finally, there is the complexity and awkwardness of queries. Some folks consider it a benefit to be able to jam a variety of data into a single table when necessary -- they call it "scalable". In reality, since EAV mixes up data with metadata, it is lot more difficult to manipulate data even for simple requirements.
The solution to the EAV nightmare is simple: Analyze and research the users' needs and identify the data requirements up-front. A relational database maintains the integrity and consistency of data. It is virtually impossible to make a case for designing such a database without well-defined requirements. Period.
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes.
That looks like a case for normalization: Break the table into multiple, for example one for horns, one for wings, all connected by foreign key to the main entity table. But do make sure that every attribute still maps to one or more columns, so that you can define constraints, data types, indexes, and so on.
Do the join. The database was specifically designed to support joins for your use case. If there is any doubt, then benchmark.
EDIT: A better way to profile the queries is to run the query directly in the MySQL interpretter on the CLI. It will give you the exact time that it took to run the query. The PHP microtime() function will also introduce other latencies (Apache, PHP, server resource allocation, network if connection to a remote MySQL instance, etc).
What you are proposing is called 'normalization'. This is exactly what relational databases were made for - if you take care of your indexes, the joins will run almost as fast as if the data were in one table.
Actually, they might even go faster: instead of loading 1 table row with 100 columns, you can just load the columns you need. If a pet only has 8 attributes, you only load those 8.
This question is a very subjective. If you have the resources to update the middleware to reflect the column that has been added then, by all means, go with horizontal there is nothing safer and easier to learn than a fixed structure. One thing to remember, anytime you update a tables structure you have to update each one of its dependencies unless there is some catch-all like *, which I suggest you stay aware from unless you are just dumping data to a screen and order of columns is irrelevant.
With that said, Verticle is the way to go if you don't have all of your requirements in place or don't have the desire to update code in n number of areas. Most of the time you just need storage containers to store data. I would segregate things like numbers, dates, binary, and text in separate columns to preserve some data integrity, but there is nothing wrong with verticle storage, as long as you know how to formulate and structure queries to bring back the data in the appropriate format.
FYI, Wordpress uses verticle data storage for majority of the dynamic content it has to store for the millions of uses it has.
First thing from Database point of view is that your data should be grow vertically not in horizontal way. So, adding a new column is not a good design at all. Second thing, this is very common scenario in DB design. And the way to solve this you have to create three tables. 1st is of Pets, 2nd is of Attributes and 3rd is mapping table between theres two. Here is the example:
Table 1 (Pet)
Pet_ID | Pet_Name
1 | Dog
2 | Cat
Table 2 (Attribute)
Attribute_ID | Attribute_Name
1 | Wings
2 | Eyes
Table 3 (Pet_Attribute)
Pet_ID | Attribute_ID | Attribute_Value
1 | 1 | 0
1 | 2 | 2
About Performance:
Pet_ID and Attribute_ID are the primary keys which are indexed (http://developer.mimer.com/documentation/html_92/Mimer_SQL_Engine_DocSet/Basic_concepts4.html), so the search is very fast. And this is the right way to sovle the problem. Hope, now it will be clear to you.

Dynamic survey application logic PHP/MSSQL

Firstly I think this question can be related to any language, but I specified what I was using.
Excuse me if I start to bore also, but I am trying to find out the best way to build a dynamic survey management system.
My client basically has said to me that the data has to be stored in MS SQL as his client has only got MS SQL connector for SAS, which is going to do reporting.
My logic so far is this:
1st. Setup the survey itself, i.e. ask for title, quick overview, etc, etc.
2nd. Define your questions.
3rd. Publish survey.
Now what I have done so far is that when they "publish survey", I have created a dedicated database table for this survey which will house the responses.
From the admin side of this, they will not be able to modify the questions, maybe the question title but that is about it. They cant add/remove questions.
Question is, is creating individual database tables a good thing? My only worry really is that say the admin creates like 30 questions, I will have 30 columns in that dedicated table. To go with that, this way might be easy for the SAS system to pull in data for reporting. The administrator will not see the survey responses in the admin panel btw.
I have done something similar for a language grading exam. I opted for a more flexible approach with the following tables
+------+ +-------------+ +-------------+ +-------------+ +----------+
| Exam | | Question | | Choice | | Answer | | User |
+------+ +-------------+ +-------------+ +-------------+ +----------+
| id | | id | | id | | id | | id |
| name | | questionNb | | choice | | user_id | | name |
+------+ | question | | question_id | | exam_id | | email |
| exam_id | | isAnswer | | question_id | | password |
+-------------+ +-------------+ | choice_id | +----------+
| isGood |
+-------------+
This model allowed me to easilly have a 15 questions exam, a 30 questions exam and a 50 questions exam. To adapt this model for survey, you might just have to remove the isAnswer and isGood part and you should be good and replace users data with anonymous general data like age, income, sex.
Creating a column for each question is totally wrong, altering the database at runtime for business oriented purpose is a "never ever do".
Read something about "relational databases" things should look like this:
table_surveys
id
survey_name
table_questions
id
fk_survey (foreign key to table_surveys)
question_text
(question value? maybe)
table_questions_options
id
question_id(foreign key to table_questions)
option_value (this can be true/false for a test or a numeric value for a survey)
option_label
table_users
id
username
pass
name
table_answers
id
options_fk (foreign key to table_question_options)
users_fk (foreign key to table_users)
This way everything is linked together (No reusing of options,or questions or stuff into different surveys)
According to the comments in the documentation, MS SQL Support in PHP is iffy at best. Is PHP the only language you are allowed to use for the project? If not, you might want to consider using C#, VB.Net or something more compatible with SQL Server. Otherwise, you could initially store the data in MySQL, and export it to MS SQL Server when you needed to do analysis.
Dont know, if I really understand your question. But I once built such a survey system. And it came out pretty quick and easy with about the following tables (if I remember right):
USER, SURVEYS, QUESTIONS, ANSWERS, [some mapping tables]
The SAS will fetch the data from virtual any table. If everything in one or two tables, it will even be easier.
With all due respect to Kibbee, PHP/MSSQL support is actually VERY good. We do it quite often, and the performance bests PHP/MySQL and matches compiled C#/MSSQL (in our very limited and unscientific testing). This is assuming you're running PHP on a Win machine. Running PHP with a TLS connector to a separate MSSQL box is another ball of wax and can be a pain to configure.
Anyway, we had a similar scenario and went with one table to manage forms (Forms w/ FormID as the primary), another to manage fields/questions (Fields w/FieldID, FieldType such as Y/N, text, select, etc.), and another to "assign" a field to a form (FormFields w/ FormFieldID, FormID, FieldID, parameters in an array for select items, etc.). Then yet another set of tables to deal with the answering of the questions.
I agree with the rest of the group. Make sure to normalize and don't create a separate column for each question. It'll be more work initially, but you'll appreciate it when you simply have to add a few rows to a table instead of re-writing your queries and re-designing your tables.

Categories