Description:
I am building a rating system with mysql/php. I am confused as to how I would set up the database.
Here is my article setup:
Article table:
id | user_id | title | body | date_posted
This is my assumed rating table:
Rating table:
id | article_id | score | ? user_id ?
Problem:
I don't know if I should place the user_id in the rating table. My plan is to use a query like this:
SELECT ... WHERE user_id = 1 AND article_id = 10
But I know that it's redundant data as it stores the user_id twice. Should I figure out a JOIN on the tables or is the structure good as is?
It depends. I'm assuming that the articles are unique to individual users? In that case, I could retain the user_id in your rating table and then just alter your query to:
SELECT ... WHERE article_id = 10
or
SELECT ... WHERE user_id = 1
Depending on what info you're trying to pull.
You're not "storing the user_id twice" so much as using the user_id to link the article to unique data associated to the user in another table. You're taking the right approach, except in your query.
I don't see anything wrong with this approach. The user id being stored twice is not particularly relevant since one is regarding a rating entry and the other, i assume, is related to the article owner.
The benefit of this way is you can prevent multiple scores being recorded for each user by making article_id and user_id unique and use replace into to manage scoring.
There are many things to elaborate on this depending on whether or not this rating system needs to be intelligent to prevent gaming, etc. How large the user base is, etc.
I bet for any normal person, this setup would not be detrimental to even a relatively large scale system.
... semi irrelevant:
Just FYI, depending on the importance and gaming aspects of this score, you could use STDDEV() to fetch an average factoring the standard deviation on the score column...
SELECT STDDEV(`score`) FROM `rating` WHERE `article_id` = {article_id}
That would factor outliers supposing you cared whether or not it looked like people were ganging up on a particular article to shoot it down or praise it without valid cause.
you should not, due to 3rd normal form, you need to keep the independence.
"The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971.[1] Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R."
Source here: http://en.wikipedia.org/wiki/Third_normal_form
First normal Form: http://en.wikipedia.org/wiki/First_normal_form
Second normal Form: http://en.wikipedia.org/wiki/Second_normal_form
you should take a look to normalization and E/R model it will help you a lot.
normalization in wikipedia: http://en.wikipedia.org/wiki/Database_normalization
Related
For an MySQL table I am using the InnoDB engine and the structure of my tables looks like this:
Table user
id | username | etc...
----|------------|--------
1 | bruce | ...
2 | clark | ...
3 | tony | ...
Table user-emails
id | person_id | email
----|-------------|---------
1 | 1 | bruce#wayne-ent.com
2 | 1 | ceo#wayne-ent.com
3 | 2 | clark.k#daily-planet.com
To fetch data from the database I've written a tiny framework. E.g. on __construct($id) it checks if there is a person with the given id, if yes it creates the corresponding model and saves only the field id to an array. During runtime, if I need another field from the model it fetches only the value from the database, saves it to the array and returns it. E.g. same with the field emails for that my code accesses the table user-emails and get all the emails for the corresponding user.
For small models this works alright, but now I am working on another project where I have to fetch a lot of data at once for a list and that takes some time. Also I know that many connections to MySQL and many queries are quite stressful for the server, so..
My question now is: Should I fetch all data at once (with left joins etc.) while constructing the model and save the fields as an array or should I use some other method?
Why do people insist on referring to the entities and domain objects as "models".
Unless your entities are extremely large, I would populate the entire entity, when you need it. And, if "email list" is part of that entity, I would populate that too.
As I see it, the question is more related to "what to do with tables, that are related by foreign keys".
Lets say you have Users and Articles tables, where each article has a specific owner associate by user_id foreign key. In this case, when populating the Article entity, I would only retrieve the user_id value instead of pulling in all the information about the user.
But in your example with Users and UserEmails, the emails seem to be a part of the User entity, and something that you would often call via $user->getEmailList().
TL;DR
I would do this in two queries, when populating User entity:
select all you need from Users table and apply to User entity
select all user's emails from the UserEmails table and apply it to User entity.
P.S
You might want to look at data mapper pattern for "how" part.
In my opinion you should fetch all your fields at once, and divide queries in a way that makes your code easier to read/manage.
When we're talking about one query or two, the difference is usually negligible unless the combined query (with JOINs or whatever) is overly complex. Usually an index or two is the solution to a very slow query.
If we're talking about one vs hundreds or thousands of queries, that's when the connection/transmission overhead becomes more significant, and reducing the number of queries can make an impact.
It seems that your framework suffers from premature optimization. You are hyper-concerned about fetching too many fields from a row, but why? Do you have thousands of columns or something?
The time consuming part of your query is almost always the lookup, not the transmission of data. You are causing the database to do the "hard" part over and over again as you pull one field at a time.
I want to improve the speed of a notification board. It retrieves data from the event table.
At this moment the events MySQL table looks like this
id | event_type | who_added_id | date
In the event table I store one row with information regarding a particular event. Each time a users A asks for new notifications, the query runs through the table and looks if the notifications added by the user B suit him (they have to be friends, members of the same groups, have previously chatted).
Table events became big, because of the bulky query the page loads slow.
I'm thinking of changing entirely this design and, instead of adding one event row and then compare if the user's event suits or not, to add as many rows as interested users. I would change the table events structure as follows:
id | event_type | who_added_id | forwho_id | date
Now, if user B creates an event which interests other 50 members, I create 50 rows with the same information and in the 'forwho_id' field I mention those 50 members which must get this notification.
I think the query will become much more simple and it will take less time to search through it.
How do you think:
1. Is this a good approach in storing such kind of data or we should avoid duplicate data at any cost?
2. How do you think the events table will behave if the number of interested users will be not 50 but hundreds?
Thank you for reading this and I hope I made myself understandable.
Duplicated data is not "bad", and it's not to be "avoided at all cost".
What is "bad" is uncontrolled redundancy, and the kind of problems that come up when the logical data model isn't third normal form. It is acceptable and expected that an implementation will deviate from a logical data model, and introduce redundancy for performance.
Your revised design looks appropriate for your needs.
i don't even know if calling it serialized column is right, but i'm going to explain myself, for example, i have a table for users, i want to store the users phone numbers(cellphone, home, office, etc), so, i was thinkin' to make a column for each number type, but at the same time came to my head an idea, what if i save a json string in a single column, so, i will never have a column that probably will never be used and i can turn that string into a php array when reading the data from database, but i would like to hear the goods and bads of this practice, maybe it is just a bad idea, but first i want to know what other people have to say about
thanks
Short Answer, Multiple columns.
Long Answer:
For the love of all that is holy in the world please do not store mutiple data sets in a single text column
I am assuming you will have a table that will either be
+------------------------------+ +----------------------+
| User | cell | office | home | OR | User | JSON String |
+------------------------------+ +----------------------+
First I will say both these solutions are not the best solution but if you were to pick the from the two the first is best. There are a couple reasons mainly though the ability to modify and query specifically is really important. Think about the algrothim to modify the second option.
SELECT `JSON` FROM `table` WHERE `User` = ?
Then you have to do a search and replace in either your server side or client side language
Finally you have to reinsert the JSON string
This solution totals 2 queries and a search and replace algorithm. No Good!
Now think about the first solution.
SELECT * FROM `table` WHERE `User` = ?
Then you can do a simple JSON encode to send it down
To modify you only need one Query.
UPDATE `table` SET `cell` = ? WHERE `User` = ?
to update more than one its again a simple single query
UPDATE `table` SET `cell` = ?, `home` = ? WHERE `User` = ?
This is clearly better but it is not best
There is a third solution Say you want a user to be able to insert an infinite number of phone numbers.
Lets use a relation table for that so now you have two tables.
+-------------------------------------+
+---------+ | Phone |
| Users | +-------------------------------------+
+---------+ | user_name| phone_number | type |
| U_name | +-------------------------------------+
+---------+
Now you can query all the phone numbers of a user with something like this
Now you can query the table via a join
SELECT Users., phone. FROM Phone, Users WHERE phone.user_name = ? AND Users.U_name = ?
Inserts are just as easy and type checking is easy too.
Remember this is a simple example but SQL really provides a ton of power to your data-structure you should use it rather than avoiding it
I would only do this with non-essential data, for example, the user's favorite color, favorite type of marsupial (obviously 'non-essential' is for you to decide). The problem with doing this for essential data (phone number, username, email, first name, last name, etc) is that you limit yourself to what you can accomplish with the database. These include indexing fields, using ORDER BY clauses, or even searching for a specific piece of data. If later on you realize you need to perform any of these tasks it's going to be a major headache.
Your best best in this situation is using a relational table for 1 to many objects - ex UserPhoneNumbers. It would have 3 columns: user_id, phone_number, and type. The user_id lets you link the rows in this table to the appropriate User table row, the phone_number is self explanatory, and the type could be 'home', 'cell', 'office', etc. This lets you still perform the tasks I mentioned above, and it also has the added benefit of not wasting space on empty columns, as you only add rows to this table as you need to.
I don't know how familiar you are with MySQL, but if you haven't heard of database normalization and query JOINs, now is a good time to start reading up on them :)
Hope this helps.
If you work with json, there are more elegant ways than MySQL. Would recommend to use either another Database working better with json, like mongoDB or a wrapper for SQL like Persevere, http://www.persvr.org/Documentation (see "Perstore")
I'm not sure what the advantages of this approach would be. You say "so, i will never have a column that probably will never be used..." What I think you meant was (in your system) that sometimes a user may not have a value for each type of phone number available, and that being the case, why store records with empty columns?
Storing records with some empty columns is not necessarily bad. However, if you wanted to normalize your database, you could have a separate table for user_phonenumber, and create a 1:many relationship between user and user_phonenumber records. The user_phonenumber table would basically have four columns:
id (primary key)
userid (foreign key to user table)
type (e.g. cellphone, home, office, etc.)
value (the phone number)
Constraints would be that id is a primary key, userid is a foreign key for user.id, and type would be an enum (of all possible phone number types).
Soon I'll be working on catalog(php+mysql) that will have multilang content support. And now I'm considering the best approach to design the database structure. At the moment I see 3 ways for multilang handling:
1) Having separate tables for each language specific data, i.e. schematicly it'll look like this:
There will be one table Main_Content_Items, storing basic data that cannot be translated like ID, creation_date, hits, votes on so on - it will be only one and will refer to all languages.
And here are tables that will be dublicated for each language:
Common_Data_LANG table(example: common_data_en_us) (storing common/"static" fields that can be translated, but are present for eny catalog item: title, desc and so on...)
Extra_Fields_Data_LANG table (storing extra fields data that can be translated, but can be different for custom item groups, i.e. like: | id | item_id | field_type | value | ...)
Then on items request we will look in table according to user/default language and join translatable data with main_content table.
Pros:
we can update "main" data(i.e. hits, votes...) that are updated most often with only one query
we don't need o dublicate data 4x or more times if we have 4 or more languages in comparison with structure using only one table with 'lang' field. So MySql queries would take less time to go through 100000(for example) records catalog rather then 400000 or more
Cons:
+2 tables for each language
2) Using 'lang' field in content tables:
Main_Content_Items table (storing basic data that cannot be translated like ID, creation_date, hits, votes on so on...)
Common_Data table (storing common/"static" fields that can be translated, but are present for eny catalog item: | id | item_id | lang | title | desc | and so on...)
Extra_Fields_Data table (storing extra fields data that can be translated, but can be different for custom item groups, i.e. like: | id | item_id | lang | field_type | value | ...)
So we'll join common_data and extra_fields to main_content_items according to 'lang' field.
Pros:
we can update "main" data(i.e. hits, votes...) that are updated most often with only one query
we only 3 tables for content data
Cons:
we have custom_data and extra_fields table filled with data for all languages, so its X time bigger and queries run slower
3) Same as 2nd way, but with Main_Content_Items table merged with Common_Data, that has 'lang' field:
Pros:
...?
Cons:
we need to update update "main" data(i.e. hits, votes...) that are updated most often with for every language
we have custom_data and extra_fields table filled with data for all languages, so its X time bigger and queries run slower
Will be glad to hear suggestions about "what is better" and "why"? Or are there better ways?
Thanks in advance...
I've given a similar anwer in this question and highlighted the advantages of this technique (it would be, for example, important for me to let the application decide on the language and build the query accordingly by only changing the lang parameter in the WHERE clause of the SQL query.
This get's pretty close to your second solution. I didn't quite got the "extra_fields" but if it makes sense, you could(!) merge it into the common_data table. I would advise you against the first idea since there will be too many tables and it can be easy to lose track about the items in there.
To your edit: I still consider the second approach the better one (it's my optinion so it's relative ;)) I'm no expert on optimization but I think that with proper indexes and proper table structure speed should be not be a problem. As always, the best way to find the most effective way is doing both methods and see which is best since speed will vary from data, structure, ....
I am a new php and mysql programmer. I am handling quite large amount of data, and in future it will grow slowly, thus I am using hash table. I have couple of questions:
Does mysql have hash table built in function? If yes, how to use that?
After couple of days doing research about hash table. I briefly know what hash table is but I just could not understand how to start creating one. I saw a lot of hash table codes over the internet. Most of them, in the first step in to create a hashtable class. Does it mean, they store the hash table value in the temporary table instead of insert into mysql database?
For questions 3,4 & 5, example scenario:
User can collect items in the website. I would like to use hash table to insert and retrieve the items that the user collected.
[Important] What are the possible mysql database structure looks like?
e.g, create items and users table
in items table have: item_id, item_name, and item_hash_value
in users table have: user_id, username, item_name, item_hash_value
I am not sure if the users table is correct?
[Important] What are the steps of creating hash table in php and mysql?
(If there is any sample code would be great :))
[Important] How to insert and retrieve data from hash table? I am talking about php and mysql, so I hope the answers can be like: "you can use mysql query i.e SELECT * from blabla..."
(sorry about the italics, underscores can trigger them but I can't find a good way to disable that in the middle of a paragraph. Ignore the italics, I didn't mean to put them there)
You don't need to worry about using a hashtable with MySQL. If you intend to have a large number of items in memory while you operate on them a hashtable is a good data structure to use since it can find things much faster than a simple list.
But at the database level, you don't need to worry about the hashtable. Figuring out how to best hold and access records is MySQL's job, so as long as you give it the correct information it will be happy.
Database Structure
items table would be: item_id, item_name
Primary key is item_id
users table would be: user_id, username
Primary key is user_id
user_items table would be: user_id, item_id
Primary key is the combination of user_id and item_id
Index on item_id
Each item gets one (and only one) entry in the items table. Each user gets one (and only one) entry in the users table. When a user selects an item, it goes in the user items table. Example:
Users:
1 | Bob
2 | Alice
3 | Robert
Items
1 | Headphones
2 | Computer
3 | Beanie Baby
So if Bob has selected the headphones and Robert has selected the computer and beanie baby, the user_items table would look like this:
User_items (user_id, item_id)
1 | 1 (This shows Bob (user 1) selected headphones (item 1))
3 | 2 (This shows Robert (user 3) selected a computer (item 2))
3 | 3 (This shows Robert (user 3) selected a beanie baby (item 3))
Since the user_id and item_id on the users and items tables are primary keys, MySQL will let you access them very fast, just like a hashmap. On the user_items table having both the user_id and item_id in the primary key means you won't have duplicates and you should be able to get fast access (an index on item_id wouldn't hurt).
Example Queries
With this setup, it's really easy to find out what you want to know. Here are some examples:
Who has selected item 2?
SELECT users.user_id, users.user_name FROM users, user_items
WHERE users.user_id = user_items.user_id AND user_items.item_id = 2
How many things has Robert selected?
SELECT COUNT(user_items.item_id) FROM user_items, users
WHERE users.user_id = user_items.user_id AND users.user_name = 'Robert'
I want a list of each user and what they've selected, ordered by the user name
SELECT user.user_name, item.item_name FROM users, items, user_items
WHERE users.user_id = user_items.user_id AND items.item_id = user_items.item_id
ORDER BY user_name, item_name
There are many guides to SQL on the internet, such as the W3C's tutorial.
1) Hashtables do exist in MySQL but are used to keep internal track of keys on tables.
2) Hashtables work by hashing a data cell to create a number of different keys that separate the data by these keys making it easier to search through. The hashtable is used to find what the key is that should be used to bring up the correct list to search through.
Example, you have 100 items, searching 100 items in a row takes 10 seconds. If you know that they can be separated by type of item and break it up into 25 items of t-shirts, 25 items of clocks, items rows of watches, and items rows of shoes. Then when you need to find a t-shirt, you can only have to search through the 25 items of t-shirts which then takes 2.5 seconds.
3) Not sure what your question means, a MySQL database is a binary file that contains all the rows in the database.
4) As in #2 you would need to decide what you want your key to be.
5) #2 you need to know what your key is.
If you think a hash table is the right way to store your data, you may want to use a key-value database like CouchDB instead of MySQL. They show you how to get started with PHP.
I am a new php and mysql programmer. I am handling quite large amount of data, and in future it will grow slowly, thus I am using hash table.
lookin at your original purpose, use "memcache" instead, it is the most scalable solution while offers the minimal changes in your code, you can scale up the memcache servers as your data go larger and larger.