I have a table called users and a table called pages. Users of the system can subscribe to a page and receive updates about the page. My problem is that users and pages will be updated dynamically (ie. no manual intervention to the tables) and I don't want to keep adding another column everytime someone subscribes to the page.
How can I achieve updating both the users table and the pages table dynamically to reflect that they have subscribed to that page?
My idea would be to add an comma separated array of usernames into the pages table and update them as users subscribe/unsubscribe.
Just making it an official answer:
While the initial hunch may be to use comma separated values to represent the link between those 2 tables (or any other way of saving the data in one column like saving a json string), it is actually bad practice because it does not conform to the First Normal Form (and definitely not 2nd and 3rd).
First Normal Form - Wikipedia
First Normal Form says you should never store more than 1 value in 1 table cell.
The problem, in short, starts when you'll need to use that data, which will actually take you at least 2 actions - 1 is reading the data from the database and 2nd is to parse it in your languaging script. Imagine what happens when you need then to use that data to read some other data from the database - you are making more sql queries than you need and taking at least twice the time (+resources). It becomes even more complicated when you need to use JOIN queries or have other one-to-many data relationships.
The solution then is simple - you need to create a 3rd table that serves as an intermediate table.
You can call it users_pages or user2pages and that represents the 1 to many relationship between 1 user and many pages.
The structure of the table is as simple as:
users_pages
-----------
-- id // a unique id for the relationship, can be auto generated
-- user_id // the user id
-- page_id // the page id
-----------
This allows you to build a more robust application as well as run advanced queries and calculations without the need to parse the data in your script (i.e count amount of pages each user is subscribed to, or amount of users subscribed to 1 page).
Unsubscribing can be also much easier this way since you don't need to read the users or pages table at all. You simply delete the relation from the users_pages table.
Without it, you will need to (a) first read the users table (b) get the pages data comma separated (c) parse the data and remove the specific page from it (d) save the new data again to the database. That's 4 actions and 2 SQL queries...
I hope this helps!
Related
I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).
I have a MySQL database that stores user emails and news articles that my service provides. I want users to be able to save/bookmark articles they would like to read later.
My plan for accomplishing this was to have a column, in the table where I store the users' emails, that holds comma-delineated strings of unique IDs, where the unique IDs are values assigned to each article as they are added into the database. These articles are stored in a separate table and I use UUID_SHORT() to generate the unique IDs of type BIGINT.
For example, let's say in the table where I store my articles, I have
ArticleID OtherColumn
4419350002044764160 other stuff
4419351050184556544 other stuff
In the table where I store user data, I would have
UserEmail ArticlesSaved OtherColumn
examlple1#email.com 4419350002044764160,4419351050184556544,... other stuff
examlple2#email.com 4419350002044764160,4419351050184556544,... other stuff
to indicate the first two users have saved the articles with IDs 4419350002044764160 and 4419351050184556544.
Is this a proper way to store something like this on a database? If there is a better method, could someone explain it please?
One other option I was thinking of was having a separate table for each user where I can store the IDs of the articles they saved into a column, though the answer for this post that this is not very efficient: Database efficiency - table per user vs. table of users
I would suggest one table for the user and one table his/her bookmarked articles.
USERs
id - int autoincrement
user_email - varchar50
PREFERENCES
id int autoincrement
article_index (datatype that you find accurate according to your structure)
id_user (integer)
This way it will be easy for a user to bookmark and unbookmark an article. Connecting the two tables are done with id in users and id_user in preferences. Make sure that each row in the preferences/bookmarks is one article (don't do anything comma seperated). Doing it this way will save you much time/complications - I promise!
A typical query to fetch a user's bookmarked pages would look something like this.
SELECT u.id,p.article_index,p.id_user FROM users u
LEFT JOIN preferences ON u.id=p.id_user
WHERE u.id='1' //user id goes here, make sure it's an int.. apply appropriate security to your queries.
"Proper" is a squirrely word, but the approach you suggest is pretty flawed. The resulting database no longer satisfies even first normal form, and that predicts practical problems even if you don't immediately see them. Some of the problems you would be likely to encounter are
the number of articles each user can "save" will be limited by the data type of the ArticlesSaved column;
you will have issues around duplicate "saved" article IDs; and
queries about which articles are saved will be more difficult to formulate and will probably run slower; in part because
you cannot meaningfully index the the ArticlesSaved column.
The usual way to model a many-to-many relationship (such as between users and articles) is via a separate table. In this case, such a table would have one row for each (user, saved article) pair.
Saving data in CSV format in a database field is (almost) never a good idea. You should have 3 tables :
1 table describing users with everything concerning directly the user
1 table describing articles with data about it
1 table with 2 columns "userid" and "articleid" linking both. If a user bookmarks 10 articles, this table will have 10 records with a different aticleid each time.
I'm making a table (with MySQL) to store some data, but i'm not sure of the way to do it properly, because of the amount of data. For example if it's adress book database.
so there is a table for users and a table for contacts. Each users can own hundreds of contacts, and there could be thousans of users. Should I add a new row for every single contact (it will make a lot of rows!), or can i just concatenate all of them in one row with the user id.
uuh, this is just an example, but in my case once contacts are INSERTED they will never be UPDATED so, no modifications, they can only be DELETED.
To go by the normal forms, you should have three tables
1) Users -> {User_id} (primary key)
2) Contacts -> {Contact_id} (primary key)
3) Users_Contacts -> {User_id, Contact_id} (Compound key)
The Junction table Users_Contacts will have one record per contact - meaning for each unique value of User_id+Contact_id, there will be one record.
However, In practice, it is not always necessary to stick to the rule book. Depending on the use case, it is often advisable to have a denormalized table. The call is yours.
There is also another option of using NoSQL with MySQL. For example, the contacts can be serialized into JSON and stored. Mysql 5.7 seem to support this data format (with some external help). See this for details.
Say for eg: If you add 3 contacts for a single user and as you mentioned you would be deleting contacts the its better to insert all three contacts, each in a new row with its user id. Because if you want to delete any one of the contact from 3 of them, then it will be easy.
If you concatenate all the contacts for an user and add them in one row could land up many issues. What in future the requirement changes and you need to make a layout all the contacts for an user with edit/delete individual contacts. So you should have one contact in each row.
You can optimize your query by indexing the columns.
Say userid#1234 has 1000 contacts in contact table where the primary key in contact table is idcontact (Indexed by default) and then in contact table another field called "iduser" which is also indexed, then the select performance over an iduser on contact table will be fast.
Ideally its the best approach using mysql database. There are examples of many apps where it maintains millions of data so it should be fine with a contact table and for each contact a new row.
I wouldn't worry about lots of rows. You have to keep in-mind the granularity of control the user would expect (deleting / adding a contact, rearranging the list based on different factors, etc). It's always better to break things out into their owns rows if they are going to be treated independently from a similar item (contacts, users, addresses, etc). Additionally, if you were to concatenate your data, re-ordering for display or removing data becomes extremely resource intensive. Where as MySQL is designed to do exactly that "on the cheap".
MySQL can easily handle millions of rows of data. If you are worried about speed, just make sure your indexes are in-place before your data collection is too big (I would venture a guess, and say you'll need to index the user ID the contact belongs to and the first/last names). Indexes are a double-edged sword, however, as they take up disk space, but allow fast querying of large data sets. So you don't want to go over-board and index everything, only what you'll be sorting/searching by.
(Why on earth will contacts never be updated?...)
I'm working on an app in JavaScipt, jQuery, PHP & MySQL that consists of ~100 lessons. I am trying to think of an efficient way to store the status of each user's progress through the lessons, without having to query the MySQL database too much.
Right now, I am thinking the easiest implementation is to create a table for each user, and then store each lesson's status in that table. The only problem with that is if I add new lessons, I would have to update every user's table.
The second implementation I considered would be to store each lesson as a table, and record the user ID for each user that completed that lesson there - but then generating a status report (what lessons a user completed, how well they did, etc.) would mean pulling data from 100 tables.
Is there an obvious solution I am missing? How would you store your users progress through 100 lessons, so it's quick and simple to generate a status report showing their process.
Cheers!
The table structure I would recommend would be to keep a single table with non-unique fields userid and lessonid, as well as the relevant progress fields. When you want the progress of user x on lesson y, you would do this:
SELECT * FROM lessonProgress WHERE userid=x AND lessonid=y LIMIT 1;
You don't need to worry about performance unless you see that it's actually an issue. Having a table for each user or a table for each lesson are bad solutions because there aren't meant to be a dynamic number of tables in a database.
If reporting is restricted to one user at a time - that is, when generating a report, it's for a specific user and not a large clump of users - why not consider javascript object notation stored in a file? If extensibility is key, it would make it a simple matter.
Obviously, if you're going to run reports against an arbitrarily large number of users at once, separate data files would become inefficient.
Discarding the efficiency argument, json would also give you a very human-readable and interchangeable format.
Lastly, if the security of the report output isn't a big sticking point, you'd also gain the ability to easily offload view rendering onto the client.
Use relations between 2 tables. One for users with user specific columns like ID, username, email, w/e else you want to store about them.
Then a status table that has a UID foreign key. ID UID Status etc.
It's good to keep datecreated and dateupdated on tables as well.
Then just join the tables ON status.UID = users.ID
A good option will be to create one table with an user_ID as primary key and a status (int) each row of the table will represent a user. Accessing to its progress would be fast a simple since you have an index of user IDs.
In this way, adding new leassons would not make you change de DB
I'm used to building websites with user accounts, so I can simply auto-increment the user id, then let them log in while I identify that user by user id internally. What I need to do in this case is a bit different. I need to anonymously collect a few rows of data from people, and tie those rows together so I can easily discern which data rows belong to which user.
The difficulty I'm having is in generating the id to tie the data rows together. My first thought was to poll the database for the highest user ID in existence, and write to the database with user ID +1. This will fail, however, if two submissions poll the database before either of them writes to it - they will each share the same user ID.
Another thought I had was to create a separate user ID table that would be set to auto-increment, and simply generate a new row, then poll that table for the id of the last row created. That also fails for the same reason as above - if two submissions create a row before either of them polls for the latest user ID, then they'll end up sharing an ID.
Any ideas? I get the impression I'm missing something obvious.
I think I'm understanding you right; I was having a similar issue. There's a super handy php function, though. After you query the database to insert a new row and auto-incrementing their user ID, do:
$user_id = mysql_insert_id();
That just returns the auto-increment value from the previous query on the current mysql connection. You can read more about it here if you need to.
You can then use this to populate the second table's data, being sure nobody will get a duplicate ID from the first one.
You need to insert the user, get the auto-generated id, and then use that id as a foreign key in the couple of rows you need to associate with the parent record. The hat rack must exist before you can hang hats on it.
This is a common issue, and to solve it, you would use a transaction. This gives you the atomic idea being being able to do more than one thing, but have it tied to either a success or fail as a package. It's an advanced db feature, and does require awareness of some more advanced programming in order to implement it in as fault-tolerant a manner as possible.