I am working on a new web app that allow users to "save" pictures they like. I have a big table with many pictures and another table with users information. My question: How can I save the pictures each users "save"? I mean what is the proper way to save that information. I thought of making a new table with the user id and the picture id but I think maybe it is too messy and in the future it will take to long to make a query.
Thanks you very much
I'd introduce only a single association table, not one per user.
A "user" can "save" zero, one or more "picture"
A "picture" can be saved by zero, on or more "user"
We introduce a third table, call it "user_picture" or "picture_user", or "saved_picture", (it's just a table name; but it should just make "sense" to someone looking at the model.)
That table will have two foreign keys:
user_id REFERENCES user(id)
picture_id REFERENCES picture(id)
The combination of these two columns can serve as the PRIMARY KEY.
PRIMARY KEY (user_id, picture_id)
To get the saved pictures for a user:
SELECT p.*
FROM picture p
JOIN saved_picture s
ON s.picture_id = p.id
JOIN user u
ON u.id = s.user_id
WHERE u.username = 'foo'
With suitable indexes available, selecting a small subset of rows from large tables should still be very efficient.
This design makes it easy to answer some questions, such as "which pictures are the 'most' saved?"
SELECT s.picture_id
, COUNT(1) AS save_count
FROM saved_picture s
GROUP BY s.picture_id
ORDER BY COUNT(1) DESC
On very large tables, this can crank a while, so this is where having simple, short, surrogate primary keys really helps.
Compare this to the query (or queries) that would be required to answer that same question if you had separate "save" table for each user; consider the number of tables that would need to be queried.
If you start adding attributes to the saved_picture table (e.g. date_saved), you may consider adding a surrogate primary key on the table, and using a UNIQUE constraint on (user_id,picture_id).
you can do either things you can either make tables the way you mentioned or you can create different tables for different users {which makes it easier for you to display images saved by each user}
you can use certain nomenclature for tables like <user_id>_savedimages
for example you can have a table 21_savedimages for a user with id 21 this makes the task faster and less messy but this in turn results in lots of table in database.
you can decide yourself depending upon no of users you have and average no of images a user is saving.
Related
I have a MySQL database that stores user emails and news articles that my service provides. I want users to be able to save/bookmark articles they would like to read later.
My plan for accomplishing this was to have a column, in the table where I store the users' emails, that holds comma-delineated strings of unique IDs, where the unique IDs are values assigned to each article as they are added into the database. These articles are stored in a separate table and I use UUID_SHORT() to generate the unique IDs of type BIGINT.
For example, let's say in the table where I store my articles, I have
ArticleID OtherColumn
4419350002044764160 other stuff
4419351050184556544 other stuff
In the table where I store user data, I would have
UserEmail ArticlesSaved OtherColumn
examlple1#email.com 4419350002044764160,4419351050184556544,... other stuff
examlple2#email.com 4419350002044764160,4419351050184556544,... other stuff
to indicate the first two users have saved the articles with IDs 4419350002044764160 and 4419351050184556544.
Is this a proper way to store something like this on a database? If there is a better method, could someone explain it please?
One other option I was thinking of was having a separate table for each user where I can store the IDs of the articles they saved into a column, though the answer for this post that this is not very efficient: Database efficiency - table per user vs. table of users
I would suggest one table for the user and one table his/her bookmarked articles.
USERs
id - int autoincrement
user_email - varchar50
PREFERENCES
id int autoincrement
article_index (datatype that you find accurate according to your structure)
id_user (integer)
This way it will be easy for a user to bookmark and unbookmark an article. Connecting the two tables are done with id in users and id_user in preferences. Make sure that each row in the preferences/bookmarks is one article (don't do anything comma seperated). Doing it this way will save you much time/complications - I promise!
A typical query to fetch a user's bookmarked pages would look something like this.
SELECT u.id,p.article_index,p.id_user FROM users u
LEFT JOIN preferences ON u.id=p.id_user
WHERE u.id='1' //user id goes here, make sure it's an int.. apply appropriate security to your queries.
"Proper" is a squirrely word, but the approach you suggest is pretty flawed. The resulting database no longer satisfies even first normal form, and that predicts practical problems even if you don't immediately see them. Some of the problems you would be likely to encounter are
the number of articles each user can "save" will be limited by the data type of the ArticlesSaved column;
you will have issues around duplicate "saved" article IDs; and
queries about which articles are saved will be more difficult to formulate and will probably run slower; in part because
you cannot meaningfully index the the ArticlesSaved column.
The usual way to model a many-to-many relationship (such as between users and articles) is via a separate table. In this case, such a table would have one row for each (user, saved article) pair.
Saving data in CSV format in a database field is (almost) never a good idea. You should have 3 tables :
1 table describing users with everything concerning directly the user
1 table describing articles with data about it
1 table with 2 columns "userid" and "articleid" linking both. If a user bookmarks 10 articles, this table will have 10 records with a different aticleid each time.
I have to create a system to save user's vote for two different type of module: News and Video.
This table should have the same fields:
id
entry_id
vote
user_id
So I tought to add a new field to save also the name of the module (module), in this way I can have just one table in the DB and filter it when needed and create two views for statistic purpose.
I don't really know if the best solution is one table with the new field or is better have two different table.
Let's assume that I have 1000 news and 1000 users and all of them will vote each news I will have 1000000 rows in the table.
Now assume that I have also 1000 videos and also in this case all my users will vote it, other 1000000 rows for an amount of 2000000 rows in a single table.
Do I have any performance problem in this case? And If I will have much more video, news an users?
Operation that I should do:
Insert
Update
Search
If you need more infos please ask
I think the way to answer this question is based on entry_id. The votes are going to be about something and that something is going to reference another table.
So, if you have two separate tables for News and Videos, then you should have two separate votes tables. Neither will have entry_id. One will have news_id and the other video_id.
If you have one table, say Entries for both News and Videos, then have one table.
In other words, I am advising against having one table conditionally reference multiple other tables. It becomes very difficult to express foreign key restraints, for one thing. In addition, join operations are cumbersome to express. Someone else might visit the table and not realize that entry_id can refer to multiple tables, and incorrectly set up queries.
All of these problem can be overcome (and there are situations where one table may be the preferred solution). However, if the original entities are in different tables, then put the votes in different tables.
I have 2 tables users and comments.
In term of performance, should i use a join (with user id) to get the username from users table or should i add a column username to the comments, so i won't need join for only 1 data (username) but i will need to add one more data for each comment (username).
Is join slow if there's a lot of comments in the table ?
Wich one should i do ? Join or add column username to comments.
Thank you.
Join is probably the best so you're not storing data in two places. What if a user name is changed? It won't change the user name for the comments. Also, joining two tables is not very time consuming.
It depends on the number of users and comments. Having a denormalized db, which is what you ask, can be faster but then you need to take care yourself to update username in both tables. Don't forget to add index for userid in comments table if you go the join way.
So the correct answer I believe is go with the join, and denormalize later if needed.
If you're using InnoDB, you can add the column and add foreign key restrictions. This will allow you to increase efficiency and worry less about updating indexes.
The one reason why you would store the user name in the comments table is if you wanted to know the user name when the comment was created. If the user name is subsequently changed, you'll still have the name at the time of the comment.
In general, though, you want to use join. You want to have the primary key on a table defined, probably as an auto-incremented (identity) integer value.
If you are concerned about performance for getting all comments for a single user, then you should build an index on the comments table on the user id field.
Personally I would rather user two simple separate queries. I do not like joins all that much. Joins just produce duplicated data by definition. You might want to check http://www.notorm.com/ that is a simple php db access layer going joinless way.
I'm trying to replicate SE's voting system. User's on my website should only be able to vote on posts one time, then they are locked in. I currently have two tables, users and posts.
How should I store information on which posts a user has voted on? I was thinking of having a column in posts which would store the uids of users which have voted on it. Another idea would have a column in users with the ids of the posts he/she has voted on.
How should I do this? I want to take scalability into account and easy of determining whether or not the vote is valid.
Create another table to store information from the users and posts tables:
CREATE TABLE votes (
user_id INT
, post_id INT
, PRIMARY KEY (user_id, post_id)
);
With this approach:
I was thinking of having a column in posts which would store the uids of users which have voted on it. Another idea would have a column in users with the ids of the posts he/she has voted on.
Unless you store the values as delimited values (to fit in one cell) or JSON, you'll end up with many rows for just one post. But then, that's a bad approach to start with.
Stick with creating a new table which contains the relationship determining "voting". The table is simple enough to check:
SELECT COUNT(t1.post_id) AS vote_count
FROM votes AS t1
WHERE
t1.user_id = SOME_INTEGER
AND t1.post_id = SOME_INTEGER
Best practice, for something the size of stackoverflow, is to store the individual votes in a separate table. But also keep a derived vote count in a field directly attached to the post. As database sizes grow, it'll become prohibitively expensive to sum up all the votes on every viewing of a post.
Keeping this derived field is relatively easy by use of triggers on this user/votes table.
I'd like to add a like/dislike-upvote/downovote-type feature to each one of the posts in the forum script I'm writing (much like the one here in SO). I'm having two difficulties trying to figure out how it can be done:
1) I can't figure out a db schema that'd do it efficiently. I could use a separate `likeordislike` table to make a relation between user and post (xyz likes post #123), or I can use a column of type \'text\' in the `posts` table listing out all the users who have liked (or disliked) the post. The latter of course means I'd have to parse the field for userIDs to make any use of it.
2) Make sure the user doesn't get to like/dislike a post twice.
It's probably trivial but I can only think of ways that make a lot of mysql calls on server side processes. Thanks.
Make a separate table in which you keep track of who likes something and who doesn't. That table will be used to check if a user already did something, so you can prevent him doing it twice.
Then add another field (if you will have votes) or two (if you will have likes/dislikes) in which you will store the total amount of likes/dislikes or score, so you don't have to calculate this on-the-fly every time you display the post. And you will, off course, update this column (or columns) when somebody votes on the post.
And don't bother disabling the vote link. Just check if the user has already voted when he clicks on the link and deny him the vote if he already cast one.
(Similar answer to Jan Hančič here, but I decided my take on the ratings was different enough...)
Your initial thought of a separate table to store likes/dislikes is absolutely the way I would go. I'd put indexes on the two main columns (player ID and post ID) which is critical to the next bit.
For example:
create table users (
userId varchar(254) not null,
-- ...
primary key (userId)
)
ENGINE=...;
create table posts (
postId int(11) not null,
title varchar(254) not null,
content text not null,
-- ...
primary key (postId)
)
ENGINE=...;
create table userPostRatings (
userId varchar(254) not null,
postId int(11) not null,
rating int(2) not null,
-- ...
)
ENGINE=...;
create index userPostRatings_userId on userPostRatings(userId);
create index userPostRatings_postId on userPostRatings(postId);
I'd then use a joined query (whether in a stored procedure, in the code, or in a view) to get the post information, e.g.:
select p.postId, p.title, p.content, avg(r.rating)
from posts p
left outer join userPostRatings r on p.postId = r.postId
where p.postId = ?
group by p.postId;
(That will return NULL for the average for any posts that don't have any ratings yet.)
Although this is a join, because of the indexes on the userPostRatings table it's a fairly efficient one.
If you find that the join is killing you (very high concurrency sites do), then you may want to de-normalize a bit in the way Jan suggested, by adding an average rating column to posts and keeping it updated. You'd just change your query to use it. But I wouldn't start out denormalized, it's more code and arguably premature optimisation. Your database is there to keep track of this information; as soon as you duplicate the information it's tracking, you're introducing maintenance and synchronisation issues. Those may be justified in the end, but I wouldn't start out assuming my DB couldn't help me with this. Making the adjustment (if you plan ahead for it) if the join is a problem in your particular situation isn't a big deal.