I have a table, where I store user uploaded files. There can be 5 different file types: profile picture, cpr file, degree file, video file, background check file.
Table structure is this:
file_id, user_id, file_type, file_size, file_name, file_new_name, file_path, file_cat, date_created
My questions:
Is this structure efficient or should I create 5 different tables?
If I would like to update, lets say user profile picture row, then what would be the best way to do it? --- I came up with a solution that probably is not be the best one- I update the row where file_cat = "profile_picture" and user_id=:user_id. Would that put a lot of load in the system?
First when user signs up, he doesn't have any files. Should I user insert into ... VALUES ... on duplicate key update with a hidden value in a form?
Thank you in advance.
This is three questions not one.
Is this structure efficient or should I create 5 different tables?
One table is good enough
If I would like to update, lets say user profile picture row, then
what would be the best way to do it? --- I came up with a solution
that probably is not be the best one- I update the row where file_cat
= "profile_picture" and user_id=:user_id. Would that put a lot of load in the system?
Not if you have an index on file_cat, user_id (composite index on both fields). If you want to make things a bit leaner you can store constants instead of 'profile_picture' etc. eg
profile_picture = 1
cpr = 2
....
background = 6
This would make the tables and indexes a bit smaller. It might make the queries slightly faster.
First when user signs up, he doesn't have any files. Should I user
insert into ... VALUES ... on duplicate key update with a hidden value
in a form?
No need for that. not having a record for new users actually makes things easier. You can do an COUNT(*) = 0 query or better still an EXISTS query without having to fetch rows and examine them.
Update:
These EXISTS queries are really usefull when you are dealing with JOINs or Sub Queries for example to quickly find if a user has uploaded a profile picc
SELECT * from users WHERE exists (SELECT * from pictures where pictures.user_id = users.id)
If you use the primary key properly then your insert ... on duplicate key update ... query will do everything for you.
For your table you need to define a primary key column. In this case I would say it is your file_id column. So if you do your insert, the MySQL server will check to see if your file_id column is defined already for that value, if so it will update with the new values, other wise it will add a new row of data with the new file_id.
I should be easy enough to separate it though, make 1 script for creating new rows and another for updating. Usually you will know when you are creating as opposed to updating in an application. Again using a primary key correctly will help you out a lot. Using a primary key in your where clause I am pretty sure is one of the most efficient ways to update.
https://dev.mysql.com/doc/refman/5.5/en/optimizing-primary-keys.html
Related
I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!
In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.
I have a MySQL database that stores user emails and news articles that my service provides. I want users to be able to save/bookmark articles they would like to read later.
My plan for accomplishing this was to have a column, in the table where I store the users' emails, that holds comma-delineated strings of unique IDs, where the unique IDs are values assigned to each article as they are added into the database. These articles are stored in a separate table and I use UUID_SHORT() to generate the unique IDs of type BIGINT.
For example, let's say in the table where I store my articles, I have
ArticleID OtherColumn
4419350002044764160 other stuff
4419351050184556544 other stuff
In the table where I store user data, I would have
UserEmail ArticlesSaved OtherColumn
examlple1#email.com 4419350002044764160,4419351050184556544,... other stuff
examlple2#email.com 4419350002044764160,4419351050184556544,... other stuff
to indicate the first two users have saved the articles with IDs 4419350002044764160 and 4419351050184556544.
Is this a proper way to store something like this on a database? If there is a better method, could someone explain it please?
One other option I was thinking of was having a separate table for each user where I can store the IDs of the articles they saved into a column, though the answer for this post that this is not very efficient: Database efficiency - table per user vs. table of users
I would suggest one table for the user and one table his/her bookmarked articles.
USERs
id - int autoincrement
user_email - varchar50
PREFERENCES
id int autoincrement
article_index (datatype that you find accurate according to your structure)
id_user (integer)
This way it will be easy for a user to bookmark and unbookmark an article. Connecting the two tables are done with id in users and id_user in preferences. Make sure that each row in the preferences/bookmarks is one article (don't do anything comma seperated). Doing it this way will save you much time/complications - I promise!
A typical query to fetch a user's bookmarked pages would look something like this.
SELECT u.id,p.article_index,p.id_user FROM users u
LEFT JOIN preferences ON u.id=p.id_user
WHERE u.id='1' //user id goes here, make sure it's an int.. apply appropriate security to your queries.
"Proper" is a squirrely word, but the approach you suggest is pretty flawed. The resulting database no longer satisfies even first normal form, and that predicts practical problems even if you don't immediately see them. Some of the problems you would be likely to encounter are
the number of articles each user can "save" will be limited by the data type of the ArticlesSaved column;
you will have issues around duplicate "saved" article IDs; and
queries about which articles are saved will be more difficult to formulate and will probably run slower; in part because
you cannot meaningfully index the the ArticlesSaved column.
The usual way to model a many-to-many relationship (such as between users and articles) is via a separate table. In this case, such a table would have one row for each (user, saved article) pair.
Saving data in CSV format in a database field is (almost) never a good idea. You should have 3 tables :
1 table describing users with everything concerning directly the user
1 table describing articles with data about it
1 table with 2 columns "userid" and "articleid" linking both. If a user bookmarks 10 articles, this table will have 10 records with a different aticleid each time.
I am storing user ID values in a table field separated by a | (user_id1|user_id2|user_id3|user_id17).
A user ID will be added and removed from this field at certain points.
How can I check if the current users ID exists in the field or not using a query?
And it of course needs to be an exact match. Can't look for user_id1 and find user_id17.
I know I could use a SELECT query, explode the field, then use in_array but if there's a way to do it using a query it'd be better.
I guess I'll explain what I am doing: I made a forum for a small private website (7 users), but coding it for larger scale.
My table structure is pretty good: forum_categories, forum_topics, forum_posts. Using foreign keys between the tables for delete and update queries.
What I am seeking help on is to mark Topics as unread for each user. I could create a new table with topic_id & user_id, each one being a new row but that wouldn't be good with alot of users & topics.
If somebody has a better solution I am all for it. Or can prove to me that 1 row per user_id is the best way then I'll be more than willing to do that.
I think you want to track read messages, not the other way around. If you tracked unread messages, every time you add a user you'll have to add that user to every topics "unread list".
I looked into SMF like my comment suggested. They are using a separate table to track read messages.
A simple table that holds user_id and topic_id are you are need. When a user reads a topic, make sure there is a row in the table for that user.
Another reason to use a separate table. It's going to be faster to query against 2 int values in the database than to use LIKE % statements.
When creating a many to many table, relational database. If for example you are enabling users to scrape images form around the web and tag them.
Would it be better to:
Check to see if the image is already in the database and if it is, create a link in a relational table and if it is not create a new image.
Create a unique instance of the image for every user and when looking to display the most popular images SELECT AND ORDER BY the image with the most duplicates
I hope this makes sense. Thanks in advance for you help.
I assume you have something equivalent to a USERS table and a PICTURES table. Also a table to break up the many to many relationship. U2P I will call it.
The option you listed as option 1 would seem to be the preferred way. Check to see if the picture is in the DB, if it is get primary key from PICTURES corresponding to it. If not, put the picture in the PICTURES table.
Regardless of if it is a new image, or one that is already in there, you will insert the event into the U2P table. This would reference the USERS primary key and the PICTURES primary key corresponding to the event. You would also record other data such as time etc...
How can we re-use the deleted id from any MySQL-DB table?
If I want to rollback the deleted ID , can we do it anyhow?
It may be possible by finding the lowest unused ID and forcing it, but it's terribly bad practice, mainly because of referential integrity: It could be, for example, that relationships from other tables point to a deleted record, which would not be recognizable as "deleted" any more if IDs were reused.
Bottom line: Don't do it. It's a really bad idea.
Related reading: Using auto_increment in the mySQL manual
Re your update: Even if you have a legitimate reason to do this, I don't think there is an automatic way to re-use values in an auto_increment field. If at all, you would have to find the lowest unused value (maybe using a stored procedure or an external script) and force that as the ID (if that's even possible.).
You shouldn't do it.
Don't think of it as a number at all.
It is not a number. It's unique identifier. Think of this word - unique. No record should be identified with the same id.
1.
As per your explanation provided "#Pekka, I am tracking the INsert Update and delete query..." I assume you just some how want to put your old data back to the same ID.
In that case you may consider using a delete-flag column in your table.
If the delete-flag is set for some row, you shall consider program to consider it deleted. Further you may make it available by setting the delete-flat(false).
Similar way is to move whole row to some temporary table and you can bring it back when required with the same data and ID.
Prev. idea is better though.
2.
If this is not what you meant by your explanation; and you want to delete and still use all the values of ID(auto-generated); i have a few ideas you may implement:
- Create a table (IDSTORE) for storing Deleted IDs.
- Create a trigger activated on row delete which will note the ID and store it to the table.
- While inserting take minimum ID from IDSTORE and insert it with that value. If IDSTORE is empty you can pass NULL ID to generate Auto Incremented number.
Of course if you have references / relations (FK) implemented, you manually have to look after it, as your requirement is so.
Further Read:
http://www.databasejournal.com/features/mysql/article.php/10897_2201621_3/Deleting-Duplicate-Rows-in-a-MySQL-Database.htm
Here is the my case for mysql DB:
I had menu table and the menu id was being used in content table as a foreign key. But there was no direct relation between tables (bad table design, i know but the project was done by other developer and later my client approached me to handle it). So, one day my client realised that some of the contents are not showing up. I looked at the problem and found that one of the menu is deleted from menu table, but luckily the menu id exist in cotent table. I found the menu id from content table that was deleted and run the normal insert query for menu table with same menu id along with other fields. (Id is primary key) and it worked.
insert into tbl_menu(id, col1, col2, ...) values(12, val1, val2, ...)